21088026

From SEQwiki
Jump to: navigation, search

This reference describes SUTTA.

PMID PMID 21088026
Title Scoring-and-Unfolding Trimmed Tree Assembler: Concepts, Constructs and Comparisons.
Year 2010
Journal Bioinformatics
Author Narzisi G, Mishra B.
Volume
Start page


Error: No contents found at URL http://www.ebi.ac.uk/europepmc/webservices/rest/MED/21088026/citations/4000.

According to Europe PubMed Central, this reference has Error: no local variable "citations" was set. " Error: no local variable "citations" was set. " is not a number. citations.

For reference, you can check Google Scholar, which lacks an API because Google ...


Error: Invalid JSON. According to Almetric, this reference has an Altmetric score of Error: no local variable "altscore" was set. " Error: no local variable "altscore" was set. " is not a number..

Full text description

MOTIVATION: Mired by its connection to a well known NP-complete combinatorial optimization problem - namely, the Shortest Common Superstring Problem (SCSP) - historically, the whole-genome sequence assembly (WGSA) problem has been assumed to be amenable only to greedy and heuristic methods. By placing efficiency as their first priority, these methods opted to rely only on local searches, and are thus inherently approximate, ambiguous or error-prone, especially, for genomes with complex structures. Furthermore, since choice of the best heuristics depended critically on the properties of (e.g., errors in) the input data and the available long range information, these approaches hindered designing an error free WGSA (whole genome sequence assembly) pipeline.

RESULTS: We dispense with the idea of limiting the solutions to just the approximated ones, and instead favor an approach that could potentially lead to an exhaustive (exponential-time) search of all possible layouts. Its computational complexity thus must be tamed through a constrained search (Branch-and-Bound) and quick identification and pruning of implausible overlays. For his purpose, such a method necessarily relies on a set of score-functions (oracles) that can combine different structural properties (e.g., transitivity, coverage, physical maps, etc.). We give a detailed description of this novel assembly framework, referred to as SUTTA (Scoring-and-Unfolding Trimmed Tree Assembler), and present experimental results on several bacterial genomes using next generation sequencing technology data. We also report experimental evidence that the assembly quality strongly depends on the choice of the minimum overlap parameter k. Availability and Implementation: SUTTA's binaries are freely available for download at http://www.bioinformatics.nyu.edu.