18202027

From SEQwiki
Jump to: navigation, search

This reference describes Figaro.

PMID PMID 18202027
Title Figaro: a novel statistical method for vector sequence removal.
Year 2008
Journal Bioinformatics
Author White JR, Roberts M, Yorke JA, Pop M
Volume 24
Start page


Error: No contents found at URL http://www.ebi.ac.uk/europepmc/webservices/rest/MED/18202027/citations/4000.

According to Europe PubMed Central, this reference has Error: no local variable "citations" was set. " Error: no local variable "citations" was set. " is not a number. citations.

For reference, you can check Google Scholar, which lacks an API because Google ...


Error: Invalid JSON. According to Almetric, this reference has an Altmetric score of Error: no local variable "altscore" was set. " Error: no local variable "altscore" was set. " is not a number..

Full text description

MOTIVATION: Sequences produced by automated Sanger sequencing machines frequently contain fragments of the cloning vector on their ends. Software tools currently available for identifying and removing the vector sequence require knowledge of the vector sequence, specific splice sites and any adapter sequences used in the experiment-information often omitted from public databases. Furthermore, the clipping coordinates themselves are missing or incorrectly reported. As an example, within the approximately 1.24 billion shotgun sequences deposited in the NCBI Trace Archive, as many as approximately 735 million (approximately 60%) lack vector clipping information. Correct clipping information is essential to scientists attempting to validate, improve and even finish the increasingly large number of genomes released at a 'draft' quality level. RESULTS: We present here Figaro, a novel software tool for identifying and removing the vector from raw sequence data without prior knowledge of the vector sequence. The vector sequence is automatically inferred by analyzing the frequency of occurrence of short oligo-nucleotides using Poisson statistics. We show that Figaro achieves 99.98% sensitivity when tested on approximately 1.5 million shotgun reads from Drosophila pseudoobscura. We further explore the impact of accurate vector trimming on the quality of whole-genome assemblies by re-assembling two bacterial genomes from shotgun sequences deposited in the Trace Archive. Designed as a module in large computational pipelines, Figaro is fast, lightweight and flexible. AVAILABILITY: Figaro is released under an open-source license through the AMOS package (http://amos.sourceforge.net/Figaro).