ZORRO

From SEQwiki
Jump to: navigation, search

Application data

Created by Gustavo Lacerda, Ramon Vidal and Marcelo Carazzolle
Biological application domain(s) Genomics
Principal bioinformatics method(s) Sequence assembly
Technology Sanger, Illumina, PacBio, 454, Ion Torrent, ABI SOLiD
Created at Laboratório de Genômica e Expressão - Universidade Estadual de Campinas
Maintained? Yes
Input format(s) FASTA
Output format(s) FASTA
Programming language(s) Perl
Licence GPL
Operating system(s) Linux
Contact: glacerda@lge.ibi.unicamp.br

Summary: ZORRO is an hybrid sequencing technology assembler. It takes to sets of pre-assembled contigs and merge them into a more contiguous and consistent assembly. The main caracteristic of Zorro is the treatment before and after assembly to avoid errors.

"Error: no local variable "counter" was set." is not a number.

Description

Zorro: The Masked Assembler

Zorro is based on the minimus2 pipeline (AMOS package) and uses MuMMer, AMOS and bowtie in its internals. Zorro takes 2 contigs fasta files as input (representing assembled contigs from a whole genome assembly) and one fasta file containing some of the reads used for assembly (only 10X coverage is enough, more will slow down the pipeline and consume more resources). Please note that the reads file will be used only to help in the ab initio repeat detection, thus we can safely use only a sample of the reads.

Zorro initial phase detect inconsistencies in the assemblies and split the contigs where they occur. Next, zorro counts k-mers (default k=22) in the reads and use the k-mer count table to detect and mask repeats in both assembly1 and assembly2. After repeat masking, zorro uses nucmer to detect overlaps between assembly1 and assembly2 (no overlaps between contigs from the same assembly are allowed). All overlaps found in this phase are expected to be between unique regions (because repeats are masked). The overlaps are used to layout and generate consensus for the merged contigs, using AMOS tools. The merged contigs are built using the unmasked contigs, so the final merged assembly should include the repeat regions.

Another round of assembly, less stringent, tries to merge contigs that were not included in the first Zorro phase. All the contigs are outputted to <prefix>.ZORRO.fasta. We recommend the use of SSPACE to scaffold the ZORRO contigs.

Links


References

none specified


To add a reference for ZORRO, enter the PubMed ID in the field below and click 'Add'.

 


Search for "ZORRO" in the SEQanswers forum / BioStar or:

Web Search Wiki Sites Scientific