ZORRO
Application data |
|
Created by | Gustavo Lacerda, Ramon Vidal and Marcelo Carazzolle |
---|---|
Biological application domain(s) | Genomics |
Principal bioinformatics method(s) | Sequence assembly |
Technology | Sanger, Illumina, PacBio, 454, Ion Torrent, ABI SOLiD |
Created at | Laboratório de Genômica e Expressão - Universidade Estadual de Campinas |
Maintained? | Yes |
Input format(s) | FASTA |
Output format(s) | FASTA |
Programming language(s) | Perl |
Licence | GPL |
Operating system(s) | Linux |
Contact: | glacerda@lge.ibi.unicamp.br |
Summary: ZORRO is an hybrid sequencing technology assembler. It takes to sets of pre-assembled contigs and merge them into a more contiguous and consistent assembly. The main caracteristic of Zorro is the treatment before and after assembly to avoid errors.
"Error: no local variable "counter" was set." is not a number.
Description
Zorro: The Masked Assembler
Zorro is based on the minimus2 pipeline (AMOS package) and uses MuMMer, AMOS and bowtie in its internals. Zorro takes 2 contigs fasta files as input (representing assembled contigs from a whole genome assembly) and one fasta file containing some of the reads used for assembly (only 10X coverage is enough, more will slow down the pipeline and consume more resources). Please note that the reads file will be used only to help in the ab initio repeat detection, thus we can safely use only a sample of the reads.
Zorro initial phase detect inconsistencies in the assemblies and split the contigs where they occur. Next, zorro counts k-mers (default k=22) in the reads and use the k-mer count table to detect and mask repeats in both assembly1 and assembly2. After repeat masking, zorro uses nucmer to detect overlaps between assembly1 and assembly2 (no overlaps between contigs from the same assembly are allowed). All overlaps found in this phase are expected to be between unique regions (because repeats are masked). The overlaps are used to layout and generate consensus for the merged contigs, using AMOS tools. The merged contigs are built using the unmasked contigs, so the final merged assembly should include the repeat regions.
Another round of assembly, less stringent, tries to merge contigs that were not included in the first Zorro phase. All the contigs are outputted to <prefix>.ZORRO.fasta. We recommend the use of SSPACE to scaffold the ZORRO contigs.
Links
- ZORRO Homepage [ edit link ]
- ZORRO [ edit link ]
References
none specified
To add a reference for ZORRO, enter the PubMed ID in the field below and click 'Add'.
Search for "ZORRO" in the SEQanswers forum / BioStar or:
Web Search | Wiki Sites | Scientific |
---|---|---|