Difference between revisions of "How-to/short read aligners"
m |
|||
Line 3: | Line 3: | ||
Short read mapping refers to the process of alignment of sequencing reads (a.k.a. reads) onto the reference sequence. The reference sequence is often pre-processed into an indexed form for rapid searching. | Short read mapping refers to the process of alignment of sequencing reads (a.k.a. reads) onto the reference sequence. The reference sequence is often pre-processed into an indexed form for rapid searching. | ||
For a technical overview of mapping algorithm, please see reference ([http://www.nature.com/nmeth/journal/v7/n6/full/nmeth0610-479b.html 1]). An updated (as of April 2011) evaluation of short read aligners is available at ([http://www.nature.com/jhg/journal/v56/n6/pdf/jhg201143a.pdf 2]) and one (from August 2011) also targeting SNP discovery ([http://www.nature.com/srep/2011/110805/srep00055/full/srep00055.html 3]) | For a technical overview of mapping algorithm, please see reference ([http://www.nature.com/nmeth/journal/v7/n6/full/nmeth0610-479b.html 1]). An updated (as of April 2011) evaluation of short read aligners is available at ([http://www.nature.com/jhg/journal/v56/n6/pdf/jhg201143a.pdf 2]) and one (from August 2011) also targeting SNP discovery ([http://www.nature.com/srep/2011/110805/srep00055/full/srep00055.html 3]) | ||
+ | In addition here are some updated ROC curves including bowtie2 [http://lh3lh3.users.sourceforge.net/alnROC.shtml here] | ||
=Decision Helper= | =Decision Helper= |
Revision as of 20:51, 2 November 2011
Contents
Short read mapping
Short read mapping refers to the process of alignment of sequencing reads (a.k.a. reads) onto the reference sequence. The reference sequence is often pre-processed into an indexed form for rapid searching. For a technical overview of mapping algorithm, please see reference (1). An updated (as of April 2011) evaluation of short read aligners is available at (2) and one (from August 2011) also targeting SNP discovery (3) In addition here are some updated ROC curves including bowtie2 here
Decision Helper
This is based on personal experience and prevalence and based on literature data on the perfromance but only meant to give you a quick primer.
- Genome data
Software Packages
Free Software
BWA
Compatible with illumina, SOLiD and 454 data
- Pros
- The SAM/BAM output adhere to SAM format, contains mapped and unmapped data, easy to parse
- Cons
Bowtie
Compatible with illumina and SOLiD data. Bowtie is discussed in the forum.
- Pros
- Fast
- Cons
- No mapping quality reported
- Not as sensitive as Stampy and Novoalign
Stampy
Compatible with illumina data
- Pros
- Balance of speed and sensitivity
- Cons
- Can be slow even using BWA as premapper
- Pros
- Higher sensitivity than BWA
- One step mapping, Indexing of genome is not needed
- Alignment can take less time than BWA is the reference sequence is short, e.g. mapping of reads against a targeted region
- Cons
- Alignment speed is slow IF mapping is done onto a large genome
TMAP
Aligner specifically tuned for Ion Torrent PGM data
- Pros
- Uses a selection of algorithms to balance speed and sensitivity
- Cons
Commercial Software
CLC workstation
- Pro
- GUI, easy to use
- Cons
- Expensive
- Alignment is spurious based on our dataset
- Alignment speed is NOT impressive at all compared to BWA or Bowtie (i7 860 + 16GB memory, windows 2008 R2-64bit)
Further Reading Material and References
- Original Publications
- Li and Durbin, 2009 BWA
- Langmead et al., 2009 bowtie
- Lunter and Goodson, 2010 Stampy
- Comparisons
Appendix
A list of short read aligners
Illumina | BWA | SHRiMP2 | Bowtie | Stampy | Novoalign |
454 | GSMapper | SSAHA2 | BLAT | Mosaik | BWA-SW |
SOLiD | Bfast | BWA | NovoalignCS |
(1) Brief review of alignment aglorithm - Alignment section
(2) Evaluation of next-generation sequencing software in mapping and assembly