Open main menu

SEQwiki β

Changes

BWA

750 bytes added, 17:10, 25 March 2016
no edit summary
{{Bioinformatics application
|sw summary=Fast, accurate, memory-efficient aligner for short and long sequencing reads
|bio domain=Mapping, Read alignment|bio method=FM-IndexRead mapping,
|bio tech=Sanger, Illumina, 454, ABI SOLiD
|interface=Command line,
|created by=Heng Li and Richard Durbin
|created at=Sanger Institute
|maintained=Yes
|input format=compressed/uncompressed fastq/fastaFASTQ, FASTA
|output format=SAM
|sw feature=Gapped alignment, paired-end mapping
|language=C,
|licence=GPLv3, MIT,|os=UNIX,
}}
BWA (Burrows-Wheeler Aligner) is an aligner using the Burrows-Wheeler transform to index the reference genome, which decreases memory usage compared to aligners using k-mer hashing. BWA includes two read alignment algorithms, the first is usually meant when simply the "BWA algorithm" is mentioned. It is callable via the command <tt>bwa align</tt>. The second algorithm is "BWA-SW", it can be called via the command <tt>bwa bwasw</tt>. That tool is described in its own article [[BWA-SW]]. Both algorithms use the same index on disk, which can be created with <tt>bwa index</tt>.  = Notes Implementation notes The following notes were obtained by inspecting the source code.
== Ambiguous bases in reference sequences ==
According to the BWA paper, "Non-A/C/G/T bases on the reference genome are converted to random nucleotides."
BWA uses a '''fixed seed''' for the random number generator. This means that running <tt>bwa index</tt> twice on the same FASTA file will result in the same index. (The That seed is set to the value 11 in bntseq.c.)
== The "XT:A " tag ==
The value "N " stands for for <tt>BWA_TYPE_NO_MATCH</tt> (bwtaln.h).If the number of ambiguous bases in the reference (which is stored in the "XN:i " tag) is greater than 10, this tag is also set to "N".
== The NM, and CM tags ==
If "-c" was given on the command line, CM is written and NM otherwise (the tags are mutually exclusive).
== The XC:i tag ==
The XC:i tag is output when the clipped length of a read is less than the full read length.
== The XO, and XG tags ==
Documentation The documentation says that XG is the number gap extensions., butThe the source code seems to indicate that XG is the total no. of gaps (open+extend) (bwase.c):
printf("\tXM:i:%d\tXO:i:%d\tXG:i:%d", p->n_mm, p->n_gapo, p->n_gapo+p->n_gape);
== Regular index vs. color space index ==
A color space index (created with the -c option to bwa index) and a regular index '''cannot coexist ''' in the same directory unless different prefixes are chosen (with the -p option).
== Length of contig names ==
Contig names must not be longer than 1024 characters. If a name is longer, there is no error message, but it mapping still does not work.
== Unapplied Patches ==
* [http://sourceforge.net/mailarchive/message.php?msg_id=26175347|Solid paired-end patch] plus [http://sourceforge.net/mailarchive/message.php?msg_id=27685167|its correction]
{{Links}}
{{References}}
{{Link box}}