Open main menu

SEQwiki β

Changes

BWA

2,728 bytes added, 17:10, 25 March 2016
no edit summary
{{Bioinformatics application
|sw summary=Fast, accurate, memory economic read -efficient alignerfor short and long sequencing reads|bio domain=Mapping|bio method=MappingRead mapping,|bio tech=Sanger, Illumina, 454, FM-indexABI SOLiD|interface=Command line, Colorspace|created by=Heng Li and Richard Durbin|created at=Sanger Institute|maintained=Yes|input format=FASTQ, FASTA
|output format=SAM
|sw feature=Gapped alignment, paired-end mapping
|language=C,
|licence=GPLv3, MIT,
|os=UNIX,
}}
BWA (Burrows-Wheeler Aligner) is an aligner using the Burrows-Wheeler transform to index the reference genome, which decreases memory usage compared to aligners using k-mer hashing.
 
BWA includes two read alignment algorithms, the first is usually meant when simply the "BWA algorithm" is mentioned. It is callable via the command <tt>bwa align</tt>. The second algorithm is "BWA-SW", it can be called via the command <tt>bwa bwasw</tt>. That tool is described in its own article [[BWA-SW]].
 
Both algorithms use the same index on disk, which can be created with <tt>bwa index</tt>.
 
 
= Implementation notes =
 
The following notes were obtained by inspecting the source code.
 
== Ambiguous bases in reference sequences ==
 
According to the BWA paper, "Non-A/C/G/T bases on the reference genome are converted to random nucleotides."
 
BWA uses a '''fixed seed''' for the random number generator. This means that running <tt>bwa index</tt> twice on the same FASTA file will result in the same index. (That seed is set to the value 11 in bntseq.c.)
 
== The "XT:A" tag ==
 
The value "N" stands for <tt>BWA_TYPE_NO_MATCH</tt> (bwtaln.h).
If the number of ambiguous bases in the reference (which is stored in the "XN:i" tag) is greater than 10, this tag is also set to "N".
 
== The NM and CM tags ==
 
If "-c" was given on the command line, CM is written and NM otherwise (the tags are mutually exclusive).
 
== The XC:i tag ==
 
The XC:i tag is output when the clipped length of a read is less than the full read length.
 
== The XO and XG tags ==
 
The documentation says that XG is the number gap extensions, but
the source code seems to indicate that XG is the total no. of gaps (open+extend) (bwase.c):
printf("\tXM:i:%d\tXO:i:%d\tXG:i:%d", p->n_mm, p->n_gapo, p->n_gapo+p->n_gape);
 
== Regular index vs. color space index ==
 
A color space index (created with the -c option to bwa index) and a regular index '''cannot coexist''' in the same directory unless different prefixes are chosen (with the -p option).
 
== Length of contig names ==
 
Contig names must not be longer than 1024 characters. If a name is longer, there is no error message, but mapping still does not work.
 
== Unapplied Patches ==
 
* [http://sourceforge.net/mailarchive/message.php?msg_id=26175347 Solid paired-end patch] plus [http://sourceforge.net/mailarchive/message.php?msg_id=27685167 its correction]
{{Links}}
{{References}}
{{Link box}}