Latest revision as of 17:10, 25 March 2016

Application data
Created by	Heng Li and Richard Durbin
Biological application domain(s)	Mapping
Principal bioinformatics method(s)	Read mapping
Technology	Sanger, Illumina, 454, ABI SOLiD
Created at	Sanger Institute
Maintained?	Yes
Input format(s)	FASTQ, FASTA
Output format(s)	SAM
Software features	Gapped alignment, paired-end mapping
Programming language(s)	C
Interface type(s)	Command line
Licence	GPLv3, MIT
Operating system(s)	UNIX

Summary: Fast, accurate, memory-efficient aligner for short and long sequencing reads

"Error: no local variable "counter" was set." is not a number.

BWA (Burrows-Wheeler Aligner) is an aligner using the Burrows-Wheeler transform to index the reference genome, which decreases memory usage compared to aligners using k-mer hashing.

BWA includes two read alignment algorithms, the first is usually meant when simply the "BWA algorithm" is mentioned. It is callable via the command bwa align. The second algorithm is "BWA-SW", it can be called via the command bwa bwasw. That tool is described in its own article BWA-SW.

Both algorithms use the same index on disk, which can be created with bwa index.

Implementation notes

The following notes were obtained by inspecting the source code.

Ambiguous bases in reference sequences

According to the BWA paper, "Non-A/C/G/T bases on the reference genome are converted to random nucleotides."

BWA uses a fixed seed for the random number generator. This means that running bwa index twice on the same FASTA file will result in the same index. (That seed is set to the value 11 in bntseq.c.)

The "XT:A" tag

The value "N" stands for BWA_TYPE_NO_MATCH (bwtaln.h). If the number of ambiguous bases in the reference (which is stored in the "XN:i" tag) is greater than 10, this tag is also set to "N".

The NM and CM tags

If "-c" was given on the command line, CM is written and NM otherwise (the tags are mutually exclusive).

The XC:i tag

The XC:i tag is output when the clipped length of a read is less than the full read length.

The XO and XG tags

The documentation says that XG is the number gap extensions, but the source code seems to indicate that XG is the total no. of gaps (open+extend) (bwase.c):

 printf("\tXM:i:%d\tXO:i:%d\tXG:i:%d", p->n_mm, p->n_gapo, p->n_gapo+p->n_gape);

Regular index vs. color space index

A color space index (created with the -c option to bwa index) and a regular index cannot coexist in the same directory unless different prefixes are chosen (with the -p option).

Length of contig names

Contig names must not be longer than 1024 characters. If a name is longer, there is no error message, but mapping still does not work.

Unapplied Patches

Solid paired-end patch plus its correction

Links

BWA Related [ edit link ]
BWA Homepage [ edit link ]

Add a Link

References

. 2009. Bioinformatics

To add a reference for BWA, enter the PubMed ID in the field below and click 'Add'.

[ edit box ]

Search for "BWA" in the SEQanswers forum / BioStar or:

Web Search	Wiki Sites	Scientific
Google. Bioinformatics journals Clusty	WikiPedia MetaBase BioPedia OpenWetWare bifx wiki	PubMed Europe PMC ScientificCommons.org

@@ Line 1: / Line 1: @@
 {{Bioinformatics application
 |sw summary=Fast, accurate, memory-efficient aligner for short and long sequencing reads
-|bio domain=Read alignment, Mapping
+|bio domain=Mapping
-|bio method=FM-Index
+|bio method=Read mapping,
 |bio tech=Sanger, Illumina, 454, ABI SOLiD
+|interface=Command line,
 |created by=Heng Li and Richard Durbin
 |created at=Sanger Institute
 |maintained=Yes
-|input format=compressed/uncompressed fastq/fasta
+|input format=FASTQ, FASTA
 |output format=SAM
 |sw feature=Gapped alignment, paired-end mapping
 |language=C,
-|licence=GPLv3, MIT
+|licence=GPLv3, MIT,
-|os=Unix
+|os=UNIX,
 }}
+BWA (Burrows-Wheeler Aligner) is an aligner using the Burrows-Wheeler transform to index the reference genome, which decreases memory usage compared to aligners using k-mer hashing.
-{{Links}}
+BWA includes two read alignment algorithms, the first is usually meant when simply the "BWA algorithm" is mentioned. It is callable via the command <tt>bwa align</tt>. The second algorithm is "BWA-SW", it can be called via the command <tt>bwa bwasw</tt>. That tool is described in its own article [[BWA-SW]].
-{{References}}
+Both algorithms use the same index on disk, which can be created with <tt>bwa index</tt>.
+= Implementation notes =
-= Notes =
+The following notes were obtained by inspecting the source code.
 == Ambiguous bases in reference sequences ==
@@ Line 24: / Line 30: @@
 According to the BWA paper, "Non-A/C/G/T bases on the reference genome are converted to random nucleotides."
-BWA uses a '''fixed seed''' for the random number generator. This means that running <tt>bwa index</tt> twice on the same FASTA file will result in the same index.
+BWA uses a '''fixed seed''' for the random number generator. This means that running <tt>bwa index</tt> twice on the same FASTA file will result in the same index. (That seed is set to the value 11 in bntseq.c.)
-(The seed is set to 11 in bntseq.c.)
+== The "XT:A" tag ==
-== XT:A tag ==
+The value "N" stands for <tt>BWA_TYPE_NO_MATCH</tt> (bwtaln.h).
+If the number of ambiguous bases in the reference (which is stored in the "XN:i" tag) is greater than 10, this tag is also set to "N".
-N stands for for <tt>BWA_TYPE_NO_MATCH</tt> (bwtaln.h).
+== The NM and CM tags ==
-If the number of ambiguous bases in the reference (XN:i tag) is greater than 10, this tag is also set to N.
-== NM, CM tags ==
 If "-c" was given on the command line, CM is written and NM otherwise (the tags are mutually exclusive).
-== XC:i tag ==
+== The XC:i tag ==
 The XC:i tag is output when the clipped length of a read is less than the full read length.
-== XO, XG tags ==
+== The XO and XG tags ==
-Documentation says that XG is the number gap extensions.
+The documentation says that XG is the number gap extensions, but
-The source code seems to indicate that XG is the total no. of gaps (open+extend) (bwase.c):
+the source code seems to indicate that XG is the total no. of gaps (open+extend) (bwase.c):
    printf("\tXM:i:%d\tXO:i:%d\tXG:i:%d", p->n_mm, p->n_gapo, p->n_gapo+p->n_gape);
 == Regular index vs. color space index ==
-A color space index (created with the -c option to bwa index) and a regular index cannot coexist in the same directory unless different prefixes are chosen (with the -p option).
+A color space index (created with the -c option to bwa index) and a regular index '''cannot coexist''' in the same directory unless different prefixes are chosen (with the -p option).
+== Length of contig names ==
+Contig names must not be longer than 1024 characters. If a name is longer, there is no error message, but mapping still does not work.
+== Unapplied Patches ==
+* [http://sourceforge.net/mailarchive/message.php?msg_id=26175347 Solid paired-end patch] plus [http://sourceforge.net/mailarchive/message.php?msg_id=27685167 its correction]
+{{Links}}
+{{References}}
 {{Link box}}

Difference between revisions of "BWA"

Latest revision as of 17:10, 25 March 2016

Contents

Implementation notes

Ambiguous bases in reference sequences

The "XT:A" tag

The NM and CM tags

The XC:i tag

The XO and XG tags

Regular index vs. color space index

Length of contig names

Unapplied Patches

Links

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

wiki navigation

Software

Tools