Difference between revisions of "BWA"
Line 21: | Line 21: | ||
− | = | + | = Implementation notes == |
+ | |||
+ | The following notes were obtained by inspecting the source code. | ||
== Ambiguous bases in reference sequences === | == Ambiguous bases in reference sequences === |
Revision as of 11:15, 14 September 2011
Application data |
|
Created by | Heng Li and Richard Durbin |
---|---|
Biological application domain(s) | Mapping, Read alignment |
Principal bioinformatics method(s) | FM-Index |
Technology | Sanger, Illumina, 454, ABI SOLiD |
Created at | Sanger Institute |
Maintained? | Yes |
Input format(s) | FASTQ, FASTA |
Output format(s) | SAM |
Software features | Gapped alignment, paired-end mapping |
Programming language(s) | C |
Licence | GPLv3, MIT License |
Operating system(s) | UNIX |
Summary: Fast, accurate, memory-efficient aligner for short and long sequencing reads
"Error: no local variable "counter" was set." is not a number.
BWA (Burrows-Wheeler Aligner) is an aligner using the Burrows-Wheeler transform to index the reference genome, which decreases memory usage compared to aligners using k-mer hashing.
BWA includes two read alignment algorithms, the first is usually meant when simply the "BWA algorithm" is mentioned. It is callable via the command bwa align. The second algorithm is "BWA-SW", it can be called via the command bwa bwasw. That tool is described in its own article BWA-SW.
Both algorithms use the same index on disk, which can be created with bwa index.
Contents
Implementation notes =
The following notes were obtained by inspecting the source code.
Ambiguous bases in reference sequences =
According to the BWA paper, "Non-A/C/G/T bases on the reference genome are converted to random nucleotides."
BWA uses a fixed seed for the random number generator. This means that running bwa index twice on the same FASTA file will result in the same index. (That seed is set to the value 11 in bntseq.c.)
The "XT:A" tag
The value "N" stands for BWA_TYPE_NO_MATCH (bwtaln.h). If the number of ambiguous bases in the reference (which is stored in the "XN:i" tag) is greater than 10, this tag is also set to "N".
The NM and CM tags
If "-c" was given on the command line, CM is written and NM otherwise (the tags are mutually exclusive).
The XC:i tag
The XC:i tag is output when the clipped length of a read is less than the full read length.
The XO and XG tags
The documentation says that XG is the number gap extensions, but the source code seems to indicate that XG is the total no. of gaps (open+extend) (bwase.c):
printf("\tXM:i:%d\tXO:i:%d\tXG:i:%d", p->n_mm, p->n_gapo, p->n_gapo+p->n_gape);
Regular index vs. color space index
A color space index (created with the -c option to bwa index) and a regular index cannot coexist in the same directory unless different prefixes are chosen (with the -p option).
Length of contig names
Contig names must not be longer than 1024 characters. If a name is longer, there is no error message, but mapping still does not work.
Unapplied Patches
Links
- BWA Related [ edit link ]
- BWA Homepage [ edit link ]
References
To add a reference for BWA, enter the PubMed ID in the field below and click 'Add'.
Search for "BWA" in the SEQanswers forum / BioStar or:
Web Search | Wiki Sites | Scientific |
---|---|---|