Open main menu

SEQwiki β

Changes

MUMmer

9,435 bytes added, 13:18, 18 January 2010
Adding a 'Usage Notes' section
|created by=Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL
|created at=The Institute for Genomic Research, Maryland
|maintained=Yes
|input format=FASTA,
|os=POSIX
MUMmer is released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools.
MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for a DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences. The original MUMmer system, version 1.0, is described in our 1999 Nucleic Acids Research paper. Version 2.1 appeared a few years later and is described in our 2002 Nucleic Acids Research paper, while MUMmer 3.0 was recently described in our 2004 Genome Biology paper.   == USAGE NOTES == === nucmer ===<pre>nucmer -h   USAGE: nucmer [options] <Reference> <Query>  DESCRIPTION: nucmer generates nucleotide alignments between two mutli-FASTA input files. Two output files are generated. The .cluster output file lists clusters of matches between each sequence. The .delta file lists the distance between insertions and deletions that produce maximal scoring alignments between each sequence.  MANDATORY: Reference Set the input reference multi-FASTA filename Query Set the input query multi-FASTA filename  OPTIONS: --mum Use anchor matches that are unique in both the reference and query --mumcand Same as --mumreference --mumreference Use anchor matches that are unique in in the reference but not necessarily unique in the query (default behavior) --maxmatch Use all anchor matches regardless of their uniqueness  -b|breaklen Set the distance an alignment extension will attempt to extend poor scoring regions before giving up (default 200) -c|mincluster Sets the minimum length of a cluster of matches (default 65) --[no]delta Toggle the creation of the delta file (default --delta) --depend Print the dependency information and exit -d|diagfactor Set the clustering diagonal difference separation factor (default 0.12) --[no]extend Toggle the cluster extension step (default --extend) -f --forward Use only the forward strand of the Query sequences -g|maxgap Set the maximum gap between two adjacent matches in a cluster (default 90) -h --help Display help information and exit -l|minmatch Set the minimum length of a single match (default 20) -o --coords Automatically generate the original NUCmer1.1 coords output file using the 'show-coords' program --[no]optimize Toggle alignment score optimization, i.e. if an alignment extension reaches the end of a sequence, it will backtrack to optimize the alignment score instead of terminating the alignment at the end of the sequence (default --optimize) -p|prefix Set the prefix of the output files (default "out") -r --reverse Use only the reverse complement of the Query sequences --[no]simplify Simplify alignments by removing shadowed clusters. Turn this option off if aligning a sequence to itself to look for repeats (default --simplify) -V --version Display the version information and exit</pre> === delta-filter ===Filters a delta alignment file produced by either nucmer or promer, leaving only the desired alignments which are output to STDOUT in the same delta format as the input. Its primary function is the LIS algorithm which calculates the longest increasing subset of alignments. This allows for the calculation of a global set of alignments i.e. 1-to-1 and mutually consistent order) with the -g option or locally consistent with -1 or -m. Reference sequences can be mapped to query sequences with -r, or queries to references with -q.  This allows the user to exclude chance and repeat induced alignments, leaving only the "best" alignments between the two data sets. Filtering can also be performed on length, identity, and uniqueness. <pre>USAGE: delta-filter [options] <deltafile>  [options] type 'delta-filter -h' for a list of options. <deltafile> the .delta output file from either nucmer or promer.  OUTPUT: stdout The same delta alignment format as output by nucmer and promer.  NOTES: For more most cases the -m option is recommended, however -1 is useful for applications that require a 1-to-1 mapping, such as SNP finding. Use the -q option for mapping query contigs to their best reference location. -1 1-to-1 alignment allowing for rearrangements (intersection of -r and -q alignments)-g 1-to-1 global alignment not allowing rearrangements-h Display help information regarding -i float Set the minimum alignment identity [0, 100], default 0-l int Set the minimum alignment length, default 0-m Many-to-many alignment allowing for rearrangements (union of -r and -q alignments)-q Maps each position of each query to its best hit in the MUMmer packagereference, please refer allowing for reference overlaps-r Maps each position of each reference to its best hit in the query, allowing for query overlaps-u float Set the minimum alignment uniqueness, i.e. percent of thealignment matching to unique reference AND query sequence [0, 100], default 0-o float Set the maximum alignment overlap for -r and -q options as a percent of the alignment length [0, 100], default 100</pre> === show-aligns === <pre>USAGE: show-aligns [options] <deltafile> <ref ID> <qry ID> -h Display help information-q Sort alignments by the query start coordinate-r Sort alignments by the reference start coordinate-w int Set the screen width - default is 60-x int Set the matrix type - default is 2 (BLOSUM 62), other options include 1 (BLOSUM 45) and 3 (BLOSUM 80) note: only has effect on amino acid alignments</pre> Input is the .delta output of either the "nucmer" or the "promer" program passed on the command line. Output is to STDOUT, and consists of all the alignments between the query and reference sequences identified on the command line. NOTE: No sorting is done by default, therefore the alignmentswill be ordered as found in the <deltafile> input. === mummerplot === <pre>mummerplot -h   USAGE: mummerplot [options] <match file>  DESCRIPTION: mummerplot generates plots of alignment data produced by mummer, nucmer, promer or show-tiling by using the GNU gnuplot utility. After generating the appropriate scripts and datafiles, mummerplot will attempt to run gnuplot to generate the plot. If this attempt fails, a warning will be output and the resulting .gp and .[frh]plot files will remain so that the user may run gnuplot independently. If the attempt succeeds, either an x11 window will be spawned or an additional output file will be generated (.ps or .png depending on the selected terminal). Feel free to edit the resulting gnuplot script (.gp) and rerun gnuplot to change line thinkness, labels, colors, plot size etc.  MANDATORY: match file Set the alignment input to 'match file' Valid inputs are from mummer, nucmer, promer and show-tiling (.out, .cluster, .delta and .tiling)  OPTIONS: -b|breaklen Highlight alignments with breakpoints further than breaklen nucleotides from the nearest sequence end --[no]color Color plot lines with a percent similarity gradient or turn off all plot color (default color by match dir) If the plot is very sparse, edit the .gp script to plot with 'linespoints' instead of 'lines' -c --[no]coverage Generate a reference coverage plot (default for .tiling) --depend Print the dependency information and exit -f --filter Only display .delta alignments which represent the "best" hit to any particular spot on either sequence, i.e. a one-to-one mapping of reference and query subsequences -h --help Display help information and exit -l --layout Layout a .delta multiplot in an intelligible fashion, this option requires the -R -Q options --fat Layout sequences using fattest alignment only -p|prefix Set the prefix of the output files (default 'out') -rv Reverse video for x11 plots -r|IdR Plot a particular reference sequence ID on the X-axis -q|IdQ Plot a particular query sequence ID on the Y-axis -R|Rfile Plot an ordered set of reference sequences from Rfile -Q|Qfile Plot an ordered set of query sequences from Qfile Rfile/Qfile Can either be the original DNA multi-FastA files or lists of sequence IDs, lens and dirs [ /+/-] -r|rport Specify the port to send reference ID and position on mouse double click in X11 plot window -q|qport Specify the port to send query IDs and position on mouse double click in X11 plot window -s|size Set the output size to small, medium or large --small --medium --large (default 'small') -S --SNP Highlight SNP locations in each alignment -t|terminal Set the output terminal to x11, postscript or png --x11 --postscript --png (default 'x11') -t|title Specify the gnuplot plot title (default none) -x|xrange Set the xrange for the plot '[min:max]' -y|yrange Set the yrange for the plot '[min:max]' -V --version Display the version information and exit</pre>