ATAC

From SEQwiki
Jump to: navigation, search

Application data

Created by Walenz B, Florea L, Mobarry C, Sutton G
Principal bioinformatics method(s) Sequence assembly validation, Sequence alignment
Created at Celera
Maintained? Maybe
Input format(s) FASTA
Output format(s) Custom
Operating system(s) Linux
Contact: brianwalenz@users.sf.net

Summary: ATAC is a computational process for comparative mapping between two genome assemblies, or between two different genomes.

"Error: no local variable "counter" was set." is not a number.

Description

Assembly To Assembly Comparison (ATAC), sometimes referred to as 'A2Amapper', is a fast way to compare whole genome assemblies or whole genomes.

NOTE: ATAC is currently available as part of the k-mer package

Method

Taken from the paper:

A2Amapper is based on the identification of seed alignments, in this case unique exact matches, followed by a more aggressive local alignment phase between seeds within nonoverlapping chains of seeds. Cutoffs were carefully tuned to balance sensitivity (finding all correlations), specificity (finding only the true ones), and computational requirements (see Data Set 1). Details about A2Amapper will be presented elsewhere (H.S., J.R.M., C.M.M., M.J.F., S.Y., and G.G.S., unpublished work; R.L., X. Zhao, L.F., C.M.M., and S.I., unpublished work). A2Amapper produces a set of one-to-one matches that are alignments of nearly identical pairs of segments imputed to be analogous up to polymorphisms. Each match aligns a segment of the target genome against a segment of NCBI-34. The segments are nonoverlapping by construction, and we consider the coverage of NCBI-34 to be the sum of the lengths of these segments. This set of matches is the basis for further analysis regarding correctness of order and orientation for which we develop three concepts: runs, heaviest common subsequence, and clumps. One match is consistent with another if in each assembly the segments of the matches are in the same relative order and orientation with no intervening matches between them. A run is a maximal chain of consistent matches. The heaviest common subsequence between two genomes is a subset of the matches for which the sum of the lengths of the matches is maximal and removing all other matches from consideration leaves a single run. Intuitively, the heaviest common subsequence is a global measure of the largest subset of the two assemblies that agree with each other. A clump is a run of 50 kbp or more that can be obtained by eliminating out-of-order matches, giving a local equivalent of the heaviest common subsequence (Supporting Text 1).


Usage

  • Download from SF [1]
    • Actually, there isn't a download as yet, so you need to check out from SVN:
svn co https://kmer.svn.sourceforge.net/svnroot/kmer-code
  • Build... For instructions, see [2]
    • You will need the python devel libraries to provide include/Pyhton.h, and you can probably just use make where it asks for gmake.



Links


References

  1. . 2004. PNAS


To add a reference for ATAC, enter the PubMed ID in the field below and click 'Add'.

 


Search for "ATAC" in the SEQanswers forum / BioStar or:

Web Search Wiki Sites Scientific