Changes

Jump to: navigation, search

ABySS

6,136 bytes added, 17:19, 3 December 2009
no edit summary
|created at=British Columbia Cancer Agency
|input format=FASTA, FASTQ,
|licence=Free to academics,
}}
LinkHere is the README:  ABySS - assemble short reads into contigs * Compiling ABySS Compiling ABySS should be as easy as./configure && make To install ABySS in a specified directory./configure --prefix=/opt/ABySS && make && sudo make install If you wish to build the parallel assembler with MPI support,MPI should be found in /usr/include and /usr/lib or its locationspecified to configure:./configure --with-mpi=/usr/lib/openmpi && make ABySS should be built using Google sparsehash to reduce memory usage,although it will build without. Google sparsehash should be found in/usr/include or its location specified to configure:./configure CPPFLAGS=-I/usr/local/include The default maximum k-mer size is 64 and may be decreased to reducememory usage or increased at compile time:./configure --enable-maxk=96 && make To run ABySS, its binaries should be found in your PATH. * Single-end assembly Assemble short reads in a file named reads.fa into contigs in afile named contigs.fa with the following command: ABYSS -k25 reads.fa -o contigs.fa where -k is an appropriate k-mer length. The only method to find theoptimal value of k is to run multiple trials and inspect the results.The following shell snippet will assemble for every value of k from 20to 40. for k in {20..40}; do ABYSS -k$k reads.fa -o contigs-k$k.fadone The maximum value for k is 64. This limit may be changed at compiletime using the --enable-maxk option of configure. It may be decreasedto 32 to decrease memory usage, which is particularly useful for largeparallel jobs, or increased to 96. * Paired-end assembly To assemble paired short reads in a file named reads.fa into contigsin a file named paired-contigs.fa, run the command: abyss-pe k=25 n=10 in='reads1.fa reads2.fa' name=ecoli where k is the k-mer length as before.n is the minimum number of pairs needed to consider joining twocontigs. The optimal value for n must be found by trial.in specifies the input files to read, which may be in FASTA, FASTQ,qseq or export format and compressed with gz, bz2 or xz.The assembled contigs will be stored in ${name}-contigs.fa. The suffix of the read identifier for a pair of reads must be one of'1' and '2', or 'A' and 'B', or 'F' and 'R', or 'F3' and 'R3', or'forward' and 'reverse'. The reads may be interleaved in the same fileor found in different files; however, interleaved mates will use lessmemory. abyss-pe is a driver script implemented as a Makefile and runs asingle-end assembly, as described above, and the following commands,which must be found in your PATH: ABYSS - the single-end assemblerAdjList - finds overlaps of length k-1 between contigsKAligner - aligns reads to contigsParseAligns - finds pairs of reads in alignmentsDistanceEst - estimates distances between contigsOverlap - find overlaps between blunt contigsSimpleGraph - finds paths between pairs of contigsMergePaths - merges consistent pathsConsensus - for a colour-space assembly, convert the colour-space contigs to nucleotide contigs * Paired-end assembly of multiple fragment libraries The distribution of fragment sizes of each library is calculatedempirically by aligning paired reads to the contigs produced by thesingle-end assembler, and the distribution is stored in a file withthe extension .hist, such as ecoli-4.hist. The N50 of the single-endassembly must be well over the fragment-size to obtain an accurateempirical distribution. Here's an example scenario of assembling a data set with two differentfragment libraries and single-end reads: Library lib1 has reads in two files, lib1_1.fa and lib1_2.fa.Library lib2 has reads in two files, lib2_1.fa and lib2_2.fa.Single-end reads are stored in two files se1.fa and se2.fa. The command line to assemble this example data set is...abyss-pe -j2 k=25 n=10 name=ecoli lib='lib1 lib2' \ lib1='lib1_1.fa lib1_2.fa' lib2='lib2_1.fa lib2_2.fa' \ se='se1.fa se2.fa' The paired-end assembly of lib1 and lib2 may be run in parallel byspecifying the -j option of make to abyss-pe, which is implemented asa Makefile script. The -j option should be set to the number oflibraries, but setting it higher will not cause any trouble. The empirical distribution of fragment sizes will be stored in twofiles named lib1-3.hist and lib2-3.hist. These files may be plotted tocheck that the empirical distribution agrees with the expecteddistribution. The assembled contigs will be stored in${name}-contigs.fa. Reads without mates should be placed in a file specified by the `se'(single-end) parameter. Reads without mates in the paired-end fileswill slow down the paired-end assembler considerably during theParseAligns stage. * Parallel assembly The `np' option of abyss-pe specifies the number of processes touse for the ABYSS-P parallel MPI job. Without any MPI configuration,this will allow you to make use of multiple cores on a single machine.To use multiple machines for assembly, you must create a hostfile formpirun, which is describe in the mpirun man page. The paired-end assembly runs on a single processor. For very largejobs, a good portion of the paired-end assembly (KAligner,ParseAligns, DistanceEst) may be run in parallel separate processes,but this process is not automated by the driver script abyss-pe. Open MPI integrates well with SGE (Sun Grid Engine). For example, tosubmit an array of jobs to assemble every odd value of k between 51and 63 using 64 processes for each job: qsub -pe openmpi 64 -t 51-63:2 -N testing abyss-pe in=reads.fa n=10 For more information on using SGE and qsub, please refer to the qsubmanual page. Open MPI must have been compiled with support for SGEusing the ./configure --with-sge option. * See also Try `abyss --help' for more information on command line options, orsee the manual page in the file `ABYSS.1'.Please refer to the mpirun manual page for information on configuringparallel jobs. Written by Jared Simpson and Shaun Jackman.Subscribe to the users' mailing list athttp://www.bcgsc.ca/platformmailman/bioinfo/softwarelistinfo/abyss-usersContact the users' mailing list at <abyss-users@bcgsc.ca>or the authors directly at <abyss@bcgsc.ca>.

Navigation menu