MuSICA 2

From SEQwiki
Jump to: navigation, search

Application data

Biological application domain(s) Clone verification
Principal bioinformatics method(s) Sequence assembly
Maintained? Maybe
Programming language(s) Perl

Summary: Assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ~800 human full-length cDNA clones.

"Error: no local variable "counter" was set." is not a number.

Sequencing full-length cDNA clones is the most robust way to annotate genes. Recently, with the advent of new-generation sequencing instruments such as Illumina Genome Analyzer (Illumina GA) and AppliedBiosystem SOLiD, genome-wide comprehensive approaches including RNA-sequencing succeeded in capturing both expressed regions and expression levels of genes without cloning individual mRNAs. However, RNA expression level varies widely at least four orders of magnitude, and thus it is impossible to robustly capture lowly expressed transcripts without cloning or other nomalization techniques. Moreover, overlapping transcripts (alternative splicing variants, alternative transcription start/end sites) are difficult to distinguish by RNA sequencing unless splicing junctions are previously known. MuSICA 2 aims at shotgun sequencing ~800 full-length cDNA clones using one flow-cell lane of Illumina GA. First, target full-length cDNA clones are amplified by PCR, and then they are mixed by equal volumes regardless of PCR efficiency. The mixture is nebulized to yield a shotgun library, which is then sequenced by Illumina GA using one flow-cell lane. Finally, MuSICA 2 assembles the shotgun reads and Sanger reads from both ends of the cDNA clones (recommended; at least either end is required to associate the output contigs with individual clones), and reconstructs their cDNA sequences with the help of the reference genome sequence. The key feature of MuSICA 2 is that it assembles shotgun reads in a hybrid way of de novo assembly and reference assembly in which shotgun reads are aligned against the reference genome. The hybrid strategy benefits both from the advantages of de novo assembly and reference assembly.