Gk arrays

From SEQwiki
Revision as of 12:58, 7 October 2011 by Rivals (talk | contribs) (Created page with "{{Bioinformatics application |sw summary=Gk-arrays are a data structure to index the k-mers in a collection of reads (short sequences provided by Next Generation Sequencing machi...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Application data

Biological application domain(s) Genomics, transcriptomics, Assembly, ...
Principal bioinformatics method(s) Assembly, Error correction, Mapping
Technology All
Maintained? Yes
Input format(s) FASTA, Fastq, Multi-FastA
Output format(s) none
Software features programming library
Programming language(s) C++
Licence CeCILL-C license
Operating system(s) Linux, Linux 64, Mac OS X, any

Summary: Gk-arrays are a data structure to index the k-mers in a collection of reads (short sequences provided by Next Generation Sequencing machines). It is distributed as a a C++ library to build and query Gk-arrays. This code is freely available under a CeCILL-C license.

"Error: no local variable "counter" was set." is not a number.

Description

 Gk-arrays are a data structure to index the k-mers in a collection
 of reads (short sequences provided by Next Generation Sequencing
 machines). Similarly, the well known suffix tree is a data structure
 to index the (sub)words of a text. Given a parameter k, the
 Gk-arrays index all k-mers occurrences in the input collection of
 reads (given as a file with lot of short sequences, some of them can
 be identical, but have different names). Once the index is built and
 kept in memory, the program can asks queries to the index. An
 example of query type, among 7 types: Given a position in the
 collection, compute the number of reads that share this
 k-mer. Gk-arrays are designed to answer multiple, even numerous
 (thousands, millions of) queries. Compared to other possible
 solutions, like hash tables, Gk-arrays are efficient in terms of
 memory and both of construction and query time.
 Reads are produced by sequencing machines in the context of genomics
 projects (for genome, transcriptome, epigenome, meta-genome or
 meta-transcriptome sequencing). Millions of reads can be in a single
 run, and this represents a novel challenge for indexing algorithms.
 Gk-arrays accept files where all reads have the same length, but
 also files with reads of varying length. The files can
 be in FASTA/FASTQ format.
 This page provides access to the code of a C++ library to build and
 use Gk-arrays. This code is freely available under a CeCILL-C
 license.


Links


References

  1. . 2011. BMC Bioinformatics


To add a reference for Gk arrays, enter the PubMed ID in the field below and click 'Add'.

 


Search for "Gk arrays" in the SEQanswers forum / BioStar or:

Web Search Wiki Sites Scientific