Gk arrays

From SEQwiki
Jump to: navigation, search

Application data

Created by Philippe N, Salson M, Lecroq T, Léonard M, Commes T, Rivals E.
Biological application domain(s) Genomics, Transcriptomics, Metagenomics
Principal bioinformatics method(s) Sequence assembly, Sequence error correction, Read mapping
Technology Illumina, 454, Sanger
Created at LIRMM, UMR 5506, CNRS and Université de Montpellier 2, CC 477, 161 rue Ada, 34095 Montpellier, France. rivals@lirmm.fr.

LITIS EA 4108, Université de Rouen, 1 rue Thomas Becket, 76821 Mont-Saint-Aignan Cedex, France LIFL, UMR 8022 CNRS and Université Lille 1 and INRIA Lille-Nord-Europe, Bât. M3 - UFR IEEA, 59655 Villeneuve d'Ascq Cedex, France

Maintained? Yes
Input format(s) FASTA, Fastq, Multi-FastA
Software features programming library
Programming language(s) C++
Licence CeCILL-C license
Operating system(s) Linux, Linux 64, Mac OS X, any

Summary: Gk-arrays are a data structure to index the k-mers in a collection of reads.

"Error: no local variable "counter" was set." is not a number.

Description

Gk-arrays are a data structure to index the k-mers in a collection of reads (short sequences provided by Next Generation Sequencing machines). Similarly, the well known suffix tree is a data structure to index the (sub)words of a text. Given a parameter k, the Gk-arrays index all k-mers occurrences in the input collection of reads (given as a file with lot of short sequences, some of them can be identical, but have different names). Once the index is built and kept in memory, the program can asks queries to the index. An example of query type, among 7 types: Given a position in the collection, compute the number of reads that share this k-mer. Gk-arrays are designed to answer multiple, even numerous (thousands, millions of) queries. Compared to other possible solutions, like hash tables, Gk-arrays are efficient in terms of memory and both of construction and query time. Reads are produced by sequencing machines in the context of genomics projects (for genome, transcriptome, epigenome, meta-genome or meta-transcriptome sequencing). Millions of reads can be in a single run, and this represents a novel challenge for indexing algorithms.

Gk-arrays accept files where all reads have the same length, but also files with reads of varying length. The files can be in FASTA/FASTQ format.

This page provides access to the code of a C++ library to build and use Gk-arrays. This code is freely available under a CeCILL-C license.


Links


References

  1. . 2011. BMC Bioinformatics


To add a reference for Gk arrays, enter the PubMed ID in the field below and click 'Add'.

 


Search for "Gk arrays" in the SEQanswers forum / BioStar or:

Web Search Wiki Sites Scientific