You do not have permission to edit this page, for the following reason:
The action you have requested is limited to users in the group: Users.
Short description:
Please summarise the application in a few sentences. Avoid links here. AREM: Aligning Short Reads from ChIP-sequencing by Expectation Maximisation
Software version:
Biological application domain(s) (Phylogenetics, Genomics, ...):
ChIP-seq,
Principal bioinformatics method(s) (Assembly, Mapping, ...):
Peak calling, Read mapping,
Technology (Sanger, Illumina, 454, SOLiD, Ion Torrent, ...):
Illumina
Interface (Command line, Web UI, Desktop GUI, SOAP WS, HTTP WS, API, QL):
Resource type (Command-line tool, Web application, Desktop application, Script, Suite, Workbench, Database portal, Workflow, Plug-in, Library, Web API, Web service, SPARQL endpoint):
University of California Irvine,
== Description == High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here, we introduce a probabilistic approach for ChIP-seq data analysis that utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. AREM can be used both as a peak caller and for probabilistic determination of the most likely mapping locations for sequences; it utilizes the output from other mapping tools as input from which it determines the best mapping location(s). Both peaks and the sequence locations (with associated probability) can be returned as output, along with plots showing the shift in entropy during the EM process. This program was designed with low memory usage in mind, such that one can fit 120+ million alignments in memory while using less than 6 Gb of memory (assuming one is including the optional plots, which require extra data to be stored in memory). Run time increases with the number of sequences involved, but requires less than 30 minutes on average with 120+ million reads.
Once you save the form, you will have the chance to add links and references.
Summary of edit
This is a minor edit Watch this page
Cancel