22035330

From SEQwiki
Revision as of 02:54, 10 November 2011 by Dnewkirk (talk | contribs) (Created page with "{{Reference |reference describes=AREM |pmid=22035330 |title=AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization |year=2011 |journal=Journal of Computation...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This reference describes AREM.

PMID PMID 22035330
Title AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization
Year 2011
Journal Journal of Computational Biology
Author Newkirk D, Biesinger J, Chon A, Yokomori K, Xie X.
Volume 18
Start page 1-12"-12" can not be assigned to a declared number type with value 1.


Error: No contents found at URL http://www.ebi.ac.uk/europepmc/webservices/rest/MED/22035330/citations/4000.

According to Europe PubMed Central, this reference has Error: no local variable "citations" was set. " Error: no local variable "citations" was set. " is not a number. citations.

For reference, you can check Google Scholar, which lacks an API because Google ...


Error: Invalid JSON. According to Almetric, this reference has an Altmetric score of Error: no local variable "altscore" was set. " Error: no local variable "altscore" was set. " is not a number..

Full text description

High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here, we introduce a probabilistic approach for ChIP-Seq data analysis that utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem.