DiffBind

From SEQwiki
Jump to: navigation, search

Application data

Created by Rory Stark
Biological application domain(s) ChIP-seq
Principal bioinformatics method(s) differential binding sites
Technology Illumina, ABI SOLiD, 454, Any, Illumina HiSeq, Illumina Solexa
Created at Cancer Research UK - Cambridge Institute, University of Cambridge
Maintained? Yes
Input format(s) BAM, BED, MACS
Output format(s) CSV, BED, Differential sites/peaks, Peaks
Software features Multiple replicates information used; automated pipeline; finding hotspots;
Programming language(s) R
Software libraries Bioconductor
Licence Artistic-2.0
Operating system(s) Linux, Mac OS X, Windows
Contact: rory.stark@cruk.cam.ac.uk

Summary: Differential Binding Analysis of ChIP-seq peak data Compute differentially bound sites from multiple ChIP-seq experiments using affinity (quantitative) data. Also enables occupancy (overlap) analysis and plotting functions.

"Error: no local variable "counter" was set." is not a number.

Description

DiffBind is a Bioconductor package which provides functions for processing ChIP-seq data enriched for genomic loci where specific protein/DNA binding occurs, including peak sets identified by ChIP-seq peak callers and aligned sequence read datasets. It is designed to work with multiple peak sets simultaneously, representing different ChIP experiments (antibodies, transcription factor and/or histone marks, experimental conditions, replicates) as well as managing the results of multiple peak callers.

The primary emphasis of the package is on identifying sites that are differentially bound between two sample groups. It includes functions to support the processing of peak sets, including over- lapping and merging peak sets, counting sequencing reads overlapping intervals in peak sets, and identifying statistically significantly differentially bound sites based on evidence of binding affinity (measured by differences in read densities). To this end it uses statistical routines developed in an RNA-Seq context (primarily the Bioconductor packages edgeR and DESeq ). Additionally, the package builds on R graphics routines to provide a set of standardized plots to aid in binding analysis.

DiffBind works primarily with peaksets, which are sets of genomic intervals representing candidate protein binding sites. Each interval consists of a chromosome, a start and end position, and usually a score of some type indicating confidence in, or strength of, the peak. Associated with each peakset are metadata relating to the experiment from which the peakset was derived. Additionally, files containing mapped sequencing reads (BAM//BED) can be associated with each peakset (one for the ChIP data, and optionally another representing a control dataset).

Generally, processing data with DiffBind involves five phases:

  1. Reading in peaksets: The first step is to read in a set of peaksets and associated metadata. Peaksets are derived either from ChIP-seq peak callers, such as MACS (Zhang et al. [2008]), or using some other criterion (e.g. all the promoter regions in a genome). The easiest way to read in peaksets is using a comma-separated value (csv) sample sheet with one line for each peakset. A single experiment can have more than one associated peakset, e.g. if multiple peak callers are used for comparison purposes, and hence have more than one line in the sample sheet. Once the peaksets are read in, a merging function finds all overlapping peaks and derives a single set of unique genomic intervals covering all the supplied peaks.
  2. Occupancy analysis: Peaksets, especially those generated by peak callers, provide an insight into the potential occupancy of the protein being ChIPed for at specific genomic loci. After the peaksets have been loaded, it can be useful to perform some exploratory plotting to determine how these occupancy maps agree with each other, e.g. between experimental replicates (re-doing the ChIP under the same conditions), between different peak callers on the same experiment, and within groups of samples representing a common experimental condition. DiffBind provides functions to enable overlaps to be examined, as well as functions to determine how well similar samples cluster together. Beyond quality control, the product of an occupancy analysis may be a consensus peakset, representing an overall set of candidate binding sites to be used in further analysis.
  3. Counting reads: Once a consensus peakset has been derived, DiffBind can use the supplied sequence read files to count how many reads overlap each interval for each unique sample. The result of this is a binding affinity matrix containing a (normalized) read count for each sample at every potential binding site. With this matrix, the samples can be re-clustered using affinity, rather than occupancy, data. The binding affinity matrix is used for QC plotting as well as for subsequent differential analysis.
  4. Differential binding affinity analysis: The core functionality of DiffBind is the differential binding affinity analysis, which enables binding sites to be identified that are statistically significantly differentially bound between sample groups. To accomplish this, first a contrast (or contrasts) is established, dividing the samples into groups to be compared. Next the core analysis routines are executed, by default using edgeR . This will assign a p-value and FDR to each candidate binding site indicating the significance of their being differentially bound.
  5. Plotting and reporting: Once one or more contrasts have been run, DiffBind provides a number of functions for reporting and plotting the results. MA plots give an overview of the results of the analysis, while correlation heatmaps and PCA plots show how the groups cluster based on differentially bound sites. Boxplots show the distribution of reads within differentially bound sites corresponding to whether they gain or lose affinity between the two sample groups. A reporting mechanism enables differentially bound sites to be extracted for further processing, such as annotation and/or pathway analysis.


Links


References

  1. . 2012. Nature


To add a reference for DiffBind, enter the PubMed ID in the field below and click 'Add'.

 


Search for "DiffBind" in the SEQanswers forum / BioStar or:

Web Search Wiki Sites Scientific