illumina450k_filtering

A collection of resources to filter 'bad' probes from the Illumina 450k and EPIC methylation arrays

https://github.com/sirselim/illumina450k_filtering

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.0%) to scientific vocabulary

Keywords

bioinformatics methylation-analysis methylation-microarrays probes
Last synced: 6 months ago · JSON representation ·

Repository

A collection of resources to filter 'bad' probes from the Illumina 450k and EPIC methylation arrays

Basic Info
  • Host: GitHub
  • Owner: sirselim
  • Default Branch: master
  • Homepage:
  • Size: 22.8 MB
Statistics
  • Stars: 30
  • Watchers: 2
  • Forks: 25
  • Open Issues: 0
  • Releases: 0
Topics
bioinformatics methylation-analysis methylation-microarrays probes
Created over 10 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

Illumina methylation array probe filtering (450k and EPIC/850k)

A collection of resources to filter 'bad'/cross-reactive/variant probes from the Illumina methylation arrays during QC stages of pipelines/analysis.

450k array

BOWTIE2 mapping of 450k probes

All probe sequences were mapped to the human genome (hg19) using BOWTIE2 to identify potential hybridisation issues.

  • 33,457 probes were identified as aligning greater than once
  • these are made available in HumanMethylation450_15017482_v.1.1_hg19_bowtie_multimap.txt

Additional non-specific probes

Chen et al., identified a series of non-specific probes across the 450k design.

Chen Y, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R: Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 2013, 8:203–9.

  • there are a total of 29,233 probes
  • these are available in 48639-non-specific-probes-Illumina450k.csv

Note: there is overlap between the two probe sets.

remember to include any probes which fail detection

```R

process failed probes

detP <- detectionP(RGset) failed <- detP > 0.01 colMeans(failed) # Fraction of failed positions per sample sum(rowMeans(failed)>0.5) # How many positions failed in >50% of samples? failed.probes <- rownames(detP[rowMeans(failed)>0.5,]) ```

Example filtering strategy (in R)

```R

generate 'bad' probes filter

cross-reactive/non-specific

cross.react <- read.csv('48639-non-specific-probes-Illumina450k.csv', head = T, as.is = T) cross.react.probes <- as.character(cross.react$TargetID)

BOWTIE2 multi-mapped

multi.map <- read.csv('HumanMethylation45015017482v.1.1hg19bowtie_multimap.txt', head = F, as.is = T) multi.map.probes <- as.character(multi.map$V1)

determine unique probes

filter.probes <- unique(c(cross.react.probes, multi.map.probes))

filter the matrix of beta values (beta_norm)

CpGs probes (IlmnID) should be rownames

fitler out 'bad' probes

table(rownames(betanorm) %in% filter.probes) filter.bad <- rownames(betanorm) %in% filter.probes betanorm <- betanorm[!filter.bad,] ```

For a real-world example filtering strategy interested parties can refer to the methods section of our publication: (http://www.genomebiology.com/2015/16/1/8)


EPIC/850K array

Update (200827) - added manifest revsion information

If you don't follow the Illumina website closely you may miss that the annotation manifest file goes through revision occasionally. It's important to keep an eye on this as some of these changes result in the removal of probes due to poor performance. The below table details the versions and changes. More detailed information can be found at the Illumina product page here.

Revision | Date | Description of Change :-------:|:----:|:-------------------- V1.0 B5 | March 2020 | Manifest file annotation of discordant probes v1.0 B4 | May 2017 | Manifest file formatting fix v1.0 B3 | April 2017 | Removed 977 CpG sites from manifest v1.0 B2 | February 2016 | Fixed switch in red/green signal for Infinium I SNP probes v1.0 B1 | January 2016 | Removed one pair of bisulfite conversion controls and 1031 CpG sites from the manifest - probe list v1.0 | November 2015 | Initial release

Full link to the detailed change log here.

I recommend always running the latest annotation release, which is currently B5 - download.

Update (170928) - addition of probes for EPIC/850k processing

Supplementary data from Pidsley et al., (2016), suggests cross-reactive and variant containing probes to filter at QC.

Pidsley, R., Zotenko, E., Peters, T. J., Lawrence, M. G., Risbridger, G. P., Molloy, P., … Clark, S. J. (2016). Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biology, 17(1), 208. https://doi.org/10.1186/s13059-016-1066-1

  • there is overlap between 450k and 850k lists, however this will not cause any issues.

Extension to the above to filter EPIC data (can apply 450k list as well)

Combine the below with the above 450k process to flter EPIC arrays at QC stage:

```R

probes from Pidsley 2016 (EPIC)

epic.cross1 <- read.csv('EPIC/1305920161066MOESM1ESM.csv', head = T)

epic.cross2 <- read.csv('EPIC/1305920161066MOESM2ESM.csv', head = T)

epic.cross3 <- read.csv('EPIC/1305920161066MOESM3ESM.csv', head = T)

epic.variants1 <- read.csv('EPIC/1305920161066MOESM4ESM.csv', head = T) epic.variants2 <- read.csv('EPIC/1305920161066MOESM5ESM.csv', head = T) epic.variants3 <- read.csv('EPIC/1305920161066MOESM6ESM.csv', head = T)

additional filter probes

epic.add.probes <- c(as.character(epic.cross1$X), as.character(epic.variants1$PROBE), as.character(epic.variants2$PROBE), as.character(epic.variants3$PROBE))

final list of unique probes

epic.add.probes <- unique(epic.add.probes) ```

Filtering process follows the same as above (apply to matrix of beta values), example:

```R

failed probes (those that fail detection)

betanorm <- betanorm[!(rownames(beta_norm) %in% failed.probes),]

additional epic probes

betanorm <- betanorm[!(rownames(beta_norm) %in% epic.add.probes),] ```

Owner

  • Name: Miles
  • Login: sirselim
  • Kind: user
  • Location: Taranaki, New Zealand
  • Company: @nanoporetech

Senior Bioinformatician | Applications | @nanoporetech - passionate about science, technology, community empowerment, photography and heavy metal.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software and wish to cite it, please cite it as below."
authors:
- family-names: "Benton"
  given-names: "Miles C"
  orcid: "https://orcid.org/0000-0003-3442-965X"
title: "Illumina450K_filtering: A collection of resources to filter Illumina 450k and EPIC methylation arrays"
version: 1.0.4
date-released: 2016-11-29
url: "https://github.com/sirselim/illumina450k_filtering"

GitHub Events

Total
  • Watch event: 1
  • Fork event: 2
Last Year
  • Watch event: 1
  • Fork event: 2

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 6
  • Total pull requests: 0
  • Average time to close issues: 7 months
  • Average time to close pull requests: N/A
  • Total issue authors: 5
  • Total pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ahdee (2)
  • azzaea (1)
  • YoannPa (1)
  • amarinderthind (1)
  • pedrodcb (1)
Pull Request Authors
Top Labels
Issue Labels
question (1) Feature Request (1)
Pull Request Labels