illumina450k_filtering
A collection of resources to filter 'bad' probes from the Illumina 450k and EPIC methylation arrays
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary
Keywords
Repository
A collection of resources to filter 'bad' probes from the Illumina 450k and EPIC methylation arrays
Statistics
- Stars: 30
- Watchers: 2
- Forks: 25
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Illumina methylation array probe filtering (450k and EPIC/850k)
A collection of resources to filter 'bad'/cross-reactive/variant probes from the Illumina methylation arrays during QC stages of pipelines/analysis.
450k array
BOWTIE2 mapping of 450k probes
All probe sequences were mapped to the human genome (hg19) using BOWTIE2 to identify potential hybridisation issues.
- 33,457 probes were identified as aligning greater than once
- these are made available in
HumanMethylation450_15017482_v.1.1_hg19_bowtie_multimap.txt
Additional non-specific probes
Chen et al., identified a series of non-specific probes across the 450k design.
Chen Y, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R: Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 2013, 8:203–9.
- there are a total of 29,233 probes
- these are available in
48639-non-specific-probes-Illumina450k.csv
Note: there is overlap between the two probe sets.
remember to include any probes which fail detection
```R
process failed probes
detP <- detectionP(RGset) failed <- detP > 0.01 colMeans(failed) # Fraction of failed positions per sample sum(rowMeans(failed)>0.5) # How many positions failed in >50% of samples? failed.probes <- rownames(detP[rowMeans(failed)>0.5,]) ```
Example filtering strategy (in R)
```R
generate 'bad' probes filter
cross-reactive/non-specific
cross.react <- read.csv('48639-non-specific-probes-Illumina450k.csv', head = T, as.is = T) cross.react.probes <- as.character(cross.react$TargetID)
BOWTIE2 multi-mapped
multi.map <- read.csv('HumanMethylation45015017482v.1.1hg19bowtie_multimap.txt', head = F, as.is = T) multi.map.probes <- as.character(multi.map$V1)
determine unique probes
filter.probes <- unique(c(cross.react.probes, multi.map.probes))
filter the matrix of beta values (beta_norm)
CpGs probes (IlmnID) should be rownames
fitler out 'bad' probes
table(rownames(betanorm) %in% filter.probes) filter.bad <- rownames(betanorm) %in% filter.probes betanorm <- betanorm[!filter.bad,] ```
For a real-world example filtering strategy interested parties can refer to the methods section of our publication: (http://www.genomebiology.com/2015/16/1/8)
EPIC/850K array
Update (200827) - added manifest revsion information
If you don't follow the Illumina website closely you may miss that the annotation manifest file goes through revision occasionally. It's important to keep an eye on this as some of these changes result in the removal of probes due to poor performance. The below table details the versions and changes. More detailed information can be found at the Illumina product page here.
Revision | Date | Description of Change :-------:|:----:|:-------------------- V1.0 B5 | March 2020 | Manifest file annotation of discordant probes v1.0 B4 | May 2017 | Manifest file formatting fix v1.0 B3 | April 2017 | Removed 977 CpG sites from manifest v1.0 B2 | February 2016 | Fixed switch in red/green signal for Infinium I SNP probes v1.0 B1 | January 2016 | Removed one pair of bisulfite conversion controls and 1031 CpG sites from the manifest - probe list v1.0 | November 2015 | Initial release
Full link to the detailed change log here.
I recommend always running the latest annotation release, which is currently B5 - download.
Update (170928) - addition of probes for EPIC/850k processing
Supplementary data from Pidsley et al., (2016), suggests cross-reactive and variant containing probes to filter at QC.
Pidsley, R., Zotenko, E., Peters, T. J., Lawrence, M. G., Risbridger, G. P., Molloy, P., … Clark, S. J. (2016). Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biology, 17(1), 208. https://doi.org/10.1186/s13059-016-1066-1
- there is overlap between 450k and 850k lists, however this will not cause any issues.
Extension to the above to filter EPIC data (can apply 450k list as well)
Combine the below with the above 450k process to flter EPIC arrays at QC stage:
```R
probes from Pidsley 2016 (EPIC)
epic.cross1 <- read.csv('EPIC/1305920161066MOESM1ESM.csv', head = T)
epic.cross2 <- read.csv('EPIC/1305920161066MOESM2ESM.csv', head = T)
epic.cross3 <- read.csv('EPIC/1305920161066MOESM3ESM.csv', head = T)
epic.variants1 <- read.csv('EPIC/1305920161066MOESM4ESM.csv', head = T) epic.variants2 <- read.csv('EPIC/1305920161066MOESM5ESM.csv', head = T) epic.variants3 <- read.csv('EPIC/1305920161066MOESM6ESM.csv', head = T)
additional filter probes
epic.add.probes <- c(as.character(epic.cross1$X), as.character(epic.variants1$PROBE), as.character(epic.variants2$PROBE), as.character(epic.variants3$PROBE))
final list of unique probes
epic.add.probes <- unique(epic.add.probes) ```
Filtering process follows the same as above (apply to matrix of beta values), example:
```R
failed probes (those that fail detection)
betanorm <- betanorm[!(rownames(beta_norm) %in% failed.probes),]
additional epic probes
betanorm <- betanorm[!(rownames(beta_norm) %in% epic.add.probes),] ```
Owner
- Name: Miles
- Login: sirselim
- Kind: user
- Location: Taranaki, New Zealand
- Company: @nanoporetech
- Website: http://sirselim.github.io/
- Twitter: miles_benton
- Repositories: 53
- Profile: https://github.com/sirselim
Senior Bioinformatician | Applications | @nanoporetech - passionate about science, technology, community empowerment, photography and heavy metal.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software and wish to cite it, please cite it as below." authors: - family-names: "Benton" given-names: "Miles C" orcid: "https://orcid.org/0000-0003-3442-965X" title: "Illumina450K_filtering: A collection of resources to filter Illumina 450k and EPIC methylation arrays" version: 1.0.4 date-released: 2016-11-29 url: "https://github.com/sirselim/illumina450k_filtering"
GitHub Events
Total
- Watch event: 1
- Fork event: 2
Last Year
- Watch event: 1
- Fork event: 2
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 6
- Total pull requests: 0
- Average time to close issues: 7 months
- Average time to close pull requests: N/A
- Total issue authors: 5
- Total pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ahdee (2)
- azzaea (1)
- YoannPa (1)
- amarinderthind (1)
- pedrodcb (1)