hydroxymethylater

R workflow for preprocessing, analyzing, and annotating Illumina HumanMethylationEPIC hydroxymethylation data.

https://github.com/eirinisparaki/hydroxymethylater

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.4%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

R workflow for preprocessing, analyzing, and annotating Illumina HumanMethylationEPIC hydroxymethylation data.

Basic Info

Host: GitHub
Owner: eirinisparaki
License: apache-2.0
Language: R
Default Branch: main
Size: 438 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 9 months ago · Last pushed 9 months ago

Metadata Files

Readme License Citation

HydroxymethylateR

R workflow for preprocessing, analyzing, and annotating Illumina HumanMethylationEPIC hydroxymethylation data.

Computational Environment Requirements

Developed and tested on Linux. Other platforms (e.g., macOS, Windows) might also work.

System Requirements

A Linux-based computer (tested on Ubuntu)
R >= 4.5
Bioconductor >= 3.2
ChAMP >= 2.36

Overview

This workflow is built to:
- Import and preprocess bisulfite (BS) and oxidative bisulfite (oxBS) array data.
- Normalize data using NOOB/FunNorm/RAW and filter out problematic probes.
- Estimate sex, cell type proportions and predict smoking status and age.
- Run the MLML method to quantify hydroxymethylation (5hmC) levels.

`preprocess_hydroxymethylation_data()`

Workflow Overview

Workflow of sample preprocessing and 5hmC quantification

Function Signature

r preprocess_hydroxymethylation_data( ox_file, bs_file, annotation_array = "IlluminaHumanMethylationEPICv2", annotation_version = "20a1.hg38", normalization = "NOOB", ChAMPfilter_arraytype_bs = "EPICv2", ChAMPfilter_ProbeCutoff_bs = 0.01, ChAMPfilter_arraytype_ox = "EPICv2", ChAMPfilter_ProbeCutoff_ox = 0.01, file_inaccuracies = NULL, low_variance_threshold_hmc = 0, predictSex = FALSE, predictSmoking = FALSE, predictAge = FALSE, calculateCellPropPCs = FALSE, plotCellProps = FALSE, plotPCA = FALSE, plotSVD = FALSE, plotHmC = FALSE, output_dir = getwd() )

Arguments & Options

| Argument | ---------------------------- | ox_file | bs_file | annotation_array | annotation_version | normalization | ChAMPfilter_arraytype_bs | ChAMPfilter_ProbeCutoff_bs | ChAMPfilter_arraytype_ox | ChAMPfilter_ProbeCutoff_ox | file_inaccuracies | low_variance_threshold_hmc | predictSex | predictSmoking | predictAge | calculateCellPropPCs | plotCellProps | plotPCA | plotSVD | plotHmC | output_dir | Type / Accepted values | Default | Description | | ------------------------------ | ---------------------------------- | ----------------------------------------------- | | character (path) | required | csv of metadata for OxBS arrays. | | character (path) | required | csv of metadata for BS arrays. | | Valid minfi array string | "IlluminaHumanMethylationEPICv2" | Probe annotation. | | character | "20a1.hg38" | Annotation version. | | "NOOB", "FUNORM", "RAW" | "NOOB" | Choose normalisation. | | "450K", "EPIC", "EPICv2" | "EPICv2" | Array type for ChAMP filter (BS). | | numeric 0-1 | 0.01 | ProbeCutoff (BS). | | As above | "EPICv2" | Array type for ChAMP filter (OxBS). | | numeric 0–1 | 0.01 | ProbeCutoff (OxBS). | | NULL or path | NULL | Inaccurancies probes list (column IlmnID).| | numeric ≥ 0 | 0 | Low variance threshold 5hmc. | | logical | FALSE | Add sex prediction via minfi::getSex(). | | logical | FALSE | Add smoking score via EpiSmokEr. | | logical | FALSE | Add DNAm age (Horvath) via wateRmelon. | | logical | FALSE | Estimate blood-cell composition-PCs. | | logical | FALSE | Save stacked-bar chart cell-composition plot. | | logical | FALSE | Save PCA. | | logical | FALSE | Save ChAMP SVD plots. | | logical | FALSE | Save 5hmC density plot. | | character (path) | getwd() | Destination folder for all outputs. |

Required Inputs

1. Metadata csv (`ox_file`, `bs_file`)

Each csv must contain one row per array and these five columns (case-sensitive):

| Column | Description | Example | | ------------- | ---------------------------------------------------- | -------------- | | Sample_Name | Unique experiment ID (overwritten internally). | S01 | | Array | Illumina barcode (last 10 digits of iDAT filenames). | 1234567890 | | Slide | Illumina slide ID (first part of iDAT filenames). | 204905210066 | | iDAT_PATH | Directory containing Red + Grn iDATs for that slide. | /data/iDATs/ | | status | Custom label (case, control, etc.). | case |

Expected folder layout

iDAT_PATH/ └── SLIDE/ ├── SLIDE_ARRAY_Red.iDAT └── SLIDE_ARRAY_Grn.iDAT

2. Optional Probe Inaccuracies (`file_inaccuracies`)

Csv with a column IlmnID listing probes to exclude.

Outputs

Everything is written to output_dir (default: working directory):

output_dir/ ├── phenotype_table.csv # per-sample metadata (+ optional sex, PCs, etc.) ├── filtered_hmC.csv # long-format 5hmC after variance filtering ├── cell_props.png # optional barplot of blood-cell composition ├── explained_variance.png # Explained variance ├── Hydroxymethylation Density by Sample.png # optional 5hmC densities └── SVDsummary.pdf # created when plotSVD = TRUE

The function returns an invisible list:

phenotype_df_bs – BS sample metadata
filtered_hmC – long-format 5hmC

Core Workflow (14 Steps)

Read and validate metadata
Read OxBS iDAT
Read BS iDAT
(Optional) predict sex
Normalise (NOOB / FunNorm / Raw)
ChAMP filter - BS
ChAMP filter - OxBS
Remove inaccuracies probes
Build phenotype dataframe for BS
(Optional) estimate cell proportions
(Optional) predict smoking score
(Optional) predict DNAm age (Horvath)
Estimate 5hmC via MLML2R
Low-variance filter & write outputs

Each step frees memory with rm(); gc().

Minimal Example

```r library(HydroxymethylateR)

results <- preprocesshydroxymethylationdata( oxfile = "metadataoxbs.csv", bsfile = "metadatabs.csv", output_dir = "results" )

Access outputs

head(results$phenotypedfbs) head(results$filtered_hmC) ```

Required Packages

The following R packages (from CRAN and Bioconductor) are required:

CRAN Packages:

viridis, ggplot2, reshape2

Bioconductor Packages:

- `FlowSorted.Blood.EPIC`, `sesame`, `wateRmelon`,`MLML2R`, `EpiSmokEr`, `minfi`, `ChAMP`

Installation Instructions

To install this workflow:

```bash devtools::intall_github("eirinisparaki/HydroxymethylateR")

```

Here's the Markdown text you can add to your README.md to cover both the citation of your tool and a reference to the citation list for dependencies:

📖 Citation

If you use HydroxymethylateR in your research, please cite:

This GitHub repository: eirinisparaki/HydroxymethylateR

For citations of the R packages used in this project, please refer to CITATIONS.md.

Contact

For questions or collaborations, feel free to contact:

Eirini Sparaki
📧 sparakiirini@gmail.com
🔗 https://github.com/eirinisparaki

Owner

Name: Eirini
Login: eirinisparaki
Kind: user

Repositories: 1
Profile: https://github.com/eirinisparaki

Citation (CITATIONS.md)

### Package Citations

This project makes use of the following R packages. Please cite them as follows:

| Package                 | Citation                                                                                                                                                      |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `viridis`               | Garnier S. *Colorblind-Friendly Color Maps for R*. [Link](https://sjmgarnier.github.io/viridis/)                                  |
| `ggplot2`               | *ggplot2: A system for declaratively creating graphics, based on “The Grammar of Graphics”*. [Link](https://ggplot2.tidyverse.org)     |
| `reshape2`              | Wickham H. *Reshaping data with the reshape package*. J. Stat. Softw., 21(12), 1–20 (2007). [DOI](https://doi.org/10.18637/jss.v021.i12)                      |
| `FlowSorted.Blood.EPIC` | Salas LA et al. *An optimized library for reference-based deconvolution...*. Genome Biol., 19(1), 64 (2018). [DOI](https://doi.org/10.1186/s13059-018-1448-7) |
| `sesame`                | Zhou W et al. *SeSAMe...*. Nucleic Acids Res., 46(20), e123 (2018). [DOI](https://doi.org/10.1093/nar/gky691)                                                 |
| `wateRmelon`            | Pidsley R et al. *A data-driven approach to preprocessing...*. BMC Genomics, 14:293 (2013). [DOI](https://doi.org/10.1186/1471-2164-14-293)                   |
| `MLML2R`                | *Maximum Likelihood Estimation of DNA Methylation and Hydroxymethylation Proportions*. [CRAN](https://cran.r-project.org/package=MLML2R)                      |
| `EpiSmokEr`             | Bollepalli S. *EpiSmokEr: Epigenetic Smoking status Estimator*. [GitHub](https://github.com/sailalithabollepalli/EpiSmokEr)                                   |
| `minfi`                 | Aryee MJ et al. *Minfi...*. Bioinformatics, 30(10), 1363–1369 (2014). [DOI](https://doi.org/10.1093/bioinformatics/btu049)                                    |
| `ChAMP`                 | Morris TJ et al. *ChAMP...*. Bioinformatics, 30(3), 428–430 (2014). [DOI](https://doi.org/10.1093/bioinformatics/btt684)                                      |

hydroxymethylater

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

HydroxymethylateR

Computational Environment Requirements

System Requirements

Overview

preprocess_hydroxymethylation_data()

Workflow Overview

Function Signature

Arguments & Options

Required Inputs

1. Metadata csv (ox_file, bs_file)

2. Optional Probe Inaccuracies (file_inaccuracies)

Outputs

Core Workflow (14 Steps)

Minimal Example

Access outputs

Required Packages

CRAN Packages:

Bioconductor Packages:

- FlowSorted.Blood.EPIC, sesame, wateRmelon,MLML2R, EpiSmokEr, minfi, ChAMP

Installation Instructions

```

📖 Citation

Contact

Owner

Citation (CITATIONS.md)

GitHub Events

Total

Last Year

`preprocess_hydroxymethylation_data()`

1. Metadata csv (`ox_file`, `bs_file`)

2. Optional Probe Inaccuracies (`file_inaccuracies`)

- `FlowSorted.Blood.EPIC`, `sesame`, `wateRmelon`,`MLML2R`, `EpiSmokEr`, `minfi`, `ChAMP`