getdmps

functions I use frequently to automate DMP analysis

https://github.com/bethan-mallabar-rimmer/getdmps

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

functions I use frequently to automate DMP analysis

Basic Info

Host: GitHub
Owner: bethan-mallabar-rimmer
Language: R
Default Branch: main
Size: 48.8 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

getDMPs

functions I use frequently to automate DMP analysis with the limma R package

example use of getDMPs (look for differentially methylated positions (DMPs) between samples with and without diabetes)

```#remove samples with no diabetes data from variable data frame sampleswithnodiabetesdata <- variabledataframe$samplename[is.na(variabledataframe$diabetesstatus)] diabetesdataframe <- variabledataframe[!(variabledataframe$samplename %in% sampleswithnodiabetesdata)]

remove samples with no diabetes data from beta matrix

diabetesbetamatrix <- betamatrix[,!(colnames(betamatrix) %in% sampleswithnodiabetesdata)]

before running DMP analysis probes in SNPs or on the X and Y chromosome should also be removed

import illumina manifest (optional)

manifest <- read.csv('path-to-illumina-epicv2-manifest.csv')

run limma analysis

library(limma) library(dplyr) diabetesDMPs <- getDMPs(catvar = 'diabetesstatus', varlevels = c('Yes','No'), ssheet = diabetesdataframe, betamatrix = diabetesbetamatrix, adjvar = c('age','sex', 'smokingstatus'), #etc... annotatewith = manifest, #or if you don't have a manifest file just leave this as NULL vartype = 'categorical', returnbay = FALSE)

explore results

nrow(diabetesDMPs$Bonferroni) #this will print the number of DMPs when using more stringent Bonferroni correction nrow(diabetesDMPs$FDR) #this will print the number of DMPs when using less stringent false discovery rate <5% correction

names and details of DMPs are contained in diabetesDMPs$Bonferroni and diabetesDMPs$FDR

view these data frames for more info on them

```

function inputs for getDMPs

The function can be used for categorical or continuous variables. By default the function assumes that categorical variables have only 2 levels, e.g. diabetes_status = 'Yes' or 'No'. It is possible to adapt for 3+ levels if necessary. The code for this is contained in the function in case you need to adapt it but I've commented it out because it's messy and horrible :/

cat_var is the variable being analysed e.g. diabetesstatus. Needs have the same name as whichever column in `ssheet` contains this data.

var_levels = the 2 groups of catvar to find DMPs between. Only required for categorical analysis. E.g. varlevels = c('Yes','No') finds DMPs between samples with diabetesstatus = 'Yes' vs diabetesstatus = 'No'. Must be formatted the same as data in ssheet. E.g. if ssheet$diabetesstatus = c('diabetes','no diabetes','no diabetes','diabetes') then varlevels = c('diabetes','nodiabetes').
Note: order does matter! If catvar = c('Yes','No') then hypermethylated DMPs have increased methylation in samples with diabetes. If cat_var = c('No','Yes') then hypermethylated DMPs are hypermethylated in samples without diabetes.

s_sheet = the data frame which contains all relevant variables as columns, with one row per sample. Column names must include both the argument/input to cat_var (e.g. diabetesstatus as a column name) and everything in `adjvar`. Any samples which do not have variable data (e.g. diabetesstatus = NA) should be removed from ssheet before running the function.

beta_matrix = your beta matrix (can also be an M value matrix - in fact it probably should be as this is more statistically valid), with sample names in columns, sites in rows. Any samples which do not have variable data (e.g. diabetesstatus = NA) should be removed from betamatrix before running the function

adj_var = optional, a list of variables to adjust for in the model e.g. adjvar = c('age','sex','smokingstatus') or adjvar = 'age'. Any variables in this list need to be in the column names of ssheet

annotate_with: optional! prior to running this function, you can import the Illumina manifest for your platform (e.g. EPIC or EPICv2) as a data frame, and put annotate_with = the name of this data frame. This will automatically annotate all DMPs to come out of the analysis with info in the manifest. The manifest must have the column IlmnID for this to work.

var_type: either 'categorical' or 'continuous'

return_bay: ignore and leave as false - I was just using this to return results at an earlier stage of the pipeline and fix some stuff.

Owner

Login: bethan-mallabar-rimmer
Kind: user

Repositories: 1
Profile: https://github.com/bethan-mallabar-rimmer

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Mallabar-Rimmer
    given-names: Bethan
    orcid: https://orcid.org/0009-0004-4345-8414
title: "getDMPs"
version: 1.3.0
date-released: 2025-02-13
url: "https://github.com/bethan-mallabar-rimmer/getDMPs"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science