cfDNAPro

cfDNAPro specializes in standardized and robust cfDNA fragmentomic analysis

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
○
Academic publication links
✓
Committers with academic emails
2 of 4 committers (50.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary

Keywords

bioinformatics cancer-genomics cancer-research cell-free-dna early-detection genomics-visualization liquid-biopsy r swgs whole-genome-sequencing

Keywords from Contributors

bioconductor-package gene sequencing ontology proteomics

Last synced: 6 months ago · JSON representation

Repository

cfDNAPro specializes in standardized and robust cfDNA fragmentomic analysis

Basic Info

Host: GitHub
Owner: hw538
License: gpl-3.0
Language: R
Default Branch: master
Homepage:
Size: 2.72 MB

Statistics

Stars: 34
Watchers: 6
Forks: 8
Open Issues: 2
Releases: 0

Topics

bioinformatics cancer-genomics cancer-research cell-free-dna early-detection genomics-visualization liquid-biopsy r swgs whole-genome-sequencing

Created over 5 years ago · Last pushed 7 months ago

Metadata Files

Readme Changelog License

cfDNAPro

Official tutorials

This landing page aims to provide a quick start. For in-depth documentation, please visit: https://cfdnapro.readthedocs.io/en/latest/

Declaration

cfDNAPro is designed for research only.

Why cfDNAPro?

Unlike genomic DNA, cfDNA has specific fragmentation patterns. The ambiguous definition of "fragment length" by various alignment software is raising concerns: see page 9 footnote in SAM file format spec: https://samtools.github.io/hts-specs/SAMv1.pdf
cell-free DNA data fragmentomic analysis requires single-molecule level resolution, emphasising the importance of accurate/unbiased feature extraction. The traditional tools built for solid tissue sequencing do not consider the specific properties of cfDNA sequencing data (e.g., cfDNAs are naturally fragmented with a modal fragment size of 167bp, and di-/tri-nucleotide peaks in the length distributions). Researchers might inadvertently extract the features using a sub-optimised method.

cfDNAPro is designed to resolve this issue and standardize the cfDNA fragmentomic analysis, complying with the existing building blocks in the bioconductor R ecosystem. We wish cfDNAPro to provide a catalyst for further improvements in the implementation and development of cfDNA biomarkers and multi-modal AI for various health conditions.

Input

A paired-end sequencing bam file, with duplicates marked. (e.g., using the MarkDuplicates function from Picard).
Please do not impose any filtering on the bam files; For example, do not filter by the proper-pairs flag.
cfDNAPro filters the reads by following default criteria (You can toggle those criteria using parameters built-in readBam() function):
(1) Reads mapping qualities less than 30 were discarded;
(2) Reads must be paired. Of note, by default, cfDNAPro doesn’t impose filtration by “proper pair”;
(3) No duplicate;
(4) No secondary alignment;
(5) No supplementary alignment;
(6) No unmapped reads.

Note: remember to choose the correct genome_label, a parameter in readBam() function, based on the ref genome you used for alignment. At the moment, it supports three different ref genomes, hg19, hg38 and hg38-NCBI, For details see readBam() R documentation by typing ?readBam in the R console or see source code:https://github.com/hw538/cfDNAPro/blob/master/R/readBam.R

Output

cfDNAPro can extract (i.e., "quantify in a standandised and robust way") these features/bio-markers: - fragment length - fragment start/end/upstream/downstream motifs - copy number variation - single nucleotide mutation - more...

Feature extraction depends on essential data objects/R packages in the Bioconductor ecosystem, such as Rsamtools, plyranges, GenomicAlignments, GenomeInfoDb and Biostrings.
Data engineering depends on packges in the tidyverse ecosystem, such as dplyr, and stringr.
All plots depend on ggplot2 R packge.

For issues/inquiries, please contact:
Generic enquiry: Nitzan Rosenfeld Lab admin mailbox: bci-nrlab-admin@qmul.ac.uk
Fragment length, motif and CNV related questions: Haichao Wang: wanghaichao2014@gmail.com
SNV/SNP related questions: Paulius D. Mennea: paulius.mennea@cruk.cam.ac.uk

Installation

Option 1 (recommended): Use Docker or Singularity:

Thanks zetian-jia for building the docker image,
please refer to github.com/zetian-jia/cfDNAPro_docker

Docker

```bash

Step 1: Pull the Docker Image

docker pull zetianjia/cfdnapro:1.7.3

Step 2: Launch R inside the Container

docker run -it zetianjia/cfdnapro:1.7.3 R --no-save ```

Singularity

```bash

Step 1: Pull the Docker Image

singularity pull docker://zetianjia/cfdnapro:1.7.3

Step 2: Launch R inside the Container

singularity exec -e cfdnapro_1.7.3.sif R --no-save ```

Option 2: Use anaconda to build an env using the following codes:

```bash

conda create -y cfdnapro_r4.3.3 r-base=4.3.3

conda activate cfdnapro_r4.3.3

conda install -y -c conda-forge r-xml2 r-curl conda install -y -c conda-forge libgdal conda install -y r::r-libgeos conda install -y -c conda-forge udunits2

Install devtools if it's not already installed

Rscript -e 'if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools", repos = "https://cloud.r-project.org")'

IMPORTANT: Install Matrix version 1.6-5 (compatible with R 4.3)

Rscript -e 'devtools::install_version("Matrix", version = "1.6-5", repos = "https://cloud.r-project.org")'

IMPORTANT: Install MASS version 7.3-58.35 (compatible with R 4.3)

Rscript -e 'devtools::install_version("MASS", version = "7.3-58.3", repos = "https://cloud.r-project.org")'

IMPORTANT: Install units package version 0.8-2 (compatible with R 4.3)

Rscript -e 'devtools::install_version("units", version = "0.8-2", repos = "https://cloud.r-project.org")'

IMPORTANT: Install rtracklayer package version 0.8-2 (compatible with R 4.3)

Rscript -e 'devtools::install_version("rtracklayer", version = "1.62.0", repos = "https://cloud.r-project.org")'

Rscript -e 'if (!requireNamespace("pacman", quietly = TRUE)) install.packages("pacman", repos = "https://cloud.r-project.org"); pacman::p_load(xml2, curl, httpuv, shiny, gh, gert, usethis, pkgdown, rcmdcheck, roxygen2, rversions, urlchecker, BiocManager)'

IMPORTANT: you have to set the timeout time as these packages are quite big, if timeout is too short, the installation might fail due to a slow downloading process

Rscript -e 'options(timeout=3600); if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager"); BiocManager::install("OrganismDbi")' Rscript -e 'options(timeout=3600); pkgs <- c("GenomicAlignments", "rtracklayer", "GenomicFeatures", "BSgenome", "BSgenome.Hsapiens.UCSC.hg38", "BSgenome.Hsapiens.UCSC.hg19", "BSgenome.Hsapiens.NCBI.GRCh38", "Homo.sapiens", "plyranges", "TxDb.Hsapiens.UCSC.hg19.knownGene"); new <- pkgs[!pkgs %in% installed.packages()[,"Package"]]; if(length(new)) BiocManager::install(new)'

Rscript -e 'pacman::pload(car, mgcv, pbkrtest, quantreg, lme4, ggplot2, ggrepel, ggsci, cowplot, ggsignif, rstatix, ggpubr, patchwork,ggpattern)' Rscript -e 'devtools::installgithub("asntech/QDNAseq.hg38@main")'

install cfDNAPro

Rscript -e 'devtools::installgithub("hw538/cfDNAPro", buildvignettes = FALSE, force = TRUE)'

```

Quick Start 1

Read bam file, return the fragment name (i.e. read name in bam file) and alignment coordinates in GRanges object in R. If needed, you can convert the GRanges into a dataframe and the fragment length is stored in the "width" column.

```R

library(cfDNAPro)

read bam file, do alignment curation

frags <- readBam(bamfile = "/path/to/bamfile.bam")

convert GRanges object to a dataframe in R

frag_df <- as.data.frame(frags)

You can calculate fragment length and motifs from the frags object (i.e., the output of readBam() function)

fraglength <- callLength(frags) fragmotif <- callMotif(frags) ``` A screenshot of the output:

Quick Start 2

Read in bam file, return the fragment length counts. A straightforward and frequent user case: calculate the fragment size of a bam file, use the following code:

```R

install cfDNAPro newest version

if (!require(devtools)) install.packages("devtools") devtools::installgithub("hw538/cfDNAPro", buildvignettes = FALSE)

calculate insert size of a bam file

library(cfDNAPro) fraglengths <- readbaminsertmetrics(bamfile = "/path/to/bamfile.bam") ``` The returned dataframe contains two columns, i.e., "insertsize" (fragment length) and "AllReads.fr_count" (the count of the fragment length). A screenshot of the output:

News

cfDNAPro paper is online (May 2025)!

Link to our Genome Biology paper ### cfDNAPro 1.7.3 (Jan 2025)
Updated various functions for mutational analysis ### cfDNAPro 1.7.2 (Jan 2025)
Improved various function for mutation annotation analysis etc ### cfDNAPro 1.7.1 (Jan 2025)
Improved the information and layout of this markdown quick start landing page ### cfDNAPro 1.7.1 (Aug 2024)
multiple updates ### cfDNAPro 1.7.1 (May 2023)
Resolved issues when building vignette
Various updates
Added/Updated readBam() functions ### cfDNAPro 1.5.4 (Nov 2022)
In addition to "bam" and "picard" files as the input, now we accept "cfdnapro" as inputtype to various functions, this 'cfdnapro' input is exactly the output of `readbaminsertmetrics` function in cfDNAPro package. It is a tsv file containing two columns, i.e., "insertsize" (fragment length) and "AllReads.fr_count" (the count of the fragment length). ### cfDNAPro 1.5.3 (Oct 2022)
added support for hg38-NCBI version, i.e. GRCh38 ### cfDNAPro 0.99.3 (July 2021)
Modified vignette. ### cfDNAPro 0.99.2 (July 2021)
Modified vignette. ### cfDNAPro 0.99.1 (May 2021)
Added 'cfDNAPro' into the "watched tag". ### cfDNAPro 0.99.0 (May 2021)
Now cfDNAPro supports bam file as input for data characterisation.
Coding style improvements.
Documentation improvements.
Submitted to Bioconductor.

Citation

Please cite this paper:

Wang, H., Mennea, P.D., Chan, Y.K.E., Cheng, Z. et al. A standardized framework for robust fragmentomic feature extraction from cell-free DNA sequencing data. Genome Biol 26, 141 (2025). https://doi.org/10.1186/s13059-025-03607-5

Owner

Name: haichaowang
Login: hw538
Kind: user
Location: Cambridge
Company: University of Cambridge

Website: https://sites.google.com/view/wanghc
Twitter: haichao_wang20
Repositories: 2
Profile: https://github.com/hw538

PhD student and Bioinformatician at University of Cambridge

GitHub Events

Total

Watch event: 12
Issue comment event: 1
Push event: 52
Pull request event: 2
Fork event: 4
Create event: 1

Last Year

Watch event: 12
Issue comment event: 1
Push event: 52
Pull request event: 2
Fork event: 4
Create event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 177
Total Committers: 4
Avg Commits per committer: 44.25
Development Distribution Score (DDS): 0.051

Past Year

Commits: 130
Committers: 2
Avg Commits per committer: 65.0
Development Distribution Score (DDS): 0.015

Top Committers

Name	Email	Commits
haichao	w**4@g**m	168
Nitesh Turaga	n**a@g**m	4
haichao wang	h**g@c**k	3
J Wokaty	j**y@s**u	2

Committer Domains (Top 20 + Academic)

sph.cuny.edu: 1 cruk.cam.ac.uk: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 5
Total pull requests: 2
Average time to close issues: 13 days
Average time to close pull requests: 1 minute
Total issue authors: 3
Total pull request authors: 1
Average comments per issue: 1.4
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 2
Average time to close issues: 4 days
Average time to close pull requests: 1 minute
Issue authors: 1
Pull request authors: 1
Average comments per issue: 2.0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

eibol1 (2)
zetian-jia (1)
hwm08 (1)

Pull Request Authors

pauliusmennea (2)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- bioconductor 8,273 total

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 5
Total maintainers: 1

bioconductor.org: cfDNAPro

cfDNAPro extracts and Visualises biological features from whole genome sequencing data of cell-free DNA

Homepage: https://github.com/hw538/cfDNAPro
Documentation: https://bioconductor.org/packages/release/bioc/vignettes/cfDNAPro/inst/doc/cfDNAPro.pdf
License: GPL-3
Latest release: 1.14.0
published 10 months ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 8,273 Total

Rankings

Dependent repos count: 0.0%

Dependent packages count: 0.0%

Average: 27.1%

Downloads: 81.2%

Maintainers (1)

hw538@cam.ac.uk

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 4.1.0 depends
magrittr >= 1.5.0 depends
BiocGenerics * imports
GenomeInfoDb * imports
GenomicAlignments * imports
GenomicRanges * imports
IRanges * imports
Rsamtools >= 2.4.0 imports
dplyr >= 0.8.3 imports
ggplot2 >= 3.2.1 imports
plyranges * imports
quantmod >= 0.4 imports
rlang >= 0.4.0 imports
stats * imports
stringr >= 1.4.0 imports
tibble * imports
utils * imports
BSgenome.Hsapiens.UCSC.hg19 * suggests
BSgenome.Hsapiens.UCSC.hg38 * suggests
BiocStyle * suggests
devtools >= 2.3.0 suggests
ggpubr * suggests
knitr >= 1.23 suggests
rmarkdown >= 1.14 suggests
scales * suggests
testthat * suggests

cfDNAPro

Science Score: 49.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

cfDNAPro

Official tutorials

Declaration

Why cfDNAPro?

Input

Output

Installation

Option 1 (recommended): Use Docker or Singularity:

Docker

Step 1: Pull the Docker Image

Step 2: Launch R inside the Container

Singularity

Step 1: Pull the Docker Image

Step 2: Launch R inside the Container

Option 2: Use anaconda to build an env using the following codes:

Install devtools if it's not already installed

IMPORTANT: Install Matrix version 1.6-5 (compatible with R 4.3)

IMPORTANT: Install MASS version 7.3-58.35 (compatible with R 4.3)

IMPORTANT: Install units package version 0.8-2 (compatible with R 4.3)

Rscript -e 'devtools::install_version("units", version = "0.8-2", repos = "https://cloud.r-project.org")'

IMPORTANT: Install rtracklayer package version 0.8-2 (compatible with R 4.3)

IMPORTANT: you have to set the timeout time as these packages are quite big, if timeout is too short, the installation might fail due to a slow downloading process

install cfDNAPro

Quick Start 1

read bam file, do alignment curation

convert GRanges object to a dataframe in R

You can calculate fragment length and motifs from the frags object (i.e., the output of readBam() function)

Quick Start 2

install cfDNAPro newest version

calculate insert size of a bam file

News

cfDNAPro paper is online (May 2025)!

Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

bioconductor.org: cfDNAPro

Rankings

Maintainers (1)

Dependencies