absp
Analysis of Bisulfite Sequencing PCR experiments, with R, to compute CpG methylation levels and identify methylation differences on specific DNA regions of interest.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.5%) to scientific vocabulary
Repository
Analysis of Bisulfite Sequencing PCR experiments, with R, to compute CpG methylation levels and identify methylation differences on specific DNA regions of interest.
Basic Info
Statistics
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 7
Metadata Files
README.md
ABSP - Analysis of Bisulfite Sequencing PCR

ABSP, which stands for "Analysis of Bisulfite Sequencing PCR", is an R-based tool designed to analyze CpG methylation profiles using data from Bisulfite Sequencing PCR (BSP) experiments. It was developed to assist researchers in estimating and comparing methylation percentages of a DNA regions studied through BSP experiments. ABSP offers a comprehensive automated workflow, spanning from trace file sequencing results to data visualization and statistical analysis.
Contents:
For further detailed information, please read the ABSP User Guide document.
Citation
If you use ABSP in your published work, please cite:
Denoulet et al. (2023), ABSP: an automated R tool to efficiently analyze region-specific CpG methylation from bisulfite sequencing PCR, Bioinformatics, Volume 39, Issue 1, btad008, https://doi.org/10.1093/bioinformatics/btad008
Availability
ABSP was initially developed on a Windows machine but has been tested and confirmed to be is functional on MacOS and Linux devices as well. Please be aware that on Linux machine, the package installation during the first launch of ABSP may take a considerable amount of time.
Quick start guide
Installation
- Download the ABSP files from GitHub (Go to "Releases" in the right panel and in the "Assets" section of the latest release, download the ABSP .zip or .tar.gz file) and extract/unzip the folder
- Install software: R at https://www.r-project.org/ and RStudio at https://www.rstudio.com/products/rstudio/download/
Open the ABSP app
- Open the "ABSP Rproject.Rproject" file with Rstudio
- Open the "app.R" file with Rstudio
- Click on the "Run App" button at the top right corner of the RStudio window (in the "Run App" dropdown menu you can select "Run external" to open the app in web browser instead of within the RStudio window or viewer pane, if it better suits your preference)
- If prompted with a pop-up window regarding the shiny package installation, click "Yes" to proceed with installing the shiny package

Analysis with ABSP
You can locate example data for both inputs and outputs in the "examples" folder.
Individual sample analysis:

- Navigate to the "Individual analysis" tab and complete the entries and add input files such as the reference sequence in .fasta format and sequencing files in .ab1 format.
- Proceed to run the analysis to compute the CpG methylation levels of your sample
- Once the analysis is complete, review the results in the .html report generated in your "reports" folder
- Finally, find the output files in your "results" folder for further examination

Grouped samples analysis:

- After individually analyzing all of your samples, navigate to the "Grouped analysis" tab and complete input entries and select your parameters for plotting.
- Proceed to run the analysis which will gather samples, generate visualization plots (lollipop plots) and compare methylation data between groups using comparative statistics
- Once the analysis is complete, review the results in the .html report generated in your "reports" folder
- Finally, locate the output files in your "results" folder for further examination


Launch multiple analyses:

The "Multiple analyses" tab is useful to analyze multiple samples and/or for multiple grouping analysis, launched in one click, using as input tables filled with the required input entries.
FAQ - Frequently Asked Questions
1. My reference genome is not available, what can I do ?
First, the reference genome is only necessary for visualization of the reference sequence in genomic plots but it is not needed for computing methylation percentages.
Still, there are several possibilities:
1. Simplest option: Using an arbitrary reference genome
You can employ ABSP with an arbitrary genome, ideally one that closely resembles the actual reference genome. This choice does not affect computation results but impacts visualization in plots and the coordinates displayed in plots/tables. By setting your coordinates to "chr1:1-XXX" (where "XXX" represents the length of your sequence), the sequence will surely correspond to a series of N nucleotides, which will be displayed in grey in genomic plots. For output results, as coordinates are arbitrarily set, they must not be taken into consideration in tables or plots. Therefore, it is recommended to choose either the "CpG numbers" or "None" options for CpG position labels in the Grouped Analysis.
- Accessible option: Using an arbitrary reference genome with minor code ajustements
You can make several adjustments to ensure correct coordinates with an arbitrary genome: - Remove the sequence track in the genomic plots by modifying the
ABSP_functions.Rscript, in theindividual_meth_plot()(line 499) andgenomic_plot()(line 1172) functions. This consists in removing thesTrackobject in theplotTracks()function at lines 521 and 1285. Changing colors of nucleotides in genomic plots to grey by modifying the
bases_colorsobject in theABSP_individual_analysis.Rmdscript, (line 365) and in theABSP_grouped_analysis.Rmdscript (line 361).Advanced option: Using your own assemblies instead of a reference genome
For genomic plots, the Gviz package works best with a BSgenome. Although aDNAStringSetobject can replace the BSgenome, but for simplicity whithin ABSP, it needs to be manually added directly within the plot function (you can check for theDNAStringSetclass from Biostings, andSequenceTrackclass from Gviz). Below is a an example.
Example with cs10 assembly :
To test the readDNAStringSet() function before running ABSP with the Biostings package installed: (replace file path)
```r
install.packages("BiocManager")
BiocManager::install("Biostrings")
library(Biostrings)
Replace "/PATH/TO/FOLDER/GCF900626175.2cs10_genomic.fna.gz" with the actual file path
cs10 <- readDNAStringSet("/PATH/TO/FOLDER/GCF900626175.2cs10_genomic.fna.gz") ```
The incorporation of the new assembly must be added at two locations:
- In the
ABSP_function.Rscript, apply these modifications in theindividual_meth_plot()function (line 515): ```r cs10 <- readDNAStringSet("/PATH/TO/FOLDER/GCF900626175.2cs10_genomic.fna.gz") # replace path seqlevels(cs10)[1:10] <- c("chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chrX") # select and rename chromosomes
Sequence track
sTrack <- SequenceTrack(sequence = cs10, add53=F , noLetters=T, fontcolor=basescolors) # modify sequence and/or genome
``
* In theABSPfunction.Rscript, apply these modifications in thegenomicplot()function (line 1263):
``r
cs10 <- readDNAStringSet("/PATH/TO/FOLDER/GCF900626175.2cs10genomic.fna.gz") # replace path
seqlevels(cs10)[1:10] <- c("chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chrX") # select and rename chromosomes
Sequence track
sTrack <- SequenceTrack(sequence = cs10, chromosome = unique(methdf$chromosome), genome = "cs10", add53=F , noLetters=T, fontcolor=basescolors, name="sequence", col.border.title="transparent",cex=0.5) # modify sequence and/or genome ```
Then, use ABSP with any genome as the default input, as it will bypass it.
2. My reference sequence doesn't exist in a reference genome (e.g. custom reporter sequence), what can I do ?
In scenarios such as creating a custom reporter sequence, you can specify arbitrary coordinates in the header of the reference sequence FASTA file. These coordinates should align with the length of your sequence; for example, 'chr1:1-200' for a 200 bp sequence. After setting the coordinates, please consult the previous section titled 'My reference genome is not available, what can I do ?' for guidance on selecting and adjusting the reference genome to suit your needs.
3. Why is my sequence too short after trimming, and how can I fix it ?
You may either end up with a very short final sequence with only a few CpG sites covered in the Methylation tab, or the analysis may be aborted with the error message: Error: Analysis has been stopped as no CpG sites were found covered by sequencing results.
First, check the Trimming plot in the Sequencing trimming tab, as this will provide insight into the trimming step results. The results from both trimming methodsbased on quality scores (orange) and primary ratios (blue)are combined to produce the final trimmed sequence (green). The purpose of trimming is to remove low-quality ends from the sequence. It should not trim the sequence too short, nor should it allow low-quality ends to be kept for next steps of the analysis.
- If the trimming method based on Phred quality scores is too aggressive, consider lowering the minimum base-calling error probability threshold
th_quality_errorin theABSP_individual_analysis.Rmdfile (line 607). By default, this is set to 0.001, which corresponds to a base-calling accuracy of 99.9% and a Phred quality score of 30. Lowering it to 0.01, for example, corresponds to a base-calling accuracy of 99% and a Phred quality score of 20. - If the trimming method based on Primary peak ratios is too strict, you can adjust the following thresholds:
th_mixed_position(default: 0.75, line 612 in the ABSPindividualanalysis.Rmd file), this is the minimum primary peak ratio for a position to be considered non-mixed, and/orth_mixed_perc(default: 75%, line 614), this is the minimum percentage of non-mixed positions in the trimmed sequence.
To assist in setting these thresholds, review the detailed values of Phred quality scores and primary peak ratios for each position in the Raw Sequence > Table tab. Keep in mind the potential impact that lowering thresholds may have on subsequent steps in the analysis.
License
ABSP, Analysis of Bisulfite Sequencing PCR
Copyright 2023 by the CANTHER laboratory, France (absp@univ-lille.fr)
Released under the GPL-3 license.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Owner
- Name: ABSP team
- Login: ABSP-methylation-tool
- Kind: user
- Repositories: 1
- Profile: https://github.com/ABSP-methylation-tool
GitHub Events
Total
- Issues event: 2
- Watch event: 1
- Issue comment event: 1
- Push event: 2
- Create event: 1
Last Year
- Issues event: 2
- Watch event: 1
- Issue comment event: 1
- Push event: 2
- Create event: 1