https://github.com/ccicb/sigstart
Parse Mutation Files to Formats Compatible with Signature Analysis Toolkits
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
Repository
Parse Mutation Files to Formats Compatible with Signature Analysis Toolkits
Basic Info
- Host: GitHub
- Owner: CCICB
- License: other
- Language: R
- Default Branch: main
- Size: 529 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 1
- Releases: 0
Created 9 months ago
· Last pushed 9 months ago
Metadata Files
Readme
License
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# sigstart
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[](https://CRAN.R-project.org/package=sigstart)
[](https://app.codecov.io/gh/CCICB/sigstart)
[](https://github.com/CCICB/sigstart/actions/workflows/R-CMD-check.yaml)
**Sigstart** reads small variants, copy number changes, and structural rearrangements from common bioinformatics file formats like VCFs and segment files. It converts these data into formats compatible with **sigminer** and other signature analysis tools.
## Installation
You can install the development version of sigstart from [GitHub](https://github.com/) with:
``` r
if (!require("remotes"))
install.packages("remotes")
remotes::install_github("selkamand/sigstart")
```
## Quick Start
```{r}
library(sigstart)
# Convert SNVs and Indels from a tumor-normal VCF -> MAF-like dataframe for sigminer
path_vcf_snv <- system.file("tumor_normal.purple.somatic.vcf.gz",package = "sigstart")
sigminer_maf_like_dataframe <- parse_vcf_to_sigminer_maf(path_vcf_snv, sample_id = "tumor")
# Convert SNVs and INDELs from sample-chrom-pos-ref-alt TSV file to a MAF-like dataframe (Pos should be 1-based)
path_tsv <- system.file("snvs_indels.tsv", package = "sigstart")
sigminer_maf_like_dataframe <- parse_tsv_to_sigminer_maf(path_tsv, verbose = FALSE)
# Convert SNVs and INDELs from a single-sample chrom-pos-ref-alt TSV file to a MAF-like dataframe (Pos should be 1-based)
path_tsv <- system.file("snvs_indels.singlesample.tsv", package = "sigstart")
sigminer_maf_like_dataframe <- parse_tsv_to_sigminer_maf(path_tsv, sample_id = "tumor", verbose = FALSE)
# Convert SNVs and INDELs from Annovar annotated single-sample file to a MAF-like dataframe
path_annovar <- system.file("annovar.hg38.txt", package = "sigstart")
sigminer_maf_like_dataframe <- parse_tsv_to_sigminer_maf(path_annovar, sample_id = "tumor", verbose = FALSE,)
# Convert Purple SVs from VCF -> BEDPE-like structure for sigminer
path_vcf_sv <- system.file("tumor.purple.sv.vcf", package = "sigstart")
sigminer_sv_dataframe <- parse_purple_sv_vcf_to_sigminer(path_vcf_sv, sample_id = "tumor")
# Convert Purple SVs from VCF -> BEDPE structure for sigprofiler
bedpe_dataframe <- parse_purple_sv_vcf_to_sigprofiler(path_vcf_sv)
# Convert Purple SVs from VCF -> BEDPE structure (identical output to above)
bedpe_dataframe <- parse_purple_sv_vcf_to_bedpe(path_vcf_sv)
# Convert Purple CN Segment File -> Sigminer
path_cn <- system.file("purple.cnv.somatic.tsv", package = "sigstart")
sigminer_cn_dataframe <- parse_purple_cnv_to_sigminer(path_cn, sample_id = "tumor")
# Convert CN Segment files from other tools -> Sigminer
# (by manually specifying the column name mappings)
path_cn_notpurple <- system.file("single_sample.copynumber.tsv", package = "sigstart")
sigminer_cn_dataframe <- parse_cnv_to_sigminer(
path_cn_notpurple,
sample_id = "tumor",
col_chromosome = "chr",
col_start = "start",
col_end = "end",
col_copynumber = "total_cn",
col_minor_cn = "minor_cn"
)
```
## Cohort -> Single Sample Files
Signature analysis tools including sigscreen and signal are easier to run from single sample data files. Here we demonstrate how to split common cohort/multisample filetypes into a collection of single-sample files.
```{r remove_old_single_sample_dir, include=FALSE}
unlink("single_sample_files", recursive = TRUE)
```
### Converting cohort MAF to single sample VCFs
Signature analysis tools including sigscreen and signal are easier to run from single sample VCFs than cohort-MAFs. The convert_maf_to_vcfs function splits a cohort MAF file into a collection of minimal single sample vcfs.
```{r}
# Split MAF into single sample VCFs
path_maf <- system.file(package = "sigstart", "snvs_indel.5sample.maf")
convert_maf_to_vcfs(
path_maf,
outdir = "single_sample_files/vcfs",
verbose = FALSE
)
# Parse the vcfs
# (no need to specify a sample ID since these are single sample VCFs)
maf <- parse_vcf_to_sigminer_maf("single_sample_files/vcfs/sampleA.snv_indel.vcf.bgz", verbose = FALSE)
head(maf)
```
### Converting cohort copynumber calls into single sample segment VCFs
```{r}
# Split cohort CNV segment files into single sample files
path_copynumber <- system.file(package = "sigstart", "cohort.3sample.copynumber.hg38.tsv")
convert_cohort_segment_file_to_single_samples(
path_copynumber,
outdir = "single_sample_files/cnvs",
verbose = FALSE
)
# Parse the copynumber file
cnv <- parse_cnv_to_sigminer("single_sample_files/cnvs/sampleA.copynumber.tsv.bgz", sample_id = "sampleA")
head(cnv)
```
Owner
- Name: CCICB
- Login: CCICB
- Kind: organization
- Repositories: 4
- Profile: https://github.com/CCICB
GitHub Events
Total
- Issues event: 1
- Push event: 1
- Create event: 1
Last Year
- Issues event: 1
- Push event: 1
- Create event: 1
Dependencies
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v4 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v4 composite
- actions/upload-artifact v4 composite
- codecov/codecov-action v5 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- Rsamtools * imports
- StructuralVariantAnnotation * imports
- VariantAnnotation * imports
- assertions * imports
- cli * imports
- data.table * imports
- dplyr * imports
- plyranges * imports
- rlang * imports
- sigshared >= 0.0.0.9000 imports
- vcfR * imports
- testthat >= 3.0.0 suggests
- withr * suggests