https://github.com/ccicb/sigstart

Parse Mutation Files to Formats Compatible with Signature Analysis Toolkits

https://github.com/ccicb/sigstart

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Parse Mutation Files to Formats Compatible with Signature Analysis Toolkits

Basic Info
  • Host: GitHub
  • Owner: CCICB
  • License: other
  • Language: R
  • Default Branch: main
  • Size: 529 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created 9 months ago · Last pushed 9 months ago
Metadata Files
Readme License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# sigstart


[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![CRAN status](https://www.r-pkg.org/badges/version/sigstart)](https://CRAN.R-project.org/package=sigstart)
[![Codecov test coverage](https://codecov.io/gh/CCICB/sigstart/graph/badge.svg)](https://app.codecov.io/gh/CCICB/sigstart)
[![R-CMD-check](https://github.com/CCICB/sigstart/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/CCICB/sigstart/actions/workflows/R-CMD-check.yaml)


**Sigstart** reads small variants, copy number changes, and structural rearrangements from common bioinformatics file formats like VCFs and segment files. It converts these data into formats compatible with **sigminer** and other signature analysis tools.

## Installation

You can install the development version of sigstart from [GitHub](https://github.com/) with:

``` r
if (!require("remotes"))
    install.packages("remotes")

remotes::install_github("selkamand/sigstart")
```


## Quick Start

```{r}
library(sigstart)

# Convert SNVs and Indels from a tumor-normal VCF -> MAF-like dataframe for sigminer
path_vcf_snv <- system.file("tumor_normal.purple.somatic.vcf.gz",package = "sigstart")
sigminer_maf_like_dataframe <- parse_vcf_to_sigminer_maf(path_vcf_snv, sample_id = "tumor")

# Convert SNVs and INDELs from sample-chrom-pos-ref-alt TSV file to a MAF-like dataframe (Pos should be 1-based)
path_tsv <- system.file("snvs_indels.tsv", package = "sigstart")
sigminer_maf_like_dataframe <- parse_tsv_to_sigminer_maf(path_tsv, verbose = FALSE)

# Convert SNVs and INDELs from a single-sample chrom-pos-ref-alt TSV file to a MAF-like dataframe (Pos should be 1-based)
path_tsv <- system.file("snvs_indels.singlesample.tsv", package = "sigstart")
sigminer_maf_like_dataframe <- parse_tsv_to_sigminer_maf(path_tsv, sample_id = "tumor", verbose = FALSE)

# Convert SNVs and INDELs from Annovar annotated single-sample file to a MAF-like dataframe
path_annovar <- system.file("annovar.hg38.txt", package = "sigstart")
sigminer_maf_like_dataframe <- parse_tsv_to_sigminer_maf(path_annovar, sample_id = "tumor", verbose = FALSE,)

# Convert Purple SVs from VCF -> BEDPE-like structure for sigminer
path_vcf_sv <- system.file("tumor.purple.sv.vcf", package = "sigstart")
sigminer_sv_dataframe <-  parse_purple_sv_vcf_to_sigminer(path_vcf_sv, sample_id = "tumor")

# Convert Purple SVs from VCF -> BEDPE structure for sigprofiler
bedpe_dataframe <- parse_purple_sv_vcf_to_sigprofiler(path_vcf_sv)

# Convert Purple SVs from VCF -> BEDPE structure (identical output to above)
bedpe_dataframe <- parse_purple_sv_vcf_to_bedpe(path_vcf_sv)

# Convert Purple CN Segment File -> Sigminer 
path_cn <- system.file("purple.cnv.somatic.tsv", package = "sigstart")
sigminer_cn_dataframe <- parse_purple_cnv_to_sigminer(path_cn, sample_id = "tumor")

# Convert CN Segment files from other tools -> Sigminer
# (by manually specifying the column name mappings)
path_cn_notpurple <- system.file("single_sample.copynumber.tsv", package = "sigstart")
sigminer_cn_dataframe <- parse_cnv_to_sigminer(
  path_cn_notpurple,
  sample_id = "tumor",
  col_chromosome = "chr",
  col_start = "start",
  col_end = "end",
  col_copynumber = "total_cn",
  col_minor_cn = "minor_cn"
)

```

## Cohort -> Single Sample Files

Signature analysis tools including sigscreen and signal are easier to run from single sample data files. Here we demonstrate how to split common cohort/multisample filetypes into a collection of single-sample files.

```{r remove_old_single_sample_dir, include=FALSE}
unlink("single_sample_files", recursive = TRUE)
```


### Converting cohort MAF to single sample VCFs

Signature analysis tools including sigscreen and signal are easier to run from single sample VCFs than cohort-MAFs. The convert_maf_to_vcfs function splits a cohort MAF file into a collection of minimal single sample vcfs.

```{r}
# Split MAF into single sample VCFs
path_maf <- system.file(package = "sigstart", "snvs_indel.5sample.maf")
convert_maf_to_vcfs(
  path_maf, 
  outdir = "single_sample_files/vcfs", 
  verbose = FALSE
)

# Parse the vcfs 
# (no need to specify a sample ID since these are single sample VCFs)
maf <- parse_vcf_to_sigminer_maf("single_sample_files/vcfs/sampleA.snv_indel.vcf.bgz", verbose = FALSE)
head(maf)
```

### Converting cohort copynumber calls into single sample segment VCFs

```{r}
# Split cohort CNV segment files into single sample files 
path_copynumber <- system.file(package = "sigstart", "cohort.3sample.copynumber.hg38.tsv")
convert_cohort_segment_file_to_single_samples(
  path_copynumber, 
  outdir = "single_sample_files/cnvs", 
  verbose = FALSE
)

# Parse the copynumber file
cnv <- parse_cnv_to_sigminer("single_sample_files/cnvs/sampleA.copynumber.tsv.bgz", sample_id = "sampleA")
head(cnv)
```

Owner

  • Name: CCICB
  • Login: CCICB
  • Kind: organization

GitHub Events

Total
  • Issues event: 1
  • Push event: 1
  • Create event: 1
Last Year
  • Issues event: 1
  • Push event: 1
  • Create event: 1

Dependencies

.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v4 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • codecov/codecov-action v5 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • Rsamtools * imports
  • StructuralVariantAnnotation * imports
  • VariantAnnotation * imports
  • assertions * imports
  • cli * imports
  • data.table * imports
  • dplyr * imports
  • plyranges * imports
  • rlang * imports
  • sigshared >= 0.0.0.9000 imports
  • vcfR * imports
  • testthat >= 3.0.0 suggests
  • withr * suggests