Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
    Organization cbg-ethz has institutional domain (www.bsse.ethz.ch)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: cbg-ethz
  • License: gpl-3.0
  • Language: C++
  • Default Branch: master
  • Size: 47.9 MB
Statistics
  • Stars: 22
  • Watchers: 6
  • Forks: 11
  • Open Issues: 5
  • Releases: 0
Created over 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Zenodo

README.md

COMPASS

DOI: 10.5281/zenodo.10822292

COpy number and Mutations Phylogeny from Amplicon Single-cell Sequencing

This tool can be used to infer a tree of somatic events (mutations and copy number alterations) that occurred in a tumor. It is specifically designed to be used for MissionBio's Tapestri data, where a small number of amplicons (50-300) are sequenced for thousands of single-cells.

The method is described in the publication: COMPASS: joint copy number and mutation phylogeny reconstruction from amplicon single-cell sequencing data, Sollier et al., Nature Communications 2023

Quick start

git clone https://github.com/cbg-ethz/COMPASS.git cd COMPASS make ./COMPASS -i data/preprocessed_data_AML_Morita2020/AML-59-001 -o AML-59-001 --nchains 4 --chainlength 5000 --CNA 1 dot -Tpng -o AML-59-001_tree.png AML-59-001_tree.gv

Graphviz is required in order to plot the tree, which can be installed on Ubuntu by running sudo apt-get install graphviz

[!WARNING]
If you are using macOS and fail to compile COMPASS because of openMP errors, please download this alternative version which does not use openMP: https://github.com/cbg-ethz/COMPASS/archive/refs/heads/no_OMP.zip.

Usage

./COMPASS -i [sample_name] -o [output_name] --nchains 4 --chainlength 5000 --CNA 1 --sex female

Where: * -i is the input sample name, see below for the format of the input files * -o is the output name. The output is a tree in graphviz format. * --nchains indicates the number of MCMC chains to run in parallel * --chainlength indicate the number of iterations in each MCMC * --CNA can be set to 1 to use CNA, or 0 to only use SNVs * --sex can be female (default, 2 X chromosomes) or male (1 X chromosome)

Additional parameters can be changed if needed, although their default values should work for most cases: * -d (default: 1): if 1, COMPASS will use the model with doublets, and if 0, COMPASS will use the model without doubets (faster) * --doubletrate (default: 0.08) determines the doublet rate, in case -d is set to 1. * --dropoutrate (default: 0.05): prior mean of the allelic dropout rates. The dropout rates will be estimated for each SNV with a beta binomial distribution. * --dropoutrate_concentration (default: 100): prior concentration parameter for the beta binomial distribution for dropout rates. Higher values will result in the estimated dropout rates to be closer to the prior mean. * --seqerror (default: 0.02): sequencing error rate * --nodecost (default: 1): cost of adding a node to the tree * --cnacost (default: 85): cost of adding a CNA event to the tree. Increasing/decreasing this parameter will result in trees with fewer/more CNA events. * --lohcost (default: 85) cost of adding a LOH event to the tree

In targeted sequencing, different regions have different coverages, depending on the number of amplicons targeting each region and the efficiency of the primers. By default, COMPASS will use the cells attached to the root in order to estimate the proportion of reads falling on each region in the absence of CNAs. Optionally, it is possible to provide the weights of each region with the arguments --regionweights. An example csv file is provided in data/preprocessed_data_Morita2020/region_weights_50amplicons.csv and a script to generate such a csv file is provided at Experiments/preprocessing/estimate_region_weights.py.

Use with Docker

docker run -t -v `pwd`:`pwd` -w `pwd` esollier/compass:v1.1 COMPASS -i data/preprocessed_data_AML_Morita2020/AML-59-001 -o AML-59-001 --nchains 4 --chainlength 5000 --CNA 1

Input

COMPASS takes as input 2 files: * [samplename]variants.csv: each line corresponds to a variant. The first columns contain metadata and the remaining columns contain REFCOUNT:ALTCOUNT, where REFcount is the number of reference reads and ALTCOUNT is the number of alternative reads for this variant in this cell, separated by a ":". The example files in the data directory contain an additional GENOTYPE value, which is 0 for hom ref, 1 for het, 2 for hom alt and 3 for missing, but this value is optional and will be ignored by COMPASS. * [samplename]regions.csv: each line corresponds to a region (typically, a gene). The first column is CHR_REGIONNAME, and the remaining columns contain the number of reads in this region, for each cell. This file is only required in case CNAs are used (--CNA 1).

The data directory contains preprocessed datasets. The Experiments/preprocessing directory contains scripts used to preprocess the loom files generated by the Tapestri pipeline, as well as workflows used to run simulations on synthetic data.

Output

If [outputname] ends with .gv , COMPASS will only output the tree in graphviz format, which can then be plotted. Otherwise, COMPASS will produce as output: * [outputname]tree.gv: tree in graphviz format * [outputname]tree.json: tree in json format * [outputname]cellAssignments.tsv: hard assignments of cells to nodes, and whether or not the cell was inferred to be a doublet (in which case the node assignment is unreliable). * [outputname]cellAssignmentsProbs.tsv: posterior attachment probabilities of cells to nodes * [outputname]nodesgenotypes.tsv: Genotype of each SNV for each node (0: no mutation; 1: heterozygous; 2: homozygous mutated) * [outputname]nodes_copynumbers.tsv: Copy number of each region for each node

The data/output_example directory contains an example output.

Owner

  • Name: Computational Biology Group (CBG)
  • Login: cbg-ethz
  • Kind: organization
  • Location: Basel, Switzerland

Beerenwinkel Lab at ETH Zurich

GitHub Events

Total
  • Issues event: 2
  • Watch event: 4
  • Issue comment event: 8
  • Fork event: 4
Last Year
  • Issues event: 2
  • Watch event: 4
  • Issue comment event: 8
  • Fork event: 4

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 79
  • Total Committers: 2
  • Avg Commits per committer: 39.5
  • Development Distribution Score (DDS): 0.177
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
e-sollier e****r@g****m 65
murphycj c****j@g****m 14

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 17
  • Total pull requests: 3
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 5 hours
  • Total issue authors: 10
  • Total pull request authors: 2
  • Average comments per issue: 3.35
  • Average comments per pull request: 0.67
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 0
  • Average time to close issues: 2 months
  • Average time to close pull requests: N/A
  • Issue authors: 4
  • Pull request authors: 0
  • Average comments per issue: 4.4
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • murphycj (5)
  • reJELIN (3)
  • andreyurch (2)
  • Kyung-TaeLee (1)
  • sofiedemeyer (1)
  • LukaP-BB (1)
  • johan-gson (1)
  • priyadhiman007 (1)
  • kjpg (1)
  • BenjaminPLille (1)
Pull Request Authors
  • e-sollier (1)
  • murphycj (1)
Top Labels
Issue Labels
Pull Request Labels