https://github.com/cbg-ethz/cojac

Keywords from Contributors

sequences archival projection bioinformatics interactive generic covid-19 clade sars-cov-2 observability

Last synced: 7 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: cbg-ethz
License: gpl-3.0
Language: Jupyter Notebook
Default Branch: master
Size: 2.95 MB

Statistics

Stars: 19
Watchers: 10
Forks: 5
Open Issues: 13
Releases: 4

Created about 5 years ago · Last pushed 11 months ago

Metadata Files

Readme License

COJAC - CoOccurrence adJusted Analysis and Calling

The COJAC tool is part of the V-pipe workflow for analysing NGS data of short viral genomes.

Description

The cojac package comprises a set of command-line tools to analyse co-occurrence of mutations on amplicons. It is useful, for example, for early detection of viral variants of concern (e.g. Alpha, Delta, Omicron) in environmental samples, and has been designed to scan for multiple SARS-CoV-2 variants in wastewater samples, as analyzed jointly by ETH Zurich, EPFL and Eawag. Learn more about this project on its Dashboard.

The analysis requires the whole amplicon to be covered by sequencing read pairs. It currently works at the level of aligned reads, but we plan to be able to adjust confidence scores based on local (window) haplotypes (as generated, e.g., by ShoRAH, doi:10.1186/1471-2105-12-119).

Usage

Here are the available command-line tools:

| command | purpose | | :---------------------- | :------ | | cojac cooc-mutbamscan | scan an alignment BAM/CRAM/SAM file for mutation co-occurrences and output a JSON or YAML file | | cojac cooc-colourmut | display a JSON or YAML file as a coloured output on the terminal | | cojac cooc-pubmut | render a JSON or YAML file to a table as in the publication | | cojac cooc-tabmut | export a JSON or YAML file as a CSV/TSV table for downstream analysis (e.g.: RStudio) | | cojac cooc-curate | an (experimental) tool to assist evaluating the quality of variant definitions by looking at mutations' or cooccurrences' frequencies from covSPECTRUM | | cojac phe2cojac | a tool to generate new variant definition YAMLs for cojac using YMLs available at UK Health Security Agency (UKHSA) Standardised Variant Definitions | | cojac sig-generate | a tool to generate a list of mutations by querying covSPECTRUM and assist writing variant definition YAMLs for cojac |

Use option -h / --help to see available command-line options:

```console $ cojac cooc-mutbamscan --help Usage: cojac cooc-mutbamscan [OPTIONS]

Scan amplicon (covered by long read pairs) for mutation cooccurrence

Options: -a, --alignments BAM/CRAM alignment files -n, --name NAME when using alignment files, name to use for the output -s, --samples TSV V-pipe samples list tsv --batchname SEP concatenate samplename/batchname from samples tsv -p, --prefix PATH V-pipe work directory prefix for where to look at align files when using TSV samples list -r, --reference REFID reference to look for in alignment files -m, --vocdir DIR directory containing the yamls defining the variant of concerns -V, --voc VOC individual yamls defining the variant of concerns --rev, --with-revert / --no-rev, --without-revert also include reverts when compiling amplicons (requires VOC YAML files with revert category) -b, --bedfile BED bedfile defining the amplicons, with format: ref\tstart\tstop\tamp_num\tpool\tstrand --sort / --no-sort sort the bedfile by 'reference name' and 'start position' (default: sorted) -#, --cooc COOC minimum number of cooccurrences to search for --fix-subset, --fs / --no-fix-subset, --no-fs Fix variants attribution when cooccurrence are subset/superset of other variants -Q, --amplicons, --in-amp, --in-amplicons YAML use the supplied YAML file to query amplicons instead of building it from BED + voc's DIR -A, --out-amp, --out-amplicons YAML output amplicon query in a YAML file --comment / --no-comment add comments in the out amplicon YAML with names from BED file (default: comment the YAML) -j, --json JSON output results to a JSON file -y, --yaml YAML output results to a yaml file -t, --tsv TSV output results to a (raw) tsv file -d, --dump dump the python object to the terminal -h, --help Show this message and exit.

@listfile can be used to pass a long list of parameters (e.g.: a large number of BAMs) in a file instead of command line ```

```console $ cojac cooc-colourmut --help Usage: cojac cooc-colourmut [OPTIONS]

Print coloured pretty table on terminal

Options: -a, --amplicons YAML list of query amplicons, from mutbamscan [required] -j, --json JSON results generated by mutbamscan -y, --yaml YAML results generated by mutbamscan --help Show this message and exit.

See cooc-pubmut for a CSV file that can be imported into an article ```

```console $ cojac cooc-pubmut --help Usage: cojac cooc-pubmut [OPTIONS]

Make a pretty table

Options: -m, --vocdir DIR directory containing the yamls defining the variant of concerns -a, --amplicons YAML list of query amplicons, from mutbamscan -j, --json JSON results generated by mutbamscan -y, --yaml YAML results generated by mutbamscan -o, --output CSV name of (pretty) csv file to save the table into -e, --escape / -E, --no-escape use escape characters for newlines -x, --excel use a semi-colon ';' instead of a comma ',' in the comma-separated-files as required by Microsoft Excel --batchname SEP separator used to split samplename/batchname in separate column -q, --quiet Run quietly: do not print the table --help Show this message and exit.

You need to open the CSV in a spreadsheet that understands linebreaks ```

```console $ cojac cooc-tabmut --help Usage: cojac cooc-tabmut [OPTIONS]

Make a table suitable for further processing: RStudio, etc

Options: -j, --json JSON results generated by mutbamscan -y, --yaml YAML results generated by mutbamscan --batchname SEP separator used to split samplename/batchname in separate column -o, --output CSV name of (raw) csv file to save the table into -l, --lines Line-oriented table alternative -x, --excel use a semi-colon ';' instead of a comma ',' in the comma-separated-files as required by Microsoft Excel -m, --multiindex Use multi-level indexing (amplicons and counts categories) -a, --add-mutations, --am YAML add mutations descriptions using list of query amplicons, from mutbamscan -q, --quiet Run quietly: do not print the table -h, --help Show this message and exit. ```

```console $ cojac cooc-curate --help Usage: cojac cooc-curate [OPTIONS] [VOC]...

Helps determining specific mutations and cooccurrences by querying covSPECTRUM

Options: -u, --url URL url to use when querying covspectrum (e.g. https://lapis.cov-spectrum.org/open/v2, https://lapis.cov-spectrum.org/gisaid/v2, etc.) --lintype FIELD switch the lineage field queried on covspectrum (e.g. nextcladePangoLineage: as determined with nextclade, pangoLineage: as provided by upstream sequence repository) -a, --amplicons YAML use the YAML file generated by mutbamscan to query amplicons instead of mutations -m, --mutations always do mutations (even if amplicons YAML provided) -H, --high FLOAT Fraction above which a mutation must be found among seeked lineages -l, --low FLOAT Fraction under which a mutation must be found among other lineages --collapse / --no-collapse combine counts of all sublineages together and consider a single value that corresponds to a lineages family (e.g.: count all B.1.612.2* together). This is especially useful for assessing signatures of old variants that have branched out by now. --colour / --no-colour use coloured output --debug / --no-debug show API calls details (urls and arguments) -h, --help Show this message and exit.

This tool queries LAPIS, see https://lapis-docs.readthedocs.io/en/latest/ ```

```console $ cojac phe2cojac --help Usage: cojac phe2cojac [OPTIONS] IN_YAML

convert phe-genomics to cojac's dedicated variant YAML format

Options: -s, --shortname SHRT shortname to use (otherwise auto-build one based on phe-genomic's unique id) -y, --yaml OUT_YAML write cojac variant to a YAML file instead of printing (if empty, build filename from shortname) --help Show this message and exit. ```

```console $ cojac sig-generate --help Usage: cojac sig-generate [OPTIONS]

Helps generating a list of mutations frequently found in a variant by querying covSPECTRUM

Options: -u, --url URL url to use when querying covspectrum (e.g. https://lapis.cov-spectrum.org/open/v2, https://lapis.cov-spectrum.org/gisaid/v2, etc.) --lintype FIELD switch the lineage field queried on covspectrum (e.g. nextcladePangoLineage: as determined with nextclade, pangoLineage: as provided by upstream sequence repository) --var, --variant PANGO Pangolineage of the root variant to list [required] --extras LAPIS Additional LAPIS query arguments passed as a YAML flow, e.g.: '{dateFrom: "2022-02-01", variantQuery: "[6-of: S:147E, S:152R, S:157L, S:210V, S:257S, S:339H, S:446S, S:460K, ORF1a:1221L, ORF1a:1640S, ORF1a:4060S]"}'. For more information about LAPIS, see: https://lapis-docs.readthedocs.io/en/latest/ -f, --minfreq FREQ Minimum frequency for inclusion in list -d, --mindelfreq FREQ Use a different minimum frequency for deletions (useful early on when there are few sequences and some of those were produced by pipelines that don't handle deletions) -s, --minseqs NUM Minimum number of sequence supporting for inclusion in list --covariants TSV import from a covariants.org TSV file instead of covSpectrum. (See: https://github.com/hodcroftlab/co variants/blob/master/defining_mutations/) --debug / --no-debug show 'extra' query content, show API details (urls and arguments) -h, --help Show this message and exit.

This tool queries LAPIS, see https://lapis-docs.readthedocs.io/en/latest/ ```

Howto

Input data requirements

Analysis needs to be performed on SARS-CoV-2 samples sequenced using a tiled multiplexed PCRs protocol for which you need a BED (Browser Extensible Data) file describing the amplified regions, and sequenced with read settings that covers the totality of an amplicon.

We provide BED files for the following examples: - nCoV-2019.insert.V3.bed for ARTIC V3 - SARS-CoV-2.insert.V4.txt for ARTIC V4

Note: - if you have a BED file describing the primers' target binding region, it's possible to convert it into an BED inserts using the tool viramp-hub: ```bash

download the primer BED file for Artic v5.3.2

curl -o SARS-CoV-2.v532.primer.bed 'https://raw.githubusercontent.com/artic-network/primer-schemes/master/nCoV-2019/V5.3.2/SARS-CoV-2.primer.bed'

convert it into an insert BED file

scheme-convert SARS-CoV-2.v532.primer.bed --to bed --bed-type cojac -o SARS-CoV-2.v532.cojac_insert.bed ``` - for a useful application of primer BED files to searching for possible drop-outs, see section Mutations affecting primers below.

These protocols produce ~400bp long amplicons, and thus needs to be sequenced with, e.g., paired end sequencing with read length 250.

Select the desired bedfile using the -b / --bedfile option.

Note: - this analysis method cannot work on read length much shorter than the amplicons (e.g.: it will not give reliable results for a read-length of 50). - to use different protocols (e.g. Nimagen), you need to provide a BED file describing the amplicons. Its columns "start" and "stop" are mandatory

Analysis will use variants description YAML that list mutation to be searched.

We provide several examples in the directory voc/. The current variants' mutation lists that we use in production as part of our wastewater-based surveillance of SARS-CoV-2 variants can be found in the repository COWWID, in the subdirectory voc/.

Select a directory containing a collection of virus definitions YAMLs using the -m / --vocdir option, or list individual YAML file(s) with option --voc.

Note: - you can create new YAML files if you need to look for new variants of concern. - e.g. it is possible to automatically generate YAMLs listing a few key mutations for cojac from UK Health Security Agency (UKHSA) Standardised Variant Definitions: ```bash

fetch the repository of standardised variant definitions

git clone https://github.com/ukhsa-collaboration/variant_definitions.git

generate a YAML for omicron subvariant BA.2 using the corresponding standardised variant definitions

cojac phe2cojac --shortname 'om2' --yaml voc/omicronba2mutations.yaml variantdefinitions/variantyaml/imagines-viewable.yml

now have a look at the frequencies of mutations using covSPECTRUM

cojac cooc-curate voc/omicronba2mutations.yaml

adjust the content of the YAML files to your needs

- Another possibility is obtaining an exhaustive list of mutations from covSpectrum or [covariants.org's repository](https://github.com/hodcroftlab/covariants/tree/master/defining_mutations)bash

display the exhaustive list of all mutations known to appear on Omicron BA.1 on covSPECTRUM:

cojac sig-generate --url https://lapis.cov-spectrum.org/open/v2 --variant BA.1 | tee list_ba1.yaml

or, Alternatively, download the TSV from covariants.org's repo and extract the list:

curl -O 'https://raw.githubusercontent.com/hodcroftlab/covariants/master/definingmutations/21K.Omicron.tsv' cojac sig-generate --covariants 21K.Omicron.tsv --variant 'BA.1' | tee listba1.yaml

add a YAML header to the list:

(at minimum you NEED to specify the 'pangolin' lineage and give it a 'short' handle)

(source and 'nextstrain' lineages are optional)

cat - listba1.yaml > voc/omicronba1mutationsfull.yaml <<HEAD variant: short: 'om1' nextstrain: '21K' pangolin: 'BA.1' source: - https://github.com/cov-lineages/pango-designation/issues/361 - https://github.com/hodcroftlab/covariants/blob/master/defining_mutations/21K.Omicron.tsv mut: HEAD

now have a look at the frequencies of mutations using covSPECTRUM

cojac cooc-curate voc/omicronba2mutations_full.yaml ```

Collect the co-occurrence data

If you're not executing COJAC as part of a larger workflow, such as V-pipe, you can analyse stand-alone BAM/CRAM/SAM alignment files.

Standalone files

Provide a list of BAM files using the -a / --alignment option. Run:

bash cojac cooc-mutbamscan -b nCoV-2019.insert.V3.bed -m voc/ -a sam1.bam sam2.bam -j cooc-test.json

Note: - you can also use the -y / --yaml option to write to a YAML file instead of a JSON. - as an optimisation tip of your workflow, try running one separate instance of COJAC on each BAM file, and combining the results afterward in a single YAML (or JSON).

Analyzing a cohort previously aligned by V-pipe

Before the integration of COJAC to V-pipe, this was the legacy method for analysing alignments produced by V-pipe.

bash cojac cooc-mutbamscan -b nCoV-2019.insert.V3.bed -m voc/ -t work/samples.tsv -p work/samples/ -j cooc-test.json

**Note:* Warning, it is much slower as each alignment is analyzed sequentially.

Number of cooccurrences

By default cooc-mutbamscan will look for cooccurrences of at least 2 mutations on the same amplicon. You can change that number using option -#/--cooc:

you can increase it to e.g.: 3 if the variants you study require more stringent identification
you can set it to 1, to also count isolated occurrences - in this case cooc-mutbamscan will also double as a generic (non coorcurrence-aware) variant caller, so you can get all counts with a single tool.

Store the amplicon query

Using the -A / --out-amp / --out-amplicons option, it is possible to store the exact request that was used to analyze samples. You can then re-use the exact same request using the -Q / --in-amp / --amplicons option, or pass it to a visualisation tool. This is useful for sharing the exact same request accross multiple parallel COJAC instances (e.g.: one per BAM file).

```bash

store the request in a YAML file

cojac cooc-mutbamscan -b nCoV-2019.insert.V3.bed -m voc/ -A amplicons.v3.yaml

adjust the content of amplicons.v3.yaml

now have a look at the frequencies of mutation cooccurrences using covSPECTRUM

cojac cooc-curate -a amplicons.v3.yaml voc/omicronba2mutations.yaml voc/omicronba1mutations.yaml voc/delta_mutations.yaml

reuse the amplicon

cojac cooc-mutbamscan -Q amplicons.v3.yaml -a sam1.bam -y cooc-sam1.yaml cojac cooc-mutbamscan -Q amplicons.v3.yaml -a sam2.bam -y cooc-sam2.yaml cat cooc-sam1.yaml cooc-sam2.yaml > cooc-test.yaml ```

Display data on terminal

The default -d / --dump option of cooc-mutbamscan is not a very user-friendly experience to display the data. You can instead pass a JSON or YAML file to the display script. Run:

bash cojac cooc-colourmut -a amplicons.v3.yaml -j cooc-test.json

terminal screen shot

Notes: - passing the -a / --amplicons parameter is currenlty mandatory, see section Store the amplicon query above

Render table for publication

And now, let’s go beyond our terminal and produce a table that can be included in a publication (see bibliography below for concrete example). Run:

bash cojac cooc-pubmut -m voc/ -a amplicons.v3.yaml -j cooc-test.json -o cooc-output.tsv

Note: - if provided options -m / --vocdir and -a / --amplicons can help generate human-friendly headers (Amplicon 88, 26277-26635) in the table instead of short names (88_om) - you can also output to comma-separated table (-o cooc-output.csv) - Microsoft Excel requires using option -x/--excel (using semi-colon instead of comma in comma-separated-value files). Some versions can also open TSV (but not the Office 365 web app).

You need to open the table with a spread-sheet that can understand line breaks, such as LibreOffice Calc, Google Docs Spreadsheet or, using special options (see above), Microsoft Excel.

| | 72al | 78al | 92al | 93al | 76be | 77d614g | | :------- | ------------------: | ---------------: | -----------------: | ------------------: | ----------------: | ---------------------: | | sam1.bam | 158 / 809
19.53% | 2 / 452
0.44% | 89 / 400
22.25% | 344 / 758
45.38% | 0 / 1090
0.00% | 371 / 371
100.00% | | sam2.bam | 0 / 1121
0.00% | 0 / 255
0.00% | 58 / 432
13.43% | 142 / 958
14.82% | 0 / 1005
0.00% | 1615 / 1615
100.00% |

It is also possible to use the software pandoc to further convert the CSV to other formats. Run:

bash cojac cooc-pubmut -j cooc-test.json -o cooc-output.csv pandoc cooc-output.csv -o cooc-output.pdf pandoc cooc-output.csv -o cooc-output.html pandoc cooc-output.csv -o cooc-output.md

Export table for downstream analysis

If you want to further analyse the data (e.g.: with RStudio), it's also possible to export the data into a more machine-readable CSV/TSV table. Run:

bash cojac cooc-tabmut -j cooc-test.json -o cooc-export.csv

You can try importing the resulting CSV in you favourite tool.

| | A72al.count | A72al.mutall | A72al.mutoneless | A72al.frac | A72al.cooc | A78al.count | A78al.mutall | A78al.mutoneless | A78al.frac | A78al.cooc | ... | | :------- | -----------: | -------------: | -----------------: | ----------: | ----------: | -----------: | -------------: | -----------------: | ----------: | ----------: | --- | | sam1.bam | 809 | 158 | 234 | 0.195303 | 2 | 452 | 2 | 7 | 0.004425 | 2 | ... | | sam2.bam | 1121 | 0 | 0 | 0.000000 | 2 | 255 | 0 | 52 | 0.000000 | 2 | ... |

The columns are tagged as following:

count: total count of amplicons carrying the sites of interest
mut_all: amplicons carrying mutations on all site of interest (e.g.: variant mutations observed on all sites)
mut_oneless: amplicons where one mutation is missing (e.g.: only 2 out of 3 sites carried the variant mutation, 1 sites carries wild-type)
frac: fraction (mutall/count)_ or empty if no counts
cooc: number of considered site (e.g.: 2 sites of interests) or empty if no counts

If your tool supports multi-level indexing, use the -m/--multiindex option. The resulting table will be bilevel indexed: the first level is the amplicon, the second is the category.

	A72_al					A78_al
	count	mut_all	mut_oneless	frac	cooc	count	mut_all	mut_oneless	frac	cooc
sam1.bam	809	158	234	0.195303	2	452	2	7	0.004425	2
sam2.bam	1121	0	0	0.0	2	255	0	52	0.0	2

Another different table orientation is provided by -l/--lines:

| sample | amplicon | frac | cooc | count | mutall | mutoneless | al | be | d614g | | :------- | -------: | :------- | :--: | ----: | ------: | ----------: |:--:|:--:| :---: | | sam1.bam | 72 | 0.195303 | 2 | 809 | 158 | 234 | 1 | | sam1.bam | 78 | 0.004425 | 2 | 452 | 2 | 7 | 1 | | sam1.bam | 92 | 0.222500 | 3 | 400 | 89 | 3 | 1 | | sam1.bam | 93 | 0.453826 | 2 | 758 | 344 | 140 | 1 | | sam1.bam | 76 | 0.000000 | 2 | 1090 | 0 | 377 | | 1 | | sam1.bam | 77 | 1.000000 | 1 | 371 | 371 | 0 | | | 1 | | sam2.bam | 72 | 0.000000 | 2 | 1121 | 0 | 0 | 1 | | sam2.bam | 78 | 0.000000 | 2 | 255 | 0 | 52 | 1 | | sam2.bam | 92 | 0.134259 | 3 | 432 | 58 | 3 | 1 | | sam2.bam | 93 | 0.148225 | 2 | 958 | 142 | 80 | 1 | | sam2.bam | 76 | 0.000000 | 2 | 1005 | 0 | 0 | | 1 | | sam2.bam | 77 | 1.000000 | 1 | 1615 | 1615 | 0 | | | 1 |

Mutations affecting primers

It is also possible to abuse the sub-command shown in section Store the amplicon query above to get a list of mutations which fall on primers' target sites (and thus could impact binding and cause drop-outs) by providing a primer BED file.

```bash

get the primer BED file for Artic v5.3.2

curl -o SARS-CoV-2.v532.primer.bed 'https://raw.githubusercontent.com/artic-network/primer-schemes/master/nCoV-2019/V5.3.2/SARS-CoV-2.primer.bed'

get the full list of Omicron BA.2.86 mutations

curl -O 'https://raw.githubusercontent.com/cbg-ethz/cowwid/master/voc/omicronba286mutations_full.yaml'

check which primers have at least 1 mutation falling in their target binding regions

cojac cooc-mutbamscan --voc omicronba286mutationsfull.yaml --bedfile SARS-CoV-2.v532.primer.bed --no-sort --cooc 1 --out-amp affectedprimers.v532.yaml ```

This will yield entries like: yaml 50_ombba286: [7819, 7850, 7512, 7738, {7842: G}] # SARS-CoV-2_25_RIGHT

meaning: - 50th line of the BED file - primer's target binding region 7819-7850 - (that would be primer "SARS-CoV-225RIGHT" with sequence CTCTCAGGTTGTCTAAGTTAACAAAATGAGA) - is hit by one mutation 7842G

Installation

We recommend using bioconda software repositories for easy installation. You can find instruction to setup your bioconda environment at the following address:

https://bioconda.github.io/index.html#usage

In those instructions, please follow carefully the channel configuration instructions.

If you use V-pipe’s quick_install.sh, it will set up an environment that you can activate, e.g.:

bash bash quick_install.sh -b master -p testing -w work . ./testing/miniconda3/bin/activate

Prebuilt package

cojac and its dependencies are all available in the bioconda repository. We strongly advise you to install this pre-built package for a hassle-free experience.

You can install cojac in its own environment and activate it:

```bash conda create -n cojac cojac conda activate cojac

test it

cojac --help ```

And to update it to the latest version, run:

```bash

activate the environment if not already active:

conda activate cojac conda update cojac ```

Or you can add it to the current environment (e.g.: in environment base):

bash conda install cojac

Dependencies

If you want to install the software yourself, you can see the list of dependencies in conda_cojac_env.yaml.

We recommend using conda to install them:

bash conda env create -f conda_cojac_env.yaml conda activate cojac

Install cojac using pip: ```bash pip install .

this will autodetect dependencies already installed by conda

```

cojac should now be accessible from your PATH

```bash

activate the environment if not already active:

conda activate cojac cojac --help ```

Remove conda environment

You can remove the conda environment if you don't need it any more:

```bash

exit the cojac environment first:

conda deactivate conda env remove -n cojac ```

Python poetry

COJAC has its dependencies in a pyproject.toml managed with poetry and can be installed with it.

```bash

If not installed system-wide: manually run poetry-dynamic-versioning

poetry-dynamic-versioning

(this sets the version string from the git currently cloned and checked out)

poetry install ```

Additional notebooks

The subdirectory notebooks/ contains Jupyter and Rstudio notebooks used in the publication.

Upcoming features

[x] ~~bioconda package~~
[x] ~~further jupyter and rstudio code from the publication~~
[x] ~~Move hard-coded amplicons to BED input file~~
[x] ~~Move hard-coded mutations to YAML configuration~~
[x] ~~Refactor code into proper Python package~~

Long term goal:

[x] ~~Integration as part of V-pipe~~
[ ] Integration with ShoRAH amplicons

Contributions

Package developers:

Additional notebooks:

Corresponding author:

Niko Beerenwinkel

Citation

If you use this software in your research, please cite:

Katharina Jahn, David Dreifuss, Ivan Topolsky, Anina Kull, Pravin Ganesanandamoorthy, Xavier Fernandez-Cassi, Carola Bänziger, Alexander J. Devaux, Elyse Stachler, Lea Caduff, Federica Cariti, Alex Tuñas Corzón, Lara Fuhrmann, Chaoran Chen, Kim Philipp Jablonski, Sarah Nadeau, Mirjam Feldkamp, Christian Beisel, Catharine Aquino, Tanja Stadler, Christoph Ort, Tamar Kohn, Timothy R. Julian & Niko Beerenwinkel

"Early detection and surveillance of SARS-CoV-2 genomic variants in wastewater using COJAC."

Nature Microbiology volume 7, pages 1151–1160 (2022); doi:10.1038/s41564-022-01185-x

Contacts

If you experience problems running the software:

We encourage to use the issue tracker on GitHub
For further enquiries, you can also contact the V-pipe Dev Team
You can contact the publication’s corresponding author

Owner

Name: Computational Biology Group (CBG)
Login: cbg-ethz
Kind: organization
Location: Basel, Switzerland

Website: https://www.bsse.ethz.ch/cbg
Twitter: cbg_ethz
Repositories: 91
Profile: https://github.com/cbg-ethz

Beerenwinkel Lab at ETH Zurich

GitHub Events

Total

Create event: 7
Issues event: 6
Release event: 1
Watch event: 1
Delete event: 4
Issue comment event: 8
Push event: 9
Pull request event: 10

Last Year

Create event: 7
Issues event: 6
Release event: 1
Watch event: 1
Delete event: 4
Issue comment event: 8
Push event: 9
Pull request event: 10

Committers

Last synced: 9 months ago

All Time

Total Commits: 148
Total Committers: 10
Avg Commits per committer: 14.8
Development Distribution Score (DDS): 0.324

Past Year

Commits: 10
Committers: 3
Avg Commits per committer: 3.333
Development Distribution Score (DDS): 0.5

Top Committers

Name	Email	Commits
Ivan Blagoev Topolsky	i**y@b**h	100
kpj	k**i@g**m	26
dr-david	d**s@g**m	11
dependabot[bot]	4****]	3
Gordon Julian Koehn	k**g@e**h	2
Lara Fuhrmann	l**n@b**h	2
mcarrara	1****o	1
LaraFuhrmann	5****n	1
Jahn Katharina	k**n@b**h	1
David Dreifuss	g**s@D**l	1

Committer Domains (Top 20 + Academic)

bsse.ethz.ch: 2 bs-mbpr452.d.ethz.ch: 1 ethz.ch: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 17
Total pull requests: 6
Average time to close issues: almost 3 years
Average time to close pull requests: 30 days
Total issue authors: 11
Total pull request authors: 3
Average comments per issue: 1.24
Average comments per pull request: 0.67
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 3

Past Year

Issues: 3
Pull requests: 5
Average time to close issues: 5 days
Average time to close pull requests: 8 days
Issue authors: 2
Pull request authors: 2
Average comments per issue: 1.0
Average comments per pull request: 0.6
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 3

View more stats

Top Authors

Issue Authors

DrYak (4)
carlottaolivero (2)
gordonkoehn (2)
evantroendle (1)
PlushZ (1)
Albaburi (1)
hoelzer (1)
ibseq (1)
hudenise2 (1)
rafischulman (1)
suskraem (1)

Pull Request Authors

dependabot[bot] (6)
gordonkoehn (4)
kpj (1)

Top Labels

Issue Labels

enhancement (4) documentation (1)

Pull Request Labels

dependencies (6) github_actions (4) python (2)

https://github.com/cbg-ethz/cojac

Science Score: 57.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

COJAC - CoOccurrence adJusted Analysis and Calling

Description

Usage

Howto

Input data requirements

download the primer BED file for Artic v5.3.2

convert it into an insert BED file

fetch the repository of standardised variant definitions

generate a YAML for omicron subvariant BA.2 using the corresponding standardised variant definitions

now have a look at the frequencies of mutations using covSPECTRUM

adjust the content of the YAML files to your needs

display the exhaustive list of all mutations known to appear on Omicron BA.1 on covSPECTRUM:

or, Alternatively, download the TSV from covariants.org's repo and extract the list:

add a YAML header to the list:

(at minimum you NEED to specify the 'pangolin' lineage and give it a 'short' handle)

(source and 'nextstrain' lineages are optional)

now have a look at the frequencies of mutations using covSPECTRUM

Collect the co-occurrence data

Standalone files

Analyzing a cohort previously aligned by V-pipe

Number of cooccurrences

Store the amplicon query

store the request in a YAML file

adjust the content of amplicons.v3.yaml

now have a look at the frequencies of mutation cooccurrences using covSPECTRUM

reuse the amplicon

Display data on terminal

Render table for publication

Export table for downstream analysis

Mutations affecting primers

get the primer BED file for Artic v5.3.2

get the full list of Omicron BA.2.86 mutations

check which primers have at least 1 mutation falling in their target binding regions

Installation

Prebuilt package

test it

activate the environment if not already active:

Dependencies

this will autodetect dependencies already installed by conda

activate the environment if not already active:

Remove conda environment

exit the cojac environment first:

Python poetry

If not installed system-wide: manually run poetry-dynamic-versioning

(this sets the version string from the git currently cloned and checked out)

Additional notebooks

Upcoming features

Contributions

Package developers:

Additional notebooks:

Corresponding author:

Citation

Contacts

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies