https://github.com/bihealth/wei_et_al_2024

https://github.com/bihealth/wei_et_al_2024

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: bihealth
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 8.55 MB
Statistics
  • Stars: 1
  • Watchers: 6
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme

README.md

WeietalCRC2024

The scripts for Wei et. al paper on cancer cell calling in colorectal cancer

Processed data is available from zenodo and should be unpacked into a directory structure like so (named ./data/):

. ├── cellbender │   ├── p007n_cellbender_counts.h5 │   ├── p007t_cellbender_counts.h5 │   ├── p008n_cellbender_counts.h5 │   └── ... ├── cellranger │   ├── p007n_raw_feature_bc_matrix.h5 │   ├── p007t_raw_feature_bc_matrix.h5 │   ├── p008n_raw_feature_bc_matrix.h5 │   └── ... ├── cellSNP │   ├── downsampled │   │   ├── p007t_05 │   │   │   ├── cellSNP.base.vcf │   │   │   ├── cellSNP.samples.tsv │   │   │   ├── cellSNP.tag.AD.mtx │   │   │   ├── cellSNP.tag.DP.mtx │   │   │   └── cellSNP.tag.OTH.mtx │   │   ├── ... │   ├── p007n │   │   ├── cellSNP.base.vcf │   │   ├── cellSNP.cells.vcf │   │   ├── cellSNP.samples.tsv │   │   ├── cellSNP.tag.AD.mtx │   │   ├── cellSNP.tag.DP.mtx │   │   └── cellSNP.tag.OTH.mtx │   ├── ... ├── numbat │   ├── p007_clone_post_2.tsv │   ├── p008_clone_post_2.tsv │   └── ... └── WGS ├── p007_filtered.vcf.gz ├── p008_filtered.vcf.gz └── ...

Analysis pipeline

Ambient RNA background removal (Snakefile)

  • Ambient RNA background removal by CellBender
    • input: {sample}rawfeaturebcmatrix.h5
    • output: {sample}cellbendercounts.h5

Preprocessing

Integrate all samples from CellBender output and filter for QC parameters

Cell type annotation (coarse, epithelial cells vs immune cells vs stromal cells)

  • run 1_preprocessing_h5.ipynb
    • input: all cellbender_counts.h5 object
    • output:
      • CBallcells.h5 and CBepicells.h5 that contains high-quality all/epithelial cells with calculated PCA, UMAP, diffusion map, and louvain embeddings, as well as coarse cell type annotation
      • anno/{sample}{celltype}.txt cell barcode list by sample and cell type

Identification of cancer cells

inferCNV: infer copy number alteration status from gene expression

  • run inferCNV/inferCNV.ipynb to execute inferCNV

    • input: CBallcells.h5
    • output: allcellCBcounts.rds, allepicellCBcounts.rds, allcellanno.txt, allepicellanno.txt, inferCNVexpressiondata.rds and other inferCNV intermediate objects
  • run inferCNV/inferCNV_result.ipynb to collect inferCNV result

    • collect inferCNV result from the previous run
    • output: infercnv_clone_scores.tsv

Numbat: infer copy number alterations from phased gene expression profiles

  • run script/Snakefile for all the samples except p009
  • run script//Numbat/combine_p009t1t2_and_run_numbat.ipynb since p009 have two normal samples and two tumour samples, they were run separately with modified scripts
  • run script/Numbat/collect_numbat_result.ipynb to collect Numbat result of all the samples
    • output: numbat_all_output_clone_post_combined9.csv

CCISM: identify cancer cells using SNVs

  • rub script/run_CCISM.sh to run CCISM on all the sample

iCMS: annotates cancer cell phenotypes (iCMS2/iCMS3)

  • input
    • download the h5 object from Joanito et al.:
    • CB_epi_cells.h5
  • run iCMS/preprocessing_Joanito.ipynb to filter the count matrix with the same criteria as our h5 objects
    • output adata_concat_with_joanito.h5
  • run iCMS/run_scvi_snakemake.sh for scVI model training
  • run iCMS/scvi_model_result_iCMS.ipynb to inspect scVI modeling result, train and inspect scANVI models
    • output `CBepicells_iCMS.h5

Integrate results from inferCNV, Numbat, CCISM, and iCMS

  • run 2_integrate_tools_result.ipynb
  • input
    • CB_epi_cells_iCMS.h5 which contains iCMS result
    • CCISM result
    • Numbat: numbat_all_output_clone_post_combined9.csv
    • inferCNV: infercnv_clone_scores.tsv
  • output: CB_epi_Numbat_CCISM_inferCNV_iCMS.h5

Cell type annotation at finer resolution

  • run 3_cell_type_annotation_finer.ipynb
    • output adata_all_full_cell_type_annotation.h5

Consensus cancer calls

  • run 4_consensus_calls.ipynb
    • output CB_epi_Numbat_CCISM_inferCNV_icms_Uhlitz_resolved_identity.h5

Pseudotime and cell type enrichment

  • run 5_cellrank.ipynb

Ligand and receptor expression

  • run 6a_DEG_epi.ipynb for expression and differential expression analysis of epithelial cells; 6b_DEG_imm_str.ipynb for immune and stromal cells

CRC pathways and signatures expression levels

  • run 7_CRC_pathway_signature.ipynb

Cell-cell interaction

  • run 8_cellchat.ipynb to infer cell-cell interactions in the normal versus tumour samples

Figures in the publication (Figures.ipynb)

  • run script/Figures.ipynb
    • input: data/data_consolidated.h5ad
    • output: figures/*
  • Figure 2 contains data simulation result, run script/Fig2_simulations.Rmd
    • output Fig2_simulation_results.csv

Owner

  • Name: Berlin Institute of Health
  • Login: bihealth
  • Kind: organization

BIH Core Unit Bioinformatics & BIH HPC IT

GitHub Events

Total
Last Year