https://github.com/bihealth/wei_et_al_2024
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: bihealth
- Language: Jupyter Notebook
- Default Branch: master
- Size: 8.55 MB
Statistics
- Stars: 1
- Watchers: 6
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
WeietalCRC2024
The scripts for Wei et. al paper on cancer cell calling in colorectal cancer
Processed data is available from zenodo and should be unpacked into a directory structure like so (named ./data/):
.
├── cellbender
│ ├── p007n_cellbender_counts.h5
│ ├── p007t_cellbender_counts.h5
│ ├── p008n_cellbender_counts.h5
│ └── ...
├── cellranger
│ ├── p007n_raw_feature_bc_matrix.h5
│ ├── p007t_raw_feature_bc_matrix.h5
│ ├── p008n_raw_feature_bc_matrix.h5
│ └── ...
├── cellSNP
│ ├── downsampled
│ │ ├── p007t_05
│ │ │ ├── cellSNP.base.vcf
│ │ │ ├── cellSNP.samples.tsv
│ │ │ ├── cellSNP.tag.AD.mtx
│ │ │ ├── cellSNP.tag.DP.mtx
│ │ │ └── cellSNP.tag.OTH.mtx
│ │ ├── ...
│ ├── p007n
│ │ ├── cellSNP.base.vcf
│ │ ├── cellSNP.cells.vcf
│ │ ├── cellSNP.samples.tsv
│ │ ├── cellSNP.tag.AD.mtx
│ │ ├── cellSNP.tag.DP.mtx
│ │ └── cellSNP.tag.OTH.mtx
│ ├── ...
├── numbat
│ ├── p007_clone_post_2.tsv
│ ├── p008_clone_post_2.tsv
│ └── ...
└── WGS
├── p007_filtered.vcf.gz
├── p008_filtered.vcf.gz
└── ...
Analysis pipeline
Ambient RNA background removal (Snakefile)
- Ambient RNA background removal by CellBender
- input: {sample}rawfeaturebcmatrix.h5
- output: {sample}cellbendercounts.h5
Preprocessing
Integrate all samples from CellBender output and filter for QC parameters
Cell type annotation (coarse, epithelial cells vs immune cells vs stromal cells)
- run
1_preprocessing_h5.ipynb- input: all cellbender_counts.h5 object
- output:
- CBallcells.h5 and CBepicells.h5 that contains high-quality all/epithelial cells with calculated PCA, UMAP, diffusion map, and louvain embeddings, as well as coarse cell type annotation
- anno/{sample}{celltype}.txt cell barcode list by sample and cell type
Identification of cancer cells
inferCNV: infer copy number alteration status from gene expression
run
inferCNV/inferCNV.ipynbto execute inferCNV- input: CBallcells.h5
- output: allcellCBcounts.rds, allepicellCBcounts.rds, allcellanno.txt, allepicellanno.txt, inferCNVexpressiondata.rds and other inferCNV intermediate objects
run
inferCNV/inferCNV_result.ipynbto collect inferCNV result- collect inferCNV result from the previous run
- output:
infercnv_clone_scores.tsv
Numbat: infer copy number alterations from phased gene expression profiles
- run
script/Snakefilefor all the samples except p009 - run
script//Numbat/combine_p009t1t2_and_run_numbat.ipynbsince p009 have two normal samples and two tumour samples, they were run separately with modified scripts - run
script/Numbat/collect_numbat_result.ipynbto collect Numbat result of all the samples- output:
numbat_all_output_clone_post_combined9.csv
- output:
CCISM: identify cancer cells using SNVs
- rub
script/run_CCISM.shto run CCISM on all the sample
iCMS: annotates cancer cell phenotypes (iCMS2/iCMS3)
- input
- download the h5 object from Joanito et al.:
CB_epi_cells.h5
- run
iCMS/preprocessing_Joanito.ipynbto filter the count matrix with the same criteria as our h5 objects- output
adata_concat_with_joanito.h5
- output
- run
iCMS/run_scvi_snakemake.shfor scVI model training - run
iCMS/scvi_model_result_iCMS.ipynbto inspect scVI modeling result, train and inspect scANVI models- output `CBepicells_iCMS.h5
Integrate results from inferCNV, Numbat, CCISM, and iCMS
- run
2_integrate_tools_result.ipynb - input
CB_epi_cells_iCMS.h5which contains iCMS result- CCISM result
- Numbat:
numbat_all_output_clone_post_combined9.csv - inferCNV:
infercnv_clone_scores.tsv
- output:
CB_epi_Numbat_CCISM_inferCNV_iCMS.h5
Cell type annotation at finer resolution
- run
3_cell_type_annotation_finer.ipynb- output
adata_all_full_cell_type_annotation.h5
- output
Consensus cancer calls
- run
4_consensus_calls.ipynb- output
CB_epi_Numbat_CCISM_inferCNV_icms_Uhlitz_resolved_identity.h5
- output
Pseudotime and cell type enrichment
- run
5_cellrank.ipynb
Ligand and receptor expression
- run
6a_DEG_epi.ipynbfor expression and differential expression analysis of epithelial cells;6b_DEG_imm_str.ipynbfor immune and stromal cells
CRC pathways and signatures expression levels
- run
7_CRC_pathway_signature.ipynb
Cell-cell interaction
- run
8_cellchat.ipynbto infer cell-cell interactions in the normal versus tumour samples
Figures in the publication (Figures.ipynb)
- run
script/Figures.ipynb- input:
data/data_consolidated.h5ad - output:
figures/*
- input:
- Figure 2 contains data simulation result, run
script/Fig2_simulations.Rmd- output
Fig2_simulation_results.csv
- output
Owner
- Name: Berlin Institute of Health
- Login: bihealth
- Kind: organization
- Website: https://www.cubi.bihealth.org/
- Repositories: 215
- Profile: https://github.com/bihealth
BIH Core Unit Bioinformatics & BIH HPC IT