https://github.com/cellgeni/c2l
Set of scripts to run cell2location on farm
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Repository
Set of scripts to run cell2location on farm
Basic Info
- Host: GitHub
- Owner: cellgeni
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 531 KB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
c2l
Set of scripts to run cell2location on farm
Overview
There are two steps: 1. Cell type signature estimation 2. Visium deconvolution
First step needs reference h5ad file with raw counts in adata.X, all covariates and cell annotation in adata.obs. Default name for input h5ad is ./ref.h5ad. Second step needs results of the first step and h5ad with all visium samples combined. Default name for input visium is ./viss.h5ad
Two steps are independent and have to be submited to farm one by one using srs/01 and src/02 bash scripts. Usually input data are not well formated, so some preparation is needed. In this case src/check-n-prepare.input.h5ad.ipynb can be used.
Some QCs can be plotted by src/03.plot.c2l.R.
Prerequisites
The pipeline uses singularity to run cell2location. Path to the image is hardcoded in bsub scripts.
Second step of cell2location can use a lot of GPU memory, most likely it will not fit into gpu-normal if number of visium samples is above 15-20 (more then 20k spots). In this case gpu-cellgeni-a100 queue can be used (comment/uncomment corresponding lines in 02.run.predict.cell.abundancies.sh).
Details
The pipeline is designed to be run on one or more references and single set of visiums. So prepared input consists of one or more reference h5ad and one visium h5ad files.
Initialization
Set tic variable to ticket number and init:
tic=..
tick.sh -k $tic -j pm19@sanger.ac.uk -y https://github.com/cellgeni/c2l
cd /lustre/scratch127/cellgen/cellgeni/tickets/tic-$tic
mkdir ref pred figures logs
Check and prepare the input
Open actions/c2l/src/check-n-prepare.input.h5ad.ipynb in jupiter modify paths and follow the notebook. You should get one or more reference h5ad and visium h5ad files as an output.
If visium data are provided as irods paths they can be downloaded by:
cd data
mkdir vis
cd vis
../actions/c2l/src/iget_spaceranger.sh < samples.txt
Where samples.txt contains sample names and irods paths, one sample per line:
name1 /irods/path/1
name2 /irods/path/2
...
Signature estimation
actions/c2l/src/01.run.estimate.signatures.sh submits the job to farm into gpu-normal queue. Edit the file according to the ticket: list all input reference h5ad files, specify batch, covariates and cell type annotation column of adata.obs. Internaly 01.run.estimate.signatures.sh calls python script, so you can get detailed manual by actions/c2l/src/py/01.estimate.signatures.py -h. Edit file and then submit it by bsub < actions/c2l/src/01.run.estimate.signatures.sh from ticket directory. It runs array job, one item per reference.
Deconvolution
The second step can be submitted only when first step was finished succesfully. Check QC plots in ref/* subfolders. Bsub script for second step is actions/c2l/src/02.run.predict.cell.abundancies.sh, that calls actions/c2l/src/py/02.predict.cell.abundancies.py internaly. Edit 02.run.predict.cell.abundancies.sh to include all references, all desired alpha levels and other parameters, use actions/c2l/src/py/02.predict.cell.abundancies.py -h to get help. Submit job by bsub < actions/c2l/src/02.run.predict.cell.abundancies.sh from ticket directory. It runs array job, one item per reference/alpha combitation.
QC
Currently there are no numeric QC metrics for cell2location performance. Cell2location produces some QC plots related to training and observed vs predicted comparison, they can be found in subfolders in ref and pred. actions/c2l/src/03.plot.c2l.R can be used to make for additional QC: plot UMI distribution across spots and plot predicted cell abundancies.
Output
Resuls of pipeline are outputed into three folders:
1. ref contains cell2location references, they potentially can be used with other visium samples.
2. pred contains results of visium deconvolution: cell type abundancies in csv format.
3. figures contains QC figures.
Sharing of results
Use actions/c2l/src/04.share.sh to share results with customer. The script outputs template of message to be sent to the customer. Edit it accordingly to the request.
Owner
- Name: Cellular Genetics Informatics
- Login: cellgeni
- Kind: organization
- Location: United Kingdom
- Website: https://www.sanger.ac.uk/science/groups/cellular-genetics-informatics
- Repositories: 19
- Profile: https://github.com/cellgeni
Wellcome Sanger Institute
GitHub Events
Total
- Watch event: 1
- Push event: 6
Last Year
- Watch event: 1
- Push event: 6