shunpyker
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: kousaa
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 13.1 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 1
- Releases: 2
Metadata Files
README.md
The shunPykeR's guide to single cell analysis
Anastasia Kousa and Andri Lemarquis
Similar to the Hitchhiker's Guide to the Galaxy, this guide has probably ended up in your "hands" due to a great catastrophe: your single cell data need to be analyzed and you don't know how to do it. Well, *DON'T PANIC*, this guide is here to help you skip all the hurdles of coding and decision making and let you get the most out of your data.
This guide starts with a click on the Run button. Once that is done, just keep clicking...
GRAPHICAL OVERVIEW
Step 1 guides you through the quality control and necessary clean-up tread that scRNA-seq data require. Steps 2 and 3 are optional and can be skipped if initial inspection of the samples does not reveal any obvious technical artifacts that can bias results interpretation. Step 4 introduces a plethora of common visualization options. Step 5 provides again an optional practise to impute gene expression. This can be beneficial when interrogating in a correlative way the expression of lowly expressed genes, such as transcription factors, due to the high dropout rate that single cell data suffer from. Step 6 walks you through differential expression analysis for your comparisons of interest, advices on pathway and network enrichment analysis options and provides complementary visualization for the specific output files. Step 7 introduces ways of integrating your scRNA-seq data with (a) loom files to perform RNA velocity analysis, (b) visium files to probe the localization of the expressional data and (c) human/mouse orthologue to associate and inform the analysis from cross-species comparisons.
* Magenta denotes optional analysis modules
This guide in a nutshell
This is a complete guide for the analysis of scRNA-seq data interlaced with sections of common and complementary practises. This guide introduces all code in one notebook in a coherent order and it has been "seeded" -where necessary- to make it fully reproducible. These two main qualities make it simple and intuitive enough allowing non computational scientists to perform a robust and thorough analysis of their scRNA-seq datasets with (relative) peace of mind. Nonetheless, it can be used by bioinformaticians that may want to adjust and reuse it in their own way.
Within this guide we have brought together a combination of already established single cell analysis tools that we introduced in a logical order to simplify running your analysis. The main code is in python but it is interpolated with snippets of R code to take advantage of both worlds when using single cell developed algorithms.
Here is the list of tools that we implement across this guide: - scanpy1 provides the backbone of this pipeline and allows for visualization options (https://scanpy.readthedocs.io/en/stable/) - scrublet2 removes doublet cells (https://github.com/swolock/scrublet) - leiden3 performs unsupervised clustering of the cells (https://github.com/vtraag/leidenalg) - harmony4 corrects batch effects (if applicable!) (https://scanpy.readthedocs.io/en/stable/generated/scanpy.external.pp.harmonyintegrate.html) - magic5 imputes and denoises expression of selected genes (https://github.com/dpeerlab/magic) - wilcox6 performs differential expression analysis (https://en.wikipedia.org/wiki/Mann%E2%80%93WhitneyU_test) - GSEA7 runs pathway enrichment analysis (https://www.gsea-msigdb.org/gsea/index.jsp) - cytoscape::enrichmentmap8,9 runs network enrichment analysis (https://enrichmentmap.readthedocs.io/en/latest/) - velocyto10,11 performs rna velocity analysis (http://velocyto.org/) - scanorama12 carries out integration with spatial data (https://scanpy-tutorials.readthedocs.io/en/latest/spatial/integration-scanorama.html?highlight=scanorama#Integrating-spatial-data-with-scRNA-seq-using-scanorama) - mousipy integrates human with mouse datasets (https://github.com/stefanpeidli/mousipy)
Emojis to guide your way
This icon (🕹️) helps you move directly to specific sections of this pipeline
This icon (💡) gives you hints to assist with issues when running this pipeline
This icon (✍️) let's you know that manual input is required
This icon (❗) asks you to pay attention
This icon (⏳) let's you know that the process may take a long time to run
See the complete notebook here.
How-to guide to open a Jupyter notebook
- For Linux, Mac and Windows environments
1. Download and install the Anaconda-Navigator (https://docs.anaconda.com/anaconda/install/)
2. Open the Anaconda-Navigator
3. Go to `Environments` -> select an environment [default will be ` base(root)`] by clicking on the ▶️ button; select `shunPykeR` instead once you install it below
4. Go to `Home` again -> on the Jupyter notebook box click on the `Install` button; once installed click on the `Launch` button
5. Finally navigate through the browser that will pop up to the directory where your jupyter notebook exists and click on the file
How-to guide to install the shunPykeR environment
- For Linux and Mac environments
1. Install conda for your environment (https://docs.conda.io/projects/conda/en/latest/user-guide/install/windows.html)
2. Use the search button to find and open a `terminal` window
3. Type the command below
```
conda env create -f shunPykeR.yml
```
3. Once you see the cursor on the terminal blinking again the installation has finished
4. Launch the Anaconda-Navigator and proceed as described above
- Alternatively, in a Windows environment or if you prefer a graphical interface to facilitate the installation (even though this can end up more tricky and slow)
1. Open Anaconda-Navigator
2. Go to Environments -> click on `Import`
3. Select the `shunPykeR.yml` file and click `Import` again in the new window
❗Full environment installation has only been tested in a Mac environment via the terminal.
Citations
1. F. A. Wolf, P. Angerer, F. J. Theis, SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018). 2. S. L. Wolock, R. Lopez, A. M. Klein, Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Syst. 8, 281-291.e9 (2019). 3. V. A. Traag, L. Waltman, N. J. van Eck, From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019). 4. I. Korsunsky, N. Millard, J. Fan, K. Slowikowski, F. Zhang, K. Wei, Y. Baglaenko, M. Brenner, P. Loh, S. Raychaudhuri, Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods. 16, 1289–1296 (2019). 5. D. van Dijk, R. Sharma, J. Nainys, K. Yim, P. Kathail, A. J. Carr, C. Burdziak, K. R. Moon, C. L. Chaffer, D. Pattabiraman, B. Bierie, L. Mazutis, G. Wolf, S. Krishnaswamy, D. Pe’er, Recovering Gene Interactions from Single-Cell Data Using Data Diffusion. Cell. 174, 716-729.e27 (2018). 6. C. Soneson, M. D. Robinson, Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods. 15, 255–261 (2018). 7. A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander, J. P. Mesirov, Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102, 15545–15550 (2005). 8. D. Merico, R. Isserlin, O. Stueker, A. Emili, G. D. Bader, Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PloS One. 5, e13984 (2010). 9. P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. Schwikowski, T. Ideker, Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). 10. G. La Manno, R. Soldatov, A. Zeisel, E. Braun, H. Hochgerner, V. Petukhov, K. Lidschreiber, M. E. Kastriti, P. Lönnerberg, A. Furlan, J. Fan, L. E. Borm, Z. Liu, D. van Bruggen, J. Guo, X. He, R. Barker, E. Sundström, G. Castelo-Branco, P. Cramer, I. Adameyko, S. Linnarsson, P. V. Kharchenko, RNA velocity of single cells. Nature. 560, 494–498 (2018). 11. V. Bergen, M. Lange, S. Peidli, F. A. Wolf, F. J. Theis, Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020). 12. B. Hie, B. Bryson, B. Berger, Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
Owner
- Login: kousaa
- Kind: user
- Repositories: 2
- Profile: https://github.com/kousaa
Citation (CITATION.cff)
cff-version: 1.0.0 message: "If you use this software, please cite it as below." authors: - family-names: "Kousa" given-names: "Anastasia I" orcid: "https://orcid.org/0000-0002-4550-0389" - family-names: "Lemarquis" given-names: "Andri Leo" orcid: "https://orcid.org/0000-0001-5165-0247" title: "The shunPykeR's guide to single cell analysis" version: 1.0.0 doi: 10.5281/zenodo.7510613 date-released: 2023-01-06 url: "https://github.com/kousaa/shunPykeR"