https://github.com/broadinstitute/dino4cells_analysis
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org, zenodo.org -
○Academic email domains
-
✓Institutional organization owner
Organization broadinstitute has institutional domain (www.broadinstitute.org) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: broadinstitute
- Language: Jupyter Notebook
- Default Branch: main
- Size: 340 MB
Statistics
- Stars: 13
- Watchers: 5
- Forks: 1
- Open Issues: 3
- Releases: 0
Metadata Files
README.md
DINO4Cells_analysis
This repo will contain the code for reproducing the results and figures of the paper Unbiased single-cell morphology with self-supervised vision transformers .
For the code to train the model, go to https://github.com/broadinstitute/Dino4Cells_code.
Installation
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.py
Typical installation time: 10 minutes
Classification
To train classifiers on the features extracted using DINO, please use the following pipeline for the different datasets and tasks:
For each dataset, we want to publish:
DINO model checkpoints metadata features embeddings classifiers misc (data partition indices, preprocessed data, etc.)
Typical running time: 1 hour
The expected output of the code is shown inside the jupyter notebooks
Data
HPA FOV
For the HPA FOV data, access https://zenodo.org/record/8061392
Model checkpoints
HPAFOVdata/DINOFOVcheckpoint.pth
HPAFOVdata/densenetmodelbatch.onnx
HPAFOVdata/densenet_model.onnx
metadata
HPAFOVdata/whole_images.csv
features
HPAFOVdata/DINOfeaturesforHPAFOV.pth
HPAFOVdata/bestfittingfeaturesforHPAFOV.pth
HPAFOVdata/pretrainedDINOfeaturesforHPA_FOV.pth
Embeddings
HPAFOVdata/DINOFOVharmonized_embeddings.csv
HPAFOVdata/DINOFOVembeddings.csv
Classifiers
HPAFOVdata/classifier_cells.pth
HPAFOVdata/classifier_proteins.pth
Misc
train / test divisions for protein localizations and cell line classification
HPAFOVdata/cellstrainIDs.pth
HPAFOVdata/cellsvalidIDs.pth
HPAFOVdata/train_IDs.pth
HPAFOVdata/valid_IDs.pth
HPA single cell kaggle protein localization competition, download from https://www.kaggle.com/competitions/human-protein-atlas-image-classification/leaderboard
HPAFOVdata/human-protein-atlas-image-classification-publicleaderboard.csv
HPA cell line RNASeq data
HPAFOVdata/rna_cellline.tsv
HPA FOV color visualization
HPAFOVdata/wholeimagecellcolorindices.pth
HPAFOVdata/wholeimageproteincolorindices.pth
HPA single cells
For the HPA single cell data, access https://zenodo.org/record/8061426
Model checkpoints
HPAsinglecellsdata/DINOsinglecellcheckpoint.pth
HPAsinglecellsdata/HPAsinglecellmodel_checkpoint.pth
HPAsinglecellsdata/dualheadconfig.json
HPAsinglecellsdata/dualheadmatched_state.pth
metadata
HPAsinglecellsdata/fixedsizemaskedsinglecellsfor_sc.csv
features
HPAsinglecellsdata/DINOfeaturesforHPAsinglecells.pth
HPAsinglecellsdata/dualheadfeaturesforHPAsinglecells.pth
HPAsinglecellsdata/pretrainedDINOfeaturesforHPAsingle_cells.pth
Embeddings
HPAsinglecellsdata/DINOembeddingaverageumap.csv
HPAsinglecellsdata/DINOharmonizedembeddingaverage_umap.csv
Classifiers
HPAsinglecellsdata/classifiercells.pth
HPAsinglecellsdata/classifierproteins.pth
Misc
HPA single cell kaggle protein localization competition, download from https://www.kaggle.com/competitions/hpa-single-cell-image-classification/leaderboard
HPAsinglecells_data/hpa-single-cell-image-classification-publicleaderboard.csv
HPA XML data
HPAsinglecellsdata/XMLHPA.csv
UNIPROT interaction dataset
HPAsinglecellsdata/uniportinteractions.tsv
HPA gene heterogeneity annotated by experts
HPAsinglecellsdata/geneheterogeneity.tsv
single cell metadata with genetic information
HPAsinglecellsdata/MasterscKaggle.csv
HPA single cell color visualization
HPAsinglecellsdata/cellcolor_indices.pth
HPAsinglecellsdata/proteincolor_indices.pth
WTC11
For the WTC11 data, access https://zenodo.org/record/8061424
Model checkpoints
WTC11data/DINOcheckpoint.pth
metadata
WTC11data/normalizedcell_df.csv
features
WTC11data/DINOfeaturesanddf.pth
WTC11data/engineeredfeatures.pth
WTC11data/pretrainedfeaturesanddf.pth
Embeddings
WTC11data/DINOtrained_embedding.pth
WTC11data/DINOtrainedharmonizedembedding.pth
WTC11data/pretrainedembedding.pth
WTC11data/pretrainedharmonized_embedding.pth
WTC11data/engineeredembedding.pth
Classifiers
WTC11data/predictionsforWTC11trained_model.pth
WTC11data/predictionsforWTC11pretrained_model.pth
WTC11data/predictionsforWTC11xgb.pth
Misc
WTC11data/trainindices.pth
WTC11data/testindices.pth
Cell Painting
For the Cell Painting data, access https://zenodo.org/record/8061428
DINO model checkpoints
CellPaintingdata/DINOcellpaintingbasecheckpoint.pth
CellPaintingdata/DINOcellpaintingsmallcheckpoint.pth
metadata and embeddings
CellPaintingdata/LINCSViTSmallCompresseddfandUMAP.csv
CellPaintingdata/CombinedCPdfandUMAP.csv
features
Code to calculate the PUMA results is in: https://github.com/CaicedoLab/2023MoshkovNatComm
misc (data partition indices, preprocessed data, etc.)
CellPaintingdata/scaffoldmedianpython_dino.csv
Owner
- Name: Broad Institute
- Login: broadinstitute
- Kind: organization
- Location: Cambridge, MA
- Website: http://www.broadinstitute.org/
- Twitter: broadinstitute
- Repositories: 1,083
- Profile: https://github.com/broadinstitute
Broad Institute of MIT and Harvard
GitHub Events
Total
- Issues event: 2
- Watch event: 7
- Push event: 1
Last Year
- Issues event: 2
- Watch event: 7
- Push event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- danrlu (1)
- jccaicedo (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Pillow ==9.5.0
- PyYAML ==6.0
- imageio ==2.19.3
- kornia ==0.6.8
- numpy ==1.23.5
- oyaml ==1.0
- pandas ==1.5.3
- scikit-image ==0.19.3
- scikit-learn ==1.2.2
- scipy ==1.10.1
- torch ==1.12.1
- torchvision ==0.13.1
- tqdm ==4.64.1