https://github.com/broadinstitute/dino4cells_analysis

https://github.com/broadinstitute/dino4cells_analysis

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization broadinstitute has institutional domain (www.broadinstitute.org)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: broadinstitute
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 340 MB
Statistics
  • Stars: 13
  • Watchers: 5
  • Forks: 1
  • Open Issues: 3
  • Releases: 0
Created almost 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme

README.md

DINO4Cells_analysis

This repo will contain the code for reproducing the results and figures of the paper Unbiased single-cell morphology with self-supervised vision transformers .

For the code to train the model, go to https://github.com/broadinstitute/Dino4Cells_code.

Installation

pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116

pip install -r requirements.py

Typical installation time: 10 minutes

Classification

To train classifiers on the features extracted using DINO, please use the following pipeline for the different datasets and tasks:

For each dataset, we want to publish:

DINO model checkpoints metadata features embeddings classifiers misc (data partition indices, preprocessed data, etc.)

Typical running time: 1 hour

The expected output of the code is shown inside the jupyter notebooks

Data

HPA FOV

For the HPA FOV data, access https://zenodo.org/record/8061392

Model checkpoints

HPAFOVdata/DINOFOVcheckpoint.pth

HPAFOVdata/densenetmodelbatch.onnx

HPAFOVdata/densenet_model.onnx

metadata

HPAFOVdata/whole_images.csv

features

HPAFOVdata/DINOfeaturesforHPAFOV.pth

HPAFOVdata/bestfittingfeaturesforHPAFOV.pth

HPAFOVdata/pretrainedDINOfeaturesforHPA_FOV.pth

Embeddings

HPAFOVdata/DINOFOVharmonized_embeddings.csv

HPAFOVdata/DINOFOVembeddings.csv

Classifiers

HPAFOVdata/classifier_cells.pth

HPAFOVdata/classifier_proteins.pth

Misc

train / test divisions for protein localizations and cell line classification

HPAFOVdata/cellstrainIDs.pth

HPAFOVdata/cellsvalidIDs.pth

HPAFOVdata/train_IDs.pth

HPAFOVdata/valid_IDs.pth

HPA single cell kaggle protein localization competition, download from https://www.kaggle.com/competitions/human-protein-atlas-image-classification/leaderboard

HPAFOVdata/human-protein-atlas-image-classification-publicleaderboard.csv

HPA cell line RNASeq data

HPAFOVdata/rna_cellline.tsv

HPA FOV color visualization

HPAFOVdata/wholeimagecellcolorindices.pth

HPAFOVdata/wholeimageproteincolorindices.pth

HPA single cells

For the HPA single cell data, access https://zenodo.org/record/8061426

Model checkpoints

HPAsinglecellsdata/DINOsinglecellcheckpoint.pth

HPAsinglecellsdata/HPAsinglecellmodel_checkpoint.pth

HPAsinglecellsdata/dualheadconfig.json

HPAsinglecellsdata/dualheadmatched_state.pth

metadata

HPAsinglecellsdata/fixedsizemaskedsinglecellsfor_sc.csv

features

HPAsinglecellsdata/DINOfeaturesforHPAsinglecells.pth

HPAsinglecellsdata/dualheadfeaturesforHPAsinglecells.pth

HPAsinglecellsdata/pretrainedDINOfeaturesforHPAsingle_cells.pth

Embeddings

HPAsinglecellsdata/DINOembeddingaverageumap.csv

HPAsinglecellsdata/DINOharmonizedembeddingaverage_umap.csv

Classifiers

HPAsinglecellsdata/classifiercells.pth

HPAsinglecellsdata/classifierproteins.pth

Misc

HPA single cell kaggle protein localization competition, download from https://www.kaggle.com/competitions/hpa-single-cell-image-classification/leaderboard

HPAsinglecells_data/hpa-single-cell-image-classification-publicleaderboard.csv

HPA XML data

HPAsinglecellsdata/XMLHPA.csv

UNIPROT interaction dataset

HPAsinglecellsdata/uniportinteractions.tsv

HPA gene heterogeneity annotated by experts

HPAsinglecellsdata/geneheterogeneity.tsv

single cell metadata with genetic information

HPAsinglecellsdata/MasterscKaggle.csv

HPA single cell color visualization

HPAsinglecellsdata/cellcolor_indices.pth

HPAsinglecellsdata/proteincolor_indices.pth

WTC11

For the WTC11 data, access https://zenodo.org/record/8061424

Model checkpoints

WTC11data/DINOcheckpoint.pth

metadata

WTC11data/normalizedcell_df.csv

features

WTC11data/DINOfeaturesanddf.pth

WTC11data/engineeredfeatures.pth

WTC11data/pretrainedfeaturesanddf.pth

Embeddings

WTC11data/DINOtrained_embedding.pth

WTC11data/DINOtrainedharmonizedembedding.pth

WTC11data/pretrainedembedding.pth

WTC11data/pretrainedharmonized_embedding.pth

WTC11data/engineeredembedding.pth

Classifiers

WTC11data/predictionsforWTC11trained_model.pth

WTC11data/predictionsforWTC11pretrained_model.pth

WTC11data/predictionsforWTC11xgb.pth

Misc

WTC11data/trainindices.pth

WTC11data/testindices.pth

Cell Painting

For the Cell Painting data, access https://zenodo.org/record/8061428

DINO model checkpoints

CellPaintingdata/DINOcellpaintingbasecheckpoint.pth

CellPaintingdata/DINOcellpaintingsmallcheckpoint.pth

metadata and embeddings

CellPaintingdata/LINCSViTSmallCompresseddfandUMAP.csv

CellPaintingdata/CombinedCPdfandUMAP.csv

features

Code to calculate the PUMA results is in: https://github.com/CaicedoLab/2023MoshkovNatComm

misc (data partition indices, preprocessed data, etc.)

CellPaintingdata/scaffoldmedianpython_dino.csv

Owner

  • Name: Broad Institute
  • Login: broadinstitute
  • Kind: organization
  • Location: Cambridge, MA

Broad Institute of MIT and Harvard

GitHub Events

Total
  • Issues event: 2
  • Watch event: 7
  • Push event: 1
Last Year
  • Issues event: 2
  • Watch event: 7
  • Push event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • danrlu (1)
  • jccaicedo (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • Pillow ==9.5.0
  • PyYAML ==6.0
  • imageio ==2.19.3
  • kornia ==0.6.8
  • numpy ==1.23.5
  • oyaml ==1.0
  • pandas ==1.5.3
  • scikit-image ==0.19.3
  • scikit-learn ==1.2.2
  • scipy ==1.10.1
  • torch ==1.12.1
  • torchvision ==0.13.1
  • tqdm ==4.64.1