cellshape

3D shape analysis using deep learning

https://github.com/sentinal4d/cellshape

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.5%) to scientific vocabulary

Keywords

cancer-biology cell-biology deep-learning geometric-deep-learning
Last synced: 6 months ago · JSON representation ·

Repository

3D shape analysis using deep learning

Basic Info
Statistics
  • Stars: 26
  • Watchers: 2
  • Forks: 6
  • Open Issues: 5
  • Releases: 3
Topics
cancer-biology cell-biology deep-learning geometric-deep-learning
Created over 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

Project Status: Active – The project has reached a stable, usable
state and is being actively
developed. Python Version PyPI Downloads Wheel Development Status Tests Coverage Status Code style: black

Cellshape logo by Matt De Vries

3D single-cell shape analysis of cancer cells using geometric deep learning

This is a Python package for 3D cell shape features and classes using deep learning. Please refer to our preprint here.

cellshape is the main package which imports from sub-packages: - cellshape-helper: Facilitates point cloud generation from 3D binary masks. - cellshape-cloud: Implementations of graph-based autoencoders for shape representation learning on point cloud input data. - cellshape-voxel: Implementations of 3D convolutional autoencoders for shape representation learning on voxel input data. - cellshape-cluster: Implementation of deep embedded clustering to add to autoencoder models.

Installation and requirements

Dependencies

The software requires Python 3.7 or greater. The following are package dependencies that are installed automatically when cellshape is installed: PyTorch, pyntcloud, numpy, scikit-learn, tensorboard, tqdm (The full list is shown in the setup.py file). This repo makes extensive use of cellshape-cloud, cellshape-cluster, cellshape-helper, and cellshape-voxel. To reproduce our results in our paper, only cellshape-cloud, cellshape-cluster are needed.

To install

  1. We recommend creating a new conda environment. In the terminal, run: bash conda create --name cellshape-env python=3.8 -y conda activate cellshape-env pip install --upgrade pip
  2. Install cellshape from pip. In the same terminal, run: bash pip install cellshape This should take ~5mins or less.

Hardware requirements

We have tested this software on an Ubuntu 20.04LTS and 18.04LTS with 128Gb RAM and NVIDIA Quadro RTX 6000 GPU.

Data availability and structure

Data availability

Update (19/10/2023): Our sample data was originally published on Zenodo Sandbox, however, there are currently issues with this website and the link to the data is broken. We are working to put the data on a public data store and will update this page when this is done.

Old: Datasets to reproduce our results in our paper are available here. - SamplePointCloudData.zip contains a sample dataset of a point cloud of cells in order to test our code. - FullData.zip contains 3 plates of point cloud representations of cells for several treatments. This data can be used to reproduce our results. - Output.zip contains trained model weights and deep learning cell geometric features extracted using these trained models. - BinaryCellMasks.zip contains a sample set of binary masks of cells which can be used as input to cellshape-helper to test our point cloud generation code.

Data structure

We suggest testing our code on the data contained in SamplePointCloudData.zip. This data is structured in the following way:

cellshapeSamplePointCloudDatset/ small_data.csv Plate1/ stacked_pointcloud/ Binimetinib/ 0010_0120_accelerator_20210315_bakal01_erk_main_21-03-15_12-37-27.ply ... Blebbistatin/ ... Plate2/ stacked_pointcloud/ Plate3/ stacked_pointcloud/ This data structure is only necessary if wanting to use our data. If you would like to use your own dataset, you may structure it in any way as long as the extension of the point clouds are .ply. If using your own data structure, please define the parameter --dataset_type as "Other".

Usage

The following steps assume that one already has point cloud representations of cells or nuclei. If you need to generate point clouds from 3D binary masks, please go to cellshape-helper.

Downloading the dataset

We suggest testing our code on the data contained in SamplePointCloudData.zip. Please download the data and unzip the contents into a directory of your choice. We recommend doing this in your ~Documents/ folder. This is used as parameters in the steps below, so please remember where you download the data to. Downloading and unzipping the data can be done in the terminal. You might need to first install wget and unzip with apt-get (e.g. apt-get install wget). 1. Download the data into the ~/Documents/ folder with wget bash cd ~/Documents wget https://sandbox.zenodo.org/record/1080300/files/SamplePointCloudDataset.zip 2. Unzip the data with unzip: bash unzip SamplePointCloudDataset.zip

This will create a directory called cellshapeSamplePointCloudDatset under your ~Documents/ folder, i.e. /home/USER/Documents/cellshapeSamplePointCloudDatset/ (USER will be different for you).

Training

The training procedure follows two steps: 1. Training the dynamic graph convolutional foldingnet (DFN) autoencoder to automatically learn shape features. 2. Adding the clustering layer to refine shape features and learn shape classes simultaneously.

Inference can be done after each step.

Our training functions are run through a command line interface with the command cellshape-train. For help on all command line options, run the following in the terminal: bash cellshape-train -h

1. Train DFN autoencoder

The first step trains the autoencoder without the additional clustering layer. Run the following in the terminal. Remember to change the --cloud_dataset_path, --dataframe_path, and --output_dir parmaeters to be specific to your directories, if you have saved the data somewhere else. To test the code, we train for 5 epochs. First make sure you're in the directory where you downloaded the data to. If this is your ~/Documents/ folder, go into this: bash cd ~/Documents ` Then run the following:

bash cellshape-train \ --model_type "cloud" \ --pretrain "True" \ --train_type "pretrain" \ --cloud_dataset_path "./cellshapeSamplePointCloudDataset/" \ --dataset_type "SingleCell" \ --dataframe_path "./cellshapeSamplePointCloudDataset/small_data.csv" \ --output_dir "./cellshapeOutput/" \ --num_epochs_autoencoder 5 \ --encoder_type "dgcnn" \ --decoder_type "foldingnetbasic" \ --num_features 128 \

This step will create an output directory /home/USER/Documents/cellshapeOutput/ with the subfolders: nets, reports, and runs which contain the model weights, logged outputs, and tensorboard runs, respectively, for each experiment. Each experiment is named with the following convention {encoder_type}_{decoder_type}_{num_features}_{train_type}_{xxx}, where {xxx} is a counter. For example, if this was the first experiment you have run, the trained model weights will be saved to: /home/USER/Documents/cellshapeOutput/nets/dgcnn_foldingnetbasic_128_pretrained_001.pt. This path will be used in the next step for the --pretrained-path parameter.

2. Add clustering layer to refine shape features and learn shape classes simultaneously

The next step is to add the clustering layer to refine the model weights. As before, run the following in the terminal. Remember to change the --cloud_dataset_path, --dataframe_path, --output_dir, and --pretrained-path parmaeters to be specific to your directories. If you have followed the previous steps, then you will still be in the ~Documents/ path. In the same terminal, run:

bash cellshape-train \ --model_type "cloud" \ --train_type "DEC" \ --pretrain False \ --cloud_dataset_path "./cellshapeSamplePointCloudDataset/" \ --dataset_type "SingleCell" \ --dataframe_path "./cellshapeSamplePointCloudDataset/small_data.csv" \ --output_dir "./cellshapeOutput/" \ --num_features 128 \ --num_clusters 5 \ --pretrained_path "./cellshapeOutput/nets/dgcnn_foldingnetbasic_128_pretrained_001.pt" \

To monitor the training using Tensorboard, in a new terminal run: bash pip install tensorboard cd ~/Documents tensorboard --logdir "./cellshapeOutput/runs/"

Alternatively, the training steps can be run sequentially through one command line

This would be to state that you would like to pretrain and that you want to train DEC. bash cellshape-train \ --model_type "cloud" \ --train_type "DEC" \ --pretrain True \ --cloud_dataset_path "./cellshapeSamplePointCloudDataset/" \ --dataset_type "SingleCell" \ --dataframe_path "./cellshapeSamplePointCloudDataset/small_data.csv" \ --output_dir "./cellshapeOutput/" \ --num_features 128 \ --num_clusters 5 \

Inference

Example inference notebooks can be found in the docs/notebooks/ folder.

Issues

If you have any problems, please raise an issue here

Citation

bibtex @article{DeVries2022single, author = {Matt De Vries and Lucas Dent and Nathan Curry and Leo Rowe-Brown and Vicky Bousgouni and Adam Tyson and Christopher Dunsby and Chris Bakal}, title = {3D single-cell shape analysis using geometric deep learning}, elocation-id = {2022.06.17.496550}, year = {2023}, doi = {10.1101/2022.06.17.496550}, publisher = {Cold Spring Harbor Laboratory}, URL = {https://www.biorxiv.org/content/early/2023/03/27/2022.06.17.496550}, eprint = {https://www.biorxiv.org/content/early/2023/03/27/2022.06.17.496550.full.pdf}, journal = {bioRxiv} }

References

[1] An Tao, 'Unsupervised Point Cloud Reconstruction for Classific Feature Learning', GitHub Repo, 2020

Owner

  • Name: Sentinal4D
  • Login: Sentinal4D
  • Kind: organization
  • Email: mattdevries.ai@gmail.com
  • Location: United Kingdom

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "De Vries"
  given-names: "Matt"
  orcid: "https://orcid.org/0000-0002-4098-1611"
title: "cellshape"
version: 1.0.0
date-released: 2022-09-13
url: "https://github.com/Sentinal4D/cellshape"
preferred-citation: 
  type: article
  authors:
  - family-names: "De Vries"
    given-names: "Matt"
    orcid: "https://orcid.org/0000-0002-4098-1611"
  - family-names: "Dent"
    given-names: "Lucas"
    orcid: "https://orcid.org/0000-0001-8573-4617"
  - family-names: "Curry"
    given-names: "Nathan"
    orcid: "https://orcid.org/0000-0001-7642-8036"
  - family-names: "Rowe-Brown"
    given-names: "Leo"
    orcid: "https://orcid.org/0000-0002-0104-8052"
  - family-names: "Tyson"
    given-names: "Adam"
    orcid: "https://orcid.org/0000-0003-3225-1130"
  - family-names: "Dunsby"
    given-names: "Chris"
    orcid: "https://orcid.org/0000-0001-8782-0885"
  - family-names: "Bakal"
    given-names: "Chris"
    orcid: "https://orcid.org/0000-0002-0413-6744"
  doi: "10.1101/2022.06.17.496550"
  journal: "biorxiv"
  month: 6
  title: "3D single-cell shape analysis of cancer cells using geometric deep learning"
  year: 2022

GitHub Events

Total
  • Watch event: 5
Last Year
  • Watch event: 5

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 6
  • Total pull requests: 5
  • Average time to close issues: 9 days
  • Average time to close pull requests: about 9 hours
  • Total issue authors: 3
  • Total pull request authors: 1
  • Average comments per issue: 1.33
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • DeVriesMatt (4)
  • alxndrkalinin (1)
  • caylamiller (1)
Pull Request Authors
  • DeVriesMatt (5)
Top Labels
Issue Labels
good first issue (3) enhancement (2) documentation (2) help wanted (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 139 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 41
  • Total maintainers: 2
pypi.org: cellshape

3D shape analysis using deep learning

  • Versions: 41
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 139 Last month
Rankings
Dependent packages count: 6.6%
Stargazers count: 15.3%
Forks count: 17.3%
Average: 17.6%
Downloads: 18.4%
Dependent repos count: 30.6%
Maintainers (2)
Last synced: 6 months ago

Dependencies

.github/workflows/test_and_deploy.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • actions/setup-python v1 composite