tyc-dataset
Official and maintained implementation of the dataset paper "The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures" [ICCVW 2023].
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, scholar.google -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Keywords
Repository
Official and maintained implementation of the dataset paper "The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures" [ICCVW 2023].
Basic Info
- Host: GitHub
- Owner: ChristophReich1996
- License: cc-by-4.0
- Language: Python
- Default Branch: main
- Homepage: https://christophreich1996.github.io/tyc_dataset/
- Size: 1.44 MB
Statistics
- Stars: 5
- Watchers: 1
- Forks: 2
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures
Christoph Reich
, Tim Prangemeier
& Heinz Koeppl
| Project Page | Paper | Download Dataset |
This repository includes the official and maintained PyTorch validation (+ data loading & visualization) code of the TYC dataset proposed in The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures.
```
Download labeled set
wget https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3930/labeled_set.zip
Download unlabeled set
wget https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3930/unlabeledset1.zip wget https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3930/unlabeledset2.zip wget https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3930/unlabeledset3.zip
Unzip files
unzip labeledset.zip unzip unlabeledset1.zip unzip unlabeledset2.zip unzip unlabeledset_3.zip ```
Abstract
Segmenting cells and tracking their motion over time is a common task in biomedical applications. However, predicting accurate instance-wise segmentation and cell motions from microscopy imagery remains a challenging task. Using microstructured environments for analyzing single cells in a constant flow of media adds additional complexity. While large-scale labeled microscopy datasets are available, we are not aware of any large-scale dataset, including both cells and microstructures. In this paper, we introduce the trapped yeast cell (TYC) dataset, a novel dataset for understanding instance-level semantics and motions of cells in microstructures. We release 105 dense annotated high-resolution brightfield microscopy images, including about 19k instance masks. We also release 261 curated video clips composed of 1293 high-resolution microscopy images to facilitate unsupervised understanding of cell motions and morphology. TYC offers ten times more instance annotations than the previously largest dataset, including cells and microstructures. Our effort also exceeds previous attempts in terms of microstructure variability, resolution, complexity, and capturing device (microscopy) variability. We facilitate a unified comparison on our novel dataset by introducing a standardized evaluation strategy. TYC and evaluation code are publicly available under CC BY 4.0 license.
If you use our dataset or find this research useful in your work, please cite both of our dataset papers:
```bibtex @inproceedings{Reich2023b, title={{The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures}}, author={Reich, Christoph and Prangemeier, Tim and Koeppl, Heinz}, booktitle={{IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)}}, year={2023} }
@inproceedings{Reich2023a, title={{An Instance Segmentation Dataset of Yeast Cells in Microstructures}}, author={Reich, Christoph and Prangemeier, Tim and Fran{\c{c}}ani, Andr{\'e} O and Koeppl, Heinz}, booktitle={{International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)}}, year={2023} } ```
Table of Contents
- Installation
- Dataformat
- Dataset Class
- Evaluation
- Visualization
- Unlabeled Data
- Dataset Statistics
- Acknowledgements
Installation
The validation, data loading, and visualization code can be installed as a Python package by running:
shell script
pip install git+https://github.com/ChristophReich1996/TYC-Dataset.git
All dependencies are listed in requirements.txt.
Dataformat
The dataset is split into a training, validation, and test set. Please refer to the paper for more information on this.
├── labeled_set
│ ├── test
│ │ ├── images
│ │ └── labels
| | ├── classes
│ | └── masks
│ ├── test_ood
│ │ ├── images
│ │ └── labels
| | ├── classes
│ | └── masks
│ ├── train
│ │ ├── images
│ │ └── labels
| | ├── classes
│ | └── masks
│ └── val
│ ├── images
│ └── labels
| ├── classes
│ └── masks
|
└── unlabeled_set
├── 100x_10BF_200EGFP18_9Z_17P_120t_10s_050nM_1
├── 100x_10BF_200GFP_9Z_120t_10s_13P_005nM_2
├── 2018-02-13_AH_T2C_AH_T2C_60x__2
├── 2018-02-13_AH_T2C_AH_T2C_60x__4
├── 20211015
├── 20211104
├── 60x_10BF_200GFP_200RFP20_3Z_10min_3
├── BCS_Data-master_Additional_Data_Z_Project_BCS
└── SegmentationPaperDeposit
Every subset (train, val, test, and test ood) of the labeled set includes a folder holding the images and labels. The images entail a resolution of 2048x2048 or larger and are directly located in the images folder. The labels folder contains two folders (classes and masks). The classes folder holds the semantic class labels of the object instances for each image as a separate JSON file. The mask folder holds an image including the instance masks for each micorcopy image. The resolution of the mask image is the same as that of the microcopy image.
The unlabeled set holds different folders for each experiment (e.g., 20211015). Each experiment folder holds multiple folders indicating the position of the video clips. Each position folder holds a folder indicating the focal position. The focal position folders hold the video clip itself as a sequence of TIFF images.
For details on the data loading please have a look at the dataset class implementation.
Dataset Class
This repo includes a PyTorch dataset class implementation (in the tyc_dataset.data module) for the labeled TYC
dataset, located in the module tyc_dataset.data. The dataset class implementation loads the dataset and returns the
images, instance maps, bounding boxes, and semantic classes.
```python import tyc_dataset from torch import Tensor from torch.utils.data import Dataset
Init dataset
dataset: Dataset = tycdataset.data.TYCDataset(path="/somepathtodata/train")
Get first sample of the dataset
image, instances, class_labels = dataset[0] # type: Tensor, Tensor, Tensor
Show shapes
print(image.shape) # [1, H, W] print(instances.shape) # [N, H, W] print(class_labels) # [N, C=2 (trap=0 and cell=1)] ```
The dataset class implementation also offers support for custom Kornia data augmentations. You can pass an AugmentationSequential object to the dataset class. The following example utilizes random horizontal and vertical flipping as well as random Gaussian blur augmentations.
```python import kornia.augmentation import tyc_dataset from torch.utils.data import Dataset
Init augmentations
augmentations = kornia.augmentation.AugmentationSequential( kornia.augmentation.RandomHorizontalFlip(p=0.5), kornia.augmentation.RandomVerticalFlip(p=0.5), kornia.augmentation.RandomGaussianBlur(kernelsize=(31, 31), sigma=(9, 9), p=0.5), datakeys=["input", "mask"], sameonbatch=False, )
Init dataset
dataset: Dataset = tycdataset.data.TYCDataset(path="/somepathtodata/train", augmentations=augmentations) ```
Note that it is necessary to pass ["input", "mask"] as data keys! If a different data key configuration is given a
runtime error is raised.
For wrapping the dataset with the PyTorch DataLoader please use the custom collide function.
```python from typing import List
import tyc_dataset from torch import Tensor from torch.utils.data import Dataset, DataLoader
Init dataset
dataset: Dataset = tycdataset.data.TYCDataset(path="/somepathtodata/train") dataloader = DataLoader( dataset=dataset, numworkers=2, batchsize=2, droplast=True, collatefn=tycdataset.data.collatefunctiontyc_dataset, )
Get a sample from dataloader
images, instances, classlabels = next( iter(dataloader)) # type: List[Tensor], List[Tensor], List[Tensor] ```
All Dataset Class Parameters
[TYCDataset](tyc_dataset/data/dataset.py) parameters: | Parameter | Default value | Info | |------------------------------------------------------|---------------------------------|--------------------------------------------------------------------| | `path: str` | - | Path to dataset as a string. | | `augmentations: Optional[AugmentationSequential]` | `None` | Augmentations to be used. If `None` no augmentation is employed. |We provide a full dataset and data loader example in example_eval.py.
If this dataset class implementation is not sufficient for your application please customize the existing code or open a pull request with extending the existing implementation.
Evaluation
We propose to validate segmentation predictions on our dataset by using
the Panoptic Quality and the cell class IoU. We implement both metrics as a
TorchMetrics metric in the tyc_dataset.eval module. Both metrics (PanopticQuality
and CellIoU) can be used like all TorchMetrics metrics. The input to both metrics is the
prediction, composed of the instance maps (list of tensors) and semantic class prediction (list of tensors), and the
label is also composed of instance maps and semantic classes. Note that the instance maps are not allowed to overlap.
Additionally, both metrics assume thresholded instance maps and hard semantic classes (no logits).
Note currently only a batch size of 1 is supported during validation!
```python import tyc_dataset from torchmetrics import Metric
pq: Metric = tycdataset.eval.PanopticQuality() celliou: Metric = tyc_dataset.eval.CellIoU()
for index, (images, instances, boundingboxes, classlabels) in enumerate(dataloader): # Make prediction instancespred, classlabelspred = model( images) # type: List[Tensor], List[Tensor], List[Tensor] # Get semantic classes form one-hot vector classlabels = [c.argmax(dim=-1) for c in classlabels] classlabelspred = [c.argmax(dim=-1) for c in classlabelspred] # Compute metrics pq.update( instancespred=instancespred, classespred=classlabelspred, instancestarget=instances, classestarget=classlabels, ) celliou.update( instancespred=instancespred, classespred=classlabelspred, instancestarget=instances, classestarget=class_labels, )
Compute final metric
print(f"Panoptic Quality: {pq.compute().item()}") print(f"Cell class IoU: {cell_iou.compute().item()}") ```
A full working example is provided in example_eval.py.
Visualization
This implementation (tyc_dataset.vis module) also includes various functions for reproducing the plots from the paper.
The instance segmentation overlay (image + instance maps + classes), as shown at the top, can be achieved by:
```python import tyc_dataset from torch import Tensor from torch.utils.data import Dataset
Init dataset
dataset: Dataset = tycdataset.data.TYCDataset(path="/somepathtodata/train")
Get first sample of the dataset
image, instances, class_labels = dataset[0] # type: Tensor, Tensor, Tensor
Plot
tycdataset.vis.plotimageinstances( image=image, instances=instances, classlabels=classlabels.argmax(dim=1) + 1, save=True, show=True, filepath="plotinstanceseg_overlay.png", )
```
All plot functions entail the parameter show: bool and save: bool. If show=True the plot is directly visualized by
calling plt.show(). If you want to save the plot to a file set save=True and provide the path and file
name (file_path: str).
An example use of all visualization functions is provided in example_vis.py.
Unlabeled Data
Due to the vastly different usage of our unlabeled video clips, we don't provide a dataset class implementation.
However, you can load the individual frames of the provided video clips by using cv2.imread(path_to_image, -1). Note
that currently torchvision does not support loading 16-bit TIFF images.
For details on the dataformat of the unlabeled set please refer to the dataformat section.
Dataset Statistics
We do not normalize the loaded images in the provided dataset classes. If you want to normalize the images you might find the following statistics helpful. Note that all images are provided (and loaded) in the raw 16-bit TIFF format.
| Dataset | Mean | Std | |---------------|-----------|----------| | Labeled set | 4818.1709 | 685.2440 | | Unlabeled set | 6208.7651 | 949.4420 | | Full dataset | 6104.3213 | 929.5987 |
Acknowledgements
We thank Bastian Alt for insightful feedback, Klaus-Dieter Voss for aid with the microfluidics fabrication, Markus Baier for help with the data hosting, and Aigerim Khairullina for contributing to data labeling.
Credit to TorchMetrics (Lightning AI) , Kornia, and PyTorch for providing the basis of this implementation.
This work was supported by the Landesoffensive für wissenschaftliche Exzellenz as part of the LOEWE Schwerpunkt CompuGene. H.K. acknowledges the support from the European Research Council (ERC) with the consolidator grant CONSYN ( nr. 773196). C.R. acknowledges the support of NEC Laboratories America, Inc.
Owner
- Name: Christoph Reich
- Login: ChristophReich1996
- Kind: user
- Location: Germany
- Company: Technical University of Munich
- Website: christophreich1996.github.io
- Twitter: ChristophR1996
- Repositories: 41
- Profile: https://github.com/ChristophReich1996
ELLIS Ph.D. Student @ Technical University of Munich, Technische Universität Darmstadt & University of Oxford | Prev. NEC Labs
Citation (CITATION.cff)
cff-version: 1.2.0
message: "Code of the paper: The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures"
authors:
- family-names: Reich
given-names: Christoph
- family-names: Prangemeier
given-names: Tim
- family-names: Koeppl
given-names: Heinz
title: "The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures"
version: 0.1.0
date-released: 2023-08-22
GitHub Events
Total
- Watch event: 3
Last Year
- Watch event: 3
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Christoph Reich | 3****6 | 6 |
| ChristophReich1996 | c****h@g****t | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: almost 2 years ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- imzhangyd (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- matplotlib *
- numpy *
- opencv-python *
- torch >=1.0.0
- kornia *
- matplotlib *
- numpy *
- opencv-python *
- torch >=1.0.0
- torchmetrics *