tyc-dataset

Official and maintained implementation of the dataset paper "The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures" [ICCVW 2023].

https://github.com/christophreich1996/tyc-dataset

Keywords

cell-morphology cell-segmentation computer-vision dataset evaluation evaluation-metrics iccv iccvw instance-segmentation optical-flow panoptic-segmentation representation-learning supervised-learning unsupervised-learning yeast yeast-dataset

Last synced: 6 months ago · JSON representation ·

Repository

Official and maintained implementation of the dataset paper "The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures" [ICCVW 2023].

Basic Info

Host: GitHub
Owner: ChristophReich1996
License: cc-by-4.0
Language: Python
Default Branch: main
Homepage: https://christophreich1996.github.io/tyc_dataset/
Size: 1.44 MB

Statistics

Stars: 5
Watchers: 1
Forks: 2
Open Issues: 0
Releases: 0

Topics

cell-morphology cell-segmentation computer-vision dataset evaluation evaluation-metrics iccv iccvw instance-segmentation optical-flow panoptic-segmentation representation-learning supervised-learning unsupervised-learning yeast yeast-dataset

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

README.md

The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures

Christoph Reich , Tim Prangemeier & Heinz Koeppl

| Project Page | Paper | Download Dataset |

This repository includes the official and maintained PyTorch validation (+ data loading & visualization) code of the TYC dataset proposed in The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures.

```

Download labeled set

wget https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3930/labeled_set.zip

Download unlabeled set

wget https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3930/unlabeledset1.zip wget https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3930/unlabeledset2.zip wget https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3930/unlabeledset3.zip

Unzip files

unzip labeledset.zip unzip unlabeledset1.zip unzip unlabeledset2.zip unzip unlabeledset_3.zip ```

Abstract

Segmenting cells and tracking their motion over time is a common task in biomedical applications. However, predicting accurate instance-wise segmentation and cell motions from microscopy imagery remains a challenging task. Using microstructured environments for analyzing single cells in a constant flow of media adds additional complexity. While large-scale labeled microscopy datasets are available, we are not aware of any large-scale dataset, including both cells and microstructures. In this paper, we introduce the trapped yeast cell (TYC) dataset, a novel dataset for understanding instance-level semantics and motions of cells in microstructures. We release 105 dense annotated high-resolution brightfield microscopy images, including about 19k instance masks. We also release 261 curated video clips composed of 1293 high-resolution microscopy images to facilitate unsupervised understanding of cell motions and morphology. TYC offers ten times more instance annotations than the previously largest dataset, including cells and microstructures. Our effort also exceeds previous attempts in terms of microstructure variability, resolution, complexity, and capturing device (microscopy) variability. We facilitate a unified comparison on our novel dataset by introducing a standardized evaluation strategy. TYC and evaluation code are publicly available under CC BY 4.0 license.

If you use our dataset or find this research useful in your work, please cite both of our dataset papers:

```bibtex @inproceedings{Reich2023b, title={{The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures}}, author={Reich, Christoph and Prangemeier, Tim and Koeppl, Heinz}, booktitle={{IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)}}, year={2023} }

@inproceedings{Reich2023a, title={{An Instance Segmentation Dataset of Yeast Cells in Microstructures}}, author={Reich, Christoph and Prangemeier, Tim and Fran{\c{c}}ani, Andr{\'e} O and Koeppl, Heinz}, booktitle={{International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)}}, year={2023} } ```

Installation

The validation, data loading, and visualization code can be installed as a Python package by running:

shell script pip install git+https://github.com/ChristophReich1996/TYC-Dataset.git

All dependencies are listed in requirements.txt.

Dataformat

The dataset is split into a training, validation, and test set. Please refer to the paper for more information on this.

├── labeled_set │ ├── test │ │ ├── images │ │ └── labels | | ├── classes │ | └── masks │ ├── test_ood │ │ ├── images │ │ └── labels | | ├── classes │ | └── masks │ ├── train │ │ ├── images │ │ └── labels | | ├── classes │ | └── masks │ └── val │ ├── images │ └── labels | ├── classes │ └── masks | └── unlabeled_set ├── 100x_10BF_200EGFP18_9Z_17P_120t_10s_050nM_1 ├── 100x_10BF_200GFP_9Z_120t_10s_13P_005nM_2 ├── 2018-02-13_AH_T2C_AH_T2C_60x__2 ├── 2018-02-13_AH_T2C_AH_T2C_60x__4 ├── 20211015 ├── 20211104 ├── 60x_10BF_200GFP_200RFP20_3Z_10min_3 ├── BCS_Data-master_Additional_Data_Z_Project_BCS └── SegmentationPaperDeposit

Every subset (train, val, test, and test ood) of the labeled set includes a folder holding the images and labels. The images entail a resolution of 2048x2048 or larger and are directly located in the images folder. The labels folder contains two folders (classes and masks). The classes folder holds the semantic class labels of the object instances for each image as a separate JSON file. The mask folder holds an image including the instance masks for each micorcopy image. The resolution of the mask image is the same as that of the microcopy image.

The unlabeled set holds different folders for each experiment (e.g., 20211015). Each experiment folder holds multiple folders indicating the position of the video clips. Each position folder holds a folder indicating the focal position. The focal position folders hold the video clip itself as a sequence of TIFF images.

For details on the data loading please have a look at the dataset class implementation.

Dataset Class

This repo includes a PyTorch dataset class implementation (in the tyc_dataset.data module) for the labeled TYC dataset, located in the module tyc_dataset.data. The dataset class implementation loads the dataset and returns the images, instance maps, bounding boxes, and semantic classes.

```python import tyc_dataset from torch import Tensor from torch.utils.data import Dataset

Init dataset

dataset: Dataset = tycdataset.data.TYCDataset(path="/somepathtodata/train")

Get first sample of the dataset

image, instances, class_labels = dataset[0] # type: Tensor, Tensor, Tensor

Show shapes

print(image.shape) # [1, H, W] print(instances.shape) # [N, H, W] print(class_labels) # [N, C=2 (trap=0 and cell=1)] ```

The dataset class implementation also offers support for custom Kornia data augmentations. You can pass an AugmentationSequential object to the dataset class. The following example utilizes random horizontal and vertical flipping as well as random Gaussian blur augmentations.

```python import kornia.augmentation import tyc_dataset from torch.utils.data import Dataset

Init augmentations

augmentations = kornia.augmentation.AugmentationSequential( kornia.augmentation.RandomHorizontalFlip(p=0.5), kornia.augmentation.RandomVerticalFlip(p=0.5), kornia.augmentation.RandomGaussianBlur(kernelsize=(31, 31), sigma=(9, 9), p=0.5), datakeys=["input", "mask"], sameonbatch=False, )

Init dataset

dataset: Dataset = tycdataset.data.TYCDataset(path="/somepathtodata/train", augmentations=augmentations) ```

Note that it is necessary to pass ["input", "mask"] as data keys! If a different data key configuration is given a runtime error is raised.

For wrapping the dataset with the PyTorch DataLoader please use the custom collide function.

```python from typing import List

import tyc_dataset from torch import Tensor from torch.utils.data import Dataset, DataLoader

Init dataset

dataset: Dataset = tycdataset.data.TYCDataset(path="/somepathtodata/train") dataloader = DataLoader( dataset=dataset, numworkers=2, batchsize=2, droplast=True, collatefn=tycdataset.data.collatefunctiontyc_dataset, )

Get a sample from dataloader

images, instances, classlabels = next( iter(dataloader)) # type: List[Tensor], List[Tensor], List[Tensor] ```

All Dataset Class Parameters

[TYCDataset](tyc_dataset/data/dataset.py) parameters: | Parameter | Default value | Info | |------------------------------------------------------|---------------------------------|--------------------------------------------------------------------| | `path: str` | - | Path to dataset as a string. | | `augmentations: Optional[AugmentationSequential]` | `None` | Augmentations to be used. If `None` no augmentation is employed. |

We provide a full dataset and data loader example in example_eval.py.

If this dataset class implementation is not sufficient for your application please customize the existing code or open a pull request with extending the existing implementation.

Evaluation

We propose to validate segmentation predictions on our dataset by using the Panoptic Quality and the cell class IoU. We implement both metrics as a TorchMetrics metric in the tyc_dataset.eval module. Both metrics (PanopticQuality and CellIoU) can be used like all TorchMetrics metrics. The input to both metrics is the prediction, composed of the instance maps (list of tensors) and semantic class prediction (list of tensors), and the label is also composed of instance maps and semantic classes. Note that the instance maps are not allowed to overlap. Additionally, both metrics assume thresholded instance maps and hard semantic classes (no logits).

Note currently only a batch size of 1 is supported during validation!

```python import tyc_dataset from torchmetrics import Metric

pq: Metric = tycdataset.eval.PanopticQuality() celliou: Metric = tyc_dataset.eval.CellIoU()

for index, (images, instances, boundingboxes, classlabels) in enumerate(dataloader): # Make prediction instancespred, classlabelspred = model( images) # type: List[Tensor], List[Tensor], List[Tensor] # Get semantic classes form one-hot vector classlabels = [c.argmax(dim=-1) for c in classlabels] classlabelspred = [c.argmax(dim=-1) for c in classlabelspred] # Compute metrics pq.update( instancespred=instancespred, classespred=classlabelspred, instancestarget=instances, classestarget=classlabels, ) celliou.update( instancespred=instancespred, classespred=classlabelspred, instancestarget=instances, classestarget=class_labels, )

Compute final metric

print(f"Panoptic Quality: {pq.compute().item()}") print(f"Cell class IoU: {cell_iou.compute().item()}") ```

A full working example is provided in example_eval.py.

Visualization

This implementation (tyc_dataset.vis module) also includes various functions for reproducing the plots from the paper. The instance segmentation overlay (image + instance maps + classes), as shown at the top, can be achieved by:

```python import tyc_dataset from torch import Tensor from torch.utils.data import Dataset

Init dataset

dataset: Dataset = tycdataset.data.TYCDataset(path="/somepathtodata/train")

Get first sample of the dataset

image, instances, class_labels = dataset[0] # type: Tensor, Tensor, Tensor

Plot

tycdataset.vis.plotimageinstances( image=image, instances=instances, classlabels=classlabels.argmax(dim=1) + 1, save=True, show=True, filepath="plotinstanceseg_overlay.png", )

```

All plot functions entail the parameter show: bool and save: bool. If show=True the plot is directly visualized by calling plt.show(). If you want to save the plot to a file set save=True and provide the path and file name (file_path: str).

An example use of all visualization functions is provided in example_vis.py.

Unlabeled Data

Due to the vastly different usage of our unlabeled video clips, we don't provide a dataset class implementation. However, you can load the individual frames of the provided video clips by using cv2.imread(path_to_image, -1). Note that currently torchvision does not support loading 16-bit TIFF images. For details on the dataformat of the unlabeled set please refer to the dataformat section.

Dataset Statistics

We do not normalize the loaded images in the provided dataset classes. If you want to normalize the images you might find the following statistics helpful. Note that all images are provided (and loaded) in the raw 16-bit TIFF format.

| Dataset | Mean | Std | |---------------|-----------|----------| | Labeled set | 4818.1709 | 685.2440 | | Unlabeled set | 6208.7651 | 949.4420 | | Full dataset | 6104.3213 | 929.5987 |

Acknowledgements

We thank Bastian Alt for insightful feedback, Klaus-Dieter Voss for aid with the microfluidics fabrication, Markus Baier for help with the data hosting, and Aigerim Khairullina for contributing to data labeling.

Credit to TorchMetrics (Lightning AI) , Kornia, and PyTorch for providing the basis of this implementation.

This work was supported by the Landesoffensive für wissenschaftliche Exzellenz as part of the LOEWE Schwerpunkt CompuGene. H.K. acknowledges the support from the European Research Council (ERC) with the consolidator grant CONSYN ( nr. 773196). C.R. acknowledges the support of NEC Laboratories America, Inc.

Owner

Name: Christoph Reich
Login: ChristophReich1996
Kind: user
Location: Germany
Company: Technical University of Munich

Website: christophreich1996.github.io
Twitter: ChristophR1996
Repositories: 41
Profile: https://github.com/ChristophReich1996

ELLIS Ph.D. Student @ Technical University of Munich, Technische Universität Darmstadt & University of Oxford | Prev. NEC Labs

Citation (CITATION.cff)

cff-version: 1.2.0
message: "Code of the paper: The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures"
authors:
  - family-names: Reich
    given-names: Christoph
  - family-names: Prangemeier
    given-names: Tim
  - family-names: Koeppl
    given-names: Heinz
title: "The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures"
version: 0.1.0
date-released: 2023-08-22

GitHub Events

Total

Watch event: 3

Last Year

Watch event: 3

Committers

Last synced: about 2 years ago

All Time

Total Commits: 7
Total Committers: 2
Avg Commits per committer: 3.5
Development Distribution Score (DDS): 0.143

Past Year

Commits: 7
Committers: 2
Avg Commits per committer: 3.5
Development Distribution Score (DDS): 0.143

Top Committers

Name	Email	Commits
Christoph Reich	3****6	6
ChristophReich1996	c**h@g**t	1

Committer Domains (Top 20 + Academic)

gmx.net: 1

Issues and Pull Requests

Last synced: almost 2 years ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

tyc-dataset

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures

| Project Page | Paper | Download Dataset |

Download labeled set

Download unlabeled set

Unzip files

Abstract

Table of Contents

Installation

Dataformat

Dataset Class

Init dataset

Get first sample of the dataset

Show shapes

Init augmentations

Init dataset

Init dataset

Get a sample from dataloader

Evaluation

Compute final metric

Visualization

Init dataset

Get first sample of the dataset

Plot

Unlabeled Data

Dataset Statistics

Acknowledgements

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies