https://github.com/alan-turing-institute/grace

Graph Representation Analysis for Connected Embeddings

https://github.com/alan-turing-institute/grace

Science Score: 20.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    4 of 9 committers (44.4%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.1%) to scientific vocabulary

Keywords

computer-vision data-science feature-extraction graphical-models image-processing latent-representations machine-learning neural-networks object-detection

Keywords from Contributors

hut23 hut23-1205
Last synced: 6 months ago · JSON representation

Repository

Graph Representation Analysis for Connected Embeddings

Basic Info
  • Host: GitHub
  • Owner: alan-turing-institute
  • License: other
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 60.5 MB
Statistics
  • Stars: 34
  • Watchers: 5
  • Forks: 1
  • Open Issues: 50
  • Releases: 0
Topics
computer-vision data-science feature-extraction graphical-models image-processing latent-representations machine-learning neural-networks object-detection
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Ruff Black pre-commit Actions status

GRACE - Graph Representation Analysis for Connected Embeddings 🌐 📊 🤓

project logo

This grace repository contains a Python library 🐍 for identification of patterns in imaging data. The package provides a method 🖥️ to find connected objects & regions of interest in images by constructing graph-like representations 🌐 .

Read more about: + the science behind this project 👩‍🔬👨‍🔬, + the workflow of the individual steps 👩‍💻👨‍💻 + the contributors participating in the project design + how to bring in your own ideas + how to provide your feedback + don't forget to give us a '⭐' -> 😉


Science

The acronym grace stands for G raph R epresentation A nalysis for C onnected E mbeddings 📈📉. This tool was developed by researchers as a scientific project at The Alan Turing Institute in the Data Science for Science programme.

As the initial use case, we (see the list of contributors below) developed grace for localising filaments in cryo-electron microscopy (cryoEM) imaging datasets as an image processing tool that automatically identifies filamentous proteins and locates the regions of interest, an accessory or binding protein.

Find out more details about the project aims & objectives here & here or visit the citation panel below to check out the overarching research projects.


Workflow

workflow steps

The grace workflow consists of the following steps:

  1. Image data acquisition (e.g. cryo-electron microscopy)
  2. Object detection via bounding boxes (e.g. crYOLO, RELION, or FasterRCNN)
  3. Organisation of the bounding boxes as nodes connected via edges as a 2D graph structure (e.g. Delaunay triangulation)
  4. Cropping of image patches (at various scales) from each bounding box detected in the image
  5. Latent feature extraction from image patches (e.g. pre-trained neural network, such as ResNet-152)
  6. 'Human-in-the-loop' annotation of the desired pattern in the image data (see the napari plugin below)
  7. Classification of each 'nodeness' and 'edgeness' confidence via deep neural network classifiers. The neural network can be applied to a full graph, or subgraphs around each node (e.g. using immediate 1-hop neighbourhood).
  8. Combinatorial optimisation via integer linear programming (ILP) to connect the candidate object nodes via edges (see the expected outcomes below)
  9. Quantitative evaluation of the filament detection performance
  10. Ta-da! 🥳

Installation

grace has been tested with Python 3.8+ on OS X.

For local development, clone the repo and install in editable mode following these guidelines:

Note: Choose which conda environment you'd like to use: * If you need to annotate your data (images) in napari, we recommend the grace-env-with-napari -> environment-with-napari.yaml * If you rely on previously annotated data and do not require napari, we recommend the grace-env-napari-free -> environment-napari-free.yaml

Specify your preference & follow the steps below:

```sh

clone the grace GitHub repository

git clone https://github.com/alan-turing-institute/grace.git cd ./grace

create a conda playground from the respective environment.yaml

conda env create -f YOUR-CHOSEN-ENVIRONMENT.yaml

To activate this environment, use

$ conda activate grace-env-with-napari

OR

$ conda activate grace-env-napari-free

To deactivate an active environment, use

$ conda deactivate

conda activate grace-env-OF-YOUR-CHOICE

install grace from local folder (not on pypi yet)

pip install -e ".[dev]"

install pre-commit separately

conda install -c conda-forge pre_commit

follow the hooks from .pre-commit-config.yaml

pre-commit install

```

Note: when exporting your own grace conda environment, use the following:

sh conda env export --no-builds > new_environment.yaml

This will allow environments to be shared between different platforms and OS. For a new install with a grace version not on pypi, please remove grace from the requirements under pip within the newly created yaml file.


If you currently do not have any data to test / implement GRACE on, have a look at the option of simulating a synthetic dataset as described in this README. An accessible link to some pre-annotated simulated images is coming soon! 🚧

Annotator GUI

Our repository contains a graphical user interface (GUI) which allows the user to manually annotate the regions of interests (motifs) in their cryo-EM data.

To test the annotator, make sure you've installed the repository using the annotation environment & navigate to:

sh python examples/show_data.py

https://user-images.githubusercontent.com/48791041/233156173-cf2a69d3-d4be-4ba1-ae57-aebf6b9501cc.mov

Demonstration of the napari widget to annotate cryo-EM images.

The recording above 👆 shows a napari-based GUI widget for annotation of the desired motifs, in our case, filamentous proteins. Follow these steps to test the plugin out:

  1. Build the graph from all vertices (node, white circle) using the 'build graph' function in the right-hand panel.
  2. Navigates the triangulated graph by zooming in/out or moving along the image from either the 'nodes_...' or 'edges_...' layer list.
  3. Choose the 'annotation_...' layer in the left-hand layer list and click on the 'brush'🖌️ icon at the top of the layer control.
  4. Annotate nodes belonging to object instances by drawing over the nodes in a continuous line.
  5. Identify edges within connected objects (green 🟩 lines) versus edges outside of annotated objects (magenta 🟪 lines) by cutting the graph using the 'cut graph' function in the right-hand panel.
  6. In case of an annotation error ❌, choose the eraser icon at the top of the layer control to erase incorrect annotations. Re-cut the graph until you are happy with the overall annotation of the image.
  7. Note: Not every single node / object has to be accounted for when annotating, take it easy 😎.
  8. Once happy with the annotations, save them out by exporting via the 'export...' button on the right-hand side. Inversely, you can load previously saved annotations using the 'import...' button.
  9. Ta-da! 🥳

Outcomes

🚧 Work in progress 🚧

The expected outcome of the grace workflow is to identify all connected objects as individual filament instances. We tested the combinatorial optimisation step on simulated data with 3 levels of 'line-seeding' densities: dense, medium and sparse.

optimising dummy graphs

As you can see, the optimiser works well to identify filamentous object instances simulated at various densities, and appears to work across object cross-overs (middle image, pink objects).

More details about how this type of graph representation analysis could be applied to other image data processing will become available soon - stay tuned! 😎👌


Contributors

Methodology / software development [The Alan Turing Institute]:

Dataset generation / processing [The University of Bristol]:

...and many others...

If you'd like to contribute to our ongoing work, please do not hesitate to let us know your suggestions for potential improvements by raising an issue on GitHub.


Citation

🚧 Work in progress 🚧

Project:ML_for_CryoEM

Project:Mol_Structures

We are currently writing up our methodology and key results, so please stay tuned for future updates!

In the meantime, please use the template below to cite our work:

@unpublished{grace_repository, year = {2023}, month = {April}, publisher = {{CCP-EM} Collaborative Computational Project for Electron cryo-Microscopy}, howpublished = {Paper presented at the 2023 {CCP-EM} Spring Symposium}, url = {https://www.ccpem.ac.uk/downloads/symposium/ccp-em_symp_schedule_2023.pdf}, author = {Beatriz Costa-Gomes, Kristina Ulicna, Christorpher Soelistyo, Marjan Famili, Alan Lowe​}, title = {Deconstructing cryoEM micrographs with a graph-based analysis for effective structure detection}, abstract = {Reliable detection of structures is a fundamental step in analysis of cryoEM micrographs. Despite intense developments of computational approaches in recent years, time-consuming hand annotating remains inevitable and represents a rate-limiting step in the analysis of cryoEM data samples with heterogeneous objects. Furthermore, many of the current solutions are constrained by image characteristics: the large sizes of individual micrographs, the need to perform extensive re-training of the detection models to find objects of various categories in the same image dataset, and the presence of artefacts that might have similar shapes to the intended targets. To address these challenges, we developed GRACE (Graph Representation Analysis for Connected Embeddings), a computer vision-based Python package for identification of structural motifs in complex imaging data. GRACE sources from large images populated with low-fidelity object detections to build a graph representation of the entire image. This global graph is then traversed to find structured regions of interest via extracting latent node representations from the local image patches and connecting candidate objects in a supervised manner with a graph neural network. Using a human-in-the-loop approach, the user is encouraged to annotate the desired motifs of interest, making our tool agnostic to the type of object detections. The user-nominated structures are then localised and connected using a combinatorial optimisation step, which uses the latent embeddings to decide whether the graph nodes belong to an object instance. Importantly, GRACE reduces the search space from millions of pixels to hundreds of nodes, which allows for fast and efficient implementation and potential tool customisation. In addition, our method can be repurposed to search for different motifs of interest within the same dataset in a significantly smaller time scale to the currently available open-source methods. We envisage that our end-to-end approach could be extended to other types of imaging data where object segmentation and detection remains challenging.} }


Happy graphing! 🎮

  • Your GRACE development research team 👋

Owner

  • Name: The Alan Turing Institute
  • Login: alan-turing-institute
  • Kind: organization
  • Email: info@turing.ac.uk

The UK's national institute for data science and artificial intelligence.

GitHub Events

Total
  • Fork event: 1
Last Year
  • Fork event: 1

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 595
  • Total Committers: 9
  • Avg Commits per committer: 66.111
  • Development Distribution Score (DDS): 0.363
Past Year
  • Commits: 593
  • Committers: 9
  • Avg Commits per committer: 65.889
  • Development Distribution Score (DDS): 0.361
Top Committers
Name Email Commits
KristinaUlicna k****a@t****k 379
Chris-Soelistyo c****5@u****k 93
quantumjot c****e@a****k 58
KristinaUlicna k****8@u****k 18
Kristina Ulicna k****a@t****m 18
mooniean b****s@g****m 17
Alden Conner a****r@t****k 9
Kristina Ulicna 4****a 2
marjanfamili m****i@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 164
  • Total pull requests: 50
  • Average time to close issues: 29 days
  • Average time to close pull requests: 9 days
  • Total issue authors: 6
  • Total pull request authors: 5
  • Average comments per issue: 0.18
  • Average comments per pull request: 1.56
  • Merged pull requests: 40
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 164
  • Pull requests: 50
  • Average time to close issues: 29 days
  • Average time to close pull requests: 9 days
  • Issue authors: 6
  • Pull request authors: 5
  • Average comments per issue: 0.18
  • Average comments per pull request: 1.56
  • Merged pull requests: 40
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • KristinaUlicna (140)
  • quantumjot (11)
  • mooniean (6)
  • marjanfamili (3)
  • chris-soelistyo (3)
  • crangelsmith (1)
Pull Request Authors
  • KristinaUlicna (44)
  • mooniean (2)
  • chris-soelistyo (2)
  • marjanfamili (1)
  • quantumjot (1)
Top Labels
Issue Labels
enhancement (62) methodology (61) documentation (22) bug (22) visualisation (8) question (8) help wanted (8) setup (6) tests (5) duplicate (1) good first issue (1)
Pull Request Labels
documentation (24) enhancement (17) methodology (15) bug (8) tests (8) visualisation (1)

Dependencies

.github/workflows/workflow.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
requirements.txt pypi
  • PyYAML ==6.0
  • click ==8.1.3
  • cvxopt ==1.3.1
  • magicgui ==0.7.2
  • matplotlib ==3.7.1
  • mrcfile ==1.4.3
  • networkx ==3.1
  • numpy ==1.24.2
  • opencv_python ==4.7.0.72
  • pandas ==2.0.1
  • pyarrow ==11.0.0
  • pytest ==7.3.1
  • qtpy ==2.3.1
  • scikit_learn ==1.2.2
  • scipy ==1.10.1
  • seaborn ==0.12.2
  • starfile ==0.4.12
  • tifffile ==2023.4.12
  • torch ==2.0.0
  • torch_geometric ==2.3.1
  • torchvision ==0.15.1
pyproject.toml pypi