distributed-cellprofiler
Run encapsulated docker containers with CellProfiler in the Amazon Web Services infrastructure.
https://github.com/distributedscience/distributed-cellprofiler
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Keywords
Repository
Run encapsulated docker containers with CellProfiler in the Amazon Web Services infrastructure.
Basic Info
- Host: GitHub
- Owner: DistributedScience
- License: other
- Language: Python
- Default Branch: master
- Homepage: https://distributedscience.github.io/Distributed-CellProfiler/
- Size: 9.81 MB
Statistics
- Stars: 41
- Watchers: 8
- Forks: 26
- Open Issues: 14
- Releases: 6
Topics
Metadata Files
README.md
Distributed-CellProfiler
Run encapsulated docker containers with CellProfiler in the Amazon Web Services infrastructure.
This code is an example of how to use AWS distributed infrastructure for running CellProfiler. The configuration of the AWS resources is done using boto3 and the AWS CLI. The worker is written in Python and is encapsulated in a docker container. There are four AWS components that are minimally needed to run distributed jobs:
- An SQS queue
- An ECS cluster
- An S3 bucket
- A spot fleet of EC2 instances
All of them can be managed through the AWS Management Console. However, this code helps to get started quickly and run a job autonomously if all the configuration is correct. The code prepares the infrastructure to run a distributed job. When the job is completed, the code is also able to stop resources and clean up components. It also adds logging and alarms via CloudWatch, helping the user troubleshoot runs and destroy stuck machines.
Documentation
Comprehensive documentation, including troubleshooting, is available at Distributed CellProfiler Documentation.
Running the code
Step 1
Edit the config.py file with all the relevant information for your job. Then, start creating the basic AWS resources by running the following script:
$ python run.py setup
This script initializes the resources in AWS. Notice that the docker registry is built separately, and you can modify the worker code to build your own. Any time you modify the worker code, you need to update the docker registry using the Makefile script inside the worker directory.
Step 2
After the first script runs successfully, the job can now be submitted to AWS using EITHER of the following commands:
$ python run.py submitJob files/exampleJob.json
OR
$ python runbatchgeneral.py
Running either script uploads the tasks that are configured in the json file.
This assumes that your data is stored in S3, and the json file has the paths to find input and output directories.
You have to customize the exampleJob.json file or the run_batch_general.py file with paths that make sense for your project.
The tasks that compose your job are CP groups, and each one will be run in parallel.
You need to define each task in your input file to guide the parallelization.
Step 3
After submitting the job to the queue, we can add computing power to process all tasks in AWS. This code starts a fleet of spot EC2 instances which will run the worker code. The worker code is encapsulated in docker containers, and the code uses ECS services to inject them in EC2. All of this is automated with the following command:
$ python run.py startCluster files/exampleFleet.json
After the cluster is ready, the code informs you that everything is setup, and saves the spot fleet identifier in a file for further reference.
Step 4
When the cluster is up and running, you can monitor progress using the following command:
$ python run.py monitor files/APP_NAMESpotFleetRequestId.json
The file APP_NAMESpotFleetRequestId.json is created after the cluster is setup in step 3. It is important to keep this monitor running if you want to automatically shutdown computing resources when there are no more tasks in the queue (recommended).

Owner
- Name: DistributedScience
- Login: DistributedScience
- Kind: organization
- Website: https://arxiv.org/abs/2210.01073
- Repositories: 5
- Profile: https://github.com/DistributedScience
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite at least McQuin et al. Also site Weisbart et al. for updated DCP functions"
type: software
authors:
- name: 'Imaging Platform, Broad Institute of Harvard and MIT'
city: Cambridge
country: US
repository: https://github.com/DistributedScience/Distributed-CellProfiler
title: "Distributed-CellProfiler"
doi: 10.1371/journal.pbio.2005970
date-released: 2018-07-03
preferred-citation:
type: article
authors:
- family-names: "McQuin"
given-names: "Claire"
orcid: "https://orcid.org/0000-0002-3664-2318"
- family-names: "Goodman"
given-names: "Allen"
orcid: "https://orcid.org/0000-0002-6434-2320"
- family-names: "Chernyshev"
given-names: "Vasiliy"
orcid: "https://orcid.org/0000-0003-2372-7037"
- family-names: "Kamentsky"
given-names: "Lee"
orcid: "https://orcid.org/0000-0002-8161-3604"
- family-names: "Cimini"
given-names: "Beth A."
orcid: "https://orcid.org/0000-0001-9640-9318"
- family-names: "Karhohs"
given-names: "Kyle W."
orcid: "https://orcid.org/0000-0002-5126-5805"
- family-names: "Doan"
given-names: "Minh"
orcid: "https://orcid.org/0000-0002-3235-0457"
- family-names: "Ding"
given-names: "Liya"
- family-names: "Rafelski"
given-names: "Susanne M."
orcid: "https://orcid.org/0000-0002-1399-5970"
- family-names: "Thirstrup"
given-names: "Derek"
orcid: "https://orcid.org/0000-0002-2702-2010"
- family-names: "Wiegraebe"
given-names: "Winfried"
orcid: "https://orcid.org/0000-0002-1099-4817"
- family-names: "Singh"
given-names: "Shantanu"
orcid: "https://orcid.org/0000-0003-3150-3025"
- family-names: "Becker"
given-names: "Tim"
orcid: "https://orcid.org/0000-0001-9615-0799"
- family-names: "Caicedo"
given-names: "Juan C."
orcid: "https://orcid.org/0000-0002-1277-4631"
- family-names: "Carpenter"
given-names: "Anne E."
orcid: "https://orcid.org/0000-0003-1555-8261"
doi: "10.1371/journal.pbio.2005970"
journal: "PLOS Biology"
month: 7
start: 0 # First page number
end: 0 # Last page number
title: "CellProfiler 3.0: Next-generation image processing for biology."
issue: 16
volume: 7
year: 2018
reference:
type: article
authors:
- family-names: "Weisbart"
given-names: "Erin"
orcid: "https://orcid.org/0000-0002-6437-2458"
- family-names: "Cimini"
given-names: "Beth A."
orcid: "https://orcid.org/0000-0001-9640-9318"
doi: "10.1038/s41592-023-01918-8"
journal: "Nature Methods"
month: 6
start: 0 # First page number
end: 0 # Last page number
title: "Distributed-Something: scripts to leverage AWS storage and computing for distributed workflows at scale"
issue: 0
volume: 0
year: 2023
GitHub Events
Total
- Create event: 7
- Issues event: 11
- Release event: 1
- Watch event: 3
- Delete event: 6
- Issue comment event: 14
- Push event: 36
- Pull request event: 18
- Pull request review event: 14
- Pull request review comment event: 11
- Fork event: 2
Last Year
- Create event: 7
- Issues event: 11
- Release event: 1
- Watch event: 3
- Delete event: 6
- Issue comment event: 14
- Push event: 36
- Pull request event: 18
- Pull request review event: 14
- Pull request review comment event: 11
- Fork event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 7
- Total pull requests: 6
- Average time to close issues: 4 months
- Average time to close pull requests: 22 days
- Total issue authors: 3
- Total pull request authors: 3
- Average comments per issue: 0.14
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 7
- Pull requests: 6
- Average time to close issues: 4 months
- Average time to close pull requests: 22 days
- Issue authors: 3
- Pull request authors: 3
- Average comments per issue: 0.14
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ErinWeisbart (12)
- bethac07 (6)
- znavidi (2)
Pull Request Authors
- ErinWeisbart (15)
- bethac07 (2)
- emiglietta (2)
- sugan89 (1)
- gareth-rogers-healx (1)
- Zitong-Chen-16 (1)
- kate-bowers-broad (1)
- znavidi (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- boto3 >=1.0.0
- actions/checkout v2 composite
- actions/setup-python v2 composite
- peaceiris/actions-gh-pages v3.6.1 composite
- cellprofiler/cellprofiler 3.1.9 build
- cellprofiler/cellprofiler 4.2.4 build