Science Score: 52.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization mcity has institutional domain (mcity.umich.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary
Keywords
Repository
Mcity Data Engine
Basic Info
- Host: GitHub
- Owner: mcity
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://mcity.umich.edu/what-we-do/data-for-ai/
- Size: 22.3 MB
Statistics
- Stars: 16
- Watchers: 4
- Forks: 1
- Open Issues: 53
- Releases: 3
Topics
Metadata Files
README.md
Acknowledgements
Mcity would like to thank Amazon Web Services (AWS) for their pivotal role in providing the cloud infrastructure on which the Data Engine depends. We couldn’t have done it without their tremendous support!
Mcity Data Engine
The Mcity Data Engine is an essential tool in the Mcity makerspace for transportation innovators making AI algorithms and seeking actionable data insights through machine learning. Details on the Data Engine can be found in the Wiki. The data engine supports all stages to continuously improve AI models based on raw visual data:
Online Demo: Data Selection with Embeddings
To get a first feel for the Mcity Data Engine, we provide an online demo in a Google Colab environment. We will load the Fisheye8K dataset and demonstrate the Mcity Data Engine workflow Embedding Selection. This workflow leverages a set of models to compute image embeddings which are used to determine both representative and rare samples. The dataset is then visualized in the Voxel51 UI, highlighting how often a sample was picked by the workflow.
Note that most of the Mcity Data Engine workflows require a more powerful GPU, so the possibilities within the Colab environment are limited. Other workflows may not work.
Online demo on Google Colab: Mcity Data Engine Web Demo
Local Execution
At least one GPU is required for many of the Mcity Data Engine workflows. Check the hardware setups we have tested in the Wiki. To download the repository and install the requirements run:
git clone --recurse-submodules git@github.com:mcity/mcity_data_engine.git
cd mcity_data_engine
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
Login with your Weights and Biases and Hugging Face accounts:
wandb login
huggingface-cli login
Launch a Voxel51 session in one terminal:
python session_v51.py
Configure your run in the config/config.py and launch the Mcity Data Engine in a second terminal:
python main.py
Notebooks and Submodules
To exclude the output of jupyter notebooks from git tracking, add the following lines to your .git/config :
[filter "strip-notebook-output-engine"]
clean = <your_path>/mcity_data_engine/.venv/bin/jupyter nbconvert --ClearOutputPreprocessor.enabled=True --ClearMetadataPreprocessor.enabled=True --to=notebook --stdin --stdout
smudge = cat
required = true
and those to .git/modules/mcity_data_engine_scripts/config
[filter "strip-notebook-output-scripts"]
clean = <your_path>/mcity_data_engine/.venv/bin/jupyter nbconvert --ClearOutputPreprocessor.enabled=True --ClearMetadataPreprocessor.enabled=True --to=notebook --stdin --stdout
smudge = cat
required = true
In order to keep the submodules updated, add the following lines to the top of your .git/hooks/pre-commit:
git submodule update --recursive --remote
git add .gitmodules $(git submodule foreach --quiet 'echo $name')
Repository Structure
.
├── main.py # Entry point of the framework → Terminal 1
├── session_v51.py # Script to launch Voxel51 session → Terminal 2
├── workflows/ # Workflows for the Mcity Data Engine
├── config/ # Local configuration files
├── utils/ # General-purpose utility functions
├── cloud/ # Scripts run in the cloud to pre-process data
├── docs/ # Documentation generated with `pdoc`
├── tests/ # Tests using Pytest
├── custom_models/ # External models with containerized environments
├── mcity_data_engine_scripts/ # Experiment scripts and one-time operations (Mcity internal)
├── .vscode # Settings for VS Code IDE
├── .github/workflows/ # GitHub Action workflows
├── .gitignore # Files and directories to be ignored by Git
├── .gitattributes # Rules for handling files like Notebooks during commits
├── .gitmodules # Configuration for managing Git submodules
├── .secret # Secret tokens (not tracked by Git)
└── requirements.txt # Python dependencies (pip install -r requirements.txt)
Training
Training runs are logged with Weights and Biases (WandB).
In order to change the standard WandB directory, run
echo 'export WANDB_DIR="<your_path>/mcity_data_engine/logs"' >> ~/.profile
source ~/.profile
Contribution
Contributions are very welcome! The Mcity Data Engine is a blueprint for data curation and model training and will not support every use case out of the box. Please find instructions on how to contribute here:
Special thanks to these amazing people for contributing to the Mcity Data Engine! 🙌
Citation
If you use the Mcity Data Engine in your research, feel free to cite the project:
bibtex
@article{bogdoll2025mcitydataengine,
title={Mcity Data Engine},
author={Bogdoll, Daniel and Anata, Rajanikant Patnaik and Stevens, Gregory},
journal={GitHub. Note: https://github.com/mcity/mcity_data_engine},
year={2025}
}
Owner
- Name: Mcity
- Login: mcity
- Kind: organization
- Location: Ann Arbor, Michigan
- Website: https://mcity.umich.edu
- Twitter: UMichMcity
- Repositories: 9
- Profile: https://github.com/mcity
Repository for software used to support the Mcity autonomous/connected vehicle test facility and member system access.
Citation (CITATION)
cff-version: 1.2.0 message: "If you use the Mcity Data Engine, please cite it as below." authors: - family-names: "Bogdoll" given-names: "Daniel" orcid: "https://orcid.org/0000-0003-0432-4937" - family-names: "Stevens" given-names: "Gregory" title: "Mcity Data Engine" date-released: 2025-01-16 url: "https://github.com/mcity/mcity_data_engine"
GitHub Events
Total
- Fork event: 1
- Create event: 118
- Release event: 2
- Issues event: 14
- Watch event: 12
- Delete event: 99
- Issue comment event: 98
- Public event: 3
- Push event: 339
- Pull request review comment event: 83
- Gollum event: 8
- Pull request review event: 60
- Pull request event: 231
Last Year
- Fork event: 1
- Create event: 118
- Release event: 2
- Issues event: 14
- Watch event: 12
- Delete event: 99
- Issue comment event: 98
- Public event: 3
- Push event: 339
- Pull request review comment event: 83
- Gollum event: 8
- Pull request review event: 60
- Pull request event: 231
Dependencies
- actions/checkout v4 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/upload-pages-artifact v3 composite
- peter-evans/create-pull-request v7 composite
- pytorch/pytorch 1.11.0-cuda11.3-cudnn8-devel build
- redhat/ubi8 8.8 build
- GitPython ==3.1.43
- MarkupSafe ==3.0.2
- PyYAML ==6.0.2
- annotated-types ==0.7.0
- boto3 ==1.35.63
- botocore ==1.35.80
- certifi ==2024.8.30
- charset-normalizer ==3.4.0
- click ==8.1.7
- docker-pycreds ==0.4.0
- gitdb ==4.0.11
- idna ==3.10
- jmespath ==1.0.1
- mpmath ==1.3.0
- numpy ==2.1.3
- packaging ==24.2
- platformdirs ==4.3.6
- protobuf ==5.29.1
- psutil ==6.1.0
- pydantic ==2.10.3
- pydantic_core ==2.27.1
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.0.1
- requests ==2.32.3
- s3transfer ==0.10.4
- sentry-sdk ==2.19.2
- setproctitle ==1.3.4
- setuptools ==75.6.0
- six ==1.17.0
- smmap ==5.0.1
- tensorboardX ==2.6.2.2
- tqdm ==4.67.0
- typing_extensions ==4.12.2
- urllib3 ==2.2.3
- wandb ==0.19.1
- Brotli ==1.1.0
- Deprecated ==1.2.14
- Django ==4.2.17
- Faker ==33.1.0
- FrEIA ==0.2
- GitPython ==3.1.43
- Hypercorn ==0.17.3
- Jinja2 ==3.1.4
- Mako ==1.3.5
- Markdown ==3.7
- MarkupSafe ==2.1.5
- PyJWT ==2.8.0
- PyYAML ==6.0.2
- Pygments ==2.18.0
- Rtree ==1.3.0
- SQLAlchemy ==2.0.35
- Send2Trash ==1.8.3
- Werkzeug ==3.0.4
- absl-py ==2.1.0
- accelerate ==1.3.0
- aiofiles ==24.1.0
- aiohappyeyeballs ==2.4.0
- aiohttp ==3.10.6
- aiosignal ==1.3.1
- alabaster ==1.0.0
- albucore ==0.0.23
- albumentations ==2.0.4
- alembic ==1.13.3
- annotated-types ==0.7.0
- anomalib ==1.2.0
- antlr4-python3-runtime ==4.9.3
- anyio ==4.6.0
- appdirs ==1.4.4
- argcomplete ==3.5.0
- argon2-cffi ==23.1.0
- argon2-cffi-bindings ==21.2.0
- arrow ==1.3.0
- asgiref ==3.8.1
- asttokens ==2.4.1
- async-lru ==2.0.4
- attr ==0.3.1
- attrs ==24.2.0
- awscli ==1.35.0
- azure-containerregistry ==1.2.0
- azure-core ==1.31.0
- azure-identity ==1.18.0
- azure-storage-blob ==12.23.1
- babel ==2.16.0
- beautifulsoup4 ==4.12.3
- black ==24.10.0
- bleach ==5.0.1
- boto ==2.49.0
- boto3 ==1.35.25
- botocore ==1.35.34
- build ==1.2.2.post1
- cachetools ==5.5.0
- certifi ==2024.8.30
- cffi ==1.17.1
- chardet ==5.2.0
- charset-normalizer ==3.3.2
- click ==8.1.7
- colorama ==0.4.6
- colorlog ==6.8.2
- comm ==0.2.2
- contourpy ==1.3.0
- cryptography ==44.0.1
- cycler ==0.12.1
- dacite ==1.7.0
- datamodel-code-generator ==0.26.1
- datasets ==3.2.0
- debugpy ==1.8.6
- decorator ==5.1.1
- defusedxml ==0.7.1
- descartes ==1.1.0
- dill ==0.3.8
- distro ==1.9.0
- django-annoying ==0.10.6
- django-cors-headers ==3.6.0
- django-csp ==3.7
- django-debug-toolbar ==3.2.1
- django-environ ==0.10.0
- django-extensions ==3.2.3
- django-filter ==2.4.0
- django-migration-linter ==5.1.0
- django-model-utils ==4.1.1
- django-ranged-fileresponse ==0.1.2
- django-rq ==2.5.1
- django-storages ==1.12.3
- django-user-agents ==0.4.0
- djangorestframework ==3.15.2
- dnspython ==2.6.1
- docker-pycreds ==0.4.0
- docstring_parser ==0.16
- docutils ==0.16
- drf-dynamic-fields ==0.3.0
- drf-flex-fields ==0.9.5
- drf-generators ==0.3.0
- durationpy ==0.9
- einops ==0.8.0
- email_validator ==2.2.0
- evaluate ==0.4.3
- executing ==2.1.0
- expiringdict ==1.2.2
- fastjsonschema ==2.20.0
- fiftyone ==1.3.0
- fiftyone-brain ==0.19.0
- fiftyone-desktop ==0.34.1
- fiftyone_db ==1.1.6
- filelock ==3.16.1
- fire ==0.6.0
- flake8 ==7.1.1
- fonttools ==4.54.1
- fqdn ==1.5.1
- frozenlist ==1.4.1
- fsspec ==2024.6.1
- ftfy ==6.2.3
- furl ==2.1.3
- future ==1.0.0
- genson ==1.3.0
- gitdb ==4.0.11
- glob2 ==0.7
- google-api-core ==2.20.0
- google-auth ==2.35.0
- google-cloud-aiplatform ==1.69.0
- google-cloud-appengine-logging ==1.5.0
- google-cloud-artifact-registry ==1.11.5
- google-cloud-audit-log ==0.3.0
- google-cloud-bigquery ==3.26.0
- google-cloud-compute ==1.19.2
- google-cloud-core ==2.4.1
- google-cloud-logging ==3.11.3
- google-cloud-resource-manager ==1.12.5
- google-cloud-storage ==2.18.2
- google-crc32c ==1.6.0
- google-resumable-media ==2.7.2
- googleapis-common-protos ==1.65.0
- graphql-core ==3.2.4
- greenlet ==3.1.1
- grpc-google-iam-v1 ==0.13.1
- grpcio ==1.66.2
- grpcio-status ==1.62.3
- gviz-api ==1.10.0
- h11 ==0.14.0
- h2 ==4.1.0
- hpack ==4.0.0
- httpcore ==1.0.5
- httpx ==0.27.2
- huggingface-hub ==0.28.1
- humanize ==4.10.0
- humansignal-drf-yasg ==1.21.10.post1
- hydra-core ==1.3.2
- hyperframe ==6.0.1
- idna ==3.10
- ijson ==3.3.0
- imageio ==2.35.1
- imagesize ==1.4.1
- importlib_metadata ==8.5.0
- importlib_resources ==6.4.5
- inflate64 ==1.0.0
- inflect ==5.6.2
- inflection ==0.5.1
- iniconfig ==2.0.0
- iopath ==0.1.10
- ipykernel ==6.29.5
- ipython ==8.27.0
- ipywidgets ==8.1.5
- iso8601 ==2.1.0
- isodate ==0.6.1
- isoduration ==20.11.0
- isort ==5.13.2
- jedi ==0.19.1
- jiter ==0.8.2
- jmespath ==1.0.1
- joblib ==1.4.2
- jsf ==0.11.2
- json5 ==0.10.0
- jsonargparse ==4.33.1
- jsonlines ==4.0.0
- jsonpointer ==3.0.0
- jsonschema ==4.23.0
- jsonschema-specifications ==2023.12.1
- jupyter ==1.1.1
- jupyter-console ==6.6.3
- jupyter-events ==0.10.0
- jupyter-lsp ==2.2.5
- jupyter_client ==8.6.3
- jupyter_core ==5.7.2
- jupyter_server ==2.14.2
- jupyter_server_terminals ==0.5.3
- jupyterlab ==4.3.3
- jupyterlab_pygments ==0.3.0
- jupyterlab_server ==2.27.3
- jupyterlab_widgets ==3.0.13
- kaleido ==0.2.1
- kiwisolver ==1.4.7
- kornia ==0.7.4
- kornia_rs ==0.1.5
- kubernetes ==31.0.0
- kubernetes_asyncio ==31.1.0
- launchdarkly-server-sdk ==8.2.1
- lazy_loader ==0.4
- lightning ==2.4.0
- lightning-utilities ==0.11.9
- llvmlite ==0.44.0
- lockfile ==0.12.2
- lxml ==5.3.0
- lxml_html_clean ==0.4.1
- markdown-it-py ==3.0.0
- matplotlib ==3.9.2
- matplotlib-inline ==0.1.7
- mccabe ==0.7.0
- mdurl ==0.1.2
- mistune ==3.0.2
- mongoengine ==0.29.1
- motor ==3.6.0
- mpmath ==1.3.0
- msal ==1.31.0
- msal-extensions ==1.2.0
- multidict ==6.1.0
- multiprocess ==0.70.16
- multivolumefile ==0.2.3
- mypy-extensions ==1.0.0
- nbclient ==0.10.0
- nbconvert ==7.16.4
- nbformat ==5.10.4
- nest-asyncio ==1.6.0
- networkx ==3.3
- nltk ==3.9.1
- notebook ==7.3.1
- notebook_shim ==0.2.4
- numba ==0.61.0
- numpy ==2.1.1
- nvidia-cublas-cu12 ==12.4.5.8
- nvidia-cuda-cupti-cu12 ==12.4.127
- nvidia-cuda-nvrtc-cu12 ==12.4.127
- nvidia-cuda-runtime-cu12 ==12.4.127
- nvidia-cudnn-cu12 ==9.1.0.70
- nvidia-cufft-cu12 ==11.2.1.3
- nvidia-curand-cu12 ==10.3.5.147
- nvidia-cusolver-cu12 ==11.6.1.9
- nvidia-cusparse-cu12 ==12.3.1.170
- nvidia-cusparselt-cu12 ==0.6.2
- nvidia-nccl-cu12 ==2.21.5
- nvidia-nvjitlink-cu12 ==12.4.127
- nvidia-nvtx-cu12 ==12.4.127
- oauthlib ==3.2.2
- omegaconf ==2.3.0
- onnx ==1.17.0
- open_clip_torch ==2.30.0
- openai ==1.62.0
- opencv-python ==4.10.0.84
- opencv-python-headless ==4.10.0.84
- opentelemetry-api ==1.28.2
- openvino ==2024.5.0
- openvino-telemetry ==2024.1.0
- optuna ==4.0.0
- ordered-set ==4.0.2
- orderedmultidict ==1.0.1
- overrides ==7.7.0
- packaging ==24.1
- pandas ==2.2.3
- pandocfilters ==1.5.1
- parso ==0.8.4
- pathlib2 ==2.3.7.post1
- pathspec ==0.12.1
- pdoc ==15.0.1
- pexpect ==4.9.0
- pillow ==10.4.0
- platformdirs ==4.3.6
- plotly ==5.24.1
- pluggy ==1.5.0
- portalocker ==2.10.1
- pprintpp ==0.4.0
- priority ==2.0.0
- prometheus_client ==0.21.1
- prompt_toolkit ==3.0.48
- proto-plus ==1.24.0
- protobuf ==5.26.1
- psutil ==6.1.1
- psycopg2-binary ==2.9.10
- ptyprocess ==0.7.0
- pure_eval ==0.2.3
- py-cpuinfo ==9.0.0
- py7zr ==0.22.0
- pyRFC3339 ==2.0.1
- pyarrow ==17.0.0
- pyasn1 ==0.6.1
- pyasn1_modules ==0.4.1
- pybcj ==1.0.2
- pyboxen ==1.3.0
- pycocotools ==2.0.8
- pycodestyle ==2.12.1
- pycparser ==2.22
- pycryptodomex ==3.20.0
- pydantic ==2.9.2
- pydantic_core ==2.23.4
- pydash ==8.0.3
- pyflakes ==3.2.0
- pymongo ==4.9.2
- pynndescent ==0.5.13
- pyparsing ==3.1.4
- pyppmd ==1.1.0
- pyproject_hooks ==1.2.0
- pyquaternion ==0.9.9
- pytest ==8.3.4
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.0.1
- python-json-logger ==2.0.4
- pytorch-lightning ==2.4.0
- pytz ==2022.7.1
- pyzmq ==26.2.0
- pyzstd ==0.16.1
- rarfile ==4.2
- redis ==3.5.3
- referencing ==0.35.1
- regex ==2024.9.11
- requests ==2.32.3
- requests-mock ==1.12.1
- requests-oauthlib ==2.0.0
- requirements-parser ==0.11.0
- retrying ==1.3.4
- rfc3339-validator ==0.1.4
- rfc3986-validator ==0.1.1
- rich ==13.9.1
- rich-argparse ==1.5.2
- rpds-py ==0.20.0
- rq ==1.10.1
- rsa ==4.7.2
- rstr ==3.2.2
- rules ==3.4
- s3transfer ==0.10.2
- safetensors ==0.5.2
- sam2 ==1.1.0
- scikit-image ==0.25.1
- scikit-learn ==1.6.1
- scipy ==1.15.1
- seaborn ==0.13.2
- semver ==3.0.2
- sentry-sdk ==2.19.2
- setproctitle ==1.3.3
- setuptools ==75.8.0
- shapely ==2.0.6
- simsimd ==6.2.1
- six ==1.16.0
- smart-open ==7.0.5
- smmap ==5.0.1
- sniffio ==1.3.1
- snowballstemmer ==2.2.0
- sortedcontainers ==2.4.0
- soupsieve ==2.6
- sphinxcontrib-applehelp ==2.0.0
- sphinxcontrib-devhelp ==2.0.0
- sphinxcontrib-htmlhelp ==2.1.0
- sphinxcontrib-jsmath ==1.0.1
- sphinxcontrib-qthelp ==2.0.0
- sphinxcontrib-serializinghtml ==2.0.0
- sqlparse ==0.5.3
- sse-starlette ==0.10.3
- sseclient-py ==1.8.0
- stack-data ==0.6.3
- starlette ==0.39.0
- strawberry-graphql ==0.138.1
- stringzilla ==3.11.3
- sympy ==1.13.1
- tabulate ==0.9.0
- tenacity ==9.0.0
- tensorboard ==2.18.0
- tensorboard-data-server ==0.7.2
- termcolor ==2.4.0
- terminado ==0.18.1
- texttable ==1.7.0
- threadpoolctl ==3.5.0
- tifffile ==2024.9.20
- timm ==1.0.14
- tinycss2 ==1.3.0
- tokenizers ==0.21.0
- toml ==0.10.2
- tomli ==2.0.2
- torch ==2.5.1
- torch-tb-profiler ==0.4.3
- torchaudio ==2.5.1
- torchmetrics ==1.6.1
- torchvision ==0.20.1
- tornado ==6.4.1
- tqdm ==4.67.1
- traitlets ==5.14.3
- transformers ==4.49.0
- triton ==3.1.0
- types-python-dateutil ==2.9.0.20241206
- types-setuptools ==75.6.0.20241126
- typeshed_client ==2.7.0
- typing_extensions ==4.12.2
- tzdata ==2024.2
- tzlocal ==5.2
- ua-parser ==1.0.0
- ua-parser-builtins ==0.18.0.post1
- ujson ==5.10.0
- ultralytics ==8.3.75
- ultralytics-thop ==2.0.12
- umap-learn ==0.5.7
- universal-analytics-python3 ==1.1.1
- uri-template ==1.3.0
- uritemplate ==4.1.1
- urllib3 ==1.26.20
- user-agents ==2.2.0
- voxel51-eta ==0.14.0
- wandb ==0.19.6
- wcwidth ==0.2.13
- webcolors ==24.11.1
- webencodings ==0.5.1
- websocket-client ==1.8.0
- wheel ==0.45.1
- widgetsnbextension ==4.0.13
- wrapt ==1.16.0
- wsproto ==1.2.0
- xmljson ==0.2.1
- xmltodict ==0.13.0
- xxhash ==3.5.0
- yarl ==1.12.1
- zipp ==3.21.0
- Flask ==2.3.1
- FrEIA *
- PyYAML ==6.0.2
- SQLAlchemy ==2.0.35
- absl-py ==2.1.0
- accelerate ==1.1.1
- aiohttp ==3.10.6
- alembic ==1.13.3
- anomalib ==1.1.1
- boto3 ==1.35.25
- colorlog *
- datasets *
- fiftyone *
- fiftyone-brain *
- fiftyone-desktop *
- fiftyone_db *
- huggingface-hub ==0.27.0
- kornia *
- lightning ==2.3.0
- nuscenes-devkit *
- open-clip-torch *
- opencv-python ==4.10.0.84
- openvino *
- psycopg2-binary ==2.9.10
- requests ==2.32.3
- scikit-learn ==1.6.0
- tensorflow ==2.12.0
- torch ==2.5.1
- tqdm ==4.67.1
- transformers ==4.48.1
- umap-learn *