augur

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/

https://github.com/chaoss/augur

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    19 of 152 committers (12.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.9%) to scientific vocabulary

Keywords

chaoss data-collection data-modeling data-visualization defined-metrics facade git github hacktoberfest hacktoberfest2020 health linux linux-foundation metrics open-source opensource python-library research sustainability unix

Keywords from Contributors

transformation cryptocurrencies embedded genomics sequencing cameratrap charts energy-system-model energy-system jax
Last synced: 4 months ago · JSON representation ·

Repository

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/

Basic Info
Statistics
  • Stars: 652
  • Watchers: 21
  • Forks: 899
  • Open Issues: 164
  • Releases: 181
Topics
chaoss data-collection data-modeling data-visualization defined-metrics facade git github hacktoberfest hacktoberfest2020 health linux linux-foundation metrics open-source opensource python-library research sustainability unix
Created almost 9 years ago · Last pushed 4 months ago
Metadata Files
Readme Contributing License Citation Codeowners Security

README.md

Augur NEW Release v0.90.0

Augur is primarily a data engineering tool that makes it possible for data scientists to gather open source software community data - less data carpentry for everyone else! The primary way of looking at Augur data is through 8Knot, a public instance of 8Knot is available here - this is tied to a public instance of Augur.

first-timers-only We follow the First Timers Only philosophy of tagging issues for first timers only, and walking one newcomer through the resolution process weekly. You can find these issues tagged with first timers only on our issues list.

standard-readme compliant Build Docker images Hits-of-Code CII Best Practices

NEW RELEASE ALERT!

If you want to jump right in, the updated docker, docker-compose and bare metal installation instructions are available here.

Augur is now releasing a dramatically improved new version. It is also available here.

  • The release branch is a stable version of our new architecture, which features:
    • Dramatic improvement in the speed of large scale data collection (100,000+ repos). All data is obtained for 100k+ repos within 2 weeks.
    • A new job management architecture that uses Celery and Redis to manage queues, and enables users to run a Flower job monitoring dashboard.
    • Materialized views to increase the snappiness of API’s and Frontends on large scale data.
    • Changes to primary keys, which now employ a UUID strategy that ensures unique keys across all Augur instances.
    • Support for 8knot dashboards (view a sample here). beautification coming soon!
    • Data collection completeness assurance enabled by a structured, relational data set that is easily compared with platform API Endpoints.
  • The next release of the new version will include a hosted version of Augur where anyone can create an account and add repos they care about. If the hosted instance already has a requested organization or repository it will be added to a user’s view. If its a new repository or organization, the user will be notified that collection will take (time required for the scale of repositories added).

What is Augur?

Augur is a software suite for collecting and measuring structured data about free and open-source software (FOSS) communities.

We gather trace data for a group of repositories, normalize it into our data model, and provide a variety of metrics about said data. The structure of our data model enables us to synthesize data across various platforms to provide meaningful context for meaningful questions about the way these communities evolve.

Augur’s main focus is to measure the overall health and sustainability of open source projects, as these types of projects are system critical for nearly every software organization or company. We do this by gathering data about project repositories and normalizing that into our data model to provide useful metrics about your project’s health.

For example, one of our metrics is burstiness. Burstiness – how are short timeframes of intense activity, followed by a corresponding return to a typical pattern of activity, observed in a project? This can paint a picture of a project’s focus and gain insight into the potential stability of a project and how its typical cycle of updates occurs.

We are a CHAOSS project, and many of our metrics are implementations of the metrics defined by our awesome community. You can find a full list of them here.

For more information on how to get involved on the CHAOSS website.

Collecting Data

Augur supports Python3.7 through Python3.11 on all platforms. Python3.12 and above do not yet work because of machine learning worker dependencies. On OSX, you can create a Python3.11 environment, by running: $ python3.11 -m venv path/to/venv

Augur's main focus is to measure the overall health and sustainability of open source projects.

Augur collects more data about open source software projects than any other available software. Augur's main focus is to measure the overall health and sustainability of open source projects.

One of Augur's core tenets is a desire to openly gather data that people can trust, and then provide useful and well-defined metrics that help give important context to the larger stories being told by that data.

We do this in a variety of ways, one of which is doing all our own data collection in house. We currently collect data from a few main sources:

  1. Raw Git commit logs (commits, contributors)
  2. GitHub's API (issues, pull requests, contributors, releases, repository metadata)
  3. The Linux Foundation's Core Infrastructure Initiative API (repository metadata)
  4. Succinct Code Counter, a blazingly fast Sloc, Cloc, and Code tool that also performs COCOMO calculations

This data is collected by dedicated data collection workers controlled by Augur, each of which is responsible for querying some subset of these data sources. We are also hard at work building workers for new data sources. If you have an idea for a new one, please tell us - we'd love your input!

Getting Started

If you're interested in collecting data with our tool, the Augur team has worked hard to develop a detailed guide to get started with our project which can be found in our documentation.

If you're looking to contribute to Augur's code, you can find installation instructions, development guides, architecture references (coming soon), best practices and more in our developer documentation.

Please know that while it's still rather sparse right now, but we are actively adding to it all the time.

If you get stuck, please feel free to ask for help!

Contributing

To contribute to Augur, please follow the guidelines found in our CONTRIBUTING.md and our Code of Conduct. Augur is a welcoming community that is open to all, regardless if you're working on your 1000th contribution to open source or your 1st. We strongly believe that much of what makes open source so great is the incredible communities it brings together, so we invite you to join us!

License, Copyright, and Funding

Copyright © 2025 University of Nebraska at Omaha, University of Missouri, Brian Warner, and the CHAOSS Project.

Augur is free software: you can redistribute it and/or modify it under the terms of the MIT License as published by the Open Source Initiative. See the LICENSE file for more details.

This work has been funded through the Alfred P. Sloan Foundation, Mozilla, The Reynolds Journalism Institute, contributions from VMWare, Red Hat Software, Grace Hopper's Open Source Day, GitHub, Microsoft, Twitter, Adobe, the Gluster Project, Open Source Summit (NA/Europe), and the Linux Foundation Compliance Summit.

Significant design contributors include Kate Stewart, Dawn Foster, Duane O'Brien, Remy Decausemaker, others omitted due to the memory limitations of project maintainers, and 15 Google Summer of Code Students.

Current maintainers

  • Derek Howard <https://github.com/howderek>_
  • Andrew Brain <https://github.com/ABrain7710>_
  • Isaac Milarsky <https://github.com/IsaacMilarky>_
  • John McGinnis <https://github.com/Ulincys>_
  • Sean P. Goggins <https://github.com/sgoggins>_

Former maintainers

  • Carter Landis <https://github.com/ccarterlandis>_
  • Gabe Heim <https://github.com/gabe-heim>_
  • Matt Snell <https://github.com/Nebrethar>_
  • Christian Cmehil-Warn <https://github.com/christiancme>_
  • Jonah Zukosky <https://github.com/jonahz5222>_
  • Carolyn Perniciaro <https://github.com/CMPerniciaro>_
  • Elita Nelson <https://github.com/ElitaNelson>_
  • Michael Woodruff <https://github.com/michaelwoodruffdev/>_
  • Max Balk <https://github.com/maxbalk/>_

Contributors

  • Dawn Foster <https://github.com/geekygirldawn/>_
  • Ivana Atanasova <https://github.com/ivanayov/>_
  • Georg J.P. Link <https://github.com/GeorgLink/>_
  • Gary P White <https://github.com/garypwhite/>_

GSoC 2025 Participants

GSoC 2022 participants

  • Kaxada <https://github.com/kaxada>_
  • Mabel F <https://github.com/mabelbot>_
  • Priya Srivastava <https://github.com/Priya730>_
  • Ramya Kappagantu <https://github.com/RamyaKappagantu>_
  • Yash Prakash <https://gist.github.com/yash-yp>_

GSoC 2021 participants

  • Dhruv Sachdev <https://github.com/Dhruv-Sachdev1313>_
  • Rashmi K A <https://github.com/Rashmi-K-A>_
  • Yash Prakash <https://github.com/yash2002109/>_
  • Anuj Lamoria <https://github.com/anujlamoria/>_
  • Yeming Gu <https://github.com/gymgym1212/>_
  • Ritik Malik <https://gist.github.com/ritik-malik>_

GSoC 2020 participants

  • Akshara P <https://github.com/aksh555/>_
  • Tianyi Zhou <https://github.com/tianyichow/>_
  • Pratik Mishra <https://github.com/pratikmishra356/>_
  • Sarit Adhikari <https://github.com/sarit-adh/>_
  • Saicharan Reddy <https://github.com/mrsaicharan1/>_
  • Abhinav Bajpai <https://github.com/abhinavbajpai2012/>_

GSoC 2019 participants

  • Bingwen Ma <https://github.com/bing0n3/>_
  • Parth Sharma <https://github.com/parthsharma2/>_

GSoC 2018 participants

  • Keanu Nichols <https://github.com/kmn5409/>_

Owner

  • Name: CHAOSS
  • Login: chaoss
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff reference content was generated from Zotero.
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
    - family-names: Goggins
      given-names: Sean
    - family-names: Lumbard
      given-names: Kevin
    - family-names: Germonprez
      given-names: Matt
title: "Open Source Community Health: Analytical Metrics and Their Corresponding Narratives"
doi: 10.1109/SoHeal52568.2021.00010
date-released: 2021
url: https://www.seangoggins.net/wp-content/plugins/zotpress/lib/request/request.dl.php?api_user_id=655145&dlkey=HNG22ZSU&content_type=application/pdf

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 9,313
  • Total Committers: 152
  • Avg Commits per committer: 61.27
  • Development Distribution Score (DDS): 0.662
Past Year
  • Commits: 488
  • Committers: 11
  • Avg Commits per committer: 44.364
  • Development Distribution Score (DDS): 0.551
Top Committers
Name Email Commits
Sean P. Goggins s@g****m 3,148
Isaac Milarsky i****y@g****m 1,471
Andrew Brain 6****0 1,328
Carter Landis c****s@g****m 725
gabe-heim g****m@y****m 709
Derek Howard d****k@h****m 329
Bingwen Ma m****y@g****m 125
Parth Sharma p****7@g****m 122
Ulincsys u****s@g****m 110
ChristianCme c****n@g****m 90
Matt Snell m****l@u****u 82
Dhruv-Sachdev1313 d****v@g****m 77
michaelwoodruffdev m****v@g****m 71
abuhman a****n 65
dependabot[bot] 4****] 44
Preshh0 u****1@g****m 43
isaacmilarky i****y@g****m 39
Michael Scherer m****c@r****m 36
mrsaicharan1 s****1@g****m 32
Jonah Zukosky j****y@g****m 32
flyagaricdev 8****v 27
Matt Germonprez g****z@g****m 25
John Strunk j****k@r****m 25
Priya Srivastava s****0@g****m 23
Nodira n****a@g****m 23
Stuart Aldrich s****1@g****m 21
Robert Lincoln Truesdale III 3****e 21
Akshara P a****2@n****n 20
sarit-adh s****i@g****m 18
Isaac Wengler i****3@g****m 16
and 122 more...

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 308
  • Total pull requests: 706
  • Average time to close issues: 5 months
  • Average time to close pull requests: 16 days
  • Total issue authors: 53
  • Total pull request authors: 47
  • Average comments per issue: 1.43
  • Average comments per pull request: 0.45
  • Merged pull requests: 499
  • Bot issues: 0
  • Bot pull requests: 54
Past Year
  • Issues: 111
  • Pull requests: 329
  • Average time to close issues: 29 days
  • Average time to close pull requests: 12 days
  • Issue authors: 19
  • Pull request authors: 30
  • Average comments per issue: 1.41
  • Average comments per pull request: 0.65
  • Merged pull requests: 179
  • Bot issues: 0
  • Bot pull requests: 12
Top Authors
Issue Authors
  • sgoggins (142)
  • cdolfi (59)
  • ABrain7710 (15)
  • mhauru (9)
  • DinneK (9)
  • MoralCode (6)
  • officialasishkumar (5)
  • GregSutcliffe (4)
  • JohnStrunk (4)
  • KhanRayyan3622 (4)
  • JamesKunstle (3)
  • jarrah42 (2)
  • Akshatb2006 (2)
  • jberkus (2)
  • Ulincsys (2)
Pull Request Authors
  • sgoggins (244)
  • ABrain7710 (181)
  • IsaacMilarky (99)
  • Ulincsys (72)
  • dependabot[bot] (67)
  • JohnStrunk (30)
  • MoralCode (26)
  • cdolfi (15)
  • officialasishkumar (12)
  • KhanRayyan3622 (12)
  • Xiaoha-cloud (12)
  • GaryPWhite (10)
  • Seltyk (7)
  • adityajha2005 (6)
  • Mbaoma (5)
Top Labels
Issue Labels
API (69) first-timers-only (65) bug (21) add-feature (16) database (9) feature-request (9) good first issue (9) server (7) documentation (7) workers (6) admin (4) devops (4) frontend (4) python (3) deployed version (3) GSoC (3) metric (3) triage (2) usability (2) question (2) docker (2) release (1) dependencies (1) GSoC 2022 (1) discussion (1) CLI (1) installation (1) NDE (1) CHAOSS (1) critical-fix (1)
Pull Request Labels
dependencies (76) python (75) bug-fix (40) database (18) documentation (17) release (16) add-feature (15) admin (13) devops (12) deployed version (10) server (9) docker (8) CHAOSS (7) API (6) frontend (5) metric (5) bug (5) CLI (5) security (4) installation (3) feature-request (3) discussion (2) critical-fix (1) workers (1) usability (1)

Dependencies

.github/workflows/build_docker.yml actions
  • actions/checkout v2 composite
docker/backend/Dockerfile docker
  • python 3.8.11-slim-buster build
docker/database/Dockerfile docker
  • postgres 12 build
docker-compose-externalDB.yml docker
  • augurlabs/augur-new latest
  • redis alpine
docker-compose.yml docker
  • augur-new latest
  • postgres 14
  • redis alpine
augur/tasks/data_analysis/clustering_worker/setup.py pypi
  • Flask ==2.0.2
  • Flask-Cors ==3.0.10
  • Flask-Login ==0.5.0
  • Flask-WTF ==1.0.0
  • matplotlib ==3.5.1
  • nltk ==3.6.6
  • numpy ==1.22.0
  • pandas ==1.3.5
  • psycopg2-binary ==2.9.3
  • requests ==2.28.0
  • scikit-learn ==1.1.3
  • seaborn ==0.11.1
  • sklearn ==0.0.0
augur/tasks/data_analysis/contributor_breadth_worker/setup.py pypi
  • Flask ==2.0.2
  • Flask-Cors ==3.0.10
  • Flask-Login ==0.5.0
  • Flask-WTF ==1.0.0
  • psycopg2-binary ==2.9.3
  • requests ==2.28.0
augur/tasks/data_analysis/discourse_analysis/setup.py pypi
  • Flask ==2.0.2
  • Flask-Cors ==3.0.10
  • Flask-Login ==0.5.0
  • Flask-WTF ==1.0.0
  • click ==8.0.3
  • nltk ==3.6.6
  • pandas ==1.3.5
  • psycopg2-binary ==2.9.3
  • python-crfsuite ==0.9.8
  • requests ==2.28.0
  • scikit-learn ==1.1.3
  • scipy ==1.7.3
  • sklearn-crfsuite ==0.3.6
  • tabulate ==0.8.9
  • textblob ==0.15.3
augur/tasks/data_analysis/insight_worker/setup.py pypi
  • Flask ==2.0.2
  • Flask-Cors ==3.0.10
  • Flask-Login ==0.5.0
  • Flask-WTF ==1.0.0
  • click ==8.0.3
  • numpy ==1.22.0
  • psycopg2-binary ==2.9.3
  • requests ==2.28.0
  • scipy >=1.7.3
  • sklearn ==0.0
augur/tasks/data_analysis/message_insights/setup.py pypi
  • Flask ==2.0.2
  • Flask-Cors ==3.0.10
  • Flask-Login ==0.5.0
  • Flask-WTF ==1.0.0
  • Keras <2.9.0rc0
  • Keras-Preprocessing ==1.1.2
  • bs4 ==0.0.1
  • click ==8.0.3
  • emoji ==1.2.0
  • gensim ==4.2.0
  • h5py *
  • joblib ==1.0.1
  • nltk ==3.6.6
  • numpy ==1.22.0
  • pandas ==1.3.5
  • psycopg2-binary ==2.9.3
  • requests ==2.28.0
  • scikit-image ==0.19.1
  • scikit-learn ==1.1.3
  • scipy ==1.7.3
  • tensorflow ==2.8.0
  • xgboost *
  • xlrd ==2.0.1
augur/tasks/data_analysis/pull_request_analysis_worker/setup.py pypi
  • Flask ==2.0.2
  • Flask-Cors ==3.0.10
  • Flask-Login ==0.5.0
  • Flask-WTF ==1.0.0
  • emoji ==1.2.0
  • joblib ==1.0.1
  • nltk ==3.6.6
  • numpy ==1.22.0
  • pandas ==1.3.5
  • psycopg2-binary ==2.9.3
  • requests ==2.28.0
  • scipy ==1.7.3
  • sklearn ==0.0
  • xgboost ==1.4.2
augur/tasks/git/util/facade_worker/setup.py pypi
  • Flask ==2.0.2
  • Flask-Cors ==3.0.10
  • Flask-Login ==0.5.0
  • Flask-WTF ==1.0.0
  • XlsxWriter ==1.3.7
  • click ==8.0.3
  • psycopg2-binary ==2.9.3
  • requests ==2.28.0
setup.py pypi
  • Beaker ==1.11.0
  • Flask ==2.0.2
  • Flask-Cors ==3.0.10
  • Flask-Login ==0.5.0
  • Flask-WTF ==1.0.0
  • Jinja2 *
  • SQLAlchemy ==1.3.23
  • Werkzeug *
  • XlsxWriter ==1.3.7
  • alembic ==1.8.1
  • blinker ==1.4
  • bokeh ==2.0.2
  • boto3 ==1.17.57
  • celery ==5.2.7
  • click ==8.0.3
  • cloudpickle *
  • coloredlogs ==15.0
  • dask >=2021.6.2
  • distributed *
  • dnspython ==2.2.1
  • eventlet ==0.33.3
  • flower ==1.2.0
  • fsspec *
  • gunicorn ==20.1.0
  • h5py *
  • httpx ==0.23.0
  • itsdangerous ==2.0.1
  • mdpdf ==0.0.18
  • mistune ==0.8.4
  • nltk ==3.6.6
  • numpy ==1.22
  • pandas ==1.3.5
  • partd *
  • protobuf <3.22
  • psutil ==5.8.0
  • psycopg2-binary ==2.9.3
  • pyYaml *
  • pylint ==2.15.5
  • redis ==4.3.3
  • requests ==2.28.0
  • scipy ==1.7.3
  • selenium ==3.141.0
  • sendgrid *
  • six ==1.15.0
  • slack ==0.0.2
  • toml *
  • toolz *
  • tornado ==6.1
  • wheel *