https://github.com/ccao-data/data-architecture
Codebase for CCAO data infrastructure construction and management
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.2%) to scientific vocabulary
Keywords
aws
aws-athena
aws-s3
data-architecture
data-engineering
Last synced: 5 months ago
·
JSON representation
Repository
Codebase for CCAO data infrastructure construction and management
Basic Info
- Host: GitHub
- Owner: ccao-data
- Language: R
- Default Branch: master
- Homepage: https://ccao-data.github.io/data-architecture/
- Size: 31.2 MB
Statistics
- Stars: 6
- Watchers: 1
- Forks: 4
- Open Issues: 80
- Releases: 0
Topics
aws
aws-athena
aws-s3
data-architecture
data-engineering
Created over 2 years ago
· Last pushed 5 months ago
Metadata Files
Readme
Codeowners
README.md
CCAO Data Infrastructure
This repository stores the code for the CCAO Data Department's ETL pipelines and data lakehouse. This infrastructure supports the Data Team's modeling, reporting, and data integrity work.
Quick Links
- :file_folder: dbt Data Catalog - Documentation for all CCAO data lakehouse tables and views
- :nutandbolt: dbt README - How to develop CCAO data infrastructure using dbt
- :test_tube: dbt Tests and QC Reports - How to add and run data tests, unit tests, and QC reports using dbt
- :pencil: dbt Generic Test Documentation - Definitions for CCAO generic dbt tests, which are functions that we use to define our QC tests
Repository Structure
- ./dbt contains the models and tests that build our Athena data lakehouse; dbt mainly acts as a transformation and documentation layer on top of our raw data
- ./docs contains design documents and other supplemental documentation
- ./etl contains ETL scripts used to load raw and slightly cleaned up data into the lakehouse as dbt sources
- ./socrata contains column transformations for the CCAO's Open Data Portal assets
Owner
- Name: Cook County Assessor's Office
- Login: ccao-data
- Kind: organization
- Email: assessor.data@cookcountyil.gov
- Website: https://www.cookcountyassessor.com
- Twitter: AssessorCook
- Repositories: 1
- Profile: https://github.com/ccao-data
Issues and Pull Requests
Last synced: 5 months ago
All Time
- Total issues: 262
- Total pull requests: 446
- Average time to close issues: about 2 months
- Average time to close pull requests: 8 days
- Total issue authors: 7
- Total pull request authors: 5
- Average comments per issue: 0.52
- Average comments per pull request: 0.65
- Merged pull requests: 327
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 77
- Pull requests: 220
- Average time to close issues: 15 days
- Average time to close pull requests: 5 days
- Issue authors: 7
- Pull request authors: 5
- Average comments per issue: 0.32
- Average comments per pull request: 0.43
- Merged pull requests: 157
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- wrridgeway (96)
- jeancochrane (91)
- dfsnow (51)
- ccao-jardine (9)
- Damonamajor (7)
- wagnerlmichael (7)
- kyrasturgill (1)
Pull Request Authors
- jeancochrane (173)
- wrridgeway (155)
- Damonamajor (44)
- dfsnow (42)
- wagnerlmichael (32)
Top Labels
Issue Labels
dbt (56)
new data/feature (19)
bug (7)
aws (7)
documentation (5)
open data (5)
ci (2)
Pull Request Labels
dbt (11)
aws (11)
new data/feature (7)
open data (3)
documentation (2)
bug (2)
Dependencies
.github/workflows/pre-commit.yaml
actions
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pre-commit/action v3.0.0 composite