https://github.com/catalyst-cooperative/pudl-scrapers
Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
2 of 10 committers (20.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.
Basic Info
Statistics
- Stars: 3
- Watchers: 5
- Forks: 3
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
PUDL Scrapers
Deprecated
This repo has been replaced by the new pudl-archiver repo, which combines both the scraping andd archiving process.
Installation
We recommend using conda to create and manage your environment.
Run:
conda env create -f environment.yml
conda activate pudl-scrapers
Output location
Logs are collected:
[your home]/Downloads/pudl_scrapers/scraped/
Data from the scrapers is stored:
[your home]/Downloads/pudl_scrapers/scraped/[source_name]/[today #]
Running the scrapers
The general pattern is scrapy crawl [source_name] for one of the supported
sources. Typically and additional "year" argument is available, in the form
scrapy crawl [source_name] -a year=[year].
See below for exact commands and available arguments.
2010 Census DP1 GeoDatabase
scrapy crawl censusdp1tract
No other options.
EPA CEMS
For full instructions:
epacems --help
EIA Bulk Electricity Data
eia_bulk_elec
No other options.
EPA CAMD to EIA Crosswalk
To collect the data and field descriptions:
scrapy crawl epacamd_eia
EIA860
To collect all the data:
scrapy crawl eia860
To collect a specific year (eg, 2007):
scrapy crawl eia860 -a year=2007
EIA860M
To collect all the data:
scrapy crawl eia860m
To collect a specific month & year (eg, August 2020):
scrapy crawl eia860 -a month=August -a year=2020
EIA861
To collect all the data:
scrapy crawl eia861
To collect a specific year (eg, 2007):
scrapy crawl eia861 -a year=2007
EIA923
To collect all the data:
scrapy crawl eia923
To collect a specific year (eg, 2007):
scrapy crawl eia923 -a year=2007
FERC Forms 1, 2, 6, & 60:
To collect all the data:
sh
scrapy crawl ferc1
scrapy crawl ferc2
scrapy crawl ferc6
scrapy crawl ferc60
There are no subsets enabled.
FERC 714
To collect the data:
scrapy crawl ferc714
There are no subsets, that's it.
Owner
- Name: Catalyst Cooperative
- Login: catalyst-cooperative
- Kind: organization
- Email: hello@catalyst.coop
- Location: United States of America
- Website: https://catalyst.coop
- Twitter: CatalystCoop
- Repositories: 82
- Profile: https://github.com/catalyst-cooperative
Catalyst is a small data engineering cooperative working on electricity regulation and climate change.
GitHub Events
Total
Last Year
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Pablo Virgo | m****x@p****m | 41 |
| Zane Selvans | z****s@c****p | 38 |
| zschira | z****3@c****u | 28 |
| dependabot[bot] | 4****] | 24 |
| pre-commit-ci[bot] | 6****] | 15 |
| Austen Sharpe | a****e@g****m | 12 |
| bendnorman | b****9@c****u | 7 |
| Christina Gosnell | c****l@c****p | 4 |
| karldw | k****w | 2 |
| t-desktop | t****h@g****m | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: about 2 years ago
All Time
- Total issues: 25
- Total pull requests: 54
- Average time to close issues: 4 months
- Average time to close pull requests: 5 days
- Total issue authors: 5
- Total pull request authors: 10
- Average comments per issue: 1.12
- Average comments per pull request: 0.8
- Merged pull requests: 51
- Bot issues: 0
- Bot pull requests: 37
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- zaneselvans (7)
- cmgosnell (6)
- zschira (6)
- ptvirgo (5)
- aesharpe (1)
Pull Request Authors
- dependabot[bot] (24)
- pre-commit-ci[bot] (13)
- aesharpe (3)
- bendnorman (3)
- ptvirgo (2)
- zschira (2)
- cmgosnell (2)
- zaneselvans (2)
- TrentonBush (2)
- karldw (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- factory_boy >=2.12
- pytest >=5.2
- scrapy >=1.7
- factory_boy >=2.12
- pytest >=5.2
- scrapy >=1.7