https://github.com/callumrollo/github-scraper
Scraping GitHub to visualise users' commits during a hackweek
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Keywords
Repository
Scraping GitHub to visualise users' commits during a hackweek
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Github scraping for fun and not-for-profit
I was inspired by the work done during Oceanhackweek2020 (OHW20) to investigate the github API. This started as an idea to count the github commits made by participants during OHW20 and the weeks on either side.
I could not find a good tutorial on scraping commits from a list of Github usernames (though this out of date one by kb22 set me on the right track) so I've written a fairly rough one here.
Check it out on binder from your browser .
You won't be able to run much of the scraping notebook on binder though, as it requires a GitHub API token
How to
All the instructions for carrying out this scraping are found in the notebook ohw_github_scrape.ipynb.
If you don't fancy the scraping yourself, but are interested in the results, you will find a csv of anonymised data in this repo along with some very basic analysis in ohw_analysis.ipynb
If you found this work interesting, please feel free to fork it and consider submitting a PR. Or contact me on twitter
Aims
- [x] Scrape github for OHW20 participant public information
- [x] Total the number of commits during hack week and the weeks either side
- [ ] Investigate deeper including file types used and wording of commit messages
- [ ] Plot and analyse this info in a blog post
- [x] Provide a how-to, with anonymised data
- [x] Write a tutorial for anyone else curious about scraping Github data
Owner
- Name: Callum Rollo
- Login: callumrollo
- Kind: user
- Location: Gothenburg, Sweden
- Company: Voice of the Ocean Foundation
- Website: https://callumrollo.github.io/
- Repositories: 86
- Profile: https://github.com/callumrollo
Oceanographer, Pythonista and data science-ish. Breaks things on Fridays
GitHub Events
Total
Last Year
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Callum Rollo | c****o@o****m | 8 |
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0