https://github.com/callumrollo/github-scraper

Scraping GitHub to visualise users' commits during a hackweek

https://github.com/callumrollo/github-scraper

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary

Keywords

api github json jupyter-notebook notebook python scraper wordcloud
Last synced: 5 months ago · JSON representation

Repository

Scraping GitHub to visualise users' commits during a hackweek

Basic Info
  • Host: GitHub
  • Owner: callumrollo
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 683 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
api github json jupyter-notebook notebook python scraper wordcloud
Created over 5 years ago · Last pushed over 5 years ago
Metadata Files
Readme

README.md

Github scraping for fun and not-for-profit

I was inspired by the work done during Oceanhackweek2020 (OHW20) to investigate the github API. This started as an idea to count the github commits made by participants during OHW20 and the weeks on either side.

I could not find a good tutorial on scraping commits from a list of Github usernames (though this out of date one by kb22 set me on the right track) so I've written a fairly rough one here.

Check it out on binder from your browser Binder.

You won't be able to run much of the scraping notebook on binder though, as it requires a GitHub API token

How to

All the instructions for carrying out this scraping are found in the notebook ohw_github_scrape.ipynb.

If you don't fancy the scraping yourself, but are interested in the results, you will find a csv of anonymised data in this repo along with some very basic analysis in ohw_analysis.ipynb

If you found this work interesting, please feel free to fork it and consider submitting a PR. Or contact me on twitter

Aims

  • [x] Scrape github for OHW20 participant public information
  • [x] Total the number of commits during hack week and the weeks either side
  • [ ] Investigate deeper including file types used and wording of commit messages
  • [ ] Plot and analyse this info in a blog post
  • [x] Provide a how-to, with anonymised data
  • [x] Write a tutorial for anyone else curious about scraping Github data

Owner

  • Name: Callum Rollo
  • Login: callumrollo
  • Kind: user
  • Location: Gothenburg, Sweden
  • Company: Voice of the Ocean Foundation

Oceanographer, Pythonista and data science-ish. Breaks things on Fridays

GitHub Events

Total
Last Year

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 8
  • Total Committers: 1
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Callum Rollo c****o@o****m 8

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels