https://github.com/cthoyt/chemrxiv-summarize

Summarize usage of ChemRxiv by using Figshare's API endpoint

https://github.com/cthoyt/chemrxiv-summarize

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary

Keywords

chemrxiv preprints
Last synced: 9 months ago · JSON representation

Repository

Summarize usage of ChemRxiv by using Figshare's API endpoint

Basic Info
  • Host: GitHub
  • Owner: cthoyt
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 11.5 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 2
  • Open Issues: 1
  • Releases: 0
Topics
chemrxiv preprints
Created over 6 years ago · Last pushed over 4 years ago
Metadata Files
Readme License

README.md

WARNING: ChemRxiv has switched from FigShare to Engage, and no longer offers an API, making this whole project effectively moot.

chemrxiv-summarize

Motivated by

this repo summarize usage of ChemRxiv by using Figshare's API endpoint.

I've already run the download script on 2020-11-02. The scripts can be run in this order to get nice charts.

bash pip install -r requirements.txt python 01_download.py python 02_process.py python 03_visualize.py

Downloading takes a bit of time (40 minutes, maybe?) but there's a tqdm bar to keep you entertained in the mean time.

I did a full write-up on the experience of writing this code and the results in this blog post.

See also the ChemRxiv dashboard (source code) that displays similar statistics and is automatically updated daily.

Charts

How many papers were submitted each month to ChemRxiv?

Articles per Month

How many unique authors have contributed per month to ChemRxiv? This only counts using the ORCID iDs of the first authors; it's pretty inconsistent what other identifying information is included in the metadata for each article.

Unique Authors per Month

How many authors submitted more than once per month? This chart shows spikes in August, which I will guess is when most people are submitting before their summer breaks :)

Percent Duplicate Authors per Month

How many authors contributed for their first time each month?

First Time First Authors per Month

How many first authors have historically contributed to ChemRxiv at each month? We can take the first date of authorship for each author then count at each month how many unique first time authors there are. Then, we can use a cumulative sum to show how many authors have contributed to ChemRxiv at any point in time.

Historical Authorship

If we aggregate that data, we can ask how many authors have submitted lots of articles:

Author Prolificness

Licensing

The following chart shows the popularity of different licenses over time:

Historical Licenses

Gender Related

The gender-guesser package was used to infer authors' genders based on their first name. This obviously comes with the caveat that some names can't be automatically assigned to the male/female dichotomy. The "mostly male" and "mostly female" results were respectively grouped with the male and female names. The "androgenous" results were evenly split between male and female.

The first chart shows the first author frequencies inferred as male, female, and unknown.

Genders of First Authors by Month

This chart shows the percentage of male first authors with respect to male + female first authors. It shows that even as the number of submissions changes, the ratio still is quite skewed towards male first authorship. Notably, there are no change in patterns during the first COVID-19 pandemic lockdown (April-August 2020).

Male Percentage by Month

Owner

  • Name: Charles Tapley Hoyt
  • Login: cthoyt
  • Kind: user
  • Location: Bonn, Germany
  • Company: RWTH Aachen University

GitHub Events

Total
Last Year

Committers

Last synced: 12 months ago

All Time
  • Total Commits: 33
  • Total Committers: 1
  • Avg Commits per committer: 33.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Charles Tapley Hoyt c****t@g****m 33

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: 1 day
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 3.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jannisborn (2)
  • cthoyt (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • click *
  • gender_guesser *
  • matplotlib *
  • pandas *
  • requests *
  • seaborn *
  • tqdm *