https://github.com/cthoyt/chemrxiv-summarize
Summarize usage of ChemRxiv by using Figshare's API endpoint
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary
Keywords
Repository
Summarize usage of ChemRxiv by using Figshare's API endpoint
Basic Info
Statistics
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
WARNING: ChemRxiv has switched from FigShare to Engage, and no longer offers an API, making this whole project effectively moot.
chemrxiv-summarize
Motivated by
makes me wonder about the stats at @ChemRxiv https://t.co/Ml5X8F4ckJ
— Egon Willighⓐgen (@egonwillighagen) January 20, 2020
this repo summarize usage of ChemRxiv by using Figshare's API endpoint.
I've already run the download script on 2020-11-02. The scripts can be run in this order to get nice charts.
bash
pip install -r requirements.txt
python 01_download.py
python 02_process.py
python 03_visualize.py
Downloading takes a bit of time (40 minutes, maybe?) but there's a tqdm bar to keep you entertained in the mean time.
I did a full write-up on the experience of writing this code and the results in this blog post.
See also the ChemRxiv dashboard (source code) that displays similar statistics and is automatically updated daily.
Charts
How many papers were submitted each month to ChemRxiv?

How many unique authors have contributed per month to ChemRxiv? This only counts using the ORCID iDs of the first authors; it's pretty inconsistent what other identifying information is included in the metadata for each article.

How many authors submitted more than once per month? This chart shows spikes in August, which I will guess is when most people are submitting before their summer breaks :)

How many authors contributed for their first time each month?

How many first authors have historically contributed to ChemRxiv at each month? We can take the first date of authorship for each author then count at each month how many unique first time authors there are. Then, we can use a cumulative sum to show how many authors have contributed to ChemRxiv at any point in time.

If we aggregate that data, we can ask how many authors have submitted lots of articles:

Licensing
The following chart shows the popularity of different licenses over time:

Gender Related
The gender-guesser package was used to infer authors' genders based on their first name. This obviously comes with the caveat that some names can't be automatically assigned to the male/female dichotomy. The "mostly male" and "mostly female" results were respectively grouped with the male and female names. The "androgenous" results were evenly split between male and female.
The first chart shows the first author frequencies inferred as male, female, and unknown.

This chart shows the percentage of male first authors with respect to male + female first authors. It shows that even as the number of submissions changes, the ratio still is quite skewed towards male first authorship. Notably, there are no change in patterns during the first COVID-19 pandemic lockdown (April-August 2020).

Owner
- Name: Charles Tapley Hoyt
- Login: cthoyt
- Kind: user
- Location: Bonn, Germany
- Company: RWTH Aachen University
- Website: https://cthoyt.com
- Repositories: 489
- Profile: https://github.com/cthoyt
GitHub Events
Total
Last Year
Committers
Last synced: 12 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Charles Tapley Hoyt | c****t@g****m | 33 |
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 3
- Total pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 3.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jannisborn (2)
- cthoyt (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- click *
- gender_guesser *
- matplotlib *
- pandas *
- requests *
- seaborn *
- tqdm *