https://github.com/alan-turing-institute/turing-publications
Analysing Turing publications and outputs from publishers and other data sources
https://github.com/alan-turing-institute/turing-publications
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.0%) to scientific vocabulary
Repository
Analysing Turing publications and outputs from publishers and other data sources
Basic Info
- Host: GitHub
- Owner: alan-turing-institute
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 686 KB
Statistics
- Stars: 2
- Watchers: 4
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
DataCite Publication Scraper for The Alan Turing Institute
Overview
This project aims to download and analyze publications that cite affiliation with "The Alan Turing Institute" from DataCite's API and other resources. Note that Zenodo and Arxiv both use DataCite as their DOI providers.
The project is implemented in Python and uses Poetry for dependency management.
Table of Contents
Prerequisites
- Python 3.x
- Poetry (for dependency management)
Installation
Clone the Repository
bash git clone https://github.com/thealanturinginstitute/turing_publications.gitNavigate to Project Directory
bash cd turing_publicationsInstall Poetry If you haven't installed Poetry yet, you can install it by following the instructions here.
Install Dependencies
bash poetry install
Usage
Activate the Poetry Environment
bash poetry shellRun the Script
Download data from DataCite:
bash python src/datacite_api.pyParse downloaded data into a csv file:
bash python src/datacite2csv.py
Contributing
If you would like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.
License
MIT License. See LICENSE for details.
Current status
Downloading data from DataCite works by looking for "Alan Turing Institute" within the text of each record (different systems put affiliations into different places). This seems to download records largely from Zenodo.
Below I'm trying to collect some examples of outputs created by The Alan Turing Institute that are not included, for reference and debugging. This is biased and incomplete:
- Most of arxiv.org papers!
- Some papers from arxiv.org that include the Turing somewhere in the text
- Random example from arxiv.org that came out of the Turing where Turing is only listed as an affiliation in the actual paper
- Some Zenodo outputs:
Owner
- Name: The Alan Turing Institute
- Login: alan-turing-institute
- Kind: organization
- Email: info@turing.ac.uk
- Website: https://turing.ac.uk
- Repositories: 477
- Profile: https://github.com/alan-turing-institute
The UK's national institute for data science and artificial intelligence.
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- fedenanni (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- appnope 0.1.3
- asttokens 2.4.0
- backcall 0.2.0
- certifi 2023.7.22
- cffi 1.15.1
- charset-normalizer 3.2.0
- colorama 0.4.6
- comm 0.1.4
- contourpy 1.1.1
- cycler 0.11.0
- debugpy 1.8.0
- decorator 5.1.1
- executing 1.2.0
- fonttools 4.42.1
- idna 3.4
- ipykernel 6.25.2
- ipython 8.15.0
- jedi 0.19.0
- jupyter-client 8.3.1
- jupyter-core 5.3.1
- kiwisolver 1.4.5
- matplotlib 3.8.0
- matplotlib-inline 0.1.6
- nest-asyncio 1.5.8
- numpy 1.26.0
- packaging 23.1
- pandas 2.1.1
- parso 0.8.3
- pexpect 4.8.0
- pickleshare 0.7.5
- pillow 10.0.1
- platformdirs 3.10.0
- prompt-toolkit 3.0.39
- psutil 5.9.5
- ptyprocess 0.7.0
- pure-eval 0.2.2
- pycparser 2.21
- pygments 2.16.1
- pyparsing 3.1.1
- python-dateutil 2.8.2
- pytz 2023.3.post1
- pywin32 306
- pyzmq 25.1.1
- requests 2.31.0
- seaborn 0.12.2
- setuptools 68.2.2
- setuptools-scm 8.0.2
- six 1.16.0
- stack-data 0.6.2
- tornado 6.3.3
- traitlets 5.10.0
- tzdata 2023.3
- urllib3 2.0.5
- wcwidth 0.2.6
- matplotlib ^3.8.0
- pandas ^2.1.1
- python >=3.11,<3.13
- requests ^2.31.0
- seaborn ^0.12.2