https://github.com/alan-turing-institute/turing-publications

Analysing Turing publications and outputs from publishers and other data sources

https://github.com/alan-turing-institute/turing-publications

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Analysing Turing publications and outputs from publishers and other data sources

Basic Info
  • Host: GitHub
  • Owner: alan-turing-institute
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 686 KB
Statistics
  • Stars: 2
  • Watchers: 4
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Created almost 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

DataCite Publication Scraper for The Alan Turing Institute

Overview

This project aims to download and analyze publications that cite affiliation with "The Alan Turing Institute" from DataCite's API and other resources. Note that Zenodo and Arxiv both use DataCite as their DOI providers.

The project is implemented in Python and uses Poetry for dependency management.

Table of Contents

  1. Overview
  2. Prerequisites
  3. Installation
  4. Usage
  5. Contributing
  6. License
  7. Current status

Prerequisites

  • Python 3.x
  • Poetry (for dependency management)

Installation

  1. Clone the Repository bash git clone https://github.com/thealanturinginstitute/turing_publications.git

  2. Navigate to Project Directory bash cd turing_publications

  3. Install Poetry If you haven't installed Poetry yet, you can install it by following the instructions here.

  4. Install Dependencies bash poetry install

Usage

  1. Activate the Poetry Environment bash poetry shell

  2. Run the Script

    Download data from DataCite: bash python src/datacite_api.py

    Parse downloaded data into a csv file: bash python src/datacite2csv.py

Contributing

If you would like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

License

MIT License. See LICENSE for details.


Current status

Downloading data from DataCite works by looking for "Alan Turing Institute" within the text of each record (different systems put affiliations into different places). This seems to download records largely from Zenodo.

Below I'm trying to collect some examples of outputs created by The Alan Turing Institute that are not included, for reference and debugging. This is biased and incomplete:

Owner

  • Name: The Alan Turing Institute
  • Login: alan-turing-institute
  • Kind: organization
  • Email: info@turing.ac.uk

The UK's national institute for data science and artificial intelligence.

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • fedenanni (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

poetry.lock pypi
  • appnope 0.1.3
  • asttokens 2.4.0
  • backcall 0.2.0
  • certifi 2023.7.22
  • cffi 1.15.1
  • charset-normalizer 3.2.0
  • colorama 0.4.6
  • comm 0.1.4
  • contourpy 1.1.1
  • cycler 0.11.0
  • debugpy 1.8.0
  • decorator 5.1.1
  • executing 1.2.0
  • fonttools 4.42.1
  • idna 3.4
  • ipykernel 6.25.2
  • ipython 8.15.0
  • jedi 0.19.0
  • jupyter-client 8.3.1
  • jupyter-core 5.3.1
  • kiwisolver 1.4.5
  • matplotlib 3.8.0
  • matplotlib-inline 0.1.6
  • nest-asyncio 1.5.8
  • numpy 1.26.0
  • packaging 23.1
  • pandas 2.1.1
  • parso 0.8.3
  • pexpect 4.8.0
  • pickleshare 0.7.5
  • pillow 10.0.1
  • platformdirs 3.10.0
  • prompt-toolkit 3.0.39
  • psutil 5.9.5
  • ptyprocess 0.7.0
  • pure-eval 0.2.2
  • pycparser 2.21
  • pygments 2.16.1
  • pyparsing 3.1.1
  • python-dateutil 2.8.2
  • pytz 2023.3.post1
  • pywin32 306
  • pyzmq 25.1.1
  • requests 2.31.0
  • seaborn 0.12.2
  • setuptools 68.2.2
  • setuptools-scm 8.0.2
  • six 1.16.0
  • stack-data 0.6.2
  • tornado 6.3.3
  • traitlets 5.10.0
  • tzdata 2023.3
  • urllib3 2.0.5
  • wcwidth 0.2.6
pyproject.toml pypi
  • matplotlib ^3.8.0
  • pandas ^2.1.1
  • python >=3.11,<3.13
  • requests ^2.31.0
  • seaborn ^0.12.2