pyalex

A Python library for OpenAlex (openalex.org)

https://github.com/j535d165/pyalex

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.7%) to scientific vocabulary

Keywords

openalex openalexapi publications research scholarly-articles scholarly-metadata science utrecht-university
Last synced: 6 months ago · JSON representation ·

Repository

A Python library for OpenAlex (openalex.org)

Basic Info
  • Host: GitHub
  • Owner: J535D165
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 155 KB
Statistics
  • Stars: 258
  • Watchers: 8
  • Forks: 34
  • Open Issues: 7
  • Releases: 16
Topics
openalex openalexapi publications research scholarly-articles scholarly-metadata science utrecht-university
Created over 3 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

PyAlex - a Python wrapper for OpenAlex

PyAlex

PyPI DOI

PyAlex is a Python library for OpenAlex. OpenAlex is an index of hundreds of millions of interconnected scholarly papers, authors, institutions, and more. OpenAlex offers a robust, open, and free REST API to extract, aggregate, or search scholarly data. PyAlex is a lightweight and thin Python interface to this API. PyAlex tries to stay as close as possible to the design of the original service.

The following features of OpenAlex are currently supported by PyAlex:

  • [x] Get single entities
  • [x] Filter entities
  • [x] Search entities
  • [x] Group entities
  • [x] Search filters
  • [x] Select fields
  • [x] Sample
  • [x] Pagination
  • [x] Autocomplete endpoint
  • [x] N-grams
  • [x] Authentication

We aim to cover the entire API, and we are looking for help. We are welcoming Pull Requests.

Key features

  • Pipe operations - PyAlex can handle multiple operations in a sequence. This allows the developer to write understandable queries. For examples, see code snippets.
  • Plaintext abstracts - OpenAlex doesn't include plaintext abstracts due to legal constraints. PyAlex can convert the inverted abstracts into plaintext abstracts on the fly.
  • Permissive license - OpenAlex data is CC0 licensed :raised_hands:. PyAlex is published under the MIT license.

Installation

PyAlex requires Python 3.8 or later.

sh pip install pyalex

Getting started

PyAlex offers support for all Entity Objects: Works, Authors, Sources, Institutions, Topics, Publishers, and Funders.

python from pyalex import Works, Authors, Sources, Institutions, Topics, Publishers, Funders

The polite pool

The polite pool has much faster and more consistent response times. To get into the polite pool, you set your email:

```python import pyalex

pyalex.config.email = "mail@example.com" ```

Max retries

By default, PyAlex will raise an error at the first failure when querying the OpenAlex API. You can set max_retries to a number higher than 0 to allow PyAlex to retry when an error occurs. retry_backoff_factor is related to the delay between two retry, and retry_http_codes are the HTTP error codes that should trigger a retry.

```python from pyalex import config

config.maxretries = 0 config.retrybackofffactor = 0.1 config.retryhttp_codes = [429, 500, 503] ```

Standards

OpenAlex uses standard ISO3166-1alpha-2 country codes.

Get single entity

Get a single Work, Author, Source, Institution, Concept, Topic, Publisher or Funder from OpenAlex by the OpenAlex ID, or by DOI or ROR.

```python Works()["W2741809807"]

same as

Works()["https://doi.org/10.7717/peerj.4375"] ```

The result is a Work object, which is very similar to a dictionary. Find the available fields with .keys().

For example, get the open access status:

python Works()["W2741809807"]["open_access"]

python {'is_oa': True, 'oa_status': 'gold', 'oa_url': 'https://doi.org/10.7717/peerj.4375'}

The previous works also for Authors, Sources, Institutions, Concepts and Topics

python Authors()["A5027479191"] Authors()["https://orcid.org/0000-0002-4297-0502"] # same

Get random

Get a random Work, Author, Source, Institution, Concept, Topic, Publisher or Funder.

python Works().random() Authors().random() Sources().random() Institutions().random() Topics().random() Publishers().random() Funders().random()

Check also sample, which does support filters.

Get abstract

Only for Works. Request a work from the OpenAlex database:

python w = Works()["W3128349626"]

All attributes are available like documented under Works, as well as abstract (only if abstract_inverted_index is not None). This abstract made human readable is create on the fly.

python w["abstract"]

python 'Abstract To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.'

Please respect the legal constraints when using this feature.

Get lists of entities

python results = Works().get()

For lists of entities, you can also count the number of records found instead of returning the results. This also works for search queries and filters.

```python Works().count()

10338153

```

For lists of entities, you can return the result as well as the metadata. By default, only the results are returned.

python topics = Topics().get()

python print(topics.meta) {'count': 65073, 'db_response_time_ms': 16, 'page': 1, 'per_page': 25}

Filter records

python Works().filter(publication_year=2020, is_oa=True).get()

which is identical to:

python Works().filter(publication_year=2020).filter(is_oa=True).get()

Nested attribute filters

Some attribute filters are nested and separated with dots by OpenAlex. For example, filter on authorships.institutions.ror.

In case of nested attribute filters, use a dict to build the query.

python Works() .filter(authorships={"institutions": {"ror": "04pp8hn57"}}) .get()

Search entities

OpenAlex reference: The search parameter

python Works().search("fierce creatures").get()

Search filter

OpenAlex reference: The search filter

python Authors().search_filter(display_name="einstein").get()

python Works().search_filter(title="cubist").get()

python Funders().search_filter(display_name="health").get()

Sort entity lists

OpenAlex reference: Sort entity lists.

python Works().sort(cited_by_count="desc").get()

Select

OpenAlex reference: Select fields.

python Works().filter(publication_year=2020, is_oa=True).select(["id", "doi"]).get()

Sample

OpenAlex reference: Sample entity lists.

python Works().sample(100, seed=535).get()

Get 10 random German-based institutions:

python Institutions().filter(country_code="DE").sample(10).get()

Check also random, which does not support filters.

Logical expressions

OpenAlex reference: Logical expressions

Inequality:

python Sources().filter(works_count=">1000").get()

Negation (NOT):

python Institutions().filter(country_code="!us").get()

Intersection (AND):

```python Works().filter(institutions={"country_code": ["fr", "gb"]}).get()

same

Works().filter(institutions={"countrycode": "fr"}).filter(institutions={"countrycode": "gb"}).get() ```

Addition (OR):

python Works().filter(institutions={"country_code": "fr|gb"}).get()

Paging

OpenAlex offers two methods for paging: basic (offset) paging and cursor paging. Both methods are supported by PyAlex.

Cursor paging (default)

Use the method paginate() to paginate results. Each returned page is a list of records, with a maximum of per_page (default 25). By default, paginates argument n_max is set to 10000. Use None to retrieve all results.

```python from pyalex import Authors

pager = Authors().searchfilter(displayname="einstein").paginate(per_page=200)

for page in pager: print(len(page)) ```

Looking for an easy method to iterate the records of a pager?

```python from itertools import chain from pyalex import Authors

query = Authors().searchfilter(displayname="einstein")

for record in chain(*query.paginate(per_page=200)): print(record["id"]) ```

Basic paging

See limitations of basic paging in the OpenAlex documentation.

```python from pyalex import Authors

pager = Authors().searchfilter(displayname="einstein").paginate(method="page", per_page=200)

for page in pager: print(len(page)) ```

Autocomplete

OpenAlex reference: Autocomplete entities.

Autocomplete a string: ```python from pyalex import autocomplete

autocomplete("stockholm resilience centre") ```

Autocomplete a string to get a specific type of entities: ```python from pyalex import Institutions

Institutions().autocomplete("stockholm resilience centre") ```

You can also use the filters to autocomplete: ```python from pyalex import Works

r = Works().filter(publication_year=2023).autocomplete("planetary boundaries") ```

Get N-grams

OpenAlex reference: Get N-grams.

python Works()["W2023271753"].ngrams()

Serialize

All results from PyAlex can be serialized. For example, save the results to a JSON file:

```python import json from pathlib import Path from pyalex import Work

with open(Path("works.json"), "w") as f: json.dump(Works().get(), f)

with open(Path("works.json")) as f: works = [Work(w) for w in json.load(f)] ```

Code snippets

A list of awesome use cases of the OpenAlex dataset.

Search author by name and affiliation

This requires searching for the affiliation first, retrieving the affiliation ID, and then searching for the author while filtering for the affiliation:

```python from pyalex import Authors, Institutions import logging

Search for the institution

insts = Institutions().search("MIT").get() logging.info(f"{len(insts)} search results found for the institution") inst_id = insts[0]["id"].replace("https://openalex.org/", "")

Search for the author within the institution

auths = Authors().search("Daron Acemoglu").filter(affiliations={"institution":{"id": inst_id}}).get() logging.info(f"{len(auths)} search results found for the author") auth = auths[0] ```

Cited publications (works referenced by this paper, outgoing citations)

```python from pyalex import Works

the work to extract the referenced works of

w = Works()["W2741809807"]

Works()[w["referenced_works"]] ```

Citing publications (other works that reference this paper, incoming citations)

python from pyalex import Works Works().filter(cites="W2741809807").get()

Get works of a single author

```python from pyalex import Works

Works().filter(author={"id": "A2887243803"}).get() ```

[!WARNING] This gets only the first 25 works of the author. To get all of them, see the paging section.

Dataset publications in the global south

```python from pyalex import Works

the work to extract the referenced works of

w = Works() \ .filter(institutions={"isglobalsouth":True}) \ .filter(type="dataset") \ .groupby("institutions.countrycode") \ .get()

```

Most cited publications in your organisation

```python from pyalex import Works

Works() \ .filter(authorships={"institutions": {"ror": "04pp8hn57"}}) \ .sort(citedbycount="desc") \ .get()

```

Experimental

Authentication

OpenAlex experiments with authenticated requests at the moment. Authenticate your requests with

```python import pyalex

pyalex.config.apikey = "<MYKEY>" ```

To check out whether your API key is indeed working, you can use the following code:

python import requests pyalex.config.retry_http_codes = None try: pyalex.Works().filter(from_updated_date="2023-01-12").get() except requests.exceptions.HTTPError as e: if e.response.status_code == 403: logging.info("API key is NOT working 🔴") else: logging.error(f"Unexpected HTTP error: {e}") raise else: logging.info("API key is working 👍")

Alternatives

R users can use the excellent OpenAlexR library.

License

MIT

Contact

This library is a community contribution. The authors of this Python library aren't affiliated with OpenAlex.

This library is maintained by J535D165 and PeterLombaers. Feel free to reach out with questions, remarks, and suggestions. The issue tracker is a good starting point. You can also reach out via jonathandebruinos@gmail.com.

Owner

  • Name: Jonathan de Bruin
  • Login: J535D165
  • Kind: user
  • Location: Netherlands
  • Company: Utrecht University

Research engineer working on software, datasets, and tools to advance open science 👐 @UtrechtUniversity @asreview

Citation (CITATION.cff)

cff-version: 1.2.0
message: 'We appreciate, but do not require, attribution.'
authors:
- family-names: "De Bruin"
  given-names: "Jonathan"
  orcid: "https://orcid.org/0000-0002-4297-0502"
title: "PyAlex"
version: 0.8
date-released: 2023-03-19
url: "https://github.com/J535D165/pyalex"

GitHub Events

Total
  • Create event: 7
  • Release event: 2
  • Issues event: 14
  • Watch event: 104
  • Delete event: 6
  • Issue comment event: 32
  • Push event: 51
  • Pull request review comment event: 8
  • Pull request review event: 14
  • Pull request event: 25
  • Fork event: 21
Last Year
  • Create event: 7
  • Release event: 2
  • Issues event: 14
  • Watch event: 104
  • Delete event: 6
  • Issue comment event: 32
  • Push event: 51
  • Pull request review comment event: 8
  • Pull request review event: 14
  • Pull request event: 25
  • Fork event: 21

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 128
  • Total Committers: 13
  • Avg Commits per committer: 9.846
  • Development Distribution Score (DDS): 0.18
Past Year
  • Commits: 26
  • Committers: 8
  • Avg Commits per committer: 3.25
  • Development Distribution Score (DDS): 0.538
Top Committers
Name Email Commits
Jonathan de Bruin j****s@g****m 105
pre-commit-ci[bot] 6****] 6
PeterLombaers 7****s 4
Raffaele Mancuso r****4@u****t 3
Romain THOMAS 3****4 2
fa-se 5****e 1
Zlatko Minev z****v@g****m 1
Phil Gooch 5****y 1
Michele Pasin m****n@g****m 1
Iunio Quarto Russo 4****o 1
Ed Summers e****s@p****m 1
Dennis Priskorn p****n@r****t 1
Andrew Hundt A****t@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 19
  • Total pull requests: 62
  • Average time to close issues: 2 months
  • Average time to close pull requests: 22 days
  • Total issue authors: 19
  • Total pull request authors: 14
  • Average comments per issue: 1.32
  • Average comments per pull request: 0.47
  • Merged pull requests: 46
  • Bot issues: 0
  • Bot pull requests: 9
Past Year
  • Issues: 7
  • Pull requests: 27
  • Average time to close issues: 25 days
  • Average time to close pull requests: 6 days
  • Issue authors: 7
  • Pull request authors: 8
  • Average comments per issue: 0.86
  • Average comments per pull request: 0.11
  • Merged pull requests: 15
  • Bot issues: 0
  • Bot pull requests: 5
Top Authors
Issue Authors
  • ainamdar-ag (1)
  • drumstick90 (1)
  • yasirroni (1)
  • paniterka (1)
  • thomaswolgast (1)
  • Nicolas0208 (1)
  • Bubblbu (1)
  • OliverKulinski (1)
  • lfoppiano (1)
  • hauschke (1)
  • patrickmineault (1)
  • normalnormie (1)
  • mazzespazze (1)
  • PeterLombaers (1)
  • J535D165 (1)
Pull Request Authors
  • J535D165 (31)
  • pre-commit-ci[bot] (14)
  • raffaem (8)
  • PeterLombaers (5)
  • romain894 (5)
  • iqrusso (2)
  • fa-se (2)
  • ahundt (2)
  • lambdamusic (2)
  • dpriskorn (1)
  • adamAfro (1)
  • phil-scholarcy (1)
  • zlatko-minev (1)
  • edsu (1)
Top Labels
Issue Labels
help wanted (1) good first issue (1) enhancement (1) question (1)
Pull Request Labels
enhancement (5) documentation (2) bug (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 21,799 last-month
  • Total docker downloads: 113
  • Total dependent packages: 5
  • Total dependent repositories: 2
  • Total versions: 16
  • Total maintainers: 1
pypi.org: pyalex

Python interface to the OpenAlex database

  • Versions: 16
  • Dependent Packages: 5
  • Dependent Repositories: 2
  • Downloads: 21,799 Last month
  • Docker Downloads: 113
Rankings
Dependent packages count: 1.9%
Downloads: 2.6%
Docker downloads count: 2.9%
Average: 4.7%
Dependent repos count: 11.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/python-lint.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v1 composite
.github/workflows/python-package.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/scorecard.yml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • github/codeql-action/upload-sarif v2 composite
  • ossf/scorecard-action 80e868c13c90f172d68d1f4501dee99e2479f7af composite
pyproject.toml pypi
  • requests *
  • urllib3 *