scholarly

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!

https://github.com/scholarly-python-package/scholarly

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: scholar.google, zenodo.org
  • Committers with academic emails
    3 of 40 committers (7.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.9%) to scientific vocabulary

Keywords

citation-analysis citation-index citation-network citations googlescholar publication-data python python-3 python3 scholar scholarly-articles scholarly-communications
Last synced: 4 months ago · JSON representation ·

Repository

Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!

Basic Info
Statistics
  • Stars: 1,697
  • Watchers: 27
  • Forks: 339
  • Open Issues: 52
  • Releases: 0
Topics
citation-analysis citation-index citation-network citations googlescholar publication-data python python-3 python3 scholar scholarly-articles scholarly-communications
Created about 11 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

Python package codecov Documentation Status DOI

scholarly

scholarly is a module that allows you to retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to solve CAPTCHAs.

Installation

Anaconda-Server Badge PyPI version

scholarly can be installed either with conda or with pip. To install using conda, simply run bash conda install -c conda-forge scholarly

Alternatively, use pip to install the latest release from pypi:

bash pip3 install scholarly

or pip to install from github:

bash pip3 install -U git+https://github.com/scholarly-python-package/scholarly.git

We are constantly developing new features. Please update your local package regularly. scholarly follows Semantic Versioning. This means your code that uses an earlier version of scholarly is guaranteed to work with newer versions.

Optional dependencies

  • Tor:

    scholarly comes with a handful of APIs to set up proxies to circumvent anti-bot measures. Tor methods are deprecated since v1.5 and are not actively tested or supported. If you wish to use Tor, install scholarly using the tor tag as bash pip3 install scholarly[tor] If you use zsh (which is now the default in latest macOS), you should type this as zsh pip3 install scholarly'[tor]' Note: Tor option is unavailable with conda installation.

Tests

To check if your installation is succesful, run the tests by executing the test_module.py file as:

bash python3 test_module

or

bash python3 -m unittest -v test_module.py

Documentation

Check the documentation for a complete API reference and a quickstart guide.

Examples

```python from scholarly import scholarly

Retrieve the author's data, fill-in, and print

Get an iterator for the author results

searchquery = scholarly.searchauthor('Steven A Cholewiak')

Retrieve the first result from the iterator

firstauthorresult = next(searchquery) scholarly.pprint(firstauthor_result)

Retrieve all the details for the author

author = scholarly.fill(firstauthorresult ) scholarly.pprint(author)

Take a closer look at the first publication

firstpublication = author['publications'][0] firstpublicationfilled = scholarly.fill(firstpublication) scholarly.pprint(firstpublicationfilled)

Print the titles of the author's publications

publicationtitles = [pub['bib']['title'] for pub in author['publications']] print(publicationtitles)

Which papers cited that publication?

citations = [citation['bib']['title'] for citation in scholarly.citedby(firstpublicationfilled)] print(citations) ```

IMPORTANT: Making certain types of queries, such as scholarly.citedby or scholarly.search_pubs, will lead to Google Scholar blocking your requests and may eventually block your IP address. You must use proxy services to avoid this situation. See the "Using proxies" section in the documentation for more details. Here's a short example:

```python from scholarly import ProxyGenerator

Set up a ProxyGenerator object to use free proxies

This needs to be done only once per session

pg = ProxyGenerator() pg.FreeProxies() scholarly.use_proxy(pg)

Now search Google Scholar from behind a proxy

searchquery = scholarly.searchpubs('Perception of physical stability and center of mass of 3D objects') scholarly.pprint(next(search_query)) ```

scholarly also has APIs that work with several premium (paid) proxy services. scholarly is smart enough to know which queries need proxies and which do not. It is therefore recommended to always set up a proxy in the beginning of your application.

Disclaimer

The developers use ScraperAPI to run the tests in Github Actions. The developers of scholarly are not affiliated with any of the proxy services and do not profit from them. If your favorite service is not supported, please submit an issue or even better, follow it up with a pull request.

Contributing

We welcome contributions from you. Please create an issue, fork this repository and submit a pull request. Read the contributing document for more information.

Acknowledging scholarly

If you have used this codebase in a scientific publication, please cite this software as following:

bibtex @software{cholewiak2021scholarly, author = {Cholewiak, Steven A. and Ipeirotis, Panos and Silva, Victor and Kannawadi, Arun}, title = {{SCHOLARLY: Simple access to Google Scholar authors and citation using Python}}, year = {2021}, doi = {10.5281/zenodo.5764801}, license = {Unlicense}, url = {https://github.com/scholarly-python-package/scholarly}, version = {1.5.1} }

License

The original code that this project was forked from was released by Luciano Bello under a WTFPL license. In keeping with this mentality, all code is released under the Unlicense.

Owner

  • Name: scholarly-python-package
  • Login: scholarly-python-package
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: scholarly
message: >-
  If your publication used this software, please cite
  it as below
type: software
authors:
  - given-names: Steven
    family-names: Cholewiak
    email: steven@cholewiak.com
    affiliation: Google LLC
    orcid: 'https://orcid.org/0000-0003-0605-4395'
  - given-names: Panos
    family-names: Ipeirotis
    email: panos@stern.nyu.edu
    orcid: 'https://orcid.org/0000-0002-2966-7402'
    affiliation: >-
      New York University Stern School of Business:
      New York, NY, US
  - given-names: Victor
    family-names: Silva
    email: 'vsilva@ualberta.ca,'
    affiliation: >-
      Department of Computing Science
      University of Alberta, Edmonton, Alberta
      Canada
    orcid: 'https://orcid.org/0000-0001-6702-6334'
  - given-names: Arun
    family-names: Kannawadi
    email: arunkannawadi@astro.princeton.edu
    affiliation: >-
      Department of Astrophysical Sciences, Princeton
      University, 4 Ivy Lane, Princeton NJ 08544 USA
    orcid: 'https://orcid.org/0000-0001-8783-6529'
identifiers:
  - type: doi
    value: 10.5821/zenodo.5764802
    description: DOI
repository-code: >-
  https://github.com/scholarly-python-package/scholarly
abstract: >-
  Retrieve author and publication information from
  Google Scholar in a friendly, Pythonic way without
  having to worry about CAPTCHAs!
keywords:
  - >-
    citations publication-data
    scholarly-communications citation-network
    citation-index scholarly-articles
    citation-analysis scholar googlescholar
license: Unlicense
version: 1.5.0

GitHub Events

Total
  • Issues event: 8
  • Watch event: 298
  • Issue comment event: 29
  • Push event: 7
  • Pull request review event: 6
  • Pull request event: 11
  • Fork event: 39
Last Year
  • Issues event: 8
  • Watch event: 298
  • Issue comment event: 29
  • Push event: 7
  • Pull request review event: 6
  • Pull request event: 11
  • Fork event: 39

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 551
  • Total Committers: 40
  • Avg Commits per committer: 13.775
  • Development Distribution Score (DDS): 0.608
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
arunkannawadi k****j@g****m 216
Panos Ipeirotis i****s@g****m 94
silvavn 3****n 54
Dimitris Mylonopoulos d****s@p****m 38
Steven A. Cholewiak s****k@g****m 31
Luciano Bello l****o@d****g 22
Stefan Tauner s****r@g****t 19
Programize Admin 3****n 10
Luciano Bello b****o@c****e 9
Steve Cholewiak s****n@c****m 6
Wei (Wayne) Hu w****u@d****u 6
Matthew m****u@g****m 6
Tom Brien t****m@b****k 4
Francisco Knebel f****l@g****m 3
jonasengelmann j****b@g****m 3
Steve Cholewiak s****k@f****u 2
Panagiotis Georgakopoulos p****g@p****m 2
Leopold Talirz l****z@g****m 2
Remi Rampin r@r****m 2
Bedanec p****1@g****m 2
nikitabalabin n****a@m****u 1
mmontevil 7****l 1
cdacosta c****f@g****m 1
Santiago Castro b****t@m****y 1
Roberto Natella 2****a 1
Javier Martinez Lizama j****z@b****m 1
Pablo Prietz p****o@p****g 1
Matthew Pfeiffer s****l@g****m 1
Masataro Asai g****8@g****m 1
Mark Abspoel m****l@m****l 1
and 10 more...

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 98
  • Total pull requests: 66
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 74
  • Total pull request authors: 23
  • Average comments per issue: 3.15
  • Average comments per pull request: 0.52
  • Merged pull requests: 53
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 11
  • Pull requests: 14
  • Average time to close issues: N/A
  • Average time to close pull requests: 4 months
  • Issue authors: 11
  • Pull request authors: 8
  • Average comments per issue: 0.36
  • Average comments per pull request: 0.5
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • arunkannawadi (11)
  • frafra (4)
  • ipeirotis (3)
  • mzhukovs (3)
  • tigerjack (2)
  • stanleyrhodes (2)
  • pannone (2)
  • hp0404 (2)
  • NisoD (2)
  • kostrykin (2)
  • gboeing (1)
  • HowardZJU (1)
  • timapage (1)
  • zhjwy9343 (1)
  • TingxunShi (1)
Pull Request Authors
  • arunkannawadi (32)
  • dlebedinsky (3)
  • Luen (2)
  • nkxxll (2)
  • jjshoots (2)
  • DLu (2)
  • yarikoptic (2)
  • brokenjade3000 (2)
  • ltalirz (2)
  • cyyever (2)
  • amchagas (2)
  • tZimmermann98 (2)
  • ssdv1 (1)
  • ipeirotis (1)
  • ma-ji (1)
Top Labels
Issue Labels
bug (56) enhancement (19) proxy (10) invalid (4) help wanted (4) wontfix (3) documentation (3) good first issue (3) duplicate (2)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 116,076 last-month
  • Total docker downloads: 191,396,802
  • Total dependent packages: 16
    (may contain duplicates)
  • Total dependent repositories: 252
    (may contain duplicates)
  • Total versions: 68
  • Total maintainers: 4
pypi.org: scholarly

Simple access to Google Scholar authors and citations

  • Versions: 60
  • Dependent Packages: 16
  • Dependent Repositories: 252
  • Downloads: 116,076 Last month
  • Docker Downloads: 191,396,802
Rankings
Docker downloads count: 0.4%
Dependent repos count: 0.9%
Average: 1.8%
Stargazers count: 2.0%
Dependent packages count: 2.2%
Downloads: 2.4%
Forks count: 3.1%
Last synced: 4 months ago
conda-forge.org: scholarly
  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Forks count: 10.2%
Stargazers count: 12.9%
Average: 27.1%
Dependent repos count: 34.0%
Dependent packages count: 51.2%
Last synced: 4 months ago

Dependencies

.github/workflows/codeql-analysis.yml actions
  • actions/checkout v2 composite
  • github/codeql-action/analyze v1 composite
  • github/codeql-action/autobuild v1 composite
  • github/codeql-action/init v1 composite
.github/workflows/lint.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/proxytests.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • codecov/codecov-action v2 composite
.github/workflows/publish-to-pypi.yml actions
  • actions/checkout main composite
  • actions/setup-python v1 composite
  • pypa/gh-action-pypi-publish master composite
.github/workflows/pythonpackage.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • codecov/codecov-action v2 composite
requirements-dev.txt pypi
  • coverage * development
  • flake8 * development
  • pandas * development
  • sphinx_rtd_theme * development
requirements.txt pypi
  • arrow *
  • beautifulsoup4 *
  • bibtexparser *
  • deprecated *
  • fake_useragent *
  • free-proxy *
  • httpx *
  • python-dotenv *
  • requests *
  • selenium *
  • stem *
setup.py pypi
  • arrow *
  • beautifulsoup4 *
  • bibtexparser *
  • deprecated *
  • fake_useragent *
  • free-proxy *
  • httpx *
  • python-dotenv *
  • requests *