scholarly
Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
Science Score: 77.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: scholar.google, zenodo.org -
✓Committers with academic emails
3 of 40 committers (7.5%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.9%) to scientific vocabulary
Keywords
Repository
Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
Basic Info
- Host: GitHub
- Owner: scholarly-python-package
- License: unlicense
- Language: Python
- Default Branch: main
- Homepage: https://scholarly.readthedocs.io/
- Size: 6.41 MB
Statistics
- Stars: 1,697
- Watchers: 27
- Forks: 339
- Open Issues: 52
- Releases: 0
Topics
Metadata Files
README.md
scholarly
scholarly is a module that allows you to retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to solve CAPTCHAs.
Installation
scholarly can be installed either with conda or with pip.
To install using conda, simply run
bash
conda install -c conda-forge scholarly
Alternatively, use pip to install the latest release from pypi:
bash
pip3 install scholarly
or pip to install from github:
bash
pip3 install -U git+https://github.com/scholarly-python-package/scholarly.git
We are constantly developing new features.
Please update your local package regularly.
scholarly follows Semantic Versioning.
This means your code that uses an earlier version of scholarly is guaranteed to work with newer versions.
Optional dependencies
Tor:
scholarlycomes with a handful of APIs to set up proxies to circumvent anti-bot measures. Tor methods are deprecated since v1.5 and are not actively tested or supported. If you wish to use Tor, installscholarlyusing thetortag asbash pip3 install scholarly[tor]If you usezsh(which is now the default in latest macOS), you should type this aszsh pip3 install scholarly'[tor]'Note: Tor option is unavailable with conda installation.
Tests
To check if your installation is succesful, run the tests by executing the test_module.py file as:
bash
python3 test_module
or
bash
python3 -m unittest -v test_module.py
Documentation
Check the documentation for a complete API reference and a quickstart guide.
Examples
```python from scholarly import scholarly
Retrieve the author's data, fill-in, and print
Get an iterator for the author results
searchquery = scholarly.searchauthor('Steven A Cholewiak')
Retrieve the first result from the iterator
firstauthorresult = next(searchquery) scholarly.pprint(firstauthor_result)
Retrieve all the details for the author
author = scholarly.fill(firstauthorresult ) scholarly.pprint(author)
Take a closer look at the first publication
firstpublication = author['publications'][0] firstpublicationfilled = scholarly.fill(firstpublication) scholarly.pprint(firstpublicationfilled)
Print the titles of the author's publications
publicationtitles = [pub['bib']['title'] for pub in author['publications']] print(publicationtitles)
Which papers cited that publication?
citations = [citation['bib']['title'] for citation in scholarly.citedby(firstpublicationfilled)] print(citations) ```
IMPORTANT: Making certain types of queries, such as scholarly.citedby or scholarly.search_pubs, will lead to Google Scholar blocking your requests and may eventually block your IP address.
You must use proxy services to avoid this situation.
See the "Using proxies" section in the documentation for more details. Here's a short example:
```python from scholarly import ProxyGenerator
Set up a ProxyGenerator object to use free proxies
This needs to be done only once per session
pg = ProxyGenerator() pg.FreeProxies() scholarly.use_proxy(pg)
Now search Google Scholar from behind a proxy
searchquery = scholarly.searchpubs('Perception of physical stability and center of mass of 3D objects') scholarly.pprint(next(search_query)) ```
scholarly also has APIs that work with several premium (paid) proxy services.
scholarly is smart enough to know which queries need proxies and which do not.
It is therefore recommended to always set up a proxy in the beginning of your application.
Disclaimer
The developers use ScraperAPI to run the tests in Github Actions.
The developers of scholarly are not affiliated with any of the proxy services and do not profit from them. If your favorite service is not supported, please submit an issue or even better, follow it up with a pull request.
Contributing
We welcome contributions from you. Please create an issue, fork this repository and submit a pull request. Read the contributing document for more information.
Acknowledging scholarly
If you have used this codebase in a scientific publication, please cite this software as following:
bibtex
@software{cholewiak2021scholarly,
author = {Cholewiak, Steven A. and Ipeirotis, Panos and Silva, Victor and Kannawadi, Arun},
title = {{SCHOLARLY: Simple access to Google Scholar authors and citation using Python}},
year = {2021},
doi = {10.5281/zenodo.5764801},
license = {Unlicense},
url = {https://github.com/scholarly-python-package/scholarly},
version = {1.5.1}
}
License
The original code that this project was forked from was released by Luciano Bello under a WTFPL license. In keeping with this mentality, all code is released under the Unlicense.
Owner
- Name: scholarly-python-package
- Login: scholarly-python-package
- Kind: organization
- Repositories: 1
- Profile: https://github.com/scholarly-python-package
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: scholarly
message: >-
If your publication used this software, please cite
it as below
type: software
authors:
- given-names: Steven
family-names: Cholewiak
email: steven@cholewiak.com
affiliation: Google LLC
orcid: 'https://orcid.org/0000-0003-0605-4395'
- given-names: Panos
family-names: Ipeirotis
email: panos@stern.nyu.edu
orcid: 'https://orcid.org/0000-0002-2966-7402'
affiliation: >-
New York University Stern School of Business:
New York, NY, US
- given-names: Victor
family-names: Silva
email: 'vsilva@ualberta.ca,'
affiliation: >-
Department of Computing Science
University of Alberta, Edmonton, Alberta
Canada
orcid: 'https://orcid.org/0000-0001-6702-6334'
- given-names: Arun
family-names: Kannawadi
email: arunkannawadi@astro.princeton.edu
affiliation: >-
Department of Astrophysical Sciences, Princeton
University, 4 Ivy Lane, Princeton NJ 08544 USA
orcid: 'https://orcid.org/0000-0001-8783-6529'
identifiers:
- type: doi
value: 10.5821/zenodo.5764802
description: DOI
repository-code: >-
https://github.com/scholarly-python-package/scholarly
abstract: >-
Retrieve author and publication information from
Google Scholar in a friendly, Pythonic way without
having to worry about CAPTCHAs!
keywords:
- >-
citations publication-data
scholarly-communications citation-network
citation-index scholarly-articles
citation-analysis scholar googlescholar
license: Unlicense
version: 1.5.0
GitHub Events
Total
- Issues event: 8
- Watch event: 298
- Issue comment event: 29
- Push event: 7
- Pull request review event: 6
- Pull request event: 11
- Fork event: 39
Last Year
- Issues event: 8
- Watch event: 298
- Issue comment event: 29
- Push event: 7
- Pull request review event: 6
- Pull request event: 11
- Fork event: 39
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| arunkannawadi | k****j@g****m | 216 |
| Panos Ipeirotis | i****s@g****m | 94 |
| silvavn | 3****n | 54 |
| Dimitris Mylonopoulos | d****s@p****m | 38 |
| Steven A. Cholewiak | s****k@g****m | 31 |
| Luciano Bello | l****o@d****g | 22 |
| Stefan Tauner | s****r@g****t | 19 |
| Programize Admin | 3****n | 10 |
| Luciano Bello | b****o@c****e | 9 |
| Steve Cholewiak | s****n@c****m | 6 |
| Wei (Wayne) Hu | w****u@d****u | 6 |
| Matthew | m****u@g****m | 6 |
| Tom Brien | t****m@b****k | 4 |
| Francisco Knebel | f****l@g****m | 3 |
| jonasengelmann | j****b@g****m | 3 |
| Steve Cholewiak | s****k@f****u | 2 |
| Panagiotis Georgakopoulos | p****g@p****m | 2 |
| Leopold Talirz | l****z@g****m | 2 |
| Remi Rampin | r@r****m | 2 |
| Bedanec | p****1@g****m | 2 |
| nikitabalabin | n****a@m****u | 1 |
| mmontevil | 7****l | 1 |
| cdacosta | c****f@g****m | 1 |
| Santiago Castro | b****t@m****y | 1 |
| Roberto Natella | 2****a | 1 |
| Javier Martinez Lizama | j****z@b****m | 1 |
| Pablo Prietz | p****o@p****g | 1 |
| Matthew Pfeiffer | s****l@g****m | 1 |
| Masataro Asai | g****8@g****m | 1 |
| Mark Abspoel | m****l@m****l | 1 |
| and 10 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 98
- Total pull requests: 66
- Average time to close issues: about 2 months
- Average time to close pull requests: about 2 months
- Total issue authors: 74
- Total pull request authors: 23
- Average comments per issue: 3.15
- Average comments per pull request: 0.52
- Merged pull requests: 53
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 11
- Pull requests: 14
- Average time to close issues: N/A
- Average time to close pull requests: 4 months
- Issue authors: 11
- Pull request authors: 8
- Average comments per issue: 0.36
- Average comments per pull request: 0.5
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- arunkannawadi (11)
- frafra (4)
- ipeirotis (3)
- mzhukovs (3)
- tigerjack (2)
- stanleyrhodes (2)
- pannone (2)
- hp0404 (2)
- NisoD (2)
- kostrykin (2)
- gboeing (1)
- HowardZJU (1)
- timapage (1)
- zhjwy9343 (1)
- TingxunShi (1)
Pull Request Authors
- arunkannawadi (32)
- dlebedinsky (3)
- Luen (2)
- nkxxll (2)
- jjshoots (2)
- DLu (2)
- yarikoptic (2)
- brokenjade3000 (2)
- ltalirz (2)
- cyyever (2)
- amchagas (2)
- tZimmermann98 (2)
- ssdv1 (1)
- ipeirotis (1)
- ma-ji (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 116,076 last-month
- Total docker downloads: 191,396,802
-
Total dependent packages: 16
(may contain duplicates) -
Total dependent repositories: 252
(may contain duplicates) - Total versions: 68
- Total maintainers: 4
pypi.org: scholarly
Simple access to Google Scholar authors and citations
- Homepage: https://github.com/scholarly-python-package/scholarly
- Documentation: https://scholarly.readthedocs.io/
- License: Unlicense
-
Latest release: 1.7.11
published almost 3 years ago
Rankings
Maintainers (4)
conda-forge.org: scholarly
- Homepage: https://github.com/scholarly-python-package/scholarly
- License: Unlicense
-
Latest release: 1.7.3
published about 3 years ago
Rankings
Dependencies
- actions/checkout v2 composite
- github/codeql-action/analyze v1 composite
- github/codeql-action/autobuild v1 composite
- github/codeql-action/init v1 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- codecov/codecov-action v2 composite
- actions/checkout main composite
- actions/setup-python v1 composite
- pypa/gh-action-pypi-publish master composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- codecov/codecov-action v2 composite
- coverage * development
- flake8 * development
- pandas * development
- sphinx_rtd_theme * development
- arrow *
- beautifulsoup4 *
- bibtexparser *
- deprecated *
- fake_useragent *
- free-proxy *
- httpx *
- python-dotenv *
- requests *
- selenium *
- stem *
- arrow *
- beautifulsoup4 *
- bibtexparser *
- deprecated *
- fake_useragent *
- free-proxy *
- httpx *
- python-dotenv *
- requests *