googlescholar
A Python module that implements a querier and parser for Google Scholar's output.
Science Score: 28.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: scholar.google -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.2%) to scientific vocabulary
Repository
A Python module that implements a querier and parser for Google Scholar's output.
Basic Info
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Google Scholar Grabber
Google Scholar Grabber is a Python module that implements a querier and parser for Google Scholar's output. Its classes can be used independently, but it can also be invoked as a command-line tool.
The script is originally written by christian@icir.org and link is here. I made great change on it to support new features!
Usage
- Initialize environment:
bash git clone https://github.com/ThreeCatsLoveFish/GoogleScholar.git cd GoogleScholar/ pip install bs4 requests tqdm - Add your cookies and proxies in
data/config.json. - Follow the given examples.
Features
- Support new version of citation.
- Support retrieving all articles written by specific author.
- Extracts publication title, most relevant web link, PDF link, number of citations, number of online versions, link to Google Scholar's article cluster for the work, Google Scholar's cluster of all works referencing the publication, and excerpt of content.
- Extracts total number of hits as reported by Scholar.
- Supports the full range of advanced query options provided by Google Scholar, such as title-only search, publication date timeframes, and inclusion/exclusion of patents and citations.
- Supports article cluster IDs, i.e., information relating to the variants of an article already identified by Google Scholar
- Supports retrieval of citation details in standard external formats as provided by Google Scholar, including BibTeX and EndNote.
- Command-line tool prints entries in CSV format, simple plain text, or in the citation export format.
Examples
Try scholar.py --help for all available options. A few examples:
Retrieve 100 articles written by Einstein on quantum theory:
$ scholar.py -c 100 --author "albert einstein" --phrase "quantum theory" --config-file data/config.json
Title On the quantum theory of radiation
URL http://icole.mut-es.ac.ir/downloads/Sci_Sec/W1/Einstein%201917.pdf
Year 1917
Citations 184
Versions 3
Cluster ID 17749203648027613321
PDF link http://icole.mut-es.ac.ir/downloads/Sci_Sec/W1/Einstein%201917.pdf
Citations list http://scholar.google.com/scholar?cites=17749203648027613321&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=17749203648027613321&hl=en&as_sdt=0,5
Excerpt The formal similarity between the chromatic distribution curve for thermal radiation [...]
......
Note the cluster ID in the above. Using this ID, you can directly access the cluster of articles Google Scholar has already determined to be variants of the same paper. So, let's see the versions:
$ scholar.py -C 17749203648027613321 --config-file data/config.json
Title On the quantum theory of radiation
URL http://icole.mut-es.ac.ir/downloads/Sci_Sec/W1/Einstein%201917.pdf
Citations 184
Versions 0
Cluster ID 17749203648027613321
PDF link http://icole.mut-es.ac.ir/downloads/Sci_Sec/W1/Einstein%201917.pdf
Citations list http://scholar.google.com/scholar?cites=17749203648027613321&as_sdt=2005&sciodt=0,5&hl=en
Excerpt The formal similarity between the chromatic distribution curve for thermal radiation [...]
Title ON THE QUANTUM THEORY OF RADIATION
URL http://www.informationphilosopher.com/solutions/scientists/einstein/1917_Radiation.pdf
Citations 0
Versions 0
PDF link http://www.informationphilosopher.com/solutions/scientists/einstein/1917_Radiation.pdf
Excerpt The formal similarity between the chromatic distribution curve for thermal radiation [...]
Title The Quantum Theory of Radiation
URL http://web.ihep.su/dbserv/compas/src/einstein17/eng.pdf
Citations 0
Versions 0
PDF link http://web.ihep.su/dbserv/compas/src/einstein17/eng.pdf
Excerpt 1 on the assumption that there are discrete elements of energy, from which quantum [...]
Let's retrieve a BibTeX entry for that quantum theory paper. The best BibTeX often seems to be the one linked from search results, not those in the article cluster, so let's do a search again:
$ scholar.py -c 1 --author "albert einstein" --phrase "quantum theory" --citation bt --config-file data/config.json
@article{einstein1917quantum,
title={On the quantum theory of radiation},
author={Einstein, Albert},
journal={Phys. Z},
volume={18},
pages={121--128},
year={1917}
}
Report the total number of articles Google Scholar has for Einstein:
$ scholar.py --txt-globals --author "albert einstein" --config-file data/config.json | grep '\[G\]' | grep Results
[G] Results 4190
Find all citation of articles Google Scholar has for Einstein's paper "On the quantum theory of radiation":
$ scholar.py --citations-only -c 150 -a "albert einstein" --phrase "On the quantum theory of radiation" --citation bt --config-file data/config.json -o test.txt
License
Google Scholar Grabber is using the standard BSD license.
Owner
- Name: Zhimin Sun
- Login: ThreeCatsLoveFish
- Kind: user
- Location: Shanghai Jiao Tong University
- Company: 东川路男子职业技术学院
- Repositories: 12
- Profile: https://github.com/ThreeCatsLoveFish
Nothing is true, everything is permitted.
Citation (citation_helper.py)
#! /usr/bin/env python
"""
Given a list of papers in the file `data/paper_list.txt`.
This script can be used to get the citations of the papers in the list.
Please make sure to update the `author` variable with your name.
"""
import os
from tqdm import tqdm
author = "Your Name"
def get_citations(citation):
os.makedirs('data/output', exist_ok=True)
filename = f'data/output/{"_".join(citation.replace(":", "-").split(" "))}.bib'
cmd = f'python scholar.py --citations-only -c 150 -a "{author}" --phrase "{citation}" --config-file data/config.json -o {filename}'
print(cmd)
if not os.path.exists(filename):
os.system(cmd)
def main():
with open('data/paper_list.txt', 'r') as file:
citations = file.readlines()
citations = [citation.strip() for citation in citations]
for citation in tqdm(citations, desc='Citations'):
get_citations(citation)
if __name__ == '__main__':
main()
GitHub Events
Total
- Watch event: 1
- Fork event: 1
Last Year
- Watch event: 1
- Fork event: 1