citation_parser

Automatically citation parsing from Google Scholar and Researchgate for creating Github badges

https://github.com/mmhs013/citation_parser

Science Score: 28.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: researchgate.net, scholar.google
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (3.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Automatically citation parsing from Google Scholar and Researchgate for creating Github badges

Basic Info

Host: GitHub
Owner: mmhs013
Language: Python
Default Branch: main
Size: 76.2 KB

Statistics

Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Created about 5 years ago · Last pushed almost 4 years ago

Metadata Files

Readme Citation

Automatic Citation Parser

This repository is for automatically parsing citation from Google Scholar & Researchgate and creating Github badges from them. The targeted papers from Google Scholar and Researchgate profile are listed below:

|Profile|Paper|Github Page|Citation Badge| |-----|------|------|------| |Google Scholar|pyMannKendall: a python package for non parametric Mann Kendall family of trend tests.|pyMannKendall|| |Researchgate|pyMannKendall: a python package for non parametric Mann Kendall family of trend tests.|pyMannKendall||

Owner

Name: Md. Manjurul Hussain Shourov
Login: mmhs013
Kind: user

Website: https://www.researchgate.net/profile/Md_Manjurul_Shourov
Repositories: 4
Profile: https://github.com/mmhs013

Citation (Citation_Parser.py)

# import urllib.request
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Google Scholar Profile Parsing
gs_profile_link = 'https://scholar.google.com/citations?user=ub2WBpoAAAAJ'

page = requests.get(gs_profile_link).text
soup = BeautifulSoup(page,"html.parser")

total_cite = int(soup.findAll('td',{'class':'gsc_rsb_std'})[0].text)

div = soup.findAll('tr',{'class':'gsc_a_tr'})

for i in range(len(div)):
    DicData = {
        'Name' : div[i].findAll('a',{'class':'gsc_a_at'})[0].text,
        'Authors' : div[i].findAll('div',{'class':'gs_gray'})[0].text,
        'Publisher' : div[i].findAll('div',{'class':'gs_gray'})[1].text[:-6],
        'Year' : div[i].findAll('td',{'class':'gsc_a_y'})[0].text,
        'Citation' : div[i].findAll('td',{'class':'gsc_a_c'})[0].text,
    }
    
    if i == 0:
        gs_profile_papers = pd.DataFrame(DicData, index=[i])
    
    else:
        gs_profile_papers = gs_profile_papers.append(DicData,ignore_index=True)      

gs_pymk_cite = int(gs_profile_papers[gs_profile_papers.Name == 'pyMannKendall: a python package for non parametric Mann Kendall family of trend tests.'].Citation.iloc[0])


# Currently Researchgate parsing is not working due to Cloudflar Protection
# # Researchgate pyMannKendall paper citetion parsing
# rg_profile_link = 'https://www.researchgate.net/publication/334688255_pyMannKendall_a_python_package_for_non_parametric_Mann_Kendall_family_of_trend_tests/citations'

# page = requests.get(rg_profile_link).text
# soup = BeautifulSoup(page,"html.parser")

# rg_pymk_cite = soup.findAll('div',{'class':'nova-legacy-e-text nova-legacy-e-text--size-m nova-legacy-e-text--family-sans-serif nova-legacy-e-text--spacing-none nova-legacy-e-text--color-inherit nova-legacy-c-nav__item-label'})[0].text
# rg_pymk_cite = int(rg_pymk_cite.replace('Citations (','').replace(')',''))


# Badge create via shields.io
badge_link = {
    'gs_pymk_cite' : "https://img.shields.io/badge/Citations-{cite}-_.svg?logo=google-scholar&labelColor=4f4f4f&color=3388ee".format(cite = gs_pymk_cite),
    # 'rg_pymk_cite' : "https://img.shields.io/badge/Citations-{cite}-_.svg?logo=researchgate&labelColor=4f4f4f&color=00bb88".format(cite = rg_pymk_cite),
}

for itm in badge_link.items():
    with open('images/'+itm[0] + '.svg', 'wb') as f:
        f.write(requests.get(itm[1]).content)

GitHub Events

Total

Fork event: 1

Last Year

Fork event: 1

Dependencies

requirements.txt pypi

beautifulsoup4 *
pandas *
requests *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science