https://github.com/climatecompatiblegrowth/scrape_eeg

Scripts to scrape all PDFs from the EEG website

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Scripts to scrape all PDFs from the EEG website

Basic Info

Host: GitHub
Owner: ClimateCompatibleGrowth
License: mit
Language: Python
Default Branch: main
Size: 7.81 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme License

Webscraper to archive EEG papers

Two scripts to:

obtain the URLs of all PDFs on the EEG website
download all the PDFs to a local folder

Install dependencies

conda create -n python=3.12 requests requests_cache beautifulsoup4

Now run the scripts::

python scrape.py

Then::

python get_pdf.py

You should see a folder webscraping containing all the PDF files. Then there's a log file app.log which should contain a bunch of debugging messages. Then metadata.csv which contains all of the details about the files scraped from the site including title, publication date, summary and authors.

Dependencies

beautifulsoup4
requests_cache

Owner

Name: Climate Compatible Growth
Login: ClimateCompatibleGrowth
Kind: organization
Location: United Kingdom

Website: www.climatecompatiblegrowth.com
Twitter: ResearchCcg
Repositories: 41
Profile: https://github.com/ClimateCompatibleGrowth

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/climatecompatiblegrowth/scrape_eeg

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

readme.md

Webscraper to archive EEG papers

Dependencies

Owner

GitHub Events

Total

Last Year