https://github.com/climatecompatiblegrowth/scrape_eeg
Scripts to scrape all PDFs from the EEG website
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.2%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Scripts to scrape all PDFs from the EEG website
Basic Info
- Host: GitHub
- Owner: ClimateCompatibleGrowth
- License: mit
- Language: Python
- Default Branch: main
- Size: 7.81 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created about 2 years ago
· Last pushed about 2 years ago
Metadata Files
Readme
License
readme.md
Webscraper to archive EEG papers
Two scripts to:
- obtain the URLs of all PDFs on the EEG website
- download all the PDFs to a local folder
Install dependencies
conda create -n python=3.12 requests requests_cache beautifulsoup4
Now run the scripts::
python scrape.py
Then::
python get_pdf.py
You should see a folder webscraping containing all the PDF files.
Then there's a log file app.log which should contain a bunch of debugging messages.
Then metadata.csv which contains all of the details about the files scraped from the site including title, publication date, summary and authors.
Dependencies
- beautifulsoup4
- requests_cache
Owner
- Name: Climate Compatible Growth
- Login: ClimateCompatibleGrowth
- Kind: organization
- Location: United Kingdom
- Website: www.climatecompatiblegrowth.com
- Twitter: ResearchCcg
- Repositories: 41
- Profile: https://github.com/ClimateCompatibleGrowth