gep-scraper
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
·
Repository
Basic Info
- Host: GitHub
- Owner: InspireQualityeu
- License: other
- Language: PHP
- Default Branch: main
- Size: 55.7 KB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Created almost 2 years ago
· Last pushed 12 months ago
Metadata Files
Readme
License
Citation
README.md
This package provides scripts to research organisations if they have a gender equality plan (GEP) or not (Prevalence) using the serpapi.com API.
webcrawler/crawler-prevalence.php : Check whether GEPs are in place.
webcrawler/crawler.php : Extract the more relative PDF GEPs.
Requirements
- Apache or Nginx web server with shell access
- PHP 8.1 or later
- Composer (latest version)
- Required PHP extensions:
- ext-json
- ext-mbstring
- ext-zip
- ext-gd
- ext-iconv
Installation
-
Clone the Repository
git clone https://github.com/InspireQualityeu/gep-scraper.gitcd webcrawler -
Install Composer Dependencies
Ensure Composer is installed. If not, download and install it from Composer's official website.
Then, run:
composer install -
Configure API Key and Variables
Before running the scripts, update the required configuration variables within:
- API Key: Obtain your API key from SerpAPI and replace the placeholder.
- Countries: Modify the list of countries as needed.
- Search Terms: Update the terms according to your requirements.
-
Run the PHP Scripts
php crawler-prevalence.php
php crawler.php
Troubleshooting
- If composer install fails, ensure PHP and Composer are installed correctly.
-
If missing extensions are reported, install them using your system's package manager. For example:
- Ubuntu/Debian: sudo apt install php-mbstring php-zip php-gd php-json php-iconv
- CentOS/RHEL: sudo yum install php-mbstring php-zip php-gd php-json php-iconv
- Windows: Enable the extensions in php.ini and restart your web server.
Owner
- Name: INSPIRE
- Login: InspireQualityeu
- Kind: user
- Website: https://www.inspirequality.eu
- Twitter: INSPIREquality_
- Repositories: 1
- Profile: https://github.com/InspireQualityeu
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: inspirequality.eu web-crawler
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- name: INNOSYSTEMS
city: Athens
country: GR
website: 'https://www.innosystems.gr'
email: info@innosystems.gr
- name: Mazlum Karataş
orcid: 0000-0002-4096-0212
affiliation: GESIS - Leibniz-Institute for the Social Sciences
city: Cologne
country: DE
email: mazlum.karatas@gesis.org
repository-code: 'https://github.com/InspireQualityeu/web-crawler'
url: 'https://www.inspirequality.eu'
license: CC-BY-NC-SA-4.0
GitHub Events
Total
- Push event: 2
Last Year
- Push event: 2
Dependencies
webcrawler/composer.json
packagist
- phpoffice/phpspreadsheet ^1.29
- serpapi/google-search-results-php ^1.2
- setasign/fpdi ^2.6
- smalot/pdfparser ^2.8
- tecnickcom/tc-lib-pdf *
- tecnickcom/tcpdf *