Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: InspireQualityeu
  • License: other
  • Language: PHP
  • Default Branch: main
  • Size: 55.7 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

This package provides scripts to research organisations if they have a gender equality plan (GEP) or not (Prevalence) using the serpapi.com API.

webcrawler/crawler-prevalence.php : Check whether GEPs are in place.

webcrawler/crawler.php : Extract the more relative PDF GEPs.

Requirements

  • Apache or Nginx web server with shell access
  • PHP 8.1 or later
  • Composer (latest version)
  • Required PHP extensions:
    • ext-json
    • ext-mbstring
    • ext-zip
    • ext-gd
    • ext-iconv

Installation

  1. Clone the Repository

    git clone https://github.com/InspireQualityeu/gep-scraper.git
    cd webcrawler
  2. Install Composer Dependencies

    Ensure Composer is installed. If not, download and install it from Composer's official website.

    Then, run:
    composer install
  3. Configure API Key and Variables

    Before running the scripts, update the required configuration variables within:

    • API Key: Obtain your API key from SerpAPI and replace the placeholder.
    • Countries: Modify the list of countries as needed.
    • Search Terms: Update the terms according to your requirements.
  4. Run the PHP Scripts

    php crawler-prevalence.php
    php crawler.php

Troubleshooting

  • If composer install fails, ensure PHP and Composer are installed correctly.
  • If missing extensions are reported, install them using your system's package manager. For example:

    • Ubuntu/Debian: sudo apt install php-mbstring php-zip php-gd php-json php-iconv
    • CentOS/RHEL: sudo yum install php-mbstring php-zip php-gd php-json php-iconv
    • Windows: Enable the extensions in php.ini and restart your web server.

Owner

  • Name: INSPIRE
  • Login: InspireQualityeu
  • Kind: user

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: inspirequality.eu web-crawler
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - name: INNOSYSTEMS
    city: Athens
    country: GR
    website: 'https://www.innosystems.gr'
    email: info@innosystems.gr
  - name: Mazlum Karataş
    orcid: 0000-0002-4096-0212
    affiliation: GESIS - Leibniz-Institute for the Social Sciences
    city: Cologne
    country: DE
    email: mazlum.karatas@gesis.org
repository-code: 'https://github.com/InspireQualityeu/web-crawler'
url: 'https://www.inspirequality.eu'
license: CC-BY-NC-SA-4.0

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2

Dependencies

webcrawler/composer.json packagist
  • phpoffice/phpspreadsheet ^1.29
  • serpapi/google-search-results-php ^1.2
  • setasign/fpdi ^2.6
  • smalot/pdfparser ^2.8
  • tecnickcom/tc-lib-pdf *
  • tecnickcom/tcpdf *