rat-software

Streamline your search engine research. With the Result Assessment Tool (RAT) you can easily collect results from different search engines, let participants evaluate the results and analyse your findings.

https://github.com/rat-software/rat-software

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary

Keywords

informationretrieval python research scraping searchengines
Last synced: 6 months ago · JSON representation ·

Repository

Streamline your search engine research. With the Result Assessment Tool (RAT) you can easily collect results from different search engines, let participants evaluate the results and analyse your findings.

Basic Info
  • Host: GitHub
  • Owner: rat-software
  • License: gpl-3.0
  • Language: JavaScript
  • Default Branch: main
  • Homepage: https://searchstudies.org/rat
  • Size: 27.2 MB
Statistics
  • Stars: 7
  • Watchers: 0
  • Forks: 1
  • Open Issues: 0
  • Releases: 2
Topics
informationretrieval python research scraping searchengines
Created over 3 years ago · Last pushed 8 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

DOI

RAT

The Result Assessment Tool (RAT) is a software toolkit that allows researchers to conduct large-scale studies based on results from (commercial) search engines and other information retrieval systems. It is developed by the research group Search Studies at the Hamburg University of Applied Sciences in Germany. The RAT project is funded by the German Research Foundation (DFG –Deutsche Forschungsgemeinschaft) from 8/2021 until 10/2024, project number 460676551.

RAT Project resources

  • For detailed information about the research project and additional resources, visit: https://searchstudies.org/research/rat/
  • Information about how to contribute: https://searchstudies.org/rat-how-to-contribute/
  • An installation of RAT can be accessed at: https://rat-software.org/
  • Datasets generated using RAT and supplementary documentation can be found at: https://osf.io/t3hg9/
  • Videos from the RAT Community Meeting are available at: https://www.youtube.com/watch?v=K2Gev8C7Xxw&list=PLiTHQpIQWsZwRaDAgFTANPvI3fHMncXUO
  • Overview of the technical implementatio: https://osf.io/5v48w

Contributors to RAT

  • #### Project Lead: Professor Dirk Lewandowski - https://github.com/dirklew
  • #### Lead Software Engineer and Developer: Sebastian Sünkler - https://github.com/sebsuenkler
  • #### Current Frontend Developer and Assistant: Tuhina Kumar - https://github.com/tuhinak
  • #### Former Frontend Developer: Nurce Yagci - https://github.com/yagci
  • #### Usability and User Experience Specialist: Sebastian Schultheiß - https://github.com/SebastianSchultheiss
  • #### Student Assistant for Software Engineering: Sophia Bosnak - https://github.com/kyuja
  • #### Developers who created extensions for RAT: https://github.com/rat-extensions
    • https://github.com/MnM3
    • https://github.com/mohamedsaeed21
    • https://github.com/g1thub-4cc0unt
    • https://github.com/Samustafa
    • https://github.com/PhilippUDE
    • https://github.com/tanveerx/
    • https://github.com/ritushetkar
    • https://github.com/EstherKuerbis/

Publications related to the project

  • Sünkler S.; Yagci, N.; Schultheiß, S.; von Mach, S.; Lewandowski, D.; (2024) Result Assessment Tool Software to Support Studies Based on Data from Search Engines In: Part of the book series: Lecture Notes in Computer Science https://link.springer.com/chapter/10.1007/978-3-031-56069-9_19
  • Sünkler, S.; Yagci, N.; Sygulla, D.; von Mach, S.; Schultheiß, S., Lewandowski, D.; (2023). Result Assessment Tool (RAT): A Software Toolkit for Conducting Studies Based on Search Results. In: Proceedings of the Association for Information Science and Technology https://doi.org/10.1002/pra2.972
  • Schultheiß, S.; Lewandowski, D.; von Mach, S.; Yagci, N. (2023). Query sampler: generating query sets for analyzing search engines using keyword research tools. In: PeerJ Computer Science 9(e1421). http://doi.org/10.7717/peerj-cs.1421
  • Schultheiß, S.; Sünkler, S.; Yagci, N.; Sygulla, D.; von Mach, S.; Lewandowski, D.; (2023). Simplify your Search Engine Research : wie das Result Assessment Tool (RAT) Studien auf der Basis von Suchergebnissen unterstützt. In: Proceedings des 17. Internationalen Symposiums für Informationswissenschaft (ISI 2023), 429-437. https://zenodo.org/records/10009338
  • Sünkler, S.; Yagci, N.; Sygulla, D.; von Mach, S.; Schultheiß, S.; Lewandowski, D.; (2023). Result Assessment Tool (RAT): Software-Toolkit für die Durchführung von Studien auf der Grundlage von Suchergebnissen. In: Proceedings des 17. Internationalen Symposiums für Informationswissenschaft (ISI 2023), 438-444. https://zenodo.org/records/10009338
  • Sünkler, S., Yagci, N., Sygulla, D., von Mach, S., Schultheiß, S. Lewandowski, D. (2022). Result Assessment Tool (RAT). Informationswissenschaft im Wandel. Wissenschaftliche Tagung 2022 (IWWT22), Düsseldorf. https://zenodo.org/records/7092079

RAT Extensions

The repository provides an overview of extensions created by our developer community: https://github.com/rat-extensions - Imprint Crawler: A web crawler that is able to automatically extract legal notice information from websites while taking German legal aspects into account: https://github.com/rat-extensions/imprint-crawler. Developed by Marius Messer - https://github.com/MnM3 - Readability Score: A Python tool that extracts the main text content of a web document and analyzes its readability: https://github.com/rat-extensions/readability-score. Developey by Mohamed Elnaggar - https://github.com/mohamedsaeed21 - Forum Scraper: An extension to extract comments from German online news services: https://github.com/rat-software/forum-scraper. Developed by Paul Kirch - https://github.com/g1thub-4cc0unt - EILoggerBA: A browser extension for conducting interactive information retrieval studies. With this extension, study participants can work on search tasks with search engines of their choice and both the search queries and the clicks on search results are saved: https://github.com/rat-extensions/EILoggerBA. Developed by Hossam Al Mustafa - https://github.com/Samustafa - Identifying affiliate links in webpages: https://github.com/rat-extensions/Identifying-affiliate-links-in-webpages. Developed by Philipp Krueger - https://github.com/PhilippUDE - App Reviews Scraper: https://github.com/rat-extensions/app-reviews-scraper. Developed by Tanveer Ahmed - https://github.com/PhilippUDE - Visualizations of IR measures: https://github.com/rat-extensions/ir-evaluation. Developed by Ritu Suhas Shetkar - https://github.com/ritushetkar - Scraping News Articles: https://github.com/rat-extensions/NewsArticlesScraper. Developed by Esther von der Weiden - https://github.com/EstherKuerbis/

Installation of RAT

The source code consists of three individual applications:

  1. Web Interface (frontend)
  2. Server backend (backend)

RAT runs on Python and has a PostgreSQL database, the web interface is a Flask app. You can install both applications on one server or split the applications to share the workload, e. g. having 2 backends for scraping on one server and the flask app on another one.

To set up your own version of RAT, you need to clone the repository and follow these steps:

Set up the database for all applications

  • Download and install PostgreSQL
  • Import database (rat-demo) > createdb -T template0 dbname (rat-demo) > psql dbname < install_database/rat-db-install.sql

Install Python

Installation of dependencies for both applications on one server:

  • Create a virtual environment

bash python -m venv venv_rat source venv_rat/bin/activate

  • Install Python packages from the requirements.txt in the root folder: python -m pip install --no-cache-dir -r requirements.txt

Set up the web interface (Flask Application / Frontend)

Access the documentation for the frontend at: https://searchstudies.org/rat-frontend-documentation/

  • Create a virtual environment

bash python -m venv venv_rat_frontend source venv_rat_frontend/bin/activate

  • Install Python packages from the /frontend/requirements.txt python -m pip install --no-cache-dir -r requirements.txt
  • Add own data to config file config.py

| Setting | Example | | ---- | ---- | | SQLALCHEMYDATABASEURI | 'postgresql://USERNAME:PASSWORD@SERVER/DBNAME' | | SECRETKEY | How to generate | | SECURITYPASSWORDSALT | How to generate | | MAILSERVER | server.domain.de | | MAILUSERNAME | name@mail.de | | MAILPASSWORD | password |

  • Google Mail does no longer allow 3rd party apps to send mails, if there is no other mail adress you can use Mailtrap
  • Start Flask export FLASK_APP=rat.py flask run

Result Assessment Tool (RAT) Backend application.

Access the documentation for the backend at: https://searchstudies.org/rat-backend-documentation/

Setting Up the Server Backend

  1. Install Google Chrome
    Ensure that Google Chrome is installed on your system. You can download it from here.

  2. Copy Backend Files
    Transfer all files from the backend directory to your server.

  3. Set Up a Virtual Environment
    It is highly recommended to set up the backend in a virtual environment. Install venv and activate it with the following commands: bash python -m venv venv_rat_backend source venv_rat_backend/bin/activate

  4. Install Dependencies Install the required packages from the requirements.txt file located in the backend directory: bash python -m pip install --no-cache-dir -r requirements.txt

  5. Initialize SeleniumBase Run the script initializeseleniumbase.py to download the latest WebDriver: ```bash python initializeseleniumbase.py ```

The RAT backend application consists of three sub-applications, which can be installed separately for better resource management. However, installing all sub-applications on one server is generally recommended.

Backend Applications

  • classifier: A toolkit for using and adding classifiers based on data provided by RAT.
  • scraper: A library for scraping search engines.
  • sources: A library for scraping content from URLs.

Configuration

All applications share the /config/ folder, which contains JSON files for configuring: - Database Connection: config_db.ini - Scraping Options: config_sources.ini

Running the Backend Application

  • The backend applications use appsheduler to run in the background. To start all services simultaneously, use: bash nohup python backend_controller_start.py &
  • Alternatively, each application has its own controller if you prefer to run them separately on different machines.

Owner

  • Name: RAT
  • Login: rat-software
  • Kind: user
  • Location: Hamburg, Germany

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Result Assessment Tool (RAT)
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Dirk
    family-names: Lewandowski
    email: dirk.lewandowski@dmi-haw-hamburg.de
    affiliation: HAW Hamburg
    orcid: 'https://orcid.org/0000-0002-2674-9509'
  - given-names: Sebastian
    family-names: Sünkler
    email: sebastian.suenkler@dmi-haw-hamburg.de
    affiliation: HAW Hamburg
    orcid: 'https://orcid.org/0000-0001-9848-1137'
  - given-names: Nurce
    family-names: Yagci
    orcid: 'https://orcid.org/0000-0001-9634-5916'
    email: nurce.yagci@dmi-haw-hamburg.de
identifiers:
  - type: doi
    value: 10.17605/OSF.IO/25UAT
repository-code: 'https://github.com/rat-software'
url: 'http://rat-software.org/'
abstract: >-
  The Result Assessment Tool (RAT) is a software toolkit
  that allows researchers to conduct large-scale studies
  based on results from (commercial) search engines and
  other information retrieval systems. It consists of
  modules for (1) designing studies, (2) collecting data
  from search systems, (3) collecting judgments on the
  results, (4) downloading/analysing the results.
keywords:
  - search engines
  - scraper
  - information retrieval
  - information retrieval evaluation
  - search engine data
license: GPL-3.0

GitHub Events

Total
  • Release event: 1
  • Watch event: 4
  • Member event: 1
  • Push event: 8
  • Fork event: 1
  • Create event: 1
Last Year
  • Release event: 1
  • Watch event: 4
  • Member event: 1
  • Push event: 8
  • Fork event: 1
  • Create event: 1