https://github.com/kuleuven-cosic/formguard

https://github.com/kuleuven-cosic/formguard

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.1%) to scientific vocabulary
Last synced: 4 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: KULeuven-COSIC
  • License: other
  • Language: JavaScript
  • Default Branch: main
  • Size: 21.6 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

Formguard

Folder content

The src folder contains the code for Formguard's record-replay and crawler. The codegensrecordings folder contains the 75 recordings used for the long-term robustness test, divided in a donation and login folder. The notebooks folder contains the notebooks used for final analysis. The inputbackup folder contains the site lists used in the crawls. The leak_detector folder contains the detector used to analyze the crawl results.

Running the crawler

Run from src folder with either: python crawler_main.py -create <path to config> # Create a default config file to specified path or python crawler_main.py -config <path to config> # Run crawl with specified config file

The config file contains the following options: - url: (str) url to visit for a single site crawl. - list: (str) path to file containing list of urls to visit for multiple site crawl. Each line is a seperate site.

  • headless: (bool) option to run in headless (true) or headed mode (false).
  • windowposx: (bool) x coordinate of the window for headed mode. (Can be placed of screen)
  • windowposy: (bool) y coordinate of the window for headed mode. (Can be placed of screen)

  • mode: (int) 0: full crawl 1: fill input on a limited number of pages, specified by "amount" value 2: only interact with the landing page 3: record and replay an interaction 4: replay an interaction 5: compare the detection of fathom and the simplified model

  • amount: (int) number of pages to fill for mode 1

  • cpu_slice: (float) fraction of cores to use out of the maximum when crawling a list of sites

  • crawlmaxduration: (int) maximum time in seconds of a single site visit before a timeout is triggered

  • output_path: (str) folder where the output will be stored.

  • record_full (bool) wether or not to record additional data such as script information

  • resume: (bool) if true, will check the files at the output path and skip the already completed sites found there when running a new crawl.

  • screenshot: (bool) wether or not to take a screenshot before and after interacting with the cookie consent dialog.

  • video: (bool) wether or not to record the visit to the page as a video.

  • pierce: (bool) wether or not to force created shadow_roots to be accessible and open instead of closed.

  • accept_cookies: (bool) wether to accept cookies or ignore them when automatically crawling.

  • replay_path: (str) path to the file/folder for replaying an interaction.

  • recordwithfirefox (bool) wether to record interactions with chrome (default) or firefox. Sometimes certain interactions are not picked up in chrome

  • replay_multiple: (bool) wether or not to replay mutliple files at once.

  • waitforclose: (bool) wether or not to automatically close the window after completing the replay.

  • clear_fields: (bool) wether or not to clear the fields before starting to fill them.

Owner

  • Name: KU Leuven - COSIC
  • Login: KULeuven-COSIC
  • Kind: organization

GitHub Events

Total
  • Push event: 3
  • Create event: 2
Last Year
  • Push event: 3
  • Create event: 2