https://github.com/kuleuven-cosic/formguard
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: KULeuven-COSIC
- License: other
- Language: JavaScript
- Default Branch: main
- Size: 21.6 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Formguard
Folder content
The src folder contains the code for Formguard's record-replay and crawler. The codegensrecordings folder contains the 75 recordings used for the long-term robustness test, divided in a donation and login folder. The notebooks folder contains the notebooks used for final analysis. The inputbackup folder contains the site lists used in the crawls. The leak_detector folder contains the detector used to analyze the crawl results.
Running the crawler
Run from src folder with either:
python crawler_main.py -create <path to config> # Create a default config file to specified path
or
python crawler_main.py -config <path to config> # Run crawl with specified config file
The config file contains the following options: - url: (str) url to visit for a single site crawl. - list: (str) path to file containing list of urls to visit for multiple site crawl. Each line is a seperate site.
- headless: (bool) option to run in headless (true) or headed mode (false).
- windowposx: (bool) x coordinate of the window for headed mode. (Can be placed of screen)
windowposy: (bool) y coordinate of the window for headed mode. (Can be placed of screen)
mode: (int) 0: full crawl 1: fill input on a limited number of pages, specified by "amount" value 2: only interact with the landing page 3: record and replay an interaction 4: replay an interaction 5: compare the detection of fathom and the simplified model
amount: (int) number of pages to fill for mode 1
cpu_slice: (float) fraction of cores to use out of the maximum when crawling a list of sites
crawlmaxduration: (int) maximum time in seconds of a single site visit before a timeout is triggered
output_path: (str) folder where the output will be stored.
record_full (bool) wether or not to record additional data such as script information
resume: (bool) if true, will check the files at the output path and skip the already completed sites found there when running a new crawl.
screenshot: (bool) wether or not to take a screenshot before and after interacting with the cookie consent dialog.
video: (bool) wether or not to record the visit to the page as a video.
pierce: (bool) wether or not to force created shadow_roots to be accessible and open instead of closed.
accept_cookies: (bool) wether to accept cookies or ignore them when automatically crawling.
replay_path: (str) path to the file/folder for replaying an interaction.
recordwithfirefox (bool) wether to record interactions with chrome (default) or firefox. Sometimes certain interactions are not picked up in chrome
replay_multiple: (bool) wether or not to replay mutliple files at once.
waitforclose: (bool) wether or not to automatically close the window after completing the replay.
clear_fields: (bool) wether or not to clear the fields before starting to fill them.
Owner
- Name: KU Leuven - COSIC
- Login: KULeuven-COSIC
- Kind: organization
- Repositories: 19
- Profile: https://github.com/KULeuven-COSIC
GitHub Events
Total
- Push event: 3
- Create event: 2
Last Year
- Push event: 3
- Create event: 2