collectress
Collectress (/kəˈlɛktɹɪs/) is a Python tool designed for downloading web data feeds periodically and consistently.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary
Keywords
Repository
Collectress (/kəˈlɛktɹɪs/) is a Python tool designed for downloading web data feeds periodically and consistently.
Basic Info
Statistics
- Stars: 5
- Watchers: 3
- Forks: 0
- Open Issues: 1
- Releases: 3
Topics
Metadata Files
README.md
Collectress is a Python tool designed for downloading web data feeds periodically and consistently. The data to download is specified in a YAML feed file. The data is downloaded and stored in a directory structure for each feed and in directories named by the current date.
Features
- Downloads content from multiple feeds specified in a YAML file
- Creates a directory for each feed
- Content stored in a date-structured directory format (YYYY/MM/DD)
- Handles errors gracefully, allowing the tool to continue even if a single operation fails
- Command-line arguments for input, output, and cache.
- Download optimisation through eTag cache.
- Logs a JSON-formatted comprehensive activity summary per script run
Usage
Collectress can be run from the command line as follows (a log.json will be created upon execution):
bash
python collectress.py -f data_feeds.yml -w data_feeds/ -e etag_cache.json
Parameters:
bash
-h, --help show this help message and exit
-e ECACHE, --ecache ECACHE
eTag cache for optimizing downloads
-f FEED, --feed FEED YAML file containing the feeds
-w WORKDIR, --workdir WORKDIR
The root of the output directory
Usage Docker
Collectress can be used through its Docker image:
bash
docker run --rm \
-e TZ=$(readlink /etc/localtime | sed -e 's,/usr/share/zoneinfo/,,' ) \
-v ${PWD}/data_feeds.yml:/collectress/data_feeds.yml \
-v ${PWD}/log.json:/collectress/log.json \
-v ${PWD}/etag_cache.json:/collectress/etag_cache.json \
-v ${PWD}/data_output:/data ghcr.io/stratosphereips/collectress:main \
python collectress.py -f data_feeds.yml -e etag_cache.json -w /data
About
This tool was developed at the Stratosphere Laboratory at the Czech Technical University in Prague.
Owner
- Name: Stratosphere IPS
- Login: stratosphereips
- Kind: organization
- Location: Prague
- Website: https://www.stratosphereips.org
- Twitter: StratosphereIPS
- Repositories: 25
- Profile: https://github.com/stratosphereips
Cybersecurity Research Laboratory at the Czech Technical University in Prague. Creators of Slips, a free software machine learning-based behavioral IDS/IPS.
Citation (CITATION.cff)
cff-version: 1.2.0
title: >-
Collectress: Automated Framework To Collect Web Data
message: 'If you use this software, please cite it as specified below.'
type: software
authors:
- given-names: Veronica
family-names: Valeros
email: valerver@fel.cvut.cz
affiliation: >-
Stratosphere Laboratory, AIC, FEL, Czech
Technical University in Prague
orcid: 'https://orcid.org/0000-0003-2554-3231'
GitHub Events
Total
- Watch event: 2
- Delete event: 1
- Push event: 1
- Pull request event: 2
- Create event: 2
Last Year
- Watch event: 2
- Delete event: 1
- Push event: 1
- Pull request event: 2
- Create event: 2
Dependencies
- actions/checkout v2 composite
- anothrNick/github-tag-action 1.36.0 composite
- actions/checkout v2 composite
- docker/build-push-action v2 composite
- docker/login-action v1 composite
- docker/metadata-action v4 composite
- actions/checkout v3 composite
- docker/build-push-action ac9327eae2b366085ac7f6a2d02df8aa8ead720a composite
- docker/login-action 28218f9b04b4f3f62068d7b6ce6ca5b26e35336c composite
- docker/metadata-action 98669ae865ea3cffbcbaa878cf57c20bbf1c6c38 composite
- docker/setup-buildx-action 79abd3f86f79a9d68a23c75a09a9a85889262adf composite
- sigstore/cosign-installer v3.6.0 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v2 composite
- python 3.11-alpine build
- python-json-logger *
- pyyaml *
- requests *
- tqdm *