srf-comment-collector
Collect comments from srf.ch. Can be used for research (data science / political analysis)
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Repository
Collect comments from srf.ch. Can be used for research (data science / political analysis)
Basic Info
- Host: GitHub
- Owner: maurosbicego
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 21.5 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
SRF.ch Comment Collector
Collect comments from srf.ch. Tool can be used for research such as data science, political analysis or also (to some extent) OSINT. IMPORTANT: Use at own risk, the data collected by this tool contains personal data that can be used to track the behaviour of specific people (as with all data from social media). Check rules / data ownership etc. and anonymise data before publishing anything.
About
This tool can be used to periodically collect the comments available on different articles on srf.ch. SRF is a media plattform from Switzerland, content is in german and the topics with comments usually concern Switzerland. The comments are curated and users required to comment with their real name. This leads to a higher quality of comments compared to other plattforms.
Structure
Datebase models / tables
Article
Whenever we discover an article, we add an entry as "Article". We check if it has comments activated. If it does, we set "hascomments" to true. If the comment section is not closed yet, we don't fetch the comments yet. As long as "commentsfetched" is false, we check if the comment section is closed (upon next run). If the comments close, we set "commentsfetched" to true and fetch all the comments.
User
Contains all the users that posted a comment with their username and full name
Comment
Definition for a comment. References the article for which it was written and references the User that wrote the comment. Can also reference another comment if it is a reply. Contains the amount of likes the comment received.
Running the tool
The tool needs to be started once (ideally docker-compose or podman), it then reloads the data every hour. This time-period can be set in settings.py when running directly with python or settings-docker.py when using docker-compose, docker or podman
Run with docker-compose
sudo docker-compose up -d
Run with podman (no compose, no root)
- Build:
podman build -f ./Dockerfile --tag srf-collector - Make sure the directory
databaseexists - Run with volume:
podman run -v $(pwd)/database:/collector/database srf-collector
Run directly with python3
- Install requirements
pip3 install -r requirements.txt - Run it
python3 main.py
Read the data
Data is saved into an sqlite-databse. It is located in the database directory where the tool is run. The database contains the tables mentioned above. Use SQL to perform your analysis. I might implement some functionality to analyse the data in the tool itself later.
Citation
Should you use this tool for an academic publication, please cite it as outlined in the CITATION.cff file.
Owner
- Name: Mauro Sbicego
- Login: maurosbicego
- Kind: user
- Location: Switzerland
- Company: @tryption
- Twitter: maurosbicego
- Repositories: 2
- Profile: https://github.com/maurosbicego
Citation (CITATION.cff)
cff-version: 1.2.0
title: srf.ch comment collector
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- orcid: 'https://orcid.org/0009-0007-1776-2916'
given-names: Mauro Mattia
family-names: Sbicego
repository-code: 'https://github.com/maurosbicego/srf-comment-collector'
abstract: >-
A tool that can be used to collect comments from srf.ch
(Swiss media platform in german language). srf.ch curates
the comments which results in high quality data that can
be used for different research purposes
license: GPL-3.0