telegram-dataset-builder
This code uses Telethon and Telegram API to retrieve messages from public channels. This allows a fast and easy dataset creation.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Keywords
Repository
This code uses Telethon and Telegram API to retrieve messages from public channels. This allows a fast and easy dataset creation.
Basic Info
Statistics
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README.md
Telegram Dataset Builder
This project uses Telethon to build datasets of public Telegram groups. To do this, it needs the channel name or ID. The result is a json file with the channel information and several json files organised in folders containing the messages in batches.
Requirements
The required packages and versions are the following:
Telethon==1.35.0
python-dotenv==1.0.1
Telegram API credentials
The default configuration uses a telegram.env file in the root folder to load the credentials. This file must follow the next schema (note that the phone number must be with prefix):
PHONE_NUMBER= "+34..."
TELEGRAM_APP_ID = 9...
TELEGRAM_APP_HASH = "d..."
How to gather groups messages?
To get all messages in some groups you can run dataset_creator.py and modify the next elements:
- You have to modify the
channel_names= ["foo", "bar"]to the channel names you want to extract. - You can set a different
BATCH_SIZEif you want. - If you put you telegram credentials in a different path, modify
telegram_env_path. - The
output_chats_pathis the folder were everythin is going to be stored. Both the channels chats and the channels info, it can be modified.
How to monitor groups messages?
To monitor new messages sent in some groups you can run engagement_monitor.py and modify the next elements:
- You have to modify the
channel_names= ["foo", "bar"]to the channel names you want to extract. - You can set a different
BATCH_SIZEif you want. - If you put you telegram credentials in a different path, modify
telegram_env_path. - The
output_chatsis the folder were everythin is going to be stored. Both the channels chats and the channels info, it can be modified.
Owner
- Name: Ontology Engineering Group (UPM)
- Login: oeg-upm
- Kind: organization
- Email: oeg-dev@delicias.dia.fi.upm.es
- Location: Boadilla del Monte, Madrid, Spain
- Website: https://oeg.fi.upm.es/
- Repositories: 294
- Profile: https://github.com/oeg-upm
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Guillén-Pacho" given-names: "Ibai" orcid: "https://orcid.org/0000-0001-7801-8815" title: "oeg-upm/telegram-dataset-builder" doi: 10.5281/zenodo.12773159 date-released: 2024-07-18 url: "https://github.com/oeg-upm/telegram-dataset-builder"
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2