constructed
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: AymenRaouf
- License: other
- Language: Python
- Default Branch: main
- Size: 33 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
ConstrucTED : Constructing Tailored Educational Datasets From Online Courses

ConstrucTED is a tool built on top of Google APIs, enabling the efficient creation of custom educational datasets from YouTube playlists. It creates datasets from video course transcripts, providing a ready-to-use solution that significantly shortens the time required to create such datasets. The resulting datasets are versatile and suitable for tasks like classification and learning path creation.

Installation
Download the project then use the package manager pip to install the dependencies.
bash
pip install -r requirements.txt

Usage
- Before using ConstrucTED, you should first get a Google API personal key.
- Create a file called .env in the base of the project and add this line with your perosnal API key :
bash GOOGLE_API_KEY='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' - You can run the main.ipynb file to create datasets.
- There are some pre-made input files in the Input folder. You can use these sample input files to create datasets.
- The datasets that can be created using these sample input files are available in the Output folder for direct usage.
- You can create your own input files and use them in the code
python input_file = 'path_to_your_input_file' my_dataset.create_series(input_file) my_dataset.save(path='path_to_an output_location') - This code generates three files as explained in the article : series.csv, episodes.csv, and chapters.csv.
- These files contain the created dataset.

Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Citation
If you use this code in your research or projects, please cite the following article:
ConstrucTED : Constructing Tailored Educational Datasets From Online Courses
Aymen Bazouzi, Zoltan Miklos, Mickaël Foursov, Hoël Le Capitaine.
Proceedings of the 16th International Conference on Computer Supported Education (CSEDU), 2024.
📖 BibTeX
bibtex
@conference{ekm24,
author={Aymen Bazouzi and Zoltan Miklos and Mickaël Foursov and Hoël {Le Capitaine}},
title={ConstrucTED: Constructing Tailored Educational Datasets from Online Courses},
booktitle={Proceedings of the 16th International Conference on Computer Supported Education - Volume 1: EKM},
year={2024},
pages={645-652},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012745000003693},
isbn={978-989-758-697-2},
issn={2184-5026},
}
License
Owner
- Name: Aymen
- Login: AymenRaouf
- Kind: user
- Location: France
- Repositories: 3
- Profile: https://github.com/AymenRaouf
PhD researcher
Citation (CITATION.cff)
```yaml
cff-version: 1.2.0
message: "If you use this software, please cite the following paper."
authors:
- family-names: Bazouzi
given-names: Aymen
- family-names: Miklos
given-names: Zoltan
- family-names: Foursov
given-names: Mickaël
- family-names: Le Capitaine
given-names: Hoël
title: "ConstrucTED : Constructing Tailored Educational Datasets From Online Courses"
conference: "Proceedings of the 16th International Conference on Computer Supported Education (CSEDU)"
year: 2024
doi: "10.5220/0012745000003693"
url: "https://www.scitepress.org/Link.aspx?doi=10.5220/0012745000003693"
GitHub Events
Total
- Watch event: 1
- Push event: 1
Last Year
- Watch event: 1
- Push event: 1
Dependencies
- Pygments ==2.17.2
- asttokens ==2.4.1
- cachetools ==5.3.3
- certifi ==2024.2.2
- charset-normalizer ==3.3.2
- comm ==0.2.1
- debugpy ==1.8.1
- decorator ==5.1.1
- exceptiongroup ==1.2.0
- executing ==2.0.1
- google-api-core ==2.17.1
- google-api-python-client ==2.119.0
- google-auth ==2.28.1
- google-auth-httplib2 ==0.2.0
- googleapis-common-protos ==1.62.0
- httplib2 ==0.22.0
- idna ==3.6
- ipykernel ==6.29.3
- ipython ==8.22.1
- jedi ==0.19.1
- jupyter_client ==8.6.0
- jupyter_core ==5.7.1
- matplotlib-inline ==0.1.6
- nest-asyncio ==1.6.0
- numpy ==1.26.4
- packaging ==23.2
- pandas ==2.2.1
- parso ==0.8.3
- pexpect ==4.9.0
- platformdirs ==4.2.0
- prompt-toolkit ==3.0.43
- protobuf ==4.25.3
- psutil ==5.9.8
- ptyprocess ==0.7.0
- pure-eval ==0.2.2
- pyasn1 ==0.5.1
- pyasn1-modules ==0.3.0
- pyparsing ==3.1.1
- python-dateutil ==2.8.2
- pytz ==2024.1
- pyzmq ==25.1.2
- requests ==2.31.0
- rsa ==4.9
- six ==1.16.0
- stack-data ==0.6.3
- tornado ==6.4
- traitlets ==5.14.1
- tzdata ==2024.1
- uritemplate ==4.1.1
- urllib3 ==2.2.1
- wcwidth ==0.2.13
- youtube-transcript-api ==0.6.2