mooccubex
A large-scale knowledge repository for adaptive learning, learning analytics, and knowledge discovery in MOOCs, hosted by THU KEG.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (2.1%) to scientific vocabulary
Repository
A large-scale knowledge repository for adaptive learning, learning analytics, and knowledge discovery in MOOCs, hosted by THU KEG.
Basic Info
- Host: GitHub
- Owner: THU-KEG
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 1.28 MB
Statistics
- Stars: 113
- Watchers: 10
- Forks: 18
- Open Issues: 10
- Releases: 0
Metadata Files
README-cn.md
MOOCCubeX
MOOCCubeX由清华大学知识工程实验室维护,并得到中国最大的MOOC网站之一学堂在线的支持。本资源库包括4,216门课程,230,263个视频,358,265个习题,637,572个细粒度的概念和超过2.96亿的3,330,294个学生的原始行为数据,用于支持MOOC中自适应学习的研究课题。
我们将MOOCCubeX的贡献总结如下。
- 高覆盖率。MOOCCubeX收集了多样化的MOOC资源和外部教育资源,以及学生的学习、练习和讨论的数据记录。
- 规模大。与其他开放的教育数据资源库相比,MOOCCubeX的规模更大,从而支持对数据要求较高的深度模型的探索。
- 以概念为中心:异质数据采用细粒度的概念进行组织,这使得资源更有针对性,更容易表达、查找和建模。
新闻 !!
- 完善了数学、心理学和计算机科学的先后修关系!!
- 我们的论文已提交给CIKM2021 resource track!!
- MOOCCubeX数据集生成器工具包已被更新!!
- 我们的论文已录用于CIKM2021 resource track!!
资源结构
MOOCCubeX的结构图如下所示。

MOOCCubeX的数据使用细粒度概念图谱进行组织。其资源在以下各表列出。
课程资源数据(详见course.md)。
| 课程资源类型 | 描述 | 下载链接 | 文件大小 | | -------------------- | ----------- | -------- | ---- | | 课程信息 | 课程总览,包括视频和习题两种资源 | entities/course.json | 43M | | 视频 | 包括视频的标题和字幕等信息 | entities/video.json | 580M | | 习题 | 课程的习题即为一组问题 | relations/exercise-problem.txt | 129M | | 问题 | 一组习题中包括的问题 | entities/problem.json | 1.2G | | 学校 | 学校信息 | entities/school.json | 613K | | 教师 | 教师信息 | entities/teacher.json | 8.7M | | 学科领域 | 人工标注的课程的(可能多个)所属领域 | relations/course-field.json | 62K |
学生行为数据(详见user.md)。
| 学生行为类型 | 描述 | 下载链接 | 文件大小 | | --------------------- | ----------- | -------- | ---- | | 用户画像 | 包括用户的ID, 学校, 课程注册顺序等等 | entities/user.json | 770M | | 观看视频 | 用户观看视频的倍速以及跳跃着看的信息 | relations/user-video.json | 3.0G | | 做习题 | 用户做习题中的问题的情况 | relations/user-problem.json | 21G | | 评论 | 用户对视频或习题的评论 | entities/comment.json | 2.1G | | 评论回复 | 用户对其他用户评论的回复 | entities/reply.json | 50M | | 小木 | 用户与学堂在线智能问答机器人小木的交互信息 | relations/user-xiaomu.json | 9.7M |
细粒度概念及其与其他MOOC资源的链接信息,也包括其他课外资源。详见concept.md。
| 概念及链接 | 描述 | 下载链接 | 文件大小 | | -------------------------------------- | ----------- | -------- | ---- | | 概念 | 从视频字幕中抽取的课程概念 | entities/concept.json | 156M | | 概念先后修关系 | 人工标注与算法预测的部分概念先后修关系。包括心理学、数学与计算机科学。 | prerequisites/psy.json prerequisites/math.json prerequisites/cs.json | 87M 59M 133M | | 概念与课程链接 | 课程对应的概念 | relations/concept-course.txt | 19M | | 概念与视频链接 | 视频对应的概念 | relations/concept-video.txt | 39M | | 概念与问题链接 | 问题对应的概念 | relations/concept-problem.txt | 1.3M | | 概念与评论链接 | 评论对应的概念 | relations/concept-comment.txt | 1.2M | | 概念与课外资源链接 | 课外资源对应的概念. | relations/concept-other.txt | 19M |
工具包
为了方便使用,我们提供2类工具包。
- MOOCCubeX数据集生成器
| 工具名称 | 描述 | 使用举例 |
| ------------ | --------------------- | ------------------------- |
| download_dataset.sh | 下载整个数据集 | ./scripts/download_dataset.sh |
| count.sh | 数课程/视频/等资源的个数 | ./scripts/count.sh |
| userfreqhistgram.py | 画视频/问题等用户使用频率统计图(论文Figure 4) | python3 scripts/user_freq_histgram.py |
| concept_course.py | 生成relations/concept-course.txt的脚本 | python scripts/concept_course.py |
| concept_finder.sh | 找到包含给定概念的视频的ccid | ./scripts/concept_finder.sh K_晶体三极管组态放大器_电子科学与技术 |
| courseinfofinder.sh | 找到包含给定字符串的课程信息 | ./scripts/course_info_finder.sh 数据结构 |
| videoviewedbyuserand_course.sh | 查询给定用户在给定课程所有观看视频的IDresource_ids | ./scripts/video_viewed_by_user_and_course.sh U_94015 C_1824928 |
| problemsbyuser.sh | 查询给定用户所有回答过的问题 | ./scripts/problems_by_user.sh U_10000835 |
| conceptsofvideo.sh | 查询给定视频的所有概念 | ./scripts/concepts_of_video.sh V_479945 |
| who_replied.sh | 查询所有回复过给定用户的其他用户 | ./scripts/who_replied.sh U_10006544 |
以上部分工具依赖于jq或其他Python库,例如matplotlib和tqdm。
MOOCube概念帮助器
- 概念提取流水线:https://github.com/yujifan0326/Concept-Acquisition-Pipeline
- 概念先后修关系发现工具: https://github.com/luogan1234/prerequisite-prediction-co-training
提示和特征
MOOCCubeX的概念和行为数据有一些统计上的特点。
- 与前一版本的MOOCCube相比,MOOCCubeX包含了更细粒度的概念。
- 视频观看行为是长尾分布的,而习题趋于正态分布。

参考文献
bib
@inproceedings{yu2021mooccubex,
title={{MOOCCubeX}: A Large Knowledge-centered Repository for Adaptive Learning in {MOOCs}},
author={Yu, Jifan and Wang, Yuquan and Zhong, Qingyang and Luo, Gan and Mao, Yiming and Sun, Kai and Feng, Wenzheng and Xu, Wei and Cao, Shulin and Zeng, Kaisheng and others},
booktitle={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
pages={4643--4652},
year={2021}
}
Owner
- Name: THU-KEG
- Login: THU-KEG
- Kind: organization
- Location: Tsinghua University, Beijing, China
- Repositories: 23
- Profile: https://github.com/THU-KEG
Citation (CITATION.bib)
@inproceedings{yu2021mooccubex,
title={{MOOCCubeX}: A Large Knowledge-centered Repository for Adaptive Learning in {MOOCs}},
author={Yu, Jifan and Wang, Yuquan and Zhong, Qingyang and Luo, Gan and Mao, Yiming and Sun, Kai and Feng, Wenzheng and Xu, Wei and Cao, Shulin and Zeng, Kaisheng and others},
booktitle={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
pages={4643--4652},
year={2021}
}
GitHub Events
Total
- Issues event: 5
- Watch event: 35
- Fork event: 2
Last Year
- Issues event: 5
- Watch event: 35
- Fork event: 2
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Yuquan Wang | 1****1@q****m | 4 |
| luzixiao | 1****5@q****m | 3 |
| yujifan | y****6@1****m | 1 |
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 13
- Total pull requests: 7
- Average time to close issues: 4 days
- Average time to close pull requests: about 18 hours
- Total issue authors: 12
- Total pull request authors: 4
- Average comments per issue: 0.69
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 4
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- hongxiaDu (2)
- lmh122 (1)
- dnsang1611 (1)
- ghzha0 (1)
- DengHanlong (1)
- luzixiao (1)
- happy-dogsss (1)
- leexzhuo (1)
- innocentc (1)
- healerccz (1)
- wahr0411 (1)
- nanang725 (1)
Pull Request Authors
- yuq-1s (4)
- PuzzlingMojito (1)
- SergioSim (1)
- luzixiao (1)