llm-lct-sequencing
AI Semantic Insights: LLM Toolkit for Analysing Educational Practices and Knowledge Building.
https://github.com/sydney-informatics-hub/llm-lct-sequencing
Science Score: 62.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
✓Institutional organization owner
Organization sydney-informatics-hub has institutional domain (sydney.edu.au) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Keywords
Repository
AI Semantic Insights: LLM Toolkit for Analysing Educational Practices and Knowledge Building.
Basic Info
Statistics
- Stars: 1
- Watchers: 6
- Forks: 0
- Open Issues: 2
- Releases: 16
Topics
Metadata Files
README.md
AI Semantic Insights: LLM Toolkit for Analysing Educational Practices and Knowledge Building
Introduction
This repository provides AI-assisted Large Language Model (LLM) tools for classifying sequencing relations between text clauses for Legitimation Code Theory (LCT). LCT is a framework developed by Prof. Karl Maton for identify and classify the ‘epistemic-semantic density (ESD)’ in English discourse, which is an approach to analyzing knowledge practices in various social fields, including education. LCT is often used to examine the underlying principles that guide knowledge building, curriculum design, pedagogical practices, and the evaluation of student work. Utilising the LCT analytic method, the complexity of knowledge practices and knowledge-building can be conceptualised and revealed from educational texts, such as the lecture transcriptions.
SIH has previously completed the ‘clausing tool’ (project PIPE-156), i.e. combining word-groupings into short, coherent standalone passages, and in turn classifying the clauses as one of the eight predefined types so that the EC can be quantitatively measured, and an information density profile of the text was generated.
This project focuses on implementing the next level of the epistemological condensation of the texts, “sequencing tool”. By combining more than one short passages (clauses), the sequencing patterns affects how the meanings are condensed from more than one passages and transported across passages. Similar to the clausing typology, the sequencing tool has a 3-level hierarchical system that consists of 8 sub-types of sequencing at the finest granularity.
Overview
This project provides two processing tools:
A LLM experiment workbench to experiment with different sequencing definitions and to optimise the LLM prompts. Please see the notebook
llm_experiment_multi.ipynbin the notebooks folder.A sequencing annotation tool that allows the user to automatically extract clauses from text, and to classify the clause pairs using a pre-defined LLM (as, e.g., optimised via Step 1). To start this tool, run the notebook
annotation-tool.ipynbor use the Web version as outlined below.
The schematic process diagrams for these two tools are shown below.
Overview of LLM Experiment Workbench
```mermaid graph TD
Expert --> LLM-Instructions Expert --> Sequencing-Definitions Expert --> Sequencing-Examples Sequencing-Examples --> Examples-Split Examples-Split --> Prompt-Examples Examples-Split --> Test-Examples Test-Examples --> True-Classification Test-Examples --> Test-text-input Test-text-input --> LLM-Model Prompt-Examples --> Prompt LLM-Instructions --> Prompt Sequencing-Definitions --> Prompt Prompt --> LLM-Model LLM-Model --> Test-Predictions True-Classification --> LLM-Optimisation Test-Predictions --> LLM-Optimisation LLM-Optimisation --> Expert LLM-Optimisation --> LLM-Model
```
Overview of Sequencing Tool for Production
```mermaid graph TD
Input-text --> Clausing-Tool Clausing-Tool --> Clausing-Pairs Optimised-LLM-Instructions --> LLM-Model LLM-Model --> Sequencing-Prediction Sequencing-Prediction --> Annotation-Tool Clausing-Pairs --> Annotation-Tool Clausing-Pairs --> LLM-Model Annotation-Tool --> Sequencing-Classification User-Input --> Annotation-Tool Sequencing-Classification --> EC-Analysis Annotation-Tool --> Clausing-Pairs ```
Functionality
The aim of the LCT analysis tool is to provide researchers an automatic classification system that detects and identifies the sequencing types of combination of passages (clauses). Existing large language models (LLMs), such as OpenAI GPT, are applied for automatic sequencing classification.
Usage and dependencies
Web version
Click the launch binder link below to use a web version of the tool hosted on Binderhub courtesy of the Australian Text Analytics Platform.
Note: CILogon authentication is required. You can use your institutional, Google or Microsoft account to log in. If you have trouble authenticating, please refer to the CILogon troubleshooting guide.
Self-host
Run the following commands in a terminal or any Bash environment.
Clone the repository and navigate into the newly created directory:
shell
git clone https://github.com/Sydney-Informatics-Hub/LLM-LCT-sequencing.git
cd LLM-LCT-sequencing
To install dependencies, ensure you have Python 3.10 and pip installed, then run the following command:
shell
pip install -r requirements.txt
Serve the application locally using the following command:
shell
panel serve annotation-tool.ipynb --show
The application will launch in a browser at the link provided (http://localhost:5006/annotation-tool)
References
- Maton, Karl, and Yaegan J. Doran. "Condensation: A translation device for revealing complexity of knowledge practices in discourse, part 2—clausing and sequencing." Onomázein (2017): 77-110
Attribution and Acknowledgement
Acknowledgments are an important way for us to demonstrate the value we bring to your research. Your research outcomes are vital for ongoing funding of the Sydney Informatics Hub.
If you make use of this software for your research project, please include the following acknowledgment:
“This research was supported by the Sydney Informatics Hub, a Core Research Facility of the University of Sydney."
Owner
- Name: Sydney Informatics Hub
- Login: Sydney-Informatics-Hub
- Kind: organization
- Email: sih.admin@sydney.edu.au
- Location: University of Sydney, Sydney Australia
- Website: https://sydney.edu.au/sydney-informatics-hub
- Twitter: Sydney_CRF
- Repositories: 189
- Profile: https://github.com/Sydney-Informatics-Hub
The Sydney Informatics Hub is a Core Research Facility of the University of Sydney, providing training and expertise on research data, analysis and computing.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
AI Semantic Insights: LLM Toolkit for Analysing
Educational Practices and Knowledge Building
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Karl
family-names: Maton
name-particle: Prof.
email: karl.maton@sydney.edu.au
affiliation: >-
Professor of Sociology; Director, LCT Centre for
Knowledge-Building, University of Sydney
- given-names: Yaegan
email: yaegan.doran@acu.edu.au
family-names: Doran
name-particle: Dr.
affiliation: >-
Senior Lecturer, Language and Literacy Education,
Australian Catholic University
- given-names: Chao
family-names: Sun
name-particle: Dr.
email: chao.sun@sydney.edu.au
affiliation: >-
Sydney Informatics Hub, a core research facility of
the University of Sydney
- given-names: Sebastian
family-names: Haan
email: sebastian.haan@sydney.edu.au
affiliation: >-
Sydney Informatics Hub, a core research facility of
the University of Sydney
- given-names: Hamish
family-names: Croser
affiliation: >-
Sydney Informatics Hub, a core research facility of
the University of Sydney
email: hamish.croser@sydney.edu.au
repository-code: >-
https://github.com/Sydney-Informatics-Hub/LLM-LCT-sequencing
keywords:
- Legitimation Code Theory
- LCT
license: MIT
GitHub Events
Total
- Release event: 2
- Push event: 8
- Pull request event: 4
- Create event: 3
Last Year
- Release event: 2
- Push event: 8
- Pull request event: 4
- Create event: 3
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Seb Haan | s****n@g****m | 128 |
| Hamish Croser | h****r@s****u | 128 |
| Hamish Croser | h****1@g****m | 8 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 2
- Total pull requests: 10
- Average time to close issues: N/A
- Average time to close pull requests: about 4 hours
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 1.0
- Average comments per pull request: 0.5
- Merged pull requests: 10
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- h-croser (1)
- sebhaan (1)
Pull Request Authors
- h-croser (13)
- sebhaan (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- celery *
- flask *
- gunicorn *
- html5lib *
- ipython *
- ipywidgets *
- itables *
- joblib *
- jsonschema *
- lxml *
- markdown *
- matplotlib *
- nltk *
- notebook *
- numpy *
- openai *
- openpyxl *
- pandas *
- panel *
- pip *
- scikit-learn *
- seaborn *
- simplejson *
- spacy *
- tiktoken *
- xlrd *
- amqp 5.1.1.*
- annotated-types 0.5.0.*
- anyio 4.0.0.*
- appnope 0.1.3.*
- argon2-cffi 23.1.0.*
- argon2-cffi-bindings 21.2.0.*
- arrow 1.2.3.*
- asttokens 2.4.0.*
- async-lru 2.0.4.*
- attrs 23.1.0.*
- babel 2.12.1.*
- backcall 0.2.0.*
- backports 1.0.*
- backports.functools_lru_cache 1.6.5.*
- backports.zoneinfo 0.2.1.*
- beautifulsoup4 4.12.2.*
- billiard 4.1.0.*
- bleach 6.0.0.*
- blinker 1.6.2.*
- brotli 1.1.0.*
- brotli-bin 1.1.0.*
- brotli-python 1.1.0.*
- bzip2 1.0.8.*
- ca-certificates 2023.7.22.*
- cached-property 1.5.2.*
- cached_property 1.5.2.*
- catalogue 2.0.9.*
- celery 5.3.1.*
- certifi 2023.7.22.*
- cffi 1.15.1.*
- charset-normalizer 3.2.0.*
- click 8.1.7.*
- click-didyoumean 0.3.0.*
- click-plugins 1.1.1.*
- click-repl 0.3.0.*
- colorama 0.4.6.*
- comm 0.1.4.*
- confection 0.1.3.*
- contourpy 1.1.1.*
- cycler 0.11.0.*
- cymem 2.0.8.*
- cython-blis 0.7.10.*
- dataclasses 0.8.*
- debugpy 1.8.0.*
- decorator 5.1.1.*
- defusedxml 0.7.1.*
- entrypoints 0.4.*
- et_xmlfile 1.1.0.*
- exceptiongroup 1.1.3.*
- executing 1.2.0.*
- flask 2.3.3.*
- fonttools 4.42.1.*
- fqdn 1.5.1.*
- freetype 2.12.1.*
- gunicorn 21.2.0.*
- html5lib 1.1.*
- icu 73.2.*
- idna 3.4.*
- importlib-metadata 6.8.0.*
- importlib_metadata 6.8.0.*
- importlib_resources 6.0.1.*
- ipykernel 6.25.2.*
- ipython 8.14.0.*
- isoduration 20.11.0.*
- itsdangerous 2.1.2.*
- jedi 0.19.0.*
- jinja2 3.1.2.*
- joblib 1.3.2.*
- json5 0.9.14.*
- jsonpointer 2.4.*
- jsonschema 4.19.0.*
- jsonschema-specifications 2023.7.1.*
- jsonschema-with-format-nongpl 4.19.0.*
- jupyter-lsp 2.2.0.*
- jupyter_client 8.3.1.*
- jupyter_core 5.3.1.*
- jupyter_events 0.7.0.*
- jupyter_server 2.7.3.*
- jupyter_server_terminals 0.4.4.*
- jupyterlab 4.0.6.*
- jupyterlab_pygments 0.2.2.*
- jupyterlab_server 2.25.0.*
- kiwisolver 1.4.5.*
- kombu 5.3.2.*
- langcodes 3.3.0.*
- lcms2 2.15.*
- lerc 4.0.0.*
- libblas 3.9.0.*
- libbrotlicommon 1.1.0.*
- libbrotlidec 1.1.0.*
- libbrotlienc 1.1.0.*
- libcblas 3.9.0.*
- libcxx 16.0.6.*
- libdeflate 1.19.*
- libffi 3.4.2.*
- libgfortran 5.0.0.*
- libgfortran5 13.2.0.*
- libiconv 1.17.*
- libjpeg-turbo 2.1.5.1.*
- liblapack 3.9.0.*
- libopenblas 0.3.24.*
- libpng 1.6.39.*
- libsodium 1.0.18.*
- libsqlite 3.43.0.*
- libtiff 4.6.0.*
- libwebp-base 1.3.2.*
- libxcb 1.15.*
- libxml2 2.11.5.*
- libxslt 1.1.37.*
- libzlib 1.2.13.*
- llvm-openmp 16.0.6.*
- lxml 4.9.3.*
- markdown-it-py 3.0.0.*
- markupsafe 2.1.3.*
- matplotlib 3.7.2.*
- matplotlib-base 3.7.2.*
- matplotlib-inline 0.1.6.*
- mdurl 0.1.0.*
- mistune 3.0.1.*
- munkres 1.1.4.*
- murmurhash 1.0.10.*
- nbclient 0.8.0.*
- nbconvert-core 7.8.0.*
- nbformat 5.9.2.*
- ncurses 6.4.*
- nest-asyncio 1.5.6.*
- nltk 3.8.1.*
- notebook 7.0.2.*
- notebook-shim 0.2.3.*
- numpy 1.25.2.*
- openjpeg 2.5.0.*
- openpyxl 3.1.2.*
- openssl 3.1.2.*
- overrides 7.4.0.*
- packaging 23.1.*
- pandas 2.1.0.*
- pandocfilters 1.5.0.*
- parso 0.8.3.*
- pathy 0.10.2.*
- patsy 0.5.3.*
- pexpect 4.8.0.*
- pickleshare 0.7.5.*
- pillow 10.0.1.*
- pip 23.2.1.*
- pkgutil-resolve-name 1.3.10.*
- platformdirs 3.10.0.*
- preshed 3.0.8.*
- prometheus_client 0.17.1.*
- prompt-toolkit 3.0.39.*
- prompt_toolkit 3.0.39.*
- psutil 5.9.5.*
- pthread-stubs 0.4.*
- ptyprocess 0.7.0.*
- pure_eval 0.2.2.*
- pycparser 2.21.*
- pydantic-core 2.6.3.*
- pygments 2.16.1.*
- pyobjc-core 9.2.*
- pyobjc-framework-cocoa 9.2.*
- pyparsing 3.0.9.*
- pysocks 1.7.1.*
- python 3.10.12.*
- python-dateutil 2.8.2.*
- python-fastjsonschema 2.18.0.*
- python-json-logger 2.0.7.*
- python-tzdata 2023.3.*
- python_abi 3.10.*
- pytz 2023.3.post1.*
- pyyaml 6.0.1.*
- pyzmq 25.1.1.*
- readline 8.2.*
- referencing 0.30.2.*
- regex 2023.8.8.*
- requests 2.31.0.*
- rfc3339-validator 0.1.4.*
- rfc3986-validator 0.1.1.*
- rich 13.5.3.*
- rpds-py 0.10.3.*
- scikit-learn 1.3.0.*
- scipy 1.11.2.*
- seaborn 0.12.2.*
- seaborn-base 0.12.2.*
- send2trash 1.8.2.*
- setuptools 68.2.2.*
- shellingham 1.5.3.*
- six 1.16.0.*
- smart_open 5.2.1.*
- sniffio 1.3.0.*
- soupsieve 2.5.*
- spacy 3.6.1.*
- spacy-legacy 3.0.12.*
- spacy-loggers 1.0.5.*
- srsly 2.4.7.*
- stack_data 0.6.2.*
- statsmodels 0.14.0.*
- terminado 0.17.1.*
- thinc 8.1.12.*
- threadpoolctl 3.2.0.*
- tinycss2 1.2.1.*
- tk 8.6.12.*
- tomli 2.0.1.*
- tornado 6.3.3.*
- tqdm 4.66.1.*
- traitlets 5.10.0.*
- typer 0.9.0.*
- typing-extensions 4.7.1.*
- typing_extensions 4.7.1.*
- typing_utils 0.1.0.*
- tzdata 2023c.*
- unicodedata2 15.0.0.*
- uri-template 1.3.0.*
- vine 5.0.0.*
- wasabi 1.1.2.*
- wcwidth 0.2.6.*
- webcolors 1.13.*
- webencodings 0.5.1.*
- werkzeug 2.3.7.*
- wheel 0.41.2.*
- xlrd 2.0.1.*
- xorg-libxau 1.0.11.*
- xorg-libxdmcp 1.1.3.*
- xz 5.2.6.*
- yaml 0.2.5.*
- zeromq 4.3.4.*
- zipp 3.16.2.*
- zstd 1.5.5.*