llm-lct-sequencing

AI Semantic Insights: LLM Toolkit for Analysing Educational Practices and Knowledge Building.

https://github.com/sydney-informatics-hub/llm-lct-sequencing

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
    Organization sydney-informatics-hub has institutional domain (sydney.edu.au)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary

Keywords

classification knowledge-representation llm
Last synced: 4 months ago · JSON representation ·

Repository

AI Semantic Insights: LLM Toolkit for Analysing Educational Practices and Knowledge Building.

Basic Info
  • Host: GitHub
  • Owner: Sydney-Informatics-Hub
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.55 MB
Statistics
  • Stars: 1
  • Watchers: 6
  • Forks: 0
  • Open Issues: 2
  • Releases: 16
Topics
classification knowledge-representation llm
Created over 2 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

AI Semantic Insights: LLM Toolkit for Analysing Educational Practices and Knowledge Building

Introduction

This repository provides AI-assisted Large Language Model (LLM) tools for classifying sequencing relations between text clauses for Legitimation Code Theory (LCT). LCT is a framework developed by Prof. Karl Maton for identify and classify the ‘epistemic-semantic density (ESD)’ in English discourse, which is an approach to analyzing knowledge practices in various social fields, including education. LCT is often used to examine the underlying principles that guide knowledge building, curriculum design, pedagogical practices, and the evaluation of student work. Utilising the LCT analytic method, the complexity of knowledge practices and knowledge-building can be conceptualised and revealed from educational texts, such as the lecture transcriptions.

SIH has previously completed the ‘clausing tool’ (project PIPE-156), i.e. combining word-groupings into short, coherent standalone passages, and in turn classifying the clauses as one of the eight predefined types so that the EC can be quantitatively measured, and an information density profile of the text was generated.

This project focuses on implementing the next level of the epistemological condensation of the texts, “sequencing tool”. By combining more than one short passages (clauses), the sequencing patterns affects how the meanings are condensed from more than one passages and transported across passages. Similar to the clausing typology, the sequencing tool has a 3-level hierarchical system that consists of 8 sub-types of sequencing at the finest granularity.

Overview

This project provides two processing tools:

  1. A LLM experiment workbench to experiment with different sequencing definitions and to optimise the LLM prompts. Please see the notebook llm_experiment_multi.ipynb in the notebooks folder.

  2. A sequencing annotation tool that allows the user to automatically extract clauses from text, and to classify the clause pairs using a pre-defined LLM (as, e.g., optimised via Step 1). To start this tool, run the notebook annotation-tool.ipynb or use the Web version as outlined below.

The schematic process diagrams for these two tools are shown below.

Overview of LLM Experiment Workbench

```mermaid graph TD

Expert --> LLM-Instructions Expert --> Sequencing-Definitions Expert --> Sequencing-Examples Sequencing-Examples --> Examples-Split Examples-Split --> Prompt-Examples Examples-Split --> Test-Examples Test-Examples --> True-Classification Test-Examples --> Test-text-input Test-text-input --> LLM-Model Prompt-Examples --> Prompt LLM-Instructions --> Prompt Sequencing-Definitions --> Prompt Prompt --> LLM-Model LLM-Model --> Test-Predictions True-Classification --> LLM-Optimisation Test-Predictions --> LLM-Optimisation LLM-Optimisation --> Expert LLM-Optimisation --> LLM-Model

```

Overview of Sequencing Tool for Production

```mermaid graph TD

Input-text --> Clausing-Tool Clausing-Tool --> Clausing-Pairs Optimised-LLM-Instructions --> LLM-Model LLM-Model --> Sequencing-Prediction Sequencing-Prediction --> Annotation-Tool Clausing-Pairs --> Annotation-Tool Clausing-Pairs --> LLM-Model Annotation-Tool --> Sequencing-Classification User-Input --> Annotation-Tool Sequencing-Classification --> EC-Analysis Annotation-Tool --> Clausing-Pairs ```

Functionality

The aim of the LCT analysis tool is to provide researchers an automatic classification system that detects and identifies the sequencing types of combination of passages (clauses). Existing large language models (LLMs), such as OpenAI GPT, are applied for automatic sequencing classification.

Usage and dependencies

Web version

Click the launch binder link below to use a web version of the tool hosted on Binderhub courtesy of the Australian Text Analytics Platform.

Note: CILogon authentication is required. You can use your institutional, Google or Microsoft account to log in. If you have trouble authenticating, please refer to the CILogon troubleshooting guide.

Binder

Self-host

Run the following commands in a terminal or any Bash environment.

Clone the repository and navigate into the newly created directory:

shell git clone https://github.com/Sydney-Informatics-Hub/LLM-LCT-sequencing.git cd LLM-LCT-sequencing

To install dependencies, ensure you have Python 3.10 and pip installed, then run the following command:

shell pip install -r requirements.txt

Serve the application locally using the following command:

shell panel serve annotation-tool.ipynb --show

The application will launch in a browser at the link provided (http://localhost:5006/annotation-tool)

References

  • Maton, Karl, and Yaegan J. Doran. "Condensation: A translation device for revealing complexity of knowledge practices in discourse, part 2—clausing and sequencing." Onomázein (2017): 77-110

Attribution and Acknowledgement

Acknowledgments are an important way for us to demonstrate the value we bring to your research. Your research outcomes are vital for ongoing funding of the Sydney Informatics Hub.

If you make use of this software for your research project, please include the following acknowledgment:

“This research was supported by the Sydney Informatics Hub, a Core Research Facility of the University of Sydney."

Owner

  • Name: Sydney Informatics Hub
  • Login: Sydney-Informatics-Hub
  • Kind: organization
  • Email: sih.admin@sydney.edu.au
  • Location: University of Sydney, Sydney Australia

The Sydney Informatics Hub is a Core Research Facility of the University of Sydney, providing training and expertise on research data, analysis and computing.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  AI Semantic Insights: LLM Toolkit for Analysing
  Educational Practices and Knowledge Building
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Karl
    family-names: Maton
    name-particle: Prof.
    email: karl.maton@sydney.edu.au
    affiliation: >-
      Professor of Sociology; Director, LCT Centre for
      Knowledge-Building, University of Sydney
  - given-names: Yaegan
    email: yaegan.doran@acu.edu.au
    family-names: Doran
    name-particle: Dr.
    affiliation: >-
      Senior Lecturer, Language and Literacy Education,
      Australian Catholic University
  - given-names: Chao
    family-names: Sun
    name-particle: Dr.
    email: chao.sun@sydney.edu.au
    affiliation: >-
      Sydney Informatics Hub, a core research facility of
      the University of Sydney
  - given-names: Sebastian
    family-names: Haan
    email: sebastian.haan@sydney.edu.au
    affiliation: >-
      Sydney Informatics Hub, a core research facility of
      the University of Sydney
  - given-names: Hamish
    family-names: Croser
    affiliation: >-
      Sydney Informatics Hub, a core research facility of
      the University of Sydney
    email: hamish.croser@sydney.edu.au
repository-code: >-
  https://github.com/Sydney-Informatics-Hub/LLM-LCT-sequencing
keywords:
  - Legitimation Code Theory
  - LCT
license: MIT

GitHub Events

Total
  • Release event: 2
  • Push event: 8
  • Pull request event: 4
  • Create event: 3
Last Year
  • Release event: 2
  • Push event: 8
  • Pull request event: 4
  • Create event: 3

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 264
  • Total Committers: 3
  • Avg Commits per committer: 88.0
  • Development Distribution Score (DDS): 0.515
Past Year
  • Commits: 8
  • Committers: 1
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Seb Haan s****n@g****m 128
Hamish Croser h****r@s****u 128
Hamish Croser h****1@g****m 8
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 2
  • Total pull requests: 10
  • Average time to close issues: N/A
  • Average time to close pull requests: about 4 hours
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.5
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • h-croser (1)
  • sebhaan (1)
Pull Request Authors
  • h-croser (13)
  • sebhaan (1)
Top Labels
Issue Labels
enhancement (1) bug (1)
Pull Request Labels
enhancement (8)

Dependencies

requirements.txt pypi
  • celery *
  • flask *
  • gunicorn *
  • html5lib *
  • ipython *
  • ipywidgets *
  • itables *
  • joblib *
  • jsonschema *
  • lxml *
  • markdown *
  • matplotlib *
  • nltk *
  • notebook *
  • numpy *
  • openai *
  • openpyxl *
  • pandas *
  • panel *
  • pip *
  • scikit-learn *
  • seaborn *
  • simplejson *
  • spacy *
  • tiktoken *
  • xlrd *
environment.yaml conda
  • amqp 5.1.1.*
  • annotated-types 0.5.0.*
  • anyio 4.0.0.*
  • appnope 0.1.3.*
  • argon2-cffi 23.1.0.*
  • argon2-cffi-bindings 21.2.0.*
  • arrow 1.2.3.*
  • asttokens 2.4.0.*
  • async-lru 2.0.4.*
  • attrs 23.1.0.*
  • babel 2.12.1.*
  • backcall 0.2.0.*
  • backports 1.0.*
  • backports.functools_lru_cache 1.6.5.*
  • backports.zoneinfo 0.2.1.*
  • beautifulsoup4 4.12.2.*
  • billiard 4.1.0.*
  • bleach 6.0.0.*
  • blinker 1.6.2.*
  • brotli 1.1.0.*
  • brotli-bin 1.1.0.*
  • brotli-python 1.1.0.*
  • bzip2 1.0.8.*
  • ca-certificates 2023.7.22.*
  • cached-property 1.5.2.*
  • cached_property 1.5.2.*
  • catalogue 2.0.9.*
  • celery 5.3.1.*
  • certifi 2023.7.22.*
  • cffi 1.15.1.*
  • charset-normalizer 3.2.0.*
  • click 8.1.7.*
  • click-didyoumean 0.3.0.*
  • click-plugins 1.1.1.*
  • click-repl 0.3.0.*
  • colorama 0.4.6.*
  • comm 0.1.4.*
  • confection 0.1.3.*
  • contourpy 1.1.1.*
  • cycler 0.11.0.*
  • cymem 2.0.8.*
  • cython-blis 0.7.10.*
  • dataclasses 0.8.*
  • debugpy 1.8.0.*
  • decorator 5.1.1.*
  • defusedxml 0.7.1.*
  • entrypoints 0.4.*
  • et_xmlfile 1.1.0.*
  • exceptiongroup 1.1.3.*
  • executing 1.2.0.*
  • flask 2.3.3.*
  • fonttools 4.42.1.*
  • fqdn 1.5.1.*
  • freetype 2.12.1.*
  • gunicorn 21.2.0.*
  • html5lib 1.1.*
  • icu 73.2.*
  • idna 3.4.*
  • importlib-metadata 6.8.0.*
  • importlib_metadata 6.8.0.*
  • importlib_resources 6.0.1.*
  • ipykernel 6.25.2.*
  • ipython 8.14.0.*
  • isoduration 20.11.0.*
  • itsdangerous 2.1.2.*
  • jedi 0.19.0.*
  • jinja2 3.1.2.*
  • joblib 1.3.2.*
  • json5 0.9.14.*
  • jsonpointer 2.4.*
  • jsonschema 4.19.0.*
  • jsonschema-specifications 2023.7.1.*
  • jsonschema-with-format-nongpl 4.19.0.*
  • jupyter-lsp 2.2.0.*
  • jupyter_client 8.3.1.*
  • jupyter_core 5.3.1.*
  • jupyter_events 0.7.0.*
  • jupyter_server 2.7.3.*
  • jupyter_server_terminals 0.4.4.*
  • jupyterlab 4.0.6.*
  • jupyterlab_pygments 0.2.2.*
  • jupyterlab_server 2.25.0.*
  • kiwisolver 1.4.5.*
  • kombu 5.3.2.*
  • langcodes 3.3.0.*
  • lcms2 2.15.*
  • lerc 4.0.0.*
  • libblas 3.9.0.*
  • libbrotlicommon 1.1.0.*
  • libbrotlidec 1.1.0.*
  • libbrotlienc 1.1.0.*
  • libcblas 3.9.0.*
  • libcxx 16.0.6.*
  • libdeflate 1.19.*
  • libffi 3.4.2.*
  • libgfortran 5.0.0.*
  • libgfortran5 13.2.0.*
  • libiconv 1.17.*
  • libjpeg-turbo 2.1.5.1.*
  • liblapack 3.9.0.*
  • libopenblas 0.3.24.*
  • libpng 1.6.39.*
  • libsodium 1.0.18.*
  • libsqlite 3.43.0.*
  • libtiff 4.6.0.*
  • libwebp-base 1.3.2.*
  • libxcb 1.15.*
  • libxml2 2.11.5.*
  • libxslt 1.1.37.*
  • libzlib 1.2.13.*
  • llvm-openmp 16.0.6.*
  • lxml 4.9.3.*
  • markdown-it-py 3.0.0.*
  • markupsafe 2.1.3.*
  • matplotlib 3.7.2.*
  • matplotlib-base 3.7.2.*
  • matplotlib-inline 0.1.6.*
  • mdurl 0.1.0.*
  • mistune 3.0.1.*
  • munkres 1.1.4.*
  • murmurhash 1.0.10.*
  • nbclient 0.8.0.*
  • nbconvert-core 7.8.0.*
  • nbformat 5.9.2.*
  • ncurses 6.4.*
  • nest-asyncio 1.5.6.*
  • nltk 3.8.1.*
  • notebook 7.0.2.*
  • notebook-shim 0.2.3.*
  • numpy 1.25.2.*
  • openjpeg 2.5.0.*
  • openpyxl 3.1.2.*
  • openssl 3.1.2.*
  • overrides 7.4.0.*
  • packaging 23.1.*
  • pandas 2.1.0.*
  • pandocfilters 1.5.0.*
  • parso 0.8.3.*
  • pathy 0.10.2.*
  • patsy 0.5.3.*
  • pexpect 4.8.0.*
  • pickleshare 0.7.5.*
  • pillow 10.0.1.*
  • pip 23.2.1.*
  • pkgutil-resolve-name 1.3.10.*
  • platformdirs 3.10.0.*
  • preshed 3.0.8.*
  • prometheus_client 0.17.1.*
  • prompt-toolkit 3.0.39.*
  • prompt_toolkit 3.0.39.*
  • psutil 5.9.5.*
  • pthread-stubs 0.4.*
  • ptyprocess 0.7.0.*
  • pure_eval 0.2.2.*
  • pycparser 2.21.*
  • pydantic-core 2.6.3.*
  • pygments 2.16.1.*
  • pyobjc-core 9.2.*
  • pyobjc-framework-cocoa 9.2.*
  • pyparsing 3.0.9.*
  • pysocks 1.7.1.*
  • python 3.10.12.*
  • python-dateutil 2.8.2.*
  • python-fastjsonschema 2.18.0.*
  • python-json-logger 2.0.7.*
  • python-tzdata 2023.3.*
  • python_abi 3.10.*
  • pytz 2023.3.post1.*
  • pyyaml 6.0.1.*
  • pyzmq 25.1.1.*
  • readline 8.2.*
  • referencing 0.30.2.*
  • regex 2023.8.8.*
  • requests 2.31.0.*
  • rfc3339-validator 0.1.4.*
  • rfc3986-validator 0.1.1.*
  • rich 13.5.3.*
  • rpds-py 0.10.3.*
  • scikit-learn 1.3.0.*
  • scipy 1.11.2.*
  • seaborn 0.12.2.*
  • seaborn-base 0.12.2.*
  • send2trash 1.8.2.*
  • setuptools 68.2.2.*
  • shellingham 1.5.3.*
  • six 1.16.0.*
  • smart_open 5.2.1.*
  • sniffio 1.3.0.*
  • soupsieve 2.5.*
  • spacy 3.6.1.*
  • spacy-legacy 3.0.12.*
  • spacy-loggers 1.0.5.*
  • srsly 2.4.7.*
  • stack_data 0.6.2.*
  • statsmodels 0.14.0.*
  • terminado 0.17.1.*
  • thinc 8.1.12.*
  • threadpoolctl 3.2.0.*
  • tinycss2 1.2.1.*
  • tk 8.6.12.*
  • tomli 2.0.1.*
  • tornado 6.3.3.*
  • tqdm 4.66.1.*
  • traitlets 5.10.0.*
  • typer 0.9.0.*
  • typing-extensions 4.7.1.*
  • typing_extensions 4.7.1.*
  • typing_utils 0.1.0.*
  • tzdata 2023c.*
  • unicodedata2 15.0.0.*
  • uri-template 1.3.0.*
  • vine 5.0.0.*
  • wasabi 1.1.2.*
  • wcwidth 0.2.6.*
  • webcolors 1.13.*
  • webencodings 0.5.1.*
  • werkzeug 2.3.7.*
  • wheel 0.41.2.*
  • xlrd 2.0.1.*
  • xorg-libxau 1.0.11.*
  • xorg-libxdmcp 1.1.3.*
  • xz 5.2.6.*
  • yaml 0.2.5.*
  • zeromq 4.3.4.*
  • zipp 3.16.2.*
  • zstd 1.5.5.*