ai-assisted-protocol-analysis-in-design-research
Code repository for the paper "Towards AI-Assisted Protocol Analysis in Design Research: Automating Question Labeling with GPT-4 According to Eris' (2004) Taxonomy." Presented at Design Computing and Cognition’24
https://github.com/ahmedshahriar/ai-assisted-protocol-analysis-in-design-research
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary
Keywords
Repository
Code repository for the paper "Towards AI-Assisted Protocol Analysis in Design Research: Automating Question Labeling with GPT-4 According to Eris' (2004) Taxonomy." Presented at Design Computing and Cognition’24
Basic Info
- Host: GitHub
- Owner: ahmedshahriar
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://rdcu.be/d03uv
- Size: 36.1 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
AI-Assisted Protocol Analysis in Design Research
This repository contains the code and documentation for the paper "Towards AI-Assisted Protocol Analysis in Design Research: Automating Question Labeling with GPT-4 According to Eris' (2004) Taxonomy."
Presented at the DCC 2024, the 11th International Conference on Design Computing and Cognition, Montreal, Canada. 8–10 July 2024.
Getting Started
Create a python virtual environment and install the required dependencies -
pip insrall -r requirements.txt
Update .env with your settings. You can use .env.example as a reference:
OPENAI_API_KEY=<your-key>: Your OpenAI API key.OPENAI_MODEL=gpt-4-1106-preview: GPT model version.PROMPT_COST_PER_1000=0.01: Cost for 1,000 prompt tokens in USD.COMPLETION_COST_PER_1000=0.03: Cost for 1,000 completion tokens in USD.DATA_DIR=dataset: Dataset directory.DATA_FILE=convo-qs-eris-labelled.xlsx: Your dataset. A sample dataset is available in thedatasetfolder.
Update the system message for the OpenAI Chat Completion API in the system-message.txt file.
Experiments
The experiments folder contains Jupyter notebooks detailing the experiments conducted for the paper.
- Determine the baseline performance by classifying a test set of standalone question utterances, with/without training set.
- Determine the effect of the size of the training set on the accuracy of labelling by the GPT-4.
- Determine the sensitivity of the results across multiple “runs” of the experiment.
- Determine whether the GPT-4 can also use context in the labelling task, and if it improves the labelling performance.
Findings
- Training set could be useful
- Labelling is probabilistic; a larger training set reduces uncertainty.
- Providing context surrounding each question results in degraded performance which aligns with recent findings on LLMs’ struggle with long context
- One notable study by Liu et al. (2024) Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12:157–173.
Owner
- Name: Ahmed Shahriar Sakib
- Login: ahmedshahriar
- Kind: user
- Location: Ontario, Canada
- Company: @criticalml-uw
- Website: https://ahmedshahriar.com
- Twitter: ahmed__shahriar
- Repositories: 5
- Profile: https://github.com/ahmedshahriar
Software Engineer, an expert in web scraping & automation, data analytics, and machine learning. Kaggle Master.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
Towards AI-Assisted Protocol Analysis in Design Research:
Automating Question Labelling with GPT-4 According to
Eris' (2004) Taxonomy
message: >-
If you use this code repository, please cite the
associated paper using the metadata provided in this
repository
type: software
authors:
- given-names: Ahmed Shahriar
family-names: Sakib
email: assakib@uwaterloo.ca
affiliation: 'University of Waterloo, Canada'
orcid: 'https://orcid.org/0000-0001-8611-9616'
- given-names: Ada
family-names: Hurst
orcid: 'https://orcid.org/0000-0002-2481-8566'
affiliation: 'University of Waterloo, Canada'
email: ada.hurst@uwaterloo.ca
- given-names: Frank
family-names: Safayeni
email: fsafayeni@uwaterloo.ca
affiliation: 'University of Waterloo, Canada'
identifiers:
- type: doi
value: 10.1007/978-3-031-71918-9_3
description: >-
Link to the published conference paper at Springer
Nature.
- type: url
value: 'https://rdcu.be/dYUou'
description: >-
Springer Nature SharedIt link for free access to the
full-text PDF of the conference paper.
- type: other
value: 978-3-031-71917-2
description: Print ISBN
- type: other
value: 978-3-031-71918-9
description: Online ISBN
repository-code: >-
https://github.com/ahmedshahriar/AI-Assisted-Protocol-Analysis-in-Design-Research
url: >-
https://link.springer.com/chapter/10.1007/978-3-031-71918-9_3
abstract: >-
This study explores the potential of large language models
(LLM)-based tools, specifically GPT-4 -- a
state-of-the-art language processing model - to assist in
the analysis of verbal protocols of design. We focus on
Eris' taxonomy, a well-established framework that
classifies questions asked by participants in a
design-focused task according to three broad categories:
low-level, deep reasoning, and generative design
questions. Using a large dataset of pre-classified
questions from design review meetings, a series of
experiments test GPT-4's capability in the categorization
task and evaluate how different factors influence its
precision. Results indicate that GPT-4 matches performance
by human coders -- a promising result for design
researchers who can benefit from this tool with little
prior natural language processing expertise. Overall,
findings offer insights into the strengths and limitations
of LLMs in this context and suggest directions for future
research into the use of LLM-based tools in qualitative
analyses of design activity.
keywords:
- design research
- protocol analysis
- Artificial Intelligence
- GPT-4
- Design cognition
- LLM
- Eris' taxonomy
- qualitative research
- Question classification
- NLP
- Automated question labeling
- text analytics
- Machine Learning
- DCC’24
- design computing
license: Apache-2.0
references:
- type: conference-paper
authors:
- given-names: Ahmed Shahriar
family-names: Sakib
email: assakib@uwaterloo.ca
affiliation: 'University of Waterloo, Canada'
orcid: 'https://orcid.org/0000-0001-8611-9616'
- given-names: Ada
family-names: Hurst
email: ada.hurst@uwaterloo.ca
affiliation: 'University of Waterloo, Canada'
orcid: 'https://orcid.org/0000-0002-2481-8566'
- given-names: Frank
family-names: Safayeni
email: fsafayeni@uwaterloo.ca
affiliation: 'University of Waterloo, Canada'
title: >-
Towards AI-Assisted Protocol Analysis in Design Research: Automating
Question Labelling with GPT-4 According to Eris' (2004) Taxonomy
collection-title: Design Computing and Cognition '24
year: 2024
month: 9
editors:
- given-names: John
family-names: Gero
email: jgero1@charlotte.edu
affiliation: 'University of North Carolina at Charlotte, USA'
orcid: 'https://orcid.org/0000-0001-9026-535X'
publisher:
name: Springer Nature Switzerland
address: 'Cham, Switzerland'
conference:
name: Design Computing and Cognition'24
location: Concordia University
city: Montreal
country: CA
date-start: '2024-07-07'
date-end: '2024-07-10'
start: 38
end: 55
doi: 10.1007/978-3-031-71918-9_3
GitHub Events
Total
- Watch event: 1
- Push event: 1
Last Year
- Watch event: 1
- Push event: 1
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| ahmedshahriar | a****b@g****m | 3 |
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- openai ==1.35.13
- openpyxl ==3.1.2
- pandas ==2.2.1
- python-dotenv ==1.0.0
- scikit-learn ==1.5.1
- tiktoken ==0.7.0