ai-assisted-protocol-analysis-in-design-research

Code repository for the paper "Towards AI-Assisted Protocol Analysis in Design Research: Automating Question Labeling with GPT-4 According to Eris' (2004) Taxonomy." Presented at Design Computing and Cognition’24

https://github.com/ahmedshahriar/ai-assisted-protocol-analysis-in-design-research

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary

Keywords

chatgpt chatgpt-api conference-paper conversational-data dcc-24 design-computing-cognition-2024 design-research gpt-4 jupyter-notebook openai-api python research-paper research-paper-implementation
Last synced: 6 months ago · JSON representation ·

Repository

Code repository for the paper "Towards AI-Assisted Protocol Analysis in Design Research: Automating Question Labeling with GPT-4 According to Eris' (2004) Taxonomy." Presented at Design Computing and Cognition’24

Basic Info
  • Host: GitHub
  • Owner: ahmedshahriar
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage: https://rdcu.be/d03uv
  • Size: 36.1 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
chatgpt chatgpt-api conference-paper conversational-data dcc-24 design-computing-cognition-2024 design-research gpt-4 jupyter-notebook openai-api python research-paper research-paper-implementation
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

AI-Assisted Protocol Analysis in Design Research

This repository contains the code and documentation for the paper "Towards AI-Assisted Protocol Analysis in Design Research: Automating Question Labeling with GPT-4 According to Eris' (2004) Taxonomy."

Presented at the DCC 2024, the 11th International Conference on Design Computing and Cognition, Montreal, Canada. 8–10 July 2024.

Getting Started

Create a python virtual environment and install the required dependencies - pip insrall -r requirements.txt

Update .env with your settings. You can use .env.example as a reference:

  • OPENAI_API_KEY=<your-key>: Your OpenAI API key.
  • OPENAI_MODEL=gpt-4-1106-preview: GPT model version.
  • PROMPT_COST_PER_1000=0.01: Cost for 1,000 prompt tokens in USD.
  • COMPLETION_COST_PER_1000=0.03: Cost for 1,000 completion tokens in USD.
  • DATA_DIR=dataset: Dataset directory.
  • DATA_FILE=convo-qs-eris-labelled.xlsx: Your dataset. A sample dataset is available in the dataset folder.

Update the system message for the OpenAI Chat Completion API in the system-message.txt file.

Experiments

The experiments folder contains Jupyter notebooks detailing the experiments conducted for the paper.

  1. Determine the baseline performance by classifying a test set of standalone question utterances, with/without training set.
  2. Determine the effect of the size of the training set on the accuracy of labelling by the GPT-4.
  3. Determine the sensitivity of the results across multiple “runs” of the experiment.
  4. Determine whether the GPT-4 can also use context in the labelling task, and if it improves the labelling performance.

Findings

  • Training set could be useful
  • Labelling is probabilistic; a larger training set reduces uncertainty.
  • Providing context surrounding each question results in degraded performance which aligns with recent findings on LLMs’ struggle with long context

Owner

  • Name: Ahmed Shahriar Sakib
  • Login: ahmedshahriar
  • Kind: user
  • Location: Ontario, Canada
  • Company: @criticalml-uw

Software Engineer, an expert in web scraping & automation, data analytics, and machine learning. Kaggle Master.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Towards AI-Assisted Protocol Analysis in Design Research:
  Automating Question Labelling with GPT-4 According to
  Eris' (2004) Taxonomy
message: >-
  If you use this code repository, please cite the
  associated paper using the metadata provided in this
  repository
type: software
authors:
  - given-names: Ahmed Shahriar
    family-names: Sakib
    email: assakib@uwaterloo.ca
    affiliation: 'University of Waterloo, Canada'
    orcid: 'https://orcid.org/0000-0001-8611-9616'
  - given-names: Ada
    family-names: Hurst
    orcid: 'https://orcid.org/0000-0002-2481-8566'
    affiliation: 'University of Waterloo, Canada'
    email: ada.hurst@uwaterloo.ca
  - given-names: Frank
    family-names: Safayeni
    email: fsafayeni@uwaterloo.ca
    affiliation: 'University of Waterloo, Canada'
identifiers:
  - type: doi
    value: 10.1007/978-3-031-71918-9_3
    description: >-
      Link to the published conference paper at Springer
      Nature.
  - type: url
    value: 'https://rdcu.be/dYUou'
    description: >-
      Springer Nature SharedIt link for free access to the
      full-text PDF of the conference paper.
  - type: other
    value: 978-3-031-71917-2
    description: Print ISBN
  - type: other
    value: 978-3-031-71918-9
    description: Online ISBN
repository-code: >-
  https://github.com/ahmedshahriar/AI-Assisted-Protocol-Analysis-in-Design-Research
url: >-
  https://link.springer.com/chapter/10.1007/978-3-031-71918-9_3
abstract: >-
  This study explores the potential of large language models
  (LLM)-based tools, specifically GPT-4 -- a
  state-of-the-art language processing model - to assist in
  the analysis of verbal protocols of design. We focus on
  Eris' taxonomy, a well-established framework that
  classifies questions asked by participants in a
  design-focused task according to three broad categories:
  low-level, deep reasoning, and generative design
  questions. Using a large dataset of pre-classified
  questions from design review meetings, a series of
  experiments test GPT-4's capability in the categorization
  task and evaluate how different factors influence its
  precision. Results indicate that GPT-4 matches performance
  by human coders -- a promising result for design
  researchers who can benefit from this tool with little
  prior natural language processing expertise. Overall,
  findings offer insights into the strengths and limitations
  of LLMs in this context and suggest directions for future
  research into the use of LLM-based tools in qualitative
  analyses of design activity.
keywords:
  - design research
  - protocol analysis
  - Artificial Intelligence
  - GPT-4
  - Design cognition
  - LLM
  - Eris' taxonomy
  - qualitative research
  - Question classification
  - NLP
  - Automated question labeling
  - text analytics
  - Machine Learning
  - DCC’24
  - design computing
license: Apache-2.0
references:
  - type: conference-paper
    authors:
      - given-names: Ahmed Shahriar
        family-names: Sakib
        email: assakib@uwaterloo.ca
        affiliation: 'University of Waterloo, Canada'
        orcid: 'https://orcid.org/0000-0001-8611-9616'
      - given-names: Ada
        family-names: Hurst
        email: ada.hurst@uwaterloo.ca
        affiliation: 'University of Waterloo, Canada'
        orcid: 'https://orcid.org/0000-0002-2481-8566'
      - given-names: Frank
        family-names: Safayeni
        email: fsafayeni@uwaterloo.ca
        affiliation: 'University of Waterloo, Canada'
    title: >-
      Towards AI-Assisted Protocol Analysis in Design Research: Automating
      Question Labelling with GPT-4 According to Eris' (2004) Taxonomy
    collection-title: Design Computing and Cognition '24
    year: 2024
    month: 9
    editors:
      - given-names: John
        family-names: Gero
        email: jgero1@charlotte.edu
        affiliation: 'University of North Carolina at Charlotte, USA'
        orcid: 'https://orcid.org/0000-0001-9026-535X'
    publisher:
      name: Springer Nature Switzerland
      address: 'Cham, Switzerland'
    conference:
      name: Design Computing and Cognition'24
      location: Concordia University
      city: Montreal
      country: CA
      date-start: '2024-07-07'
      date-end: '2024-07-10'
    start: 38
    end: 55
    doi: 10.1007/978-3-031-71918-9_3

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
Last Year
  • Watch event: 1
  • Push event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 3
  • Total Committers: 1
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 3
  • Committers: 1
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
ahmedshahriar a****b@g****m 3

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • openai ==1.35.13
  • openpyxl ==3.1.2
  • pandas ==2.2.1
  • python-dotenv ==1.0.0
  • scikit-learn ==1.5.1
  • tiktoken ==0.7.0