kokomind

KokoMind: Can LLMs Understand Social Interactions?

https://github.com/chats-lab/kokomind

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: scholar.google
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

chatgpt deep-learning gpt-4 language-model neural-network nlp
Last synced: 6 months ago · JSON representation ·

Repository

KokoMind: Can LLMs Understand Social Interactions?

Basic Info
Statistics
  • Stars: 105
  • Watchers: 5
  • Forks: 8
  • Open Issues: 3
  • Releases: 1
Topics
chatgpt deep-learning gpt-4 language-model neural-network nlp
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

KokoMind

License Python 3.9+

This is the repo for KokoMind, a dataset with multi-party social interactions to evaluate LLMs' social understanding abilities. The repo contains:


Logo of KokoMind.

News

  • [2023.07.05] KokoMind is released at https://chats-lab.github.io/KokoMind/.

Demo

https://github.com/CHATS-lab/KokoMind/assets/13882237/731427bf-0d3c-4870-b36e-e146f954309b

Dataset

KokoMind contains 150 complex multi-party social interactions (50 per source) with free-text questions and answers. To ensure diversity and scalability and avoid data contamination, all the social interactions, questions, and answers are generated by GPT-4 and verified by human experts later. These generations are based on three different sources:

  • GPT-4-only: This subset is created solely by GPT-4 through prompting, without grounding on existing sources.
  • Movie-based: To avoid data contamination, this portion of the data is grounded on diverse scenarios pulled from movies released after 2022. GPT-4 shapes these situations, maintaining the core essence while adding its own elements.
  • ToMi-based: This segment contains data backboned by a simulated dataset, ToMi, which involves moving physical objects to different places, a classic test for theory of mind. These social interactions are again embellished and expanded by GPT-4.

For each social interaction, we ask various questions designed to probe the following aspects of social understanding.

  • Theory of Mind: Questions evaluating understanding of others' mental states and perspectives.
  • Social Norm: Questions aiming to discern societal values and norms within the situations.
  • Emotion Recognition: Questions targeted at identifying and understanding emotional elements within the context.
  • Social Relation: Queries focusing on interpersonal dynamics and relationships.
  • Counterfactual Questions: Hypothetical queries designed to explore alternative outcomes or possibilities.
  • Social Advice: Questions eliciting advice or action recommendations relevant to the given situation.

question_nonverbal_yes_v0.1.json contains 770 samples in total. This JSON Lines file is a list of dictionaries, with each dictionary contains the following fields:

  • question_id: int, the unique ID of the question.
  • text: str, social interaction context and question.
  • answer: str, GPT-4 answer that has been further verified by human.
  • source: str, one of the three data sources: gpt-4, movie, tomi.
  • category: str, one of six question categories: ToM, Social Norm, Emotion Recognition, Social Relation, Counterfactual, Social Advice.

question_nonverbal_no_v0.1.json contains the same social interactions and questions but but with the non-verbal cues in the parenthesis (e.g., nervously sipping coffee, etc) removed from the context.

Evaluation

Pre-requisite

bash pip install -r requirements.txt export OPENAI_API_KEY=<your_api_key> export ANTHROPIC_API_KEY=<your_api_key>

Generate model answers

``` bash

Generate local model anwers

Use vicuna-7b as an example

python eval/getmodelanswer.py --model-path ${PATHTOLOCALHFMODEL} --model-id vicuna-7b --question-file data/questionnonverbalyesv0.1.jsonl --answer-file data/answer/answervicuna-7b.jsonl --num-gpus 8

GPT-3 answer (reference model by alpaca-eval)

python eval/qabaselinegpt3.py -q data/questionnonverbalyesv0.1.jsonl -o data/answer/answergpt3.jsonl

GPT-3.5 answer

python eval/qabaselinegpt35.py -q data/questionnonverbalyesv0.1.jsonl -o data/answer/answergpt35.jsonl

GPT-4.0 answer

python eval/qabaselinegpt4.py -q data/questionnonverbalyesv0.1.jsonl -o data/answer/answergpt4.jsonl

Claude answer

python eval/qabaselineclaude.py -q data/questionnonverbalyesv0.1.jsonl -o data/answer/answerclaude.jsonl ```

Run evaluation

Our evaluation is based on Alpaca-Eval.

```bash

Convert to alpaca_eval input format

python eval/generatealpacaeval.py -q data/questionnonverbalyesv0.1.jsonl -a data/answer/answergpt3.jsonl -o data/alpacaeval/answergpt3.json

alpacaeval makeleaderboard --leaderboardpath data/alpacaresults/leaderboard.csv --allmodeloutputs "./data/alpacaeval/answer*" --referenceoutputs data/alpacaeval/answergpt3.json --isoverwrite_leaderboard True ```

License

This project is an early-stage research showcase, designed solely for non-commercial purposes. It adheres to OpenAI's data usage terms, and ShareGPT's privacy practices. Let us know if you spot any potential violations. The software's code is available under the Apache License 2.0.

Acknowledgement

We would like to thank Yejin Choi from UW, Louis-Philippe Morency from CMU, Jason Weston from Meta, and Diyi Yang from Stanford for their enlightening dialogues and constructive inputs. The theoretical foundation of KokoMind is based on Liang's PhD research with Song-Chun Zhu from Peking University, Tsinghua University and Beijing Institute for General Artificial Intelligence (BIGAI) and Ying Nian Wu from UCLA.

Citation

Please cite our work if you find it useful.

bib @misc{Shi_KokoMind_Can_Large_2023, author = {Shi, Weiyan and Qiu, Liang and Xu, Dehong and Sui, Pengwei and Lu, Pan and Yu, Zhou}, title = {{KokoMind: Can Large Language Models Understand Social Interactions?}}, month = jul, year = {2023}, url = {https://chats-lab.github.io/KokoMind/} }

Owner

  • Name: CHATS-lab
  • Login: CHATS-lab
  • Kind: organization

Conversation, Human-AI Technology, and Security Lab

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it as below.
title: "KokoMind: Can Large Language Models Understand Social Interactions?"
authors:
  - family-names: Shi
    given-names: Weiyan
  - family-names: Qiu
    given-names: Liang
  - family-names: Xu
    given-names: Dehong
  - family-names: Sui
    given-names: Pengwei
  - family-names: Lu
    given-names: Pan
  - family-names: Yu
    given-names: Zhou
date-released: 2023-07-05
url: https://github.com/CHATS-lab/KokoMind
preferred-citation:
  type: data
  title: "KokoMind: Can Large Language Models Understand Social Interactions?"
  authors:
  - family-names: Shi
    given-names: Weiyan
  - family-names: Qiu
    given-names: Liang
  - family-names: Xu
    given-names: Dehong
  - family-names: Sui
    given-names: Pengwei
  - family-names: Lu
    given-names: Pan
  - family-names: Yu
    given-names: Zhou
  month: 7
  year: 2023
  url: https://chats-lab.github.io/KokoMind/

GitHub Events

Total
  • Watch event: 3
  • Fork event: 1
Last Year
  • Watch event: 3
  • Fork event: 1