llm-math-education

Retrieval augmented generation for middle-school math question answering and hint generation.

https://github.com/digitalharborfoundation/llm-math-education

Keywords

education hint-generation language-models python question-answering retrieval-augmented-generation

Last synced: 6 months ago · JSON representation ·

Repository

Retrieval augmented generation for middle-school math question answering and hint generation.

Basic Info

Host: GitHub
Owner: DigitalHarborFoundation
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 5.15 MB

Statistics

Stars: 41
Watchers: 4
Forks: 5
Open Issues: 1
Releases: 0

Topics

education hint-generation language-models python question-answering retrieval-augmented-generation

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

llm-math-education: Retrieval augmented generation for middle-school math question answering and hint generation

$License$

How can we incorporate trusted, external math knowledge in generated answers to student questions?

llm-math-education is a Python package that implements basic retrieval augmented generation (RAG) and contains prompts for two primary use cases: general math question-answering (QA) and hint generation. It is currently designed to work only with the OpenAI generative chat API.

This project is hosted on GitHub. Feel free to open an issue with questions, comments, or requests.

A fork of this repository at DigitalHarborFoundation/rag-for-math-qa contains research code and data used to publish our workshop paper.

Demo

You can explore the effects of the retrieval-augmented generation approach by using our Streamlit app. You'll need to provide your own OpenAI API key.

Demo link: https://llm-math-education.streamlit.app

Installation

The llm-math-education package is available on PyPI.

bash pip install llm-math-education

Usage

We assume that OPENAI_API_KEY is provided as an environment variable or set via openai.api_key = your_api_key.

Preliminary setup: specify a directory in which to save the embedding database. python from pathlib import Path demo_dir = Path("data") / "demo" demo_dir.mkdir(exist_ok=True)

We'll use llm-math-education to answer a student question. python student_question = "How do I identify common factors?"

These usage examples can be seen together in src/usage_demo.py.

Acquiring textbook data for retrieval augmented generation

To do retrieval augmented generation, we need data. We'll use an OpenStax Pre-algebra textbook as our retrieval data.

Note: the llm_math_education.openstax module relies on requests and beautifulsoup4, which are not listed as dependencies. Install them yourself with pip if you want to download and parse OpenStax textbooks.

```python from llmmatheducation import openstax prealgebratextbookurl = "https://openstax.org/books/prealgebra-2e/pages/1-introduction" textbookdata = openstax.cacheopenstaxtextbookcontents(prealgebratextbookurl, demodir / "openstax") df = openstax.getsubsectiondataframe(textbookdata)

df.columns Index(['title', 'content', 'index', 'chapter', 'section'], dtype='object') ```

The parsing code is probably very brittle; it has only been tested with the Pre-algebra textbook.

Creating an embedding lookup database from a dataframe

python from llm_math_education import retrieval db_name = "openstax_prealgebra" text_column_to_embed = "content" openstax_db = retrieval.RetrievalDb(demo_dir, db_name, text_column_to_embed, df) openstax_db.create_embeddings() openstax_db.save_df()

Loading an existing embedding database

Here, we compute the "distance" in embedding space between the student question and the documents in the database.

```python openstaxdb = retrieval.RetrievalDb(demodir, "openstaxprealgebra", "content") distances = openstaxdb.computestringdistances(student_question)

distances [0.21348877 0.24298186 0.25825211 ... 0.25500673 0.24491884 0.22458498] ```

Using the database to do retrieval augmented generation

Defining a retrieval strategy

python from llm_math_education import retrieval_strategies db_info = retrieval.DbInfo( openstax_db, max_texts=1, ) strategy = retrieval_strategies.MappedEmbeddingRetrievalStrategy( { "openstax_section": db_info, }, )

The key in the dictionary passed to the MappedEmbedding retrieval strategy identifies the key to be replaced in the prompt, in Python string formatting notation.

Starting a chat conversation with RAG

We'll use a PromptManager to build chat messages from a prompt, a retrieval strategy, and a user query.

```python from llmmatheducation import promptutils pm = promptutils.PromptManager() pm.setretrievalstrategy(strategy) pm.setintromessages( [ { "role": "user", "content": """Answer this question: {user_query}

Reference this text in your answer: {openstaxsection}""", }, ], ) messages = pm.buildquery(student_question)

messages [{'role': 'user', 'content': 'Answer this question: How do I identify common factors?' '' 'Reference this text in your answer:' 'We will now look at an expression containing a product that is raised to a power. Look for a pattern. The exponent applies to each of the factors. This leads to the Product to a Power Property for Exponents. An example with numbers helps to verify this property:'}] ```

We can pass the formatted messages to the OpenAI API.

```python import openai completion = openai.ChatCompletion.create( model="gpt-3.5-turbo-0613", messages=messages, ) assistant_message = completion["choices"][0]["message"]

assistant_message { "role": "assistant", "content": "To identify common factors, you need to look for a pattern in an expression containing a product raised to a power. The exponent applies to each of the factors in this case. \n\nFor example, let's consider the expression (ab)^2. Here, (ab) is the product, and the exponent 2 applies to both 'a' and 'b'. To identify the common factors, you can separate the product into its individual factors:\n\n(ab)^2 = ab * ab\n\nNow, you can see that both 'a' and 'b' appear as factors in the expression. Therefore, 'a' and 'b' are the common factors. By identifying the factors that appear in multiple terms, you can determine the common factors of an expression.\n\nUsing numbers to verify this property, suppose we have the expression (2*3)^2, which simplifies to (6)^2. In this case, the common factor is 6, as both 2 and 3 are factors of 6." } ```

Using PromptManager for multi-turn chat conversations

Add stored messages to continue the conversation.

python pm.add_stored_message(assistant_message) messages = pm.build_query("I have a follow-up question...")

Clear stored messages to start a new conversation on the next call to build_query().

python pm.clear_stored_messages()

Using built-in prompts for math QA or hint generation

python from llm_math_education.prompts import mathqa as mathqa_prompts pm.set_intro_messages(mathqa_prompts.intro_prompts["general_math_qa_intro"])

Development

See the developer's guide.

Primary contributor:

Zachary Levonian (levon003@umn.edu)

Other contributors:

Owen Henkel
Bill Roberts

FAQ

How can I cite this work?

You should cite our paper at the NeurIPS’23 Workshop on Generative AI for Education (GAIED).

You can cite this using the CITATION.cff file above (and the "Cite this repository" drop-down on GitHub for BibTeX) or the following citation:

Zachary Levonian, Chenglu Li, Wangda Zhu, Anoushka Gade, Owen Henkel, Millie-Ellen Postle, and Wanli Xing. 2023. Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference. In NeurIPS’23 Workshop on Generative AI for Education (GAIED), New Orleans, USA. DOI:https://doi.org/10.48550/arXiv.2310.03184
How should I use this code?

We aren't currently planning to add additional features to this package, although pull requests and bug reports are welcome.

You should use the Python package as a dependency if you want a quick way to try retrieval augmented generation with the OpenAI API. However, this code is likely more useful as inspiration. You should fork or otherwise borrow from various components if you want some of the specific functionality implemented here. Heres a quick overview of the most important modules and their implementation: - llm_math_education.prompts.{mathqa,hints} - Contains the prompt templates we use for math QA and hint generation. - llm_math_education.prompt_utils - PromptManager is an abstraction for iteratively creating conversations that include a retrieval component. - llm_math_education.retrieval_strategies - RetrievalStrategy and its implementations demonstrates implementations that use embeddings to fill a slot within a prompt template with relevant documents. - llm_math_education.retrieval - RetrievalDb creates an embedding-backed in-memory lookup database for a Pandas DataFrame with a text column. - llm_math_education.logit_bias - Using the most frequent tokens in a retrieved document, creates a logit_bias that can be used to increase the faithfulness of generations based on that retrieved document.

What license does this repository use?

The code is released under the MIT license. The example data used in the Streamlit app is released CC BY-SA 4.0; see the data/app_data folder for more info. Additional details on the data are present in the developer's guide.

Owner

Name: Digital Harbor Foundation
Login: DigitalHarborFoundation
Kind: organization
Location: Baltimore, MD

Website: http://www.digitalharbor.org
Repositories: 31
Profile: https://github.com/DigitalHarborFoundation

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite the paper as below."
date-released: 2023-10-04
preferred-citation:
  type: conference-paper
  title: "Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference"
  abstract: "For middle-school math students, interactive question-answering (QA) with tutors is an effective way to learn. The flexibility and emergent capabilities of generative large language models (LLMs) has led to a surge of interest in automating portions of the tutoring process - including interactive QA to support conceptual discussion of mathematical concepts. However, LLM responses to math questions can be incorrect or mismatched to the educational context - such as being misaligned with a school's curriculum. One potential solution is retrieval-augmented generation (RAG), which involves incorporating a vetted external knowledge source in the LLM prompt to increase response quality. In this paper, we designed prompts that retrieve and use content from a high-quality open-source math textbook to generate responses to real student questions. We evaluate the efficacy of this RAG system for middle-school algebra and geometry QA by administering a multi-condition survey, finding that humans prefer responses generated using RAG, but not when responses are too grounded in the textbook content. We argue that while RAG is able to improve response quality, designers of math QA systems must consider trade-offs between generating responses preferred by students and responses closely matched to specific educational resources."
  doi: 10.48550/arXiv.2310.03184
  year: 2023
  conference:
      name: "NeurIPS'23 Workshop on Generative AI for Education (GAIED)"
      city: "New Orleans"
      country: "US"
      date-start: "2023-12-15"
      date-end: "2023-12-15"
  authors:
  - family-names: Levonian
    given-names: Zachary
    orcid: https://orcid.org/0000-0002-8932-1489
  - family-names: Li
    given-names: Chenglu
  - family-names: Zhu
    given-names: Wangda
  - family-names: Gade
    given-names: Anoushka
  - family-names: Henkel
    given-names: Owen
  - family-names: Postle
    given-names: Millie-Ellen
  - family-names: Xing
    given-names: Wanli
authors:
  - family-names: Levonian
    given-names: Zachary
    orcid: https://orcid.org/0000-0002-8932-1489
  - family-names: Henkel
    given-names: Owen
  - family-names: Roberts
    given-names: Bill
title: "llm-math-education: Retrieval augmented generation for middle-school math question answering and hint generation"
abstract: "How can we incorporate trusted, external math knowledge in generated answers to student questions? llm-math-education is a Python package that implements basic retrieval augmented generation (RAG) and contains prompts for two primary use cases: general math question-answering (QA) and hint generation."
version: 0.5.1
doi: 10.5281/zenodo.8284412
date-released: 2023-08-25
license: MIT
repository-code: "https://github.com/levon003/llm-math-education"

GitHub Events

Total

Watch event: 15
Delete event: 1
Push event: 4
Pull request event: 2
Create event: 1

Last Year

Watch event: 15
Delete event: 1
Push event: 4
Pull request event: 2
Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 1 day
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 1 day
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 1