generate-a-corpus-with-an-llm

The notebook in this repository is provided for students in DIGI405 at the University of Canterbury to query a Large Language Model (LLM) to generate a corpus. Students can adapt the code to generate their own data for an assignment.

https://github.com/polsci/generate-a-corpus-with-an-llm

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

The notebook in this repository is provided for students in DIGI405 at the University of Canterbury to query a Large Language Model (LLM) to generate a corpus. Students can adapt the code to generate their own data for an assignment.

Basic Info
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Created almost 2 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License Citation

README.md

Generate a corpus with an LLM

Geoff Ford
https://geoffford.nz

GitHub Release DOI

The notebook in this repository is provided for students in DIGI405 at the University of Canterbury to query a Large Language Model (LLM) to generate a corpus. Students can adapt the examples to generate their own data.

I appreciate this notebook may be relevant for others. If you use it please retain the authorship information and links or cite it.

To learn more take a look at the notebook. I've also written a post on my website about how we are used this in class in 2024.

Changes are documented in the CHANGELOG.

Note on OpenRouter support

Note: that version 1.1.1 of this repository used the OpenRouter API. From version 1.1.2 the notebook uses Cerebras. You can download the 1.1.1 release from the repository if you want to use OpenRouter.

Note on Cerebras

The notebook provides code to query Cerebras's API. Cerebras provides an API endpoint that provides access to multiple LLMs with generous rate limits for development and testing. Cerebras provides good documentation and access to a range of well-regarded models. API calls are rate limited.

If anyone from Cerebras sees this - free access to API calls and the provided rate limits are very helpful for educators. Thank you!

Create a Cerebras API key

Go to Cerebras and click the link to "Get an API key". For students in DIGI405, you can signup with your UC email address. You should indicate you are a student. You will be shown an API key (partially obscured) and sample code. Copy and paste the key into your password manager for future use. There is a field in the notebook where you need to paste in your key. Don't share your key with anyone else.

Instructions for DIGI405 students - warning about excessive, rapid or repeated requests during lab times

This is the first semester we are using the Cerebras service in DIGI405, please avoid making excessive, rapid or repeated requests during the lab times as there is the potential this could cause our network to be flagged as malicious and create problems for your classmates accessing the API.

Installation

If you are a DIGI405 student running this on our JupyterHub instance, all required libraries are pre-installed. If you want to install this on your own machine, there is a requirements.txt file with required libraries. To install the required libraries run:

pip install -r requirements.txt

Owner

  • Name: Geoff Ford
  • Login: polsci
  • Kind: user
  • Location: Ōtautahi, NZ
  • Company: University of Canterbury Arts Digital Lab

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this repository for your teaching or research, please cite it as below."
authors:
- family-names: "Ford"
  given-names: "Geoffrey"
  orcid: "https://orcid.org/0000-0001-7088-4073"
title: "Generate a corpus with an LLM"
version: 1.1.2
doi: 10.5281/zenodo.13364418
date-released: 2024-08-23
url: "https://github.com/polsci/generate-a-corpus-with-an-LLM"

GitHub Events

Total
  • Release event: 1
  • Watch event: 2
  • Push event: 1
  • Create event: 1
Last Year
  • Release event: 1
  • Watch event: 2
  • Push event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • python-slugify *
  • requests *