https://github.com/data-prompt-query/dpq

dpq is an open-source python library that makes prompt-based data transformations and feature engineering easy

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 6 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary

Keywords

data data-analytics data-engineering data-science feature-engineering llm machine-learning prompt python

Last synced: 5 months ago · JSON representation

Repository

dpq is an open-source python library that makes prompt-based data transformations and feature engineering easy

Basic Info

Host: GitHub
Owner: data-prompt-query
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 17.6 KB

Statistics

Stars: 24
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Topics

data data-analytics data-engineering data-science feature-engineering llm machine-learning prompt python

Created almost 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License

dpq: data. prompt. query.

dpq is a Python library that makes it easy to process data and engineer features using generative AI.

dpq_demo

installation

bash pip install dpq

quick start

```python import dpq

Initialize dpq agent with API configuration

dpqagent = dpq.Agent( url="ENDPOINTURL", apikey="YOURAPIKEY", model="MODELID", custommessagespath="OPTIONALPATHTOCUSTOMPROMPTS" )

Apply prompt to each item in list-like iterable such as pandas series

dpqagent.classifysentiment(df['some_column']) ```

adding functionalities

A function is defined by a JSON holding messages.

[ { "role": "system", "content": "You are a sentiment classifier. You classify statements as having either a positive or negative sentiment. You return only one of two words: positive, negative." }, { "role": "user", "content": "I like dpq. It makes prompt-based feature engineering a breeze." }, { "role": "assistant", "content": "positive" } ]

To add a new function, simply add the JSON file to a prompts folder on your system and initialize the dpq agent with the respective custom_messages_path pointing to the folder. The function name is automatically set to the name of the JSON file.

Alternatively, you can pass the messages to generate a new function directly in your code.

```python

Define messages

messages = [ { "role": "system", "content": "You return the country of a city." }, { "role": "user", "content": "Berlin" }, { "role": "assistant", "content": "Germany" }, ]

Add new function

dpqagent.returncountry = dpqagent.generatefunction(messages)

Apply to a list

dpqagent.returncountry(["Berlin", "London", "Paris"]) ```

examples

In addition to the prompts in the prompts directory, which are loaded by default when initializing the dpq.Agent(), we maintain a library of additional examples in the examples directory. These are typically slightly less general-purpose. Feel free to open a pull request and share prompts you have found useful with everyone!

features

feature engineering using prompts
library of standard functions
parallelized by default

compatibility

dpq uses the requests library to send OpenAI-style Chat Completions API requests. For GPT-3.5 Turbo, the configuration is as follows.

python dpq_agent = dpq.Agent( url="https://api.openai.com/v1/chat/completions", api_key="YOUR_API_KEY", model="gpt-3.5-turbo", )

costs and speed

dpq currently comes as is without cost or speed guarantees. To still give a very rough estimate: on a test data set of 1000 product reviews, the classify_sentiment.json finishes in approx. 30 seconds (parallelized) on a standard Macbook and costs $0.05 using gpt-3.5-turbo.

is using LLMs a good idea?

Recent studies have shown promising results using general-purpose LLMs for text annotation and classification. For example, Gilardi, Alizadeh, and Kubli (2023) and Törnberg (2023) report better-than-human performance. This is an active research area and we are looking forward to seeing more results in this field. In general, we believe that LLMs can deliver consistent, high-quality output resulting in scalability, reduced time and costs (see also Aguda (2024)).

debugging

dpq logs detailed error information to help with debugging. You can view these logs by simply inspecting the errors variable of the class.

Owner

Name: data-prompt-query
Login: data-prompt-query
Kind: organization

Repositories: 1
Profile: https://github.com/data-prompt-query

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Packages

Total packages: 1
Total downloads:
- pypi 20 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 6
Total maintainers: 1

pypi.org: dpq

dpq is an open-source python library that makes prompt-based data processing and feature engineering easy.

Documentation: https://dpq.readthedocs.io/
License: Apache-2.0
Latest release: 0.1.5
published almost 2 years ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 20 Last month

Rankings

Dependent packages count: 9.6%

Average: 36.4%

Dependent repos count: 63.2%

Maintainers (1)

dpq

Last synced: 6 months ago

Dependencies

pyproject.toml pypi

python >=3.8.1,<3.13
requests ^2.31.0
tqdm ^4.66.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/data-prompt-query/dpq

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

dpq: data. prompt. query.

installation

quick start

Initialize dpq agent with API configuration

Apply prompt to each item in list-like iterable such as pandas series

adding functionalities

Define messages

Add new function

Apply to a list

examples

features

compatibility

costs and speed

is using LLMs a good idea?

debugging

Owner

GitHub Events

Total

Last Year

Packages

pypi.org: dpq

Rankings

Maintainers (1)

Dependencies