https://github.com/data-prompt-query/dpq
dpq is an open-source python library that makes prompt-based data transformations and feature engineering easy
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 6 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary
Keywords
Repository
dpq is an open-source python library that makes prompt-based data transformations and feature engineering easy
Basic Info
Statistics
- Stars: 24
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
dpq: data. prompt. query.
dpq is a Python library that makes it easy to process data and engineer features using generative AI.
installation
bash
pip install dpq
quick start
```python import dpq
Initialize dpq agent with API configuration
dpqagent = dpq.Agent( url="ENDPOINTURL", apikey="YOURAPIKEY", model="MODELID", custommessagespath="OPTIONALPATHTOCUSTOMPROMPTS" )
Apply prompt to each item in list-like iterable such as pandas series
dpqagent.classifysentiment(df['some_column']) ```
adding functionalities
A function is defined by a JSON holding messages.
[
{
"role": "system",
"content": "You are a sentiment classifier. You classify statements as having
either a positive or negative sentiment. You return only one of two words:
positive, negative."
},
{
"role": "user",
"content": "I like dpq. It makes prompt-based feature engineering a breeze."
},
{
"role": "assistant",
"content": "positive"
}
]
To add a new function, simply add the JSON file to a prompts folder on your system and
initialize the dpq agent with the respective custom_messages_path pointing to the
folder. The function name is automatically set to the name of the JSON file.
Alternatively, you can pass the messages to generate a new function directly in your code.
```python
Define messages
messages = [ { "role": "system", "content": "You return the country of a city." }, { "role": "user", "content": "Berlin" }, { "role": "assistant", "content": "Germany" }, ]
Add new function
dpqagent.returncountry = dpqagent.generatefunction(messages)
Apply to a list
dpqagent.returncountry(["Berlin", "London", "Paris"]) ```
examples
In addition to the prompts in the prompts directory, which are loaded by default when
initializing the dpq.Agent(), we maintain a library of additional examples in the
examples directory. These are typically slightly less general-purpose. Feel free to
open a pull request and share prompts you have found useful with everyone!
features
- feature engineering using prompts
- library of standard functions
- parallelized by default
compatibility
dpq uses the requests library to send OpenAI-style
Chat Completions API requests. For GPT-3.5 Turbo, the configuration is as follows.
python
dpq_agent = dpq.Agent(
url="https://api.openai.com/v1/chat/completions",
api_key="YOUR_API_KEY",
model="gpt-3.5-turbo",
)
costs and speed
dpq currently comes as is without cost or speed guarantees. To still give a very rough
estimate: on a test data set of 1000 product reviews, the classify_sentiment.json
finishes in approx. 30 seconds (parallelized) on a standard Macbook and costs
$0.05 using gpt-3.5-turbo.
is using LLMs a good idea?
Recent studies have shown promising results using general-purpose LLMs for text annotation and classification. For example, Gilardi, Alizadeh, and Kubli (2023) and Törnberg (2023) report better-than-human performance. This is an active research area and we are looking forward to seeing more results in this field. In general, we believe that LLMs can deliver consistent, high-quality output resulting in scalability, reduced time and costs (see also Aguda (2024)).
debugging
dpq logs detailed error information to help with debugging. You can view these logs by
simply inspecting the errors variable of the class.
Owner
- Name: data-prompt-query
- Login: data-prompt-query
- Kind: organization
- Repositories: 1
- Profile: https://github.com/data-prompt-query
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Packages
- Total packages: 1
-
Total downloads:
- pypi 20 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 6
- Total maintainers: 1
pypi.org: dpq
dpq is an open-source python library that makes prompt-based data processing and feature engineering easy.
- Documentation: https://dpq.readthedocs.io/
- License: Apache-2.0
-
Latest release: 0.1.5
published almost 2 years ago
Rankings
Maintainers (1)
Dependencies
- python >=3.8.1,<3.13
- requests ^2.31.0
- tqdm ^4.66.2