nl2query

A framework for converting natural language text inputs to corresponding Pandas, MongoDB, Kusto and Neo4j (Cypher) queries.

https://github.com/chirayu-tripathi/nl2query

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary

Keywords

cypher-query-language database kusto-query-language mongodb mql natural-language-processing nl2query pandas text-to-sql

Last synced: 6 months ago · JSON representation ·

Repository

A framework for converting natural language text inputs to corresponding Pandas, MongoDB, Kusto and Neo4j (Cypher) queries.

Basic Info

Host: GitHub
Owner: Chirayu-Tripathi
License: mit
Language: Python
Default Branch: main
Homepage: https://pypi.org/project/nl2query/
Size: 65.4 KB

Statistics

Stars: 88
Watchers: 7
Forks: 8
Open Issues: 2
Releases: 5

Topics

cypher-query-language database kusto-query-language mongodb mql natural-language-processing nl2query pandas text-to-sql

Created over 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Changelog License Citation

nl2query

Convert natural language text inputs to Pandas, MongoDB, Kusto, and Cypher(Neo4j) queries. The models used are fine-tuned versions of CodeT5+ 220m and Phi2 model.

Getting started

You can get nl2query from PyPI, using

bash python -m pip install nl2query

Example usage

1. Pandas Query

Suppose you want to convert the textual question to pandas query, follow the code below

```py from nl2query import PandasQuery

titanic = pd.read_csv('/path/titanic.csv') queryfier = PandasQuery(titanic, 'titanic')

queryfier.generatequery('''list all people who paid more fare than the fare paid by 'Braund, Mr. Owen Harris' ''') queryfier.generatequery('''find the names of passengers with age greater than 35 and containing Heath in their name''') queryfier.generate_query('''which cabinet has average age less than 21?''') #Groupby Query

```

2. MongoDB Query

Suppose you want to convert the textual question to Mongo query, follow the instruction code below

MongoDB query using CodeT5

The generatequery method takes a textual query and returns a MongoDB query. It also accepts optional parameters to control the generation process, such as numbeams, maxlength, repetitionpenalty, lengthpenalty, earlystopping, topp, topk, and numreturnsequences.

NOTE: GPU will be required to run Phi2 as quantization is enabled using loadin4bit.

```py from nl2query import MongoQuery import pymongo # import if performing analysis using python client keys = ['id', 'index', 'passengerid', 'survived', 'Pclass', 'name', 'sex', 'age', 'sibsp', 'parch', 'ticket', 'fare', 'cabin', 'embarked'] #keys present in the collection to be queried. queryfier = MongoQuery('T5', collectionkeys = keys, collectionname = 'titanic') queryfier.generatequery('''which pclass has the minimum average fare?''')

keys = ['id', 'index', 'totalbill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'] queryfier = MongoQuery('T5', collectionkeys = keys, collectionname = 'titanic') queryfier.generate_query('''find the day on which combined sales was highest''')

``In the above code the keys can be found by running the following piecedb.tips.find_one({}).keys()`

MongoDB query using Phi2

The generatequery method takes a database schema and a textual query and returns a MongoDB query. It also accepts optional parameters to control the generation process, such as maxlength, norepeatngramsize, and repetitionpenalty. The Phi2 model performs better than the CodeT5+ model.

```py from nl2query import MongoQuery schema = shipwreck = '''{ "collections": [ { "name": "shipwrecks", "indexes": [ { "key": { "id": 1 } }, { "key": { "featuretype": 1 } }, { "key": { "chart": 1 } }, { "key": { "latdec": 1, "londec": 1 } } ], "uniqueIndexes": [], "document": { "properties": { "id": { "bsonType": "string" }, "recrd": { "bsonType": "string" }, "vesslterms": { "bsonType": "string" }, "featuretype": { "bsonType": "string" }, "chart": { "bsonType": "string" }, "latdec": { "bsonType": "double" }, "londec": { "bsonType": "double" }, "gpquality": { "bsonType": "string" }, "depth": { "bsonType": "string" }, "soundingtype": { "bsonType": "string" }, "history": { "bsonType": "string" }, "quasou": { "bsonType": "string" }, "watlev": { "bsonType": "string" }, "coordinates": { "bsonType": "array", "items": { "bsonType": "double" } } } } } ], "version": 1 }'''

queryfier = MongoQuery('Phi2') text = 'Find the count of shipwrecks for each unique combination of "latdec" and "longdec"' queryfier.generatequery(schema, text, maxlength = 1024)

text = 'Find the total count of shipwreck for each unique category of chart' queryfier.generatequery(schema, text, maxlength = 1024)

```

3. Kusto Query

Suppose you want to convert the textual question to Kusto query, follow the code below

```py from nl2query import KustoQuery cols = ['conference', 'sessionid', 'sessiontitle', 'sessiontype', 'owner', 'participants', 'URL', 'level', 'sessionlocation', 'starttime', 'duration', 'timeandduration', 'kustoaffinity']

queryfier = KustoQuery(cols, 'ConferenceSessions') queryfier.generate_query('''find the session ids which have duration greater than 10 and having Manoj Raheja as the owner''') ```

4. Cypher(Neo4j) Query

Suppose you want to convert the textual question to Cypher query, follow the code below

```py from nl2query import CypherQuery

nodelabels = {'User': ['displayname', 'uuid'], 'Comment': ['score', 'link', 'uuid']} relationships = ['COMMENTED'] queryfier = CypherQuery(nodelabels, relationships) queryfier.generatequery('list the links of all the comments done by "jose_bacoy"')

nodelabels = {'Case': ['gender', 'reportdate', 'ageunit', 'reporteroccupation', 'primaryid', 'age', 'eventDate'], 'Outcome': ['code', 'outcome']} relationships = ['RESULTEDIN'] queryfier = CypherQuery(nodelabels, relationships) queryfier.generatequery('find the outcomes of people who are female and below the age of 32')

nodelabels = {'Person': ['id', 'name', 'dob']} relationships = [] queryfier = CypherQuery(nodelabels, relationships) res = queryfier.generate_query('find the dob of people who have "Andreia" in their name')

```

Changelog

Refer to the CHANGELOG.md file.

Owner

Name: Chirayu Tripathi
Login: Chirayu-Tripathi
Kind: user
Location: Gainesville, FL
Company: University of Florida, College of Medicine

Twitter: ChirayuTripath7
Repositories: 2
Profile: https://github.com/Chirayu-Tripathi

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: nl2query
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Chirayu
    family-names: Tripathi
    email: chirayutripathi7@gmail.com
    orcid: 'https://orcid.org/0000-0001-9495-0063'
repository-code: 'https://github.com/Chirayu-Tripathi/nl2query.git'
abstract: >-
  Convert natural language text inputs to Pandas, MongoDB,
  Kusto, and Cypher(Neo4j) queries. The models used are
  fine-tuned versions of CodeT5+ 220m and Phi2 models.
license: MIT
version: 0.1.6
date-released: '2024-04-27'

GitHub Events

Total

Issues event: 1
Watch event: 14
Issue comment event: 4

Last Year

Issues event: 1
Watch event: 14
Issue comment event: 4

Packages

Total packages: 1
Total downloads:
- pypi 196 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 9
Total maintainers: 1

pypi.org: nl2query

Documentation: https://nl2query.readthedocs.io/
License: mit
Latest release: 0.1.8
published almost 2 years ago

Versions: 9
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 196 Last month

Rankings

Dependent packages count: 7.5%

Downloads: 15.2%

Average: 30.8%

Dependent repos count: 69.8%

Maintainers (1)

Chirayu07

Last synced: 6 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

nl2query

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

nl2query

Getting started

Example usage

1. Pandas Query

2. MongoDB Query

MongoDB query using CodeT5

MongoDB query using Phi2

3. Kusto Query

4. Cypher(Neo4j) Query

Changelog

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Packages

pypi.org: nl2query

Rankings

Maintainers (1)