softpandas
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: idobenshaul10
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 84.1 MB
Statistics
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
SoftPandas - Pandas with Semantic Querying

https://github.com/idobenshaul10/SoftPandas/assets/41121256/82c467b7-701a-4cfd-9277-df7b63a66330
Description:
SoftPandas is an initial package that allows you to work with pandas DataFrames and query them using semantic similarity. This allows you to have conditions which are soft (e.g. all products that are similar to "red and black swim shorts"). Current version supports text and image data types, where if an image link is present, the image is downloaded and embedded using OpenClip. Currently supports: 1. Language Encoder Model: any model using SentenceTransformer 2. MultiModal Encoder Model: any model using OpenClip
Querying at the moment is only done using a text query.
This project is a work in progress! If you find any issues - please report them
Installation:
Python version 3.10 or later installed. Latest version from the GitHub repository:
//: # ()
pip install git+https://github.com/idobenshaul10/SoftPandas.git
and requirements:
pip install -r requirements.txt
Example Usage:
Let's say we want to get all red and black swim shorts that cost less than 600$: We can load example data from a csv file and then query it using SoftPandas:
For full script:
demo.ipynb
Imports:
import pandas as pd
from softpandas.core.data_types import InputDataType
from softpandas.core.soft_dataframe import SoftDataFrame
from softpandas.embedders.clip_embedder import OpenClipEmbedder
from softpandas.embedders.sentence_transformer_embedder import SentenceTransformerEmbedder
from sklearn.metrics.pairwise import cosine_similarity
Let's set up our encoders: ```commandline langmodel = SentenceTransformerEmbedder('thenlper/gte-small', metric=cosinesimilarity, threshold=0.8, device="cpu")
visionmodel = OpenClipEmbedder('ViT-B-32-256', metric=cosinesimilarity, threshold=0.22, pretrained="datacomps34bb86k") ``` Then let's query using soft + hard queries:
``` df = pd.readcsv("sampledata/men-swimwear.csv") df = SoftDataFrame(df, softcolumns={'NAME': InputDataType.text, 'DESCRIPTION & COLOR': InputDataType.text, 'IMAGE': InputDataType.image}, models={InputDataType.text: langmodel, InputDataType.image: vision_model} )
df = df.softquery("'DESCRIPTION & COLOR' ~= 'swim shorts'") df = df.softquery("'IMAGE' ~= 'red and black'") df = df.query("PRICE < 600") print(df.head()['DESCRIPTION & COLOR'].values) ```
Saving and loading:
commandline
df.to_pickle("relevant_items.p")
df = pd.read_pickle("relevant_items.p")
TODOs:
- ~~Add saving methods for SoftDataFrame~~
- ~~Method for adding new columns~~
- Add dealing with Nans
- ~~if a column is Nan, just ignore it~~
- If value isn't there, it shouldn't pass condition - similar to normal querying
- Add handling of multiple queries - ATM if it's more than one predicate, it'll crash.
- Add indices instead of cosine - it's too slow
- Batching of initial encoding -
- don't do it one by one
- ~~use device (cuda, mps, tpu, etc.)~~
Long Term Goals:
- Add automatic feature extraction from images into new columns
- allows hard querying using visual data!
- Add ability to soft query based on Image
- Expand to more modalities
Citation (CITATION.cff)
cff-version: 1.2.0
preferred-citation:
type: software
message: If you use SoftPandas, please cite it as below.
authors:
- family-names: Ben-Shaul
given-names: Ido
orcid: "https://orcid.org/0000-0002-3954-035X"
title: "SoftPandas"
version: 0.01
doi: 10.5281/zenodo.3908559
date-released: 2024-02-03
license: Apache-2.0
url: "https://github.com/idobenshaul10/SoftPandas?tab=readme-ov-file"