2023.0410
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: INFORMSJoC
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 3.49 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 1
Metadata Files
README.md

pyJedAI: A Library with Resolution-related Structures and Procedures for Products
This project is distributed in association with the INFORMS Journal on Computing under the Apache 2.0 License.
The software and data in this repository are associated with the paper pyJedAI: A Library with Resolution-related Structures and Procedures for Products by Ekaterini Ioannou, Konstantinos Nikoletos and George Papadakis.
Version
The version used in the paper is
Cite
To cite this software, please cite the paper and the software, using the following DOI.
@misc{pyjedaiProductMatching,
author = {Ekaterini Ioannou and Konstantinos Nikoletos and George Papadakis},
publisher = {INFORMS Journal on Computing},
title = {pyJedAI: A Library with Resolution-related Structures and Procedures for Products},
year = {2024},
doi = {10.1287/ijoc.2023.0410.cd},
note = {Available for download at https://github.com/INFORMSJoC/2023.0410},
}
Authors
- Ekaterini Ioannou, Assistant Professor at Tilburg University, The Netherlands
- Konstantinos Nikoletos, Research Associate at University of Athens, Greece
- George Papadakis, Senior Researcher at University of Athens, Greece
Description
This work presents an open-source Python library, named pyJedAI, which provides functionalities supporting the creation of algorithms related to Product Entity Resolution. Building over existing state-of-the-art resolution algorithms (Papadakis et al. 2021a), the tool offers a plethora of important tasks required for processing product data collections. It be can easily used by researchers and practitioners for creating algorithms analyzing products, such as real-time ads bidding, sponsored search, or pricing determination. In essence, it allows to easily import product data from the possible sources, compare products in order to detect either similar or identical products, generate a graph representation using the products and desired relationships, and either visualize or export the outcome in various forms. Our experimental evaluation on data from well-known online retailers illustrates high accuracy and low execution time for the supported tasks. To the best of our knowledge this is the first Python package to focus on product entities and provide this range of Product Entity Resolution functionalities.
Building
In Linux, to build the version used for this paper, execute the following two steps.
Create and then activate a conda environment with Python 3.9 or 3.10:
conda create --name pyJedAI_env python==3.9 conda activate pyJedAI_envThen, install the tool using either:
pip install pyjedai==0.1.7
or in the root directory using:
git clone https://github.com/AI-team-UoA/pyJedAI.git
pip install .
Please note that it requires pystringmatching, which can be installed (before step 2) using command: ``` conda install conda-forge::pystringmatching ```
Usage
As describe in the journal, the tool implements a comprehensive end-to-end process for realizing possible similarity relation operators. The process, shown the above figure, consists of four steps: 1. data reading, 2. filtering, 3. verification, and 4. data writing and evaluation.
Google Colab Hands-on demo:
The simplest way to reproduce and view the results of this paper, is using the Colab notebook here:
Alternatively first run the installation and then go to src directory and run:
- Blocking-based workflow:
python blocking_workflow.py --dataset 'Abt - Buy' - Similarity join-based workflow:
python similarity_joins_workflow.py --dataset 'Amazon - Google Products' - Nearest neighbor-based workflow:
python nn_workflow.py --dataset 'Abt - Buy' --schema 'schema-agnostic'
where for
- --dataset flag, available values are {'Abt - Buy', 'Amazon - Google Products', 'Wallmart - Amazon' } and for
- --schema flag {'schema-agnostic', 'schema-based'}, available only for the NN workflow.
For the scalability test:
python dbpedia_scalability.py
Ongoing Development
This main tool is being developed on an on-going basis at the author's Github site.
Documentation page
To view more examples of this software visit readthedocs website.
Support
For support in using this software, submit an issue.
Owner
- Name: INFORMS Journal on Computing
- Login: INFORMSJoC
- Kind: organization
- Website: https://pubsonline.informs.org/journal/ijoc
- Repositories: 32
- Profile: https://github.com/INFORMSJoC
Repository for software and data associated with papers published in the INFORMS Journal on Computing
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "YOUR_NAME_HERE" given-names: "YOUR_NAME_HERE" orcid: "https://orcid.org/0000-0000-0000-0000" - family-names: "Lisa" given-names: "Mona" orcid: "https://orcid.org/0000-0000-0000-0000" title: "2023.0410" version: 1.0.0 doi: 10.5281/zenodo.1234 date-released: 2024-07-12 url: "https://github.com/Nikoletos-K/2023.0410"
GitHub Events
Total
Last Year
Dependencies
- faiss-cpu *
- gensim *
- matplotlib *
- networkx *
- nltk *
- numpy >= 1.7.0,<2.0
- ordered-set *
- pandas *
- py-stringmatching *
- scipy ==1.12
- seaborn *
- sentence-transformers *
- shapely *
- tqdm *
- transformers *
- valentine python_version > '3.7'