pyjedai
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.0%) to scientific vocabulary
Keywords
Repository
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Basic Info
- Host: GitHub
- Owner: AI-team-UoA
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://pyjedai.readthedocs.io
- Size: 139 MB
Statistics
- Stars: 79
- Watchers: 4
- Forks: 12
- Open Issues: 0
- Releases: 32
Topics
Metadata Files
README.md
powerful end-to-end Entity Resolution workflows.
Overview
pyJedAI is a python framework, aiming to offer experts and novice users, robust and fast solutions for multiple types of Entity Resolution problems. It is builded using state-of-the-art python frameworks. pyJedAI constitutes the sole open-source Link Discovery tool that is capable of exploiting the latest breakthroughs in Deep Learning and NLP techniques, which are publicly available through the Python data science ecosystem. This applies to both blocking and matching, thus ensuring high time efficiency, high scalability as well as high effectiveness, without requiring any labelled instances from the user.
Key-Features
- Input data-type independent. Both structured and semi-structured data can be processed.
- Various implemented algorithms.
- Easy-to-use.
- Utilizes some of the famous and cutting-edge machine learning packages.
- Offers supervised and un-supervised ML techniques.
Open demos are available in:
Google Colab Hands-on demo:
Install
pyJedAI has been tested in Windows and Linux OS.
Basic requirements:
- Python version greater or equal to 3.8.
- For Windows, Microsoft Visual C++ 14.0 is required. Download it from Microsoft Official site.
PyPI
Install the latest version of pyjedai:
pip install pyjedai
More on PyPI.
Git
Set up locally:
git clone https://github.com/AI-team-UoA/pyJedAI.git
go to the root directory with cd pyJedAI and type:
pip install .
Docker
Available at Docker Hub, or clone this repo and:
docker build -f Dockerfile
Dependencies
See the full list of dependencies and all versions used, in this file.
Status
Statistics & Info
Bugs, Discussions & News
GitHub Discussions is the discussion forum for general questions and discussions and our recommended starting point. Please report any bugs that you find here.
Java - Web Application

For Java users checkout the initial JedAI. There you can find Java based code and a Web Application for interactive creation of ER workflows.
JedAI constitutes an open source, high scalability toolkit that offers out-of-the-box solutions for any data integration task, e.g., Record Linkage, Entity Resolution and Link Discovery. At its core lies a set of domain-independent, state-of-the-art techniques that apply to both RDF and relational data.
Team & Authors

- Lefteris Stetsikas, Research Associate at University of Athens, Greece
- Konstantinos Nikoletos, Fellow Research Associate at University of Athens, Greece
- Jakub Maciejewski, Research Associate at University of Athens, Greece
- George Papadakis, Senior Researcher at University of Athens, Greece
- Ekaterini Ioannou, Assistant Professor at Tilburg University, The Netherlands
- Manolis Koubarakis, Professor at University of Athens, Greece
This is a research project by the AI-Team of the Department of Informatics and Telecommunications at the University of Athens.
Cite us
If you use this code or find it helpful in your research, here's the .bibtex:
latex
@inproceedings{DBLP:conf/semweb/Nikoletos0K22,
author = {Konstantinos Nikoletos and
George Papadakis and
Manolis Koubarakis},
editor = {Anastasia Dimou and
Armin Haller and
Anna Lisa Gentile and
Petar Ristoski},
title = {pyJedAI: a Lightsaber for Link Discovery},
booktitle = {Proceedings of the {ISWC} 2022 Posters, Demos and Industry Tracks:
From Novel Ideas to Industrial Practice co-located with 21st International
Semantic Web Conference {(ISWC} 2022), Virtual Conference, Hangzhou,
China, October 23-27, 2022},
series = {{CEUR} Workshop Proceedings},
volume = {3254},
publisher = {CEUR-WS.org},
year = {2022},
url = {https://ceur-ws.org/Vol-3254/paper366.pdf},
timestamp = {Fri, 10 Mar 2023 16:23:05 +0100},
biburl = {https://dblp.org/rec/conf/semweb/Nikoletos0K22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
License
Released under the Apache-2.0 license (see LICENSE.txt).
Copyright 2024 AI-Team, University of Athens
Owner
- Name: AI Team - University of Athens
- Login: AI-team-UoA
- Kind: organization
- Email: ai.team@di.uoa.gr
- Location: Greece
- Website: https://ai.di.uoa.gr
- Twitter: AITeamUoA
- Repositories: 16
- Profile: https://github.com/AI-team-UoA
We work on various topics of AI. The team has published numerous influential papers and contributed with key technologies in the field.
GitHub Events
Total
- Create event: 10
- Issues event: 2
- Release event: 10
- Watch event: 10
- Issue comment event: 4
- Member event: 1
- Push event: 24
- Pull request event: 1
- Fork event: 1
Last Year
- Create event: 10
- Issues event: 2
- Release event: 10
- Watch event: 10
- Issue comment event: 4
- Member event: 1
- Push event: 24
- Pull request event: 1
- Fork event: 1
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 138
- Total Committers: 3
- Avg Commits per committer: 46.0
- Development Distribution Score (DDS): 0.514
Top Committers
| Name | Commits | |
|---|---|---|
| Konstantinos Nikoletos | n****9@g****m | 67 |
| Nikoletos Konstantinos | 4****K@u****m | 66 |
| gpapadis | g****s@y****r | 5 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 12
- Total pull requests: 1
- Average time to close issues: 17 days
- Average time to close pull requests: 9 days
- Total issue authors: 7
- Total pull request authors: 1
- Average comments per issue: 2.42
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 1
- Average time to close issues: 10 days
- Average time to close pull requests: 9 days
- Issue authors: 3
- Pull request authors: 1
- Average comments per issue: 3.33
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mrckzgl (4)
- reversingentropy (3)
- zmbc (1)
- Amselco (1)
- jstammers (1)
- NickCrews (1)
Pull Request Authors
- jstammers (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 667 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 31
- Total maintainers: 3
pypi.org: pyjedai
An open-source library that builds powerful end-to-end Entity Resolution workflows.
- Homepage: http://pyjedai.rtfd.io
- Documentation: http://pyjedai.rtfd.io
- License: Apache Software License 2.0
-
Latest release: 0.3.3
published 7 months ago
Rankings
Maintainers (3)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- jupyter-book *
- matplotlib *
- numpy *
- sphinx-examples *
- sphinx-hoverxref *
- sphinx-inline-tabs *
- sphinx-proof *
- actions/checkout v3 composite
- actions/setup-python v4 composite
- PyYAML >= 6.0
- faiss-cpu >= 1.7
- gensim >= 4.2.0
- matplotlib >= 3.1.3
- matplotlib-inline >= 0.1.3
- networkx >= 2.3
- nltk >= 3.7
- numpy >= 1.21
- optuna >= 3.0
- ordered-set >= 4.0
- pandas >= 0.25.3
- pandas-profiling >= 3.2
- pandocfilters >= 1.5
- plotly >= 5.16.0
- py-stringmatching >= 0.4
- rdflib >= 6.1.1
- rdfpandas >= 1.1.5
- regex >= 2022.6.2
- scipy >= 1.7
- seaborn >= 0.11
- sentence-transformers >= 2.2
- strsim >= 0.0.3
- strsimpy >= 0.2.1
- tomli python_version < "3.11"
- tqdm >= 4.64
- transformers >= 4.21
- valentine >=0.1; python_version > '3.7'