Recent Releases of pyjedai
pyjedai - 0.3.1
⚒️ Fixed
- None
➕ Added
- llm_matching.py: Using
ollamamatching can be done by utilizing llms [@Teris45 ] - docs/tutorials/LLMsMatching.ipynb: Check this tutorial for a better understanding of llm_matching process [@Teris45 ]
⚠️ Issues
- None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.3.0...0.3.1
Authored by @Teris45
- Python
Published by Teris45 10 months ago
pyjedai - 0.3.0
⚒️ Fixed
- schema/matching.py:
Comawrong attributes used [@Teris45 ] - schema/schema_model.py: Minor Bug when loading Schema [@Teris45 ]
➕ Added
- None
⚠️ Issues
- None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.2.9...0.3.0
Authored by @Teris45
- Python
Published by Teris45 about 1 year ago
pyjedai -
⚒️ Fixed
schema_model.py: `self` attribute not included in some functions [@Teris45 ]
➕ Added
None
⚠️ Issues
None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.2.6...0.2.9
- Python
Published by Teris45 about 1 year ago
pyjedai - 0.2.7
⚒️ Fixed
- workflow.py: EmbeddingsNNWorkflow didn't work correctly with
clusteringmethod andexport_pairs. [@Teris45]
➕ Added
- None
⚠️ Issues
- None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.2.5...0.2.6
Authored by @Teris45
- Python
Published by Teris45 about 1 year ago
pyjedai - 0.2.6
⚒️ Fixed
- workflow.py: EmbeddingsNNWorkflow didn't work correctly with
clusteringmethod andexport_pairs. [@Teris45]
➕ Added
- None
⚠️ Issues
- None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.2.5...0.2.6
Authored by @Teris45
- Python
Published by Teris45 about 1 year ago
pyjedai -
⚒️ Fixed
- None
➕ Added
- vectorbasedblocking.py: EmbeddingsNNBlockBuilding class is allowed custom word and sentence embedding models, provided the user passes the correct argument to EmbeddingsNNBlockBuilding.build_blocks [@jstammers ]
⚠️ Issues
- None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.2.4...0.2.5
Authored by @Teris45
- Python
Published by Teris45 about 1 year ago
pyjedai - 0.2.4
⚒️ Fixed
- None
➕ Added
- schema/schema_model.py: Added class to load data for schema-matching [@Teris45]
- schema/utils.py: Functions needed for schema-matching.
⚠️ Issues
- None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.2.3...0.2.4
Authored by @Teris45
- Python
Published by Teris45 about 1 year ago
pyjedai - 0.2.3
⚒️ Fixed
- blockbuilding.py: exportto_df bug for Dirty ER fixed [@Teris45 ]
- clustering.py: exporttodf bug for Dirty ER fixed [@Teris45 ]
- joins.py: exporttodf bug for Dirty ER fixed [@Teris45 ]
- matching.py: exporttodf bug for Dirty ER fixed [@Teris45 ]
- vectorbasedblocking.py: exporttodf bug for Dirty ER fixed [@Teris45 ]
➕ Added
- None
⚠️ Issues
- None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.2.2...0.2.3
Authored by @Teris45
- Python
Published by Teris45 about 1 year ago
pyjedai - 0.2.1
⚒️ Fixed
- joins.py: Export pairs [@Nikoletos-K]
➕ Added
- A new method that reads datasets from json files [@Nikoletos-K]
- Reproducibility guide for the 11 Clean-Clean ER datasets [@Nikoletos-K]
⚠️ Issues
- None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.1.9...0.2.0
Authored by @Nikoletos-K
- Python
Published by Nikoletos-K over 1 year ago
pyjedai - 0.1.8
⚒️ Fixed
- Issue #22 and #23.
- NNs save/load embeddings issue [ @JacobMaciejewski ].
- NN unused print.
- Matching issues.
➕ Added
- New visualizations (PCA and tSNE)
⚠️ Issues
- None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.1.7...0.1.8
Authored by @Nikoletos-K
- Python
Published by Nikoletos-K almost 2 years ago
pyjedai - 0.1.7
⚒️ Fixed
- Issue #19 , #20 , #21 ;
- Removed FALCONN and SCANN
- Refined dependencies
- Removed Optuna injection
- Fixed typos
- Reports
➕ Added
- New utilities to docs
⚠️ Issues
- None
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.1.6...0.1.7
Authored by @Nikoletos-K
- Python
Published by Nikoletos-K about 2 years ago
pyjedai - 0.1.6
⚒️ Fixed
- Issue #16 ;
- Typos in clustering.py
- Datamodel gt initialization
- Imports in utils
- Bugs in NN-workflow
- Bugs and evaluation of simple Schema Clustering
➕ Added
- Dataframe memory consumption
- New Schema Clustering method for RDF data [Not final implementation - alpha version]
⚠️ Issues
- SCANN and FALCONN produce warnings
Full Changelog: https://github.com/AI-team-UoA/pyJedAI/compare/0.1.5...0.1.6
Authored by @Nikoletos-K
- Python
Published by Nikoletos-K about 2 years ago
pyjedai - 0.1.5
⚒️ Fixed
- Schema Matching structure [ @Nikoletos-K ]
➕ Added
- First working version of Schema Clustering [ @Nikoletos-K ]
- vectorbasedblocking component: SCANN/FAISS full functionality on Linux OS only! [ @JacobMaciejewski ]
- RowColumnClustering: new clustering algorithm [ @JacobMaciejewski ]
⚠️ Issues
- Minor changes in ProgressiveWorkFlow(PYJEDAIWorkFlow). [ @JacobMaciejewski ]
- Python
Published by Nikoletos-K over 2 years ago
pyjedai - 0.1.4
⚒️ Fixed
- Correlation Clustering method.
- nltk.download('stopwords') download only when needed.
- Schema Matching component to align with the latest version of Valentine.
➕ Added
- datamodel.py: SchemaData for Schema Matching Component
- ‼️ New Component; pyJedAI Spatial, for Interlinking geospatial RDF data. [ @IordanisT ]
- SCANN functionality, only available for Linux OS. [ @JacobMaciejewski ]
⚠️ Issues
- None
- Python
Published by Nikoletos-K over 2 years ago
pyjedai - 0.1.3
⚒️ Fixed
- None
➕ Added
Clustering algorithms: [ Author: @JacobMaciejewski 📌 ]
- EquivalenceCluster
- ExtendedSimilarityEdge
- Vertex
- RicochetCluster
- ExactClustering
- CenterClustering
- BestMatchClustering
- MergeCenterClustering
- CorrelationClustering
- CutClustering
- MarkovClustering
- KiralyMSMApproximateClustering
- RicochetSRClustering
Blocking:
- Statistics
- Statistics
⚠️ Issues
- None
- Python
Published by Nikoletos-K over 2 years ago
pyjedai - 0.1.2
⚒️ Fixed
- Fixed export methods. Use case of not providing a ground-truth
- Restructured and optimized Joins methods by creating a vectorizer module and in-memory transactions
- Time of vectorization by saving and retrieving the distance matrix
➕ Added
- 'sqeuclidean' metric in matching step
- Valentine as a Schema Matching plugin
⚠️ Issues
- Vectorizers (tfidf, etc) don't support dirty er. Will be fixed in the next release.
- Python
Published by Nikoletos-K over 2 years ago
pyjedai - 0.1.0
⚒️ Fixed
- Restructured Matching Module - vectorizer, tokenizer, and qgrams as arguments (not inferred)
- Clustering step randomization bug
➕ Added
- PER notebook tutorials
- PER grid-search pipeline (config files, search scripts, storage)
- PER workflows visualization and comparison through:
- feature configuration budget-centric metric progress plots
- feature configuration dataset-centric sorting and comparison
⚠️ Issues
- None
- Python
Published by Nikoletos-K almost 3 years ago
pyjedai - 0.0.7
Fixed: - Issues in block filtering - Issues in vector based blocking - Data model set types - EJoin wrong naming
Added: - Prioritization algorithms - Tf-Idf functionality - More metrics on entity matching - Optional data cleaning functionalities - New visualizations - New stats for the blocking workflows
- Python
Published by Nikoletos-K almost 3 years ago
pyjedai - v0.0.5
Added: - New evaluation module - Matching metrics - Vector based blocking techniques - Data process methods - Entity matching plots - sphinx website - New tests
Fixed: - Architecture, abstract data types - Data bugs in block building - Bugs in vector based blocking - Using workflows without gt - Code runtime
- Python
Published by Nikoletos-K about 3 years ago
pyjedai - v0.0.2
Optimizations, User-friendly Approach Updates
This is the second release. Project is still under development. In this release we:
- Added WorkFlow module: A high-level method that simplifies all the process. User friendly approach.
- Added comments in the basic methods.
- Performed time optimizations using by utilizing the most python.
- Created automatic tests.
- Created new Block Building Method, by using pre-trained embeddings and Gensim. Similarity search with FAISS framework.
- Uploaded to PyPI.
- Visualization techniques for performance check.
- Python
Published by Nikoletos-K over 3 years ago
pyjedai - v0.0.1
First pyJedAI release: This release presents the basic structure of the well-known JedAI toolkit into the python environment. Contains: - Data reading techniques: RDF/OWL, SPARKQL, CSV, JSON, DB - Block building: Standard Blocking, QGrams & Extended, SuffixArray & Extended - Block cleaning: Block purging, Block filtering - Comparison cleaning: Weighted edge/node pruning, Cardinality edge/node pruning, BLAST, etc - Entity matching: strsimpy - Entity clustering: Connected component clustering - Similarity Joins: SchemaAgnosticΕJoin, TopKSchemaAgnosticJoin - Evaluation through Jupyter notebook
- Python
Published by Nikoletos-K almost 4 years ago