ember

Code and data for the paper "Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction" (IJCAI 2022)

https://github.com/tshu-w/ember

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary

Keywords

benchmark entity-matching entity-resolution ijcai2022
Last synced: 6 months ago · JSON representation ·

Repository

Code and data for the paper "Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction" (IJCAI 2022)

Basic Info
Statistics
  • Stars: 6
  • Watchers: 2
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
benchmark entity-matching entity-resolution ijcai2022
Created over 4 years ago · Last pushed 7 months ago
Metadata Files
Readme Citation

README.md

Bridging the Gap between Reality and Ideality of Entity Matching:
A Revisiting and Benchmark Re-Construction

Arxiv Conference

Description

Code and data for the paper:

Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction

Data

Details of the released data can be found in the REAME of the data.

How to run

First, install dependencies ```console

clone project

git clone https://github.com/tshu-w/EMBer cd EMBer

[SUGGESTED] use conda environment

conda env create -n ember -f environment.yaml conda activate ember

[ALTERNATIVE] install requirements directly

pip install -r requirements.txt ```

Next, to obtain the main results of the paper: ```console bash scripts/download_images.sh

python scripts/runali.py --gpus 0 1 2 3 python scripts/testali.py --gpus 0 1 2 3 python scripts/rundmali.py --gpus 0 1 2 3 python scripts/testdmali.py --gpus 0 1 2 3

python scripts/print_results results/test -k test/f1 test/prc test/rec ```

You can also run experiments with the run script. ```console

fit with the TextMatcher config

./run fit --config configs/ali_tm.yaml

or specific command line arguments

./run fit --model TextMatcher --data AliDataModule --data.batch_size 32 --trainer.gpus 0,

evaluate with the checkpoint

./run test --config configs/alitm.yaml --ckptpath ckpt_path

get the script help

./run --help ./run fit --help ```

Citation

@inproceedings{ijcai2022p0552, title = {Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction}, author = {Wang, Tianshu and Lin, Hongyu and Fu, Cheng and Han, Xianpei and Sun, Le and Xiong, Feiyu and Chen, Hui and Lu, Minlong and Zhu, Xiuwen}, booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI-22}}, publisher = {International Joint Conferences on Artificial Intelligence Organization}, editor = {Lud De Raedt}, pages = {3978--3984}, year = {2022}, month = {7}, note = {Main Track}, doi = {10.24963/ijcai.2022/552}, url = {https://doi.org/10.24963/ijcai.2022/552}, }

Owner

  • Name: Tianshu Wang
  • Login: tshu-w
  • Kind: user
  • Location: Beijing

Citation (CITATION.bib)

@inproceedings{ijcai2022p552,
  title     = {Bridging the Gap between Reality and Ideality of Entity Matching: A Revisting and Benchmark Re-Constrcution},
  author    = {Wang, Tianshu and Lin, Hongyu and Fu, Cheng and Han, Xianpei and Sun, Le and Xiong, Feiyu and Chen, Hui and Lu, Minlong and Zhu, Xiuwen},
  booktitle = {Proceedings of the Thirty-First International Joint Conference on
               Artificial Intelligence, {IJCAI-22}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Lud De Raedt},
  pages     = {3978--3984},
  year      = {2022},
  month     = {7},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2022/552},
  url       = {https://doi.org/10.24963/ijcai.2022/552},
}

GitHub Events

Total
  • Watch event: 2
  • Push event: 4
Last Year
  • Watch event: 2
  • Push event: 4

Dependencies

Dockerfile docker
  • nvidia/cuda 11.3.1-cudnn8-devel-ubuntu20.04 build
pyproject.toml pypi
requirements.txt pypi
  • Babel ==2.10.1
  • Cython ==0.29.28
  • GitPython ==3.1.27
  • Jinja2 ==3.1.2
  • Markdown ==3.3.7
  • MarkupSafe ==2.1.1
  • Pillow ==9.1.0
  • PyPrind ==2.11.3
  • PyYAML ==6.0
  • Pygments ==2.12.0
  • Send2Trash ==1.8.0
  • Werkzeug ==2.1.2
  • absl-py ==1.0.0
  • aiohttp ==3.8.1
  • aiosignal ==1.2.0
  • anyio ==3.6.1
  • argon2-cffi ==21.3.0
  • argon2-cffi-bindings ==21.2.0
  • asttokens ==2.0.5
  • async-timeout ==4.0.2
  • attrs ==21.4.0
  • backcall ==0.2.0
  • beautifulsoup4 ==4.11.1
  • bleach ==5.0.0
  • cachetools ==5.1.0
  • certifi ==2021.10.8
  • cffi ==1.15.0
  • charset-normalizer ==2.0.12
  • click ==8.1.3
  • commonmark ==0.9.1
  • datasets ==2.2.1
  • debugpy ==1.6.0
  • decorator ==5.1.1
  • deepmatcher ==0.1.2.post2
  • defusedxml ==0.7.1
  • dill ==0.3.4
  • docker-pycreds ==0.4.0
  • docstring-parser ==0.14.1
  • entrypoints ==0.4
  • executing ==0.8.3
  • fastjsonschema ==2.15.3
  • fasttext ==0.9.2
  • filelock ==3.7.0
  • frozenlist ==1.3.0
  • fsspec ==2022.3.0
  • gitdb ==4.0.9
  • google-auth ==2.6.6
  • google-auth-oauthlib ==0.4.6
  • grpcio ==1.46.1
  • huggingface-hub ==0.6.0
  • idna ==3.3
  • importlib-metadata ==4.11.3
  • ipykernel ==6.13.0
  • ipython ==8.3.0
  • ipython-genutils ==0.2.0
  • jedi ==0.18.1
  • jieba ==0.42.1
  • joblib ==1.2.0
  • json5 ==0.9.8
  • jsonargparse ==4.7.3
  • jsonschema ==4.5.1
  • jupyter-client ==7.3.1
  • jupyter-core ==4.10.0
  • jupyter-server ==1.17.0
  • jupyterlab ==3.4.2
  • jupyterlab-pygments ==0.2.2
  • jupyterlab-server ==2.13.0
  • matplotlib-inline ==0.1.3
  • mistune ==0.8.4
  • multidict ==6.0.2
  • multiprocess ==0.70.12.2
  • nbclassic ==0.3.7
  • nbclient ==0.6.3
  • nbconvert ==6.5.0
  • nbformat ==5.4.0
  • nest-asyncio ==1.5.5
  • nltk ==3.7
  • notebook ==6.4.11
  • notebook-shim ==0.1.0
  • numpy ==1.22.3
  • oauthlib ==3.2.0
  • packaging ==21.3
  • pandas ==1.4.2
  • pandocfilters ==1.5.0
  • parso ==0.8.3
  • pathtools ==0.1.2
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • prometheus-client ==0.14.1
  • promise ==2.3
  • prompt-toolkit ==3.0.29
  • protobuf ==3.20.1
  • psutil ==5.9.0
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • pyDeprecate ==0.3.2
  • pyarrow ==8.0.0
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • pybind11 ==2.9.2
  • pycparser ==2.21
  • pyparsing ==3.0.9
  • pyrsistent ==0.18.1
  • python-dateutil ==2.8.2
  • pytorch-lightning ==1.6.3
  • pytz ==2022.1
  • pyzmq ==22.3.0
  • regex ==2022.4.24
  • requests ==2.27.1
  • requests-oauthlib ==1.3.1
  • responses ==0.18.0
  • rich ==12.4.1
  • rsa ==4.8
  • scikit-learn ==1.1.0
  • scipy ==1.8.0
  • sentry-sdk ==1.5.12
  • setproctitle ==1.2.3
  • shortuuid ==1.0.9
  • shtab ==1.5.4
  • six ==1.16.0
  • smmap ==5.0.0
  • sniffio ==1.2.0
  • soupsieve ==2.3.2.post1
  • stack-data ==0.2.0
  • tensorboard ==2.9.0
  • tensorboard-data-server ==0.6.1
  • tensorboard-plugin-wit ==1.8.1
  • terminado ==0.13.3
  • threadpoolctl ==3.1.0
  • tinycss2 ==1.1.1
  • tokenizers ==0.12.1
  • torch ==1.10.2
  • torchmetrics ==0.8.2
  • torchtext ==0.11.2
  • torchvision ==0.11.3
  • tornado ==6.1
  • tqdm ==4.64.0
  • traitlets ==5.2.0
  • transformers ==4.19.1
  • typing_extensions ==4.2.0
  • urllib3 ==1.26.9
  • wandb ==0.12.16
  • wcwidth ==0.2.5
  • webencodings ==0.5.1
  • websocket-client ==1.3.2
  • xxhash ==3.0.0
  • yarl ==1.7.2
  • zipp ==3.8.0
environment.yaml conda
  • ca-certificates 2022.3.29.*
  • certifi 2021.10.8.*
  • ncurses 6.3.*
  • openssl 1.1.1n.*
  • pip 21.2.4.*
  • python 3.9.12.*
  • readline 8.1.2.*
  • setuptools 59.5.0.*
  • sqlite 3.38.2.*
  • tk 8.6.11.*
  • tzdata 2022a.*
  • wheel 0.37.1.*
  • xz 5.2.5.*
  • zlib 1.2.12.*