ember
Code and data for the paper "Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction" (IJCAI 2022)
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Keywords
Repository
Code and data for the paper "Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction" (IJCAI 2022)
Basic Info
- Host: GitHub
- Owner: tshu-w
- Language: Python
- Default Branch: main
- Homepage: https://tshu-w.github.io/ember/
- Size: 29.8 MB
Statistics
- Stars: 6
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Bridging the Gap between Reality and Ideality of Entity Matching:
A Revisiting and Benchmark Re-Construction
Description
Code and data for the paper:
Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction
Data
Details of the released data can be found in the REAME of the data.
How to run
First, install dependencies ```console
clone project
git clone https://github.com/tshu-w/EMBer cd EMBer
[SUGGESTED] use conda environment
conda env create -n ember -f environment.yaml conda activate ember
[ALTERNATIVE] install requirements directly
pip install -r requirements.txt ```
Next, to obtain the main results of the paper: ```console bash scripts/download_images.sh
python scripts/runali.py --gpus 0 1 2 3 python scripts/testali.py --gpus 0 1 2 3 python scripts/rundmali.py --gpus 0 1 2 3 python scripts/testdmali.py --gpus 0 1 2 3
python scripts/print_results results/test -k test/f1 test/prc test/rec ```
You can also run experiments with the run script.
```console
fit with the TextMatcher config
./run fit --config configs/ali_tm.yaml
or specific command line arguments
./run fit --model TextMatcher --data AliDataModule --data.batch_size 32 --trainer.gpus 0,
evaluate with the checkpoint
./run test --config configs/alitm.yaml --ckptpath ckpt_path
get the script help
./run --help ./run fit --help ```
Citation
@inproceedings{ijcai2022p0552,
title = {Bridging the Gap between Reality and Ideality of Entity Matching: A Revisiting and Benchmark Re-Construction},
author = {Wang, Tianshu and Lin, Hongyu and Fu, Cheng and Han, Xianpei and Sun, Le and Xiong, Feiyu and Chen, Hui and Lu, Minlong and Zhu, Xiuwen},
booktitle = {Proceedings of the Thirty-First International Joint Conference on
Artificial Intelligence, {IJCAI-22}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Lud De Raedt},
pages = {3978--3984},
year = {2022},
month = {7},
note = {Main Track},
doi = {10.24963/ijcai.2022/552},
url = {https://doi.org/10.24963/ijcai.2022/552},
}
Owner
- Name: Tianshu Wang
- Login: tshu-w
- Kind: user
- Location: Beijing
- Twitter: tshu_w
- Repositories: 65
- Profile: https://github.com/tshu-w
Citation (CITATION.bib)
@inproceedings{ijcai2022p552,
title = {Bridging the Gap between Reality and Ideality of Entity Matching: A Revisting and Benchmark Re-Constrcution},
author = {Wang, Tianshu and Lin, Hongyu and Fu, Cheng and Han, Xianpei and Sun, Le and Xiong, Feiyu and Chen, Hui and Lu, Minlong and Zhu, Xiuwen},
booktitle = {Proceedings of the Thirty-First International Joint Conference on
Artificial Intelligence, {IJCAI-22}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Lud De Raedt},
pages = {3978--3984},
year = {2022},
month = {7},
note = {Main Track},
doi = {10.24963/ijcai.2022/552},
url = {https://doi.org/10.24963/ijcai.2022/552},
}
GitHub Events
Total
- Watch event: 2
- Push event: 4
Last Year
- Watch event: 2
- Push event: 4
Dependencies
- nvidia/cuda 11.3.1-cudnn8-devel-ubuntu20.04 build
- Babel ==2.10.1
- Cython ==0.29.28
- GitPython ==3.1.27
- Jinja2 ==3.1.2
- Markdown ==3.3.7
- MarkupSafe ==2.1.1
- Pillow ==9.1.0
- PyPrind ==2.11.3
- PyYAML ==6.0
- Pygments ==2.12.0
- Send2Trash ==1.8.0
- Werkzeug ==2.1.2
- absl-py ==1.0.0
- aiohttp ==3.8.1
- aiosignal ==1.2.0
- anyio ==3.6.1
- argon2-cffi ==21.3.0
- argon2-cffi-bindings ==21.2.0
- asttokens ==2.0.5
- async-timeout ==4.0.2
- attrs ==21.4.0
- backcall ==0.2.0
- beautifulsoup4 ==4.11.1
- bleach ==5.0.0
- cachetools ==5.1.0
- certifi ==2021.10.8
- cffi ==1.15.0
- charset-normalizer ==2.0.12
- click ==8.1.3
- commonmark ==0.9.1
- datasets ==2.2.1
- debugpy ==1.6.0
- decorator ==5.1.1
- deepmatcher ==0.1.2.post2
- defusedxml ==0.7.1
- dill ==0.3.4
- docker-pycreds ==0.4.0
- docstring-parser ==0.14.1
- entrypoints ==0.4
- executing ==0.8.3
- fastjsonschema ==2.15.3
- fasttext ==0.9.2
- filelock ==3.7.0
- frozenlist ==1.3.0
- fsspec ==2022.3.0
- gitdb ==4.0.9
- google-auth ==2.6.6
- google-auth-oauthlib ==0.4.6
- grpcio ==1.46.1
- huggingface-hub ==0.6.0
- idna ==3.3
- importlib-metadata ==4.11.3
- ipykernel ==6.13.0
- ipython ==8.3.0
- ipython-genutils ==0.2.0
- jedi ==0.18.1
- jieba ==0.42.1
- joblib ==1.2.0
- json5 ==0.9.8
- jsonargparse ==4.7.3
- jsonschema ==4.5.1
- jupyter-client ==7.3.1
- jupyter-core ==4.10.0
- jupyter-server ==1.17.0
- jupyterlab ==3.4.2
- jupyterlab-pygments ==0.2.2
- jupyterlab-server ==2.13.0
- matplotlib-inline ==0.1.3
- mistune ==0.8.4
- multidict ==6.0.2
- multiprocess ==0.70.12.2
- nbclassic ==0.3.7
- nbclient ==0.6.3
- nbconvert ==6.5.0
- nbformat ==5.4.0
- nest-asyncio ==1.5.5
- nltk ==3.7
- notebook ==6.4.11
- notebook-shim ==0.1.0
- numpy ==1.22.3
- oauthlib ==3.2.0
- packaging ==21.3
- pandas ==1.4.2
- pandocfilters ==1.5.0
- parso ==0.8.3
- pathtools ==0.1.2
- pexpect ==4.8.0
- pickleshare ==0.7.5
- prometheus-client ==0.14.1
- promise ==2.3
- prompt-toolkit ==3.0.29
- protobuf ==3.20.1
- psutil ==5.9.0
- ptyprocess ==0.7.0
- pure-eval ==0.2.2
- pyDeprecate ==0.3.2
- pyarrow ==8.0.0
- pyasn1 ==0.4.8
- pyasn1-modules ==0.2.8
- pybind11 ==2.9.2
- pycparser ==2.21
- pyparsing ==3.0.9
- pyrsistent ==0.18.1
- python-dateutil ==2.8.2
- pytorch-lightning ==1.6.3
- pytz ==2022.1
- pyzmq ==22.3.0
- regex ==2022.4.24
- requests ==2.27.1
- requests-oauthlib ==1.3.1
- responses ==0.18.0
- rich ==12.4.1
- rsa ==4.8
- scikit-learn ==1.1.0
- scipy ==1.8.0
- sentry-sdk ==1.5.12
- setproctitle ==1.2.3
- shortuuid ==1.0.9
- shtab ==1.5.4
- six ==1.16.0
- smmap ==5.0.0
- sniffio ==1.2.0
- soupsieve ==2.3.2.post1
- stack-data ==0.2.0
- tensorboard ==2.9.0
- tensorboard-data-server ==0.6.1
- tensorboard-plugin-wit ==1.8.1
- terminado ==0.13.3
- threadpoolctl ==3.1.0
- tinycss2 ==1.1.1
- tokenizers ==0.12.1
- torch ==1.10.2
- torchmetrics ==0.8.2
- torchtext ==0.11.2
- torchvision ==0.11.3
- tornado ==6.1
- tqdm ==4.64.0
- traitlets ==5.2.0
- transformers ==4.19.1
- typing_extensions ==4.2.0
- urllib3 ==1.26.9
- wandb ==0.12.16
- wcwidth ==0.2.5
- webencodings ==0.5.1
- websocket-client ==1.3.2
- xxhash ==3.0.0
- yarl ==1.7.2
- zipp ==3.8.0
- ca-certificates 2022.3.29.*
- certifi 2021.10.8.*
- ncurses 6.3.*
- openssl 1.1.1n.*
- pip 21.2.4.*
- python 3.9.12.*
- readline 8.1.2.*
- setuptools 59.5.0.*
- sqlite 3.38.2.*
- tk 8.6.11.*
- tzdata 2022a.*
- wheel 0.37.1.*
- xz 5.2.5.*
- zlib 1.2.12.*