https://github.com/awslabs/hypergraph-tabular-lm
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: awslabs
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 447 KB
Statistics
- Stars: 29
- Watchers: 3
- Forks: 3
- Open Issues: 3
- Releases: 0
Metadata Files
README.md
HyTrel
A hypergraph-based tabular language model.
Introduction
This repository contains the official implementation for the paper HyTrel: Hypergraph-enhanced Tabular Data Representation Learning with code, data, and checkpoints.

Installation
It's recommended to use python 3.9.
Here is an example of creating the environment using Anaconda.
- Create the virtual environment using conda create -n hytrel python=3.9
- Install the required packages with the corresponding versions from requirements.txt
Note: If you encounter difficulty installing torch_geometric, please refer here to install it according to your environment settings.
Pretraining
Pre-process the raw data, slicing the big file into chunks, and put the
*.jsonlfiles into the directory/data/pretrain/chunks/. Sample data is present here and the files can be used as reference.\ Note: Pretraining data*.jsonlare acquired and preprocessed by using the scripts from the TaBERT.Run
python parallel_clean.pyto clean and serialize the tables. \ Note: We serialize the tables as arrow in consideration of memory usage.Run
sh pretrain_electra.shto pretrain HyTrel with the ELECTRA objective.Run
sh pretrain_contrast.shto pretrain HyTrel with the Contrastive objective.
Evaluation
First put the ELECTRA-pretrained checkpoint to /checkpoints/electra/, and Contrast-pretrained checkpoint to /checkpoints/contrast/.
Column Type Annotation
Put the data
{train, dev, test}.table_col_type.jsonandtype_vocab.txtinto the directory/data/col_ann/.Run
sh evaluate_cta_electra.shwith ELECTRA-pretrained checkpoint.Run
sh evaluate_cta_contrast.shwith Contrast-pretrained checkpoint.
Column Property Annotation
Put the data
{train, dev, test}.table_rel_extraction.jsonandrelation_vocab.txtinto the directory/data/col_rel/.Run
sh evaluate_cpa_electra.shwith ELECTRA-pretrained checkpoint.Run
sh evaluate_cpa_contrast.shwith Contrast-pretrained checkpoint.
Table Type Annotation
Decompose
ttd.tar.gzintotrain, dev, testdata folders under the directory/data/ttd/.Run
sh evaluate_ttd_electra.shwith ELECTRA-pretrained checkpoint.Run
sh evaluate_ttd_contrast.shwith Contrast-pretrained checkpoint.
Reference
Please cite our paper.
text
@inproceedings{NEURIPS2023_66178bea,
author = {Chen, Pei and Sarkar, Soumajyoti and Lausen, Leonard and Srinivasan, Balasubramaniam and Zha, Sheng and Huang, Ruihong and Karypis, George},
booktitle = {Advances in Neural Information Processing Systems},
editor = {A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
pages = {32173--32193},
publisher = {Curran Associates, Inc.},
title = {HyTrel: Hypergraph-enhanced Tabular Data Representation Learning},
url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/66178beae8f12fcd48699de95acc1152-Paper-Conference.pdf},
volume = {36},
year = {2023}
}
Contact
For the data and model checkpoints, please find them in the checkpoints folder.
If you have more questions, please email: chen.pei518@163.com (Pei Chen)
Owner
- Name: Amazon Web Services - Labs
- Login: awslabs
- Kind: organization
- Location: Seattle, WA
- Website: http://amazon.com/aws/
- Repositories: 914
- Profile: https://github.com/awslabs
AWS Labs
GitHub Events
Total
- Watch event: 7
- Fork event: 2
Last Year
- Watch event: 7
- Fork event: 2
Issues and Pull Requests
Last synced: about 2 years ago
All Time
- Total issues: 2
- Total pull requests: 5
- Average time to close issues: 4 days
- Average time to close pull requests: less than a minute
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 2
- Pull requests: 5
- Average time to close issues: 4 days
- Average time to close pull requests: less than a minute
- Issue authors: 2
- Pull request authors: 2
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 3
Top Authors
Issue Authors
- Chao-Ye (1)
Pull Request Authors
- SOUMAJYOTI (3)
- dependabot[bot] (3)