https://github.com/awslabs/hypergraph-tabular-lm

https://github.com/awslabs/hypergraph-tabular-lm

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: awslabs
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 447 KB
Statistics
  • Stars: 29
  • Watchers: 3
  • Forks: 3
  • Open Issues: 3
  • Releases: 0
Created almost 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

HyTrel

A hypergraph-based tabular language model.

Introduction

This repository contains the official implementation for the paper HyTrel: Hypergraph-enhanced Tabular Data Representation Learning with code, data, and checkpoints. figure1

Installation

It's recommended to use python 3.9.

Here is an example of creating the environment using Anaconda. - Create the virtual environment using conda create -n hytrel python=3.9 - Install the required packages with the corresponding versions from requirements.txt

Note: If you encounter difficulty installing torch_geometric, please refer here to install it according to your environment settings.

Pretraining

  • Pre-process the raw data, slicing the big file into chunks, and put the *.jsonl files into the directory /data/pretrain/chunks/. Sample data is present here and the files can be used as reference.\ Note: Pretraining data *.jsonl are acquired and preprocessed by using the scripts from the TaBERT.

  • Run python parallel_clean.py to clean and serialize the tables. \ Note: We serialize the tables as arrow in consideration of memory usage.

  • Run sh pretrain_electra.sh to pretrain HyTrel with the ELECTRA objective.

  • Run sh pretrain_contrast.sh to pretrain HyTrel with the Contrastive objective.

Evaluation

First put the ELECTRA-pretrained checkpoint to /checkpoints/electra/, and Contrast-pretrained checkpoint to /checkpoints/contrast/.

Column Type Annotation

  • Put the data {train, dev, test}.table_col_type.json and type_vocab.txt into the directory /data/col_ann/.

  • Run sh evaluate_cta_electra.sh with ELECTRA-pretrained checkpoint.

  • Run sh evaluate_cta_contrast.sh with Contrast-pretrained checkpoint.

Column Property Annotation

  • Put the data {train, dev, test}.table_rel_extraction.json and relation_vocab.txt into the directory /data/col_rel/.

  • Run sh evaluate_cpa_electra.sh with ELECTRA-pretrained checkpoint.

  • Run sh evaluate_cpa_contrast.sh with Contrast-pretrained checkpoint.

Table Type Annotation

  • Decompose ttd.tar.gz into train, dev, test data folders under the directory /data/ttd/.

  • Run sh evaluate_ttd_electra.sh with ELECTRA-pretrained checkpoint.

  • Run sh evaluate_ttd_contrast.sh with Contrast-pretrained checkpoint.

Reference

Please cite our paper.

text @inproceedings{NEURIPS2023_66178bea, author = {Chen, Pei and Sarkar, Soumajyoti and Lausen, Leonard and Srinivasan, Balasubramaniam and Zha, Sheng and Huang, Ruihong and Karypis, George}, booktitle = {Advances in Neural Information Processing Systems}, editor = {A. Oh and T. Neumann and A. Globerson and K. Saenko and M. Hardt and S. Levine}, pages = {32173--32193}, publisher = {Curran Associates, Inc.}, title = {HyTrel: Hypergraph-enhanced Tabular Data Representation Learning}, url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/66178beae8f12fcd48699de95acc1152-Paper-Conference.pdf}, volume = {36}, year = {2023} }

Contact

For the data and model checkpoints, please find them in the checkpoints folder.

If you have more questions, please email: chen.pei518@163.com (Pei Chen)

Owner

  • Name: Amazon Web Services - Labs
  • Login: awslabs
  • Kind: organization
  • Location: Seattle, WA

AWS Labs

GitHub Events

Total
  • Watch event: 7
  • Fork event: 2
Last Year
  • Watch event: 7
  • Fork event: 2

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 2
  • Total pull requests: 5
  • Average time to close issues: 4 days
  • Average time to close pull requests: less than a minute
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 2
  • Pull requests: 5
  • Average time to close issues: 4 days
  • Average time to close pull requests: less than a minute
  • Issue authors: 2
  • Pull request authors: 2
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 3
Top Authors
Issue Authors
  • Chao-Ye (1)
Pull Request Authors
  • SOUMAJYOTI (3)
  • dependabot[bot] (3)
Top Labels
Issue Labels
Pull Request Labels
dependencies (3)