HeXtractor

HeXtractor: Extracting Heterogeneous Graphs from Structured and Textual Data for Graph Neural Networks - Published in JOSS (2025)

https://github.com/maddataanalyst/hextractor

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Mathematics Computer Science - 88% confidence
Artificial Intelligence and Machine Learning Computer Science - 69% confidence
Last synced: 4 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: maddataanalyst
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Size: 2.22 MB
Statistics
  • Stars: 4
  • Watchers: 2
  • Forks: 2
  • Open Issues: 0
  • Releases: 3
Created over 1 year ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License

README.md

Logo

Overview

HeXtractor is a tool designed to automatically convert selected data in tabular format into a PyTorch Geometric heterogeneous graph. As research into graph neural networks (GNNs) expands, the importance of heterogeneous graphs grows. However, data often comes in tabular form, and manually transforming this data into graph format can be tedious and error-prone. HeXtractor aims to streamline this process, providing researchers and practitioners with a more efficient workflow.

This package has been reviewed and published in the Journal of Open Source Software (JOSS). You can find the paper here.

Wójcik et al., (2025). HeXtractor: Extracting Heterogeneous Graphs from Structured and Textual Data for Graph Neural Networks. Journal of Open Source Software, 10(110), 8057, https://doi.org/10.21105/joss.08057

Features

  1. Automatic Conversion: Converts tabular data into heterogeneous graphs suitable for GNNs.
  2. Support for Multiple Formats: Handles various tabular data formats with ease.
  3. Integration with PyTorch Geometric: Directly creates graphs that can be used with PyTorch Geometric.
  4. isualization: Utilizes NetworkX and PyVis for graph visualization.

Why HeXtractor?

Heterogeneous graphs are crucial in many applications of graph neural networks, yet creating them from tabular data manually is often cumbersome. HeXtractor automates this process, allowing researchers to focus on developing and training their models instead of data preprocessing.

Key Applications:

  1. Transform single tabular datasets into heterogeneous graph structures.
  2. Transform multiple tables into a heterogeneous graph.
  3. Leverage Large Language Models (LLMs) to identify and extract semantic relationships from text, converting them into heterogeneous graph representations.

Use cases

Technologies

  1. Python: The primary programming language used for HeXtractor.
  2. pandas: Utilized for data manipulation and handling tabular data.
  3. PyTorch Geometric: Framework for creating and working with graph neural networks.
  4. NetworkX: Used for creating and managing complex graph structures.
  5. PyVis: Enables interactive visualization of graphs.

Installation

HeXtractor can be installed either from PyPI (recommended for most users) or from source code (recommended for developers or if you need the latest features).

From PyPI

To install the latest version from PyPI run:

bash pip install hextractor

From Source Code

To install HeXtractor from source, you'll first need to clone the repository:

bash git clone https://github.com/maddataanalyst/hextractor.git cd hextractor

You can then install it using either conda or any standard Python virtual environment. We use Poetry as our primary dependency manager because it provides robust dependency resolution, reproducible builds, and better package management.

Option 1: Using Conda

  1. If you prefer Conda for environment management: ```bash # Create a new conda environment from the provided file conda env create -f environment.yml

Activate the environment

conda activate hextractor

Install poetry inside the conda environment

pip install poetry

Install the package with all dependencies

poetry install --with dev --with research ```

Option 2: Using Standard Python Virtual Environment

  1. Create and activate a virtual environment using your preferred method: ```bash # Using venv (Python 3.3+) python -m venv hextractor-env source hextractor-env/bin/activate # On Windows: hextractor-env\Scripts\activate

Or using virtualenv

virtualenv hextractor-env source hextractor-env/bin/activate # On Windows: hextractor-env\Scripts\activate ```

  1. Install Poetry and the package: ```bash # Install poetry pip install poetry

Install the package with all dependencies

poetry install --with dev --with research ```

Remember to activate your environment (conda or virtual environment) whenever you want to use HeXtractor.

Documentation

You can find an official, detailed documentation here.

Contributing and help

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways: 1. Reporting bugs; 2. Fixing bugs; 3. Implementing features; 4. Writing documentation; 5. Submitting feedback.

Detailed contribution and community guidelines can be found in the CONTRIBUTING.rst file.

Owner

  • Name: Filip Wójcik, PhD
  • Login: maddataanalyst
  • Kind: user
  • Company: Mad data scientist

I’m a professional data scientist and a programmer with specialization in artificial intelligence and machine learning. I hold a PhD in Economics and Management

JOSS Publication

HeXtractor: Extracting Heterogeneous Graphs from Structured and Textual Data for Graph Neural Networks
Published
June 23, 2025
Volume 10, Issue 110, Page 8057
Authors
Filip Wójcik ORCID
Wroclaw University of Economics and Business, Wrocław, Poland
Marcin Malczewski
Diveapps, Wrocław, Poland
Editor
Nikoleta Glynatsi ORCID
Tags
graph neural networks heterogeneous graphs tabular data knowledge graphs data extraction PyTorch Geometric

GitHub Events

Total
  • Create event: 9
  • Release event: 1
  • Issues event: 10
  • Watch event: 2
  • Delete event: 4
  • Issue comment event: 4
  • Push event: 35
  • Pull request event: 14
  • Pull request review comment event: 9
  • Pull request review event: 17
Last Year
  • Create event: 9
  • Release event: 1
  • Issues event: 10
  • Watch event: 2
  • Delete event: 4
  • Issue comment event: 4
  • Push event: 35
  • Pull request event: 14
  • Pull request review comment event: 9
  • Pull request review event: 17

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 6
  • Total pull requests: 9
  • Average time to close issues: 2 months
  • Average time to close pull requests: 22 days
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.22
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 9
  • Average time to close issues: 2 days
  • Average time to close pull requests: 22 days
  • Issue authors: 2
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.22
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • maddataanalyst (6)
  • jboynyc (2)
Pull Request Authors
  • maddataanalyst (8)
  • mmalczewski (2)
Top Labels
Issue Labels
enhancement (6) bug (1)
Pull Request Labels
documentation (2) enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 15 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
pypi.org: hextractor

HeXtractor is a tool designed to automatically convert selected data in tabular format into a PyTorch Geometric heterogeneous graph. As research into graph neural networks (GNNs) expands, the importance of heterogeneous graphs grows. However, data often comes in tabular form, and manually transforming this data into graph format can be tedious and error-prone. HeXtractor aims to streamline this process, providing researchers and practitioners with a more efficient workflow.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 15 Last month
Rankings
Dependent packages count: 9.4%
Average: 31.2%
Dependent repos count: 53.0%
Maintainers (1)
Last synced: 4 months ago

Dependencies

poetry.lock pypi
  • 131 dependencies
pyproject.toml pypi
  • jupyterlab ^4.2.1 research
  • python ^3.10
  • torch ^2.3.1
  • torch-geometric ^2.5.3