data2neo

Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.

https://github.com/jkminder/data2neo

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary

Keywords

data-cleaning data-conversion data-engineering data2neo database-migrations graphs neo4j relational-databases remodeling
Last synced: 6 months ago · JSON representation ·

Repository

Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.

Basic Info
  • Host: GitHub
  • Owner: jkminder
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage: https://data2neo.jkminder.ch
  • Size: 5.59 MB
Statistics
  • Stars: 22
  • Watchers: 1
  • Forks: 0
  • Open Issues: 3
  • Releases: 28
Topics
data-cleaning data-conversion data-engineering data2neo database-migrations graphs neo4j relational-databases remodeling
Created about 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Tests Neo4j 5.13 Python Versions


Data2Neo banner


Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database. It reliefs you of the cumbersome manual work of writing the conversion code and let's you focus on the conversion schema and data processing.

The library is built specifically for converting data into a neo4j graph (minimum version 5.2). The library further supports extensive customization capabilities to clean and remodel data. As neo4j python client it uses the native neo4j python client.

This library has been developed at the Chair of Systems Design at ETH Zürich. Please check out our accompanying paper: Data2Neo - A Tool for Complex Neo4j Data Integration

Installation

pip install data2neo The Data2Neo library supports Python 3.8+.

Quick Start

A quick example for converting data in a Pandas dataframe into a graph. The full example code can be found under examples. For more details, please checkout the full documentation. We first define a convertion schema in a YAML style config file. In this config file we specify, which entites are converted into which nodes and which relationships.

schema.yaml

```yaml ENTITY("Flower"): NODE("Flower") flower: - sepallength = Flower.sepallength - sepalwidth = Flower.sepalwidth - petallength = Flower.petalwidth - petalwidth = append(Flower.petalwidth, " milimeters") NODE("Species", "BioEntity") species: + Name = Flower.species RELATIONSHIP(flower, "is", species):

ENTITY("Person"): NODE("Person") person: + ID = Person.ID - FirstName = Person.FirstName - LastName = Person.LastName RELATIONSHIP(person, "likes", MATCH("Species", Name=Person.FavoriteFlower)): - Since = "4ever" `` The library itself has 2 basic elements, that are required for the conversion: theConverterthat handles the conversion itself and anIteratorthat iterates over the relational data. The iterator can be implemented for arbitrary data in relational format. Data2Neo currently has preimplemented iterators under: -Data2Neo.relationalmodules.sqlitefor [SQLite](https://www.sqlite.org/index.html) databases -Data2Neo.relationalmodules.pandas` for Pandas dataframes

We will use the PandasDataFrameIterator from Data2Neo.relational_modules.pandas. Further we will use the IteratorIterator that can wrap multiple iterators to handle multiple dataframes. Since a pandas dataframe has no type/table name associated, we need to specify the name when creating a PandasDataFrameIterator. We also define define a custom function append that can be refered to in the schema file and that appends a string to the attribute value. For an entity with Flower["petal_width"] = 5, the outputed node will have the attribute petal_width = "5 milimeters". ```python import neo4j import pandas as pd from data2neo.relationalmodules.pandas import PandasDataFrameIterator from data2neo import IteratorIterator, Converter, Attribute, registerattributepostprocessor from data2neo.utils import loadfile

Setup the neo4j uri and credentials

uri = "bolt:localhost:7687" auth = neo4j.basic_auth("neo4j", "password")

people = ... # a dataframe with peoples data (ID, FirstName, LastName, FavoriteFlower) peopleiterator = PandasDataFrameIterator(people, "Person") iris = ... # a dataframe with the iris dataset irisiterator = PandasDataFrameIterator(iris, "Flower")

register a custom data processing function

@registerattributepostprocessor def append(attribute, appendstring): newattribute = Attribute(attribute.key, str(attribute.value) + appendstring) return newattribute

Create IteratorIterator

iterator = IteratorIterator([peopleiterator, irisiterator])

Create converter instance with schema, the final iterator and the graph

converter = Converter(load_file("schema.yaml"), iterator, uri, auth)

Start the conversion

converter() ```

Known issues

If you encounter a bug or an unexplainable behavior, please check the known issues list. If your issue is not found, submit a new one.

Owner

  • Name: Julian Minder
  • Login: jkminder
  • Kind: user
  • Location: Zürich/Lausanne
  • Company: ETHZ/EPFL

CS @ ETHZ – Student Research Assistant @sg-dev , currently writing my master thesis @epfl-dlab

Citation (CITATION.cff)

@misc{minder2024data2neo,
      title={Data2Neo - A Tool for Complex Neo4j Data Integration}, 
      author={Julian Minder and Laurence Brandenberger and Luis Salamanca and Frank Schweitzer},
      year={2024},
      eprint={2406.04995},
      archivePrefix={arXiv},
      primaryClass={cs.DB}
}

GitHub Events

Total
  • Watch event: 8
Last Year
  • Watch event: 8

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 17
  • Total pull requests: 9
  • Average time to close issues: 3 months
  • Average time to close pull requests: 3 days
  • Total issue authors: 3
  • Total pull request authors: 1
  • Average comments per issue: 0.53
  • Average comments per pull request: 0.0
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 1
  • Average time to close issues: 27 days
  • Average time to close pull requests: 27 days
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jkminder (15)
  • CodesByChris (1)
  • lusamino (1)
Pull Request Authors
  • jkminder (10)
Top Labels
Issue Labels
enhancement (6) bug (5) config parser (3) wontfix (1) invalid (1) question (1)
Pull Request Labels
enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 12 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
pypi.org: data2neo

Library for converting relational data into graph data (neo4j)

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 12 Last month
Rankings
Dependent packages count: 10.8%
Average: 35.9%
Dependent repos count: 61.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/tests_neo4j5.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • neo4j ${{ matrix.neo4j-version }} docker
docs/requirements.txt pypi
  • data2neo *
  • readthedocs-sphinx-search *
  • sphinx *
  • sphinx_autodoc_typehints *
  • sphinx_rtd_theme *
requirements.txt pypi
setup.py pypi