data2neo
Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Keywords
Repository
Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.
Basic Info
- Host: GitHub
- Owner: jkminder
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://data2neo.jkminder.ch
- Size: 5.59 MB
Statistics
- Stars: 22
- Watchers: 1
- Forks: 0
- Open Issues: 3
- Releases: 28
Topics
Metadata Files
README.md
Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database. It reliefs you of the cumbersome manual work of writing the conversion code and let's you focus on the conversion schema and data processing.
The library is built specifically for converting data into a neo4j graph (minimum version 5.2). The library further supports extensive customization capabilities to clean and remodel data. As neo4j python client it uses the native neo4j python client.
This library has been developed at the Chair of Systems Design at ETH Zürich. Please check out our accompanying paper: Data2Neo - A Tool for Complex Neo4j Data Integration
Installation
pip install data2neo
The Data2Neo library supports Python 3.8+.
Quick Start
A quick example for converting data in a Pandas dataframe into a graph. The full example code can be found under examples. For more details, please checkout the full documentation. We first define a convertion schema in a YAML style config file. In this config file we specify, which entites are converted into which nodes and which relationships.
schema.yaml
```yaml ENTITY("Flower"): NODE("Flower") flower: - sepallength = Flower.sepallength - sepalwidth = Flower.sepalwidth - petallength = Flower.petalwidth - petalwidth = append(Flower.petalwidth, " milimeters") NODE("Species", "BioEntity") species: + Name = Flower.species RELATIONSHIP(flower, "is", species):
ENTITY("Person"):
NODE("Person") person:
+ ID = Person.ID
- FirstName = Person.FirstName
- LastName = Person.LastName
RELATIONSHIP(person, "likes", MATCH("Species", Name=Person.FavoriteFlower)):
- Since = "4ever"
``
The library itself has 2 basic elements, that are required for the conversion: theConverterthat handles the conversion itself and anIteratorthat iterates over the relational data. The iterator can be implemented for arbitrary data in relational format. Data2Neo currently has preimplemented iterators under:
-Data2Neo.relationalmodules.sqlitefor [SQLite](https://www.sqlite.org/index.html) databases
-Data2Neo.relationalmodules.pandas` for Pandas dataframes
We will use the PandasDataFrameIterator from Data2Neo.relational_modules.pandas. Further we will use the IteratorIterator that can wrap multiple iterators to handle multiple dataframes. Since a pandas dataframe has no type/table name associated, we need to specify the name when creating a PandasDataFrameIterator. We also define define a custom function append that can be refered to in the schema file and that appends a string to the attribute value. For an entity with Flower["petal_width"] = 5, the outputed node will have the attribute petal_width = "5 milimeters".
```python
import neo4j
import pandas as pd
from data2neo.relationalmodules.pandas import PandasDataFrameIterator
from data2neo import IteratorIterator, Converter, Attribute, registerattributepostprocessor
from data2neo.utils import loadfile
Setup the neo4j uri and credentials
uri = "bolt:localhost:7687" auth = neo4j.basic_auth("neo4j", "password")
people = ... # a dataframe with peoples data (ID, FirstName, LastName, FavoriteFlower) peopleiterator = PandasDataFrameIterator(people, "Person") iris = ... # a dataframe with the iris dataset irisiterator = PandasDataFrameIterator(iris, "Flower")
register a custom data processing function
@registerattributepostprocessor def append(attribute, appendstring): newattribute = Attribute(attribute.key, str(attribute.value) + appendstring) return newattribute
Create IteratorIterator
iterator = IteratorIterator([peopleiterator, irisiterator])
Create converter instance with schema, the final iterator and the graph
converter = Converter(load_file("schema.yaml"), iterator, uri, auth)
Start the conversion
converter() ```
Known issues
If you encounter a bug or an unexplainable behavior, please check the known issues list. If your issue is not found, submit a new one.
Owner
- Name: Julian Minder
- Login: jkminder
- Kind: user
- Location: Zürich/Lausanne
- Company: ETHZ/EPFL
- Twitter: jkminder
- Repositories: 15
- Profile: https://github.com/jkminder
CS @ ETHZ – Student Research Assistant @sg-dev , currently writing my master thesis @epfl-dlab
Citation (CITATION.cff)
@misc{minder2024data2neo,
title={Data2Neo - A Tool for Complex Neo4j Data Integration},
author={Julian Minder and Laurence Brandenberger and Luis Salamanca and Frank Schweitzer},
year={2024},
eprint={2406.04995},
archivePrefix={arXiv},
primaryClass={cs.DB}
}
GitHub Events
Total
- Watch event: 8
Last Year
- Watch event: 8
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 17
- Total pull requests: 9
- Average time to close issues: 3 months
- Average time to close pull requests: 3 days
- Total issue authors: 3
- Total pull request authors: 1
- Average comments per issue: 0.53
- Average comments per pull request: 0.0
- Merged pull requests: 9
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 1
- Average time to close issues: 27 days
- Average time to close pull requests: 27 days
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.5
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jkminder (15)
- CodesByChris (1)
- lusamino (1)
Pull Request Authors
- jkminder (10)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 12 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
pypi.org: data2neo
Library for converting relational data into graph data (neo4j)
- Homepage: https://github.com/jkminder/data2neo
- Documentation: https://data2neo.readthedocs.io/
- License: apache-2.0
-
Latest release: 1.4.3
published over 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- neo4j ${{ matrix.neo4j-version }} docker
- data2neo *
- readthedocs-sphinx-search *
- sphinx *
- sphinx_autodoc_typehints *
- sphinx_rtd_theme *