https://github.com/ashdehghan/neext

https://github.com/ashdehghan/neext

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: ashdehghan
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 48.2 MB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 5
Created over 2 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License

README.md

NEExT: Network Embedding Experimentation Toolkit

NEExT is a powerful Python framework for graph analysis, embedding computation, and machine learning on graph-structured data. It provides a unified interface for working with different graph backends (NetworkX and iGraph), computing node features, generating graph embeddings, and training machine learning models.

📚 Documentation

Detailed documentation is available in the docs directory. Build it locally or visit the online documentation at NEExT Documentation.

🌟 Features

  • Flexible Graph Handling

    • Support for both NetworkX and iGraph backends
    • Automatic graph reindexing and largest component filtering
    • Node sampling capabilities for large graphs
    • Rich attribute support for nodes and edges
  • Comprehensive Node Features

    • PageRank
    • Degree Centrality
    • Closeness Centrality
    • Betweenness Centrality
    • Eigenvector Centrality
    • Clustering Coefficient
    • Local Efficiency
    • LSME (Local Structural Motif Embeddings)
  • Graph Embeddings

    • Approximate Wasserstein
    • Exact Wasserstein
    • Sinkhorn Vectorizer
    • Customizable embedding dimensions
  • Machine Learning Integration

    • Classification and regression support
    • Dataset balancing options
    • Cross-validation with customizable splits
    • Feature importance analysis

Custom Node Feature Functions

NEExT allows you to define and compute your own custom node feature functions alongside the built-in ones. This provides great flexibility for experimenting with novel graph metrics.

Defining a Custom Feature Function:

Your custom feature function must adhere to the following structure:

  1. Input: It must accept a single argument, which will be a graph object. This object provides access to the graph's structure (nodes, edges) and properties (e.g., graph.nodes, graph.graph_id, graph.G which is the underlying NetworkX or iGraph object).
  2. Output: It must return a pandas.DataFrame with the following specific columns in order:
    • "node_id": Identifiers for the nodes for which features are computed.
    • "graph_id": The identifier of the graph to which these nodes belong.
    • One or more feature columns: These columns should contain the computed feature values. The naming convention for these columns should ideally follow the pattern your_feature_name_0, your_feature_name_1, etc., if your feature has multiple components or is expanded over hops (though a single feature column like your_feature_name is also acceptable).

Example:

Here's how you can define a simple custom feature function and use it:

```python import pandas as pd

1. Define your custom feature function

This function must be defined at the top level of your script/module

if you plan to use multiprocessing (n_jobs != 1).

def mynodedegreesquared(graph): nodes = list(graph.nodes) # or range(graph.G.vcount()) for igraph if nodes are 0-indexed graphid = graph.graph_id

if hasattr(graph.G, 'degree'): # Handles both NetworkX and iGraph
    if isinstance(graph.G, nx.Graph): # NetworkX
        degrees = [graph.G.degree(n) for n in nodes]
    else: # iGraph
        degrees = graph.G.degree(nodes)
else:
    raise TypeError("Graph object does not have a degree method.")

degree_squared_values = [d**2 for d in degrees]

df = pd.DataFrame({
    'node_id': nodes,
    'graph_id': graph_id,
    'degree_sq_0': degree_squared_values
})
# Ensure the correct column order
return df[['node_id', 'graph_id', 'degree_sq_0']]

2. Prepare the list of custom feature methods

myfeaturemethods = [ {"featurename": "mydegreesquared", "featurefunction": mynodedegree_squared} ]

3. Pass it to computenodefeatures

Initialize NEExT and load your graph_collection as shown in the Quick Start

nxt = NEExT()

graphcollection = nxt.readfrom_csv(...)

features = nxt.computenodefeatures( graphcollection=graphcollection, featurelist=["pagerank", "mydegreesquared"], # Include your custom feature name featurevectorlength=3, # Applies to built-in features that use it myfeaturemethods=myfeaturemethods )

print(features.features_df.head()) ```

When you include "my_degree_squared" in the feature_list and provide my_feature_methods, NEExT will automatically register and compute your custom function. If "all" is in feature_list, your custom registered function will also be included in the computation.

📦 Installation

Basic Installation

bash pip install NEExT

Development Installation

```bash

Clone the repository

git clone https://github.com/ashdehghan/NEExT.git cd NEExT

Install with development dependencies

pip install -e ".[dev]" ```

Additional Components

```bash

For running tests

pip install -e ".[test]"

For building documentation

pip install -e ".[docs]"

For running experiments

pip install -e ".[experiments]"

Install all components

pip install -e ".[dev,test,docs,experiments]" ```

🚀 Quick Start

Basic Usage

```python from NEExT import NEExT

Initialize the framework

nxt = NEExT() nxt.setloglevel("INFO")

Load graph data

graphcollection = nxt.readfromcsv( edgespath="edges.csv", nodegraphmappingpath="nodegraphmapping.csv", graphlabelpath="graphlabels.csv", reindexnodes=True, filterlargestcomponent=True, graphtype="igraph" )

Compute node features

features = nxt.computenodefeatures( graphcollection=graphcollection, featurelist=["all"], featurevector_length=3 )

Compute graph embeddings

embeddings = nxt.computegraphembeddings( graphcollection=graphcollection, features=features, embeddingalgorithm="approxwasserstein", embedding_dimension=3 )

Train a classifier

modelresults = nxt.trainmlmodel( graphcollection=graphcollection, embeddings=embeddings, modeltype="classifier", sample_size=50 ) ```

Working with Large Graphs

NEExT supports node sampling for handling large graphs:

```python

Load graphs with 70% of nodes

graphcollection = nxt.readfromcsv( edgespath="edges.csv", nodegraphmappingpath="nodegraphmapping.csv", nodesample_rate=0.7 # Use 70% of nodes ) ```

Feature Importance Analysis

```python

Compute feature importance

importancedf = nxt.computefeatureimportance( graphcollection=graphcollection, features=features, featureimportancealgorithm="supervisedfast", embeddingalgorithm="approxwasserstein" ) ```

📊 Experiments

NEExT includes several pre-built experiments in the examples/experiments directory:

Node Sampling Experiment

Investigates the effect of node sampling on classifier accuracy: bash cd examples/experiments python node_sampling_experiments.py

📝 Input File Formats

edges.csv

csv src_node_id,dest_node_id 0,1 1,2 ...

nodegraphmapping.csv

csv node_id,graph_id 0,1 1,1 2,2 ...

graph_labels.csv

csv graph_id,graph_label 1,0 2,1 ...

🛠️ Development

Running Tests

```bash

Run all tests

pytest

Run with coverage

pytest --cov=NEExT

Run specific test file

pytest tests/testnodesampling.py ```

Building Documentation

bash cd docs make html

Code Style

The project uses several tools for code quality: ```bash

Format code

black .

Sort imports

isort .

Check style

flake8 .

Type checking

mypy . ```

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests
  5. Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Authors

🙏 Acknowledgments

  • NetworkX team for the graph algorithms
  • iGraph team for the efficient graph operations
  • Scikit-learn team for machine learning components

📧 Contact

For questions and support: - Email: ash@anomalypoint.com - GitHub Issues: NEExT Issues

🔄 Version History

  • 0.1.0
    • Initial release
    • Basic graph operations
    • Node feature computation
    • Graph embeddings
    • Machine learning integration

Owner

  • Name: ashdehghan
  • Login: ashdehghan
  • Kind: organization

GitHub Events

Total
  • Create event: 7
  • Issues event: 1
  • Release event: 1
  • Watch event: 1
  • Delete event: 1
  • Push event: 65
  • Pull request review event: 1
  • Pull request review comment event: 1
  • Pull request event: 8
  • Fork event: 1
Last Year
  • Create event: 7
  • Issues event: 1
  • Release event: 1
  • Watch event: 1
  • Delete event: 1
  • Push event: 65
  • Pull request review event: 1
  • Pull request review comment event: 1
  • Pull request event: 8
  • Fork event: 1

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 35 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 11
  • Total maintainers: 3
pypi.org: neext

Network Embedding Experimentation Toolkit - A powerful framework for graph analysis, embedding computation, and machine learning on graph-structured data

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 35 Last month
Rankings
Dependent packages count: 10.1%
Average: 38.2%
Dependent repos count: 66.4%
Maintainers (3)
Last synced: 10 months ago

Dependencies

.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
setup.py pypi
  • arrow ==1.2.3
  • igraph ==0.11.3
  • jupyter ==1.0.0
  • karateclub ==1.2.2
  • matplotlib ==3.7.2
  • networkx ==2.8.8
  • node2vec ==0.4.6
  • numpy ==1.25.2
  • pandas ==2.0.3
  • plotly ==5.18.0
  • scikit-learn ==1.3.0
  • scipy ==1.11.2
  • tqdm ==4.65.0
  • umap-learn ==0.5.4
  • vectorizers ==0.2
  • xgboost ==2.0.2