https://github.com/ashdehghan/neext
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: ashdehghan
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 48.2 MB
Statistics
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 5
Metadata Files
README.md
NEExT: Network Embedding Experimentation Toolkit
NEExT is a powerful Python framework for graph analysis, embedding computation, and machine learning on graph-structured data. It provides a unified interface for working with different graph backends (NetworkX and iGraph), computing node features, generating graph embeddings, and training machine learning models.
📚 Documentation
Detailed documentation is available in the docs directory. Build it locally or visit the online documentation at NEExT Documentation.
🌟 Features
Flexible Graph Handling
- Support for both NetworkX and iGraph backends
- Automatic graph reindexing and largest component filtering
- Node sampling capabilities for large graphs
- Rich attribute support for nodes and edges
Comprehensive Node Features
- PageRank
- Degree Centrality
- Closeness Centrality
- Betweenness Centrality
- Eigenvector Centrality
- Clustering Coefficient
- Local Efficiency
- LSME (Local Structural Motif Embeddings)
Graph Embeddings
- Approximate Wasserstein
- Exact Wasserstein
- Sinkhorn Vectorizer
- Customizable embedding dimensions
Machine Learning Integration
- Classification and regression support
- Dataset balancing options
- Cross-validation with customizable splits
- Feature importance analysis
Custom Node Feature Functions
NEExT allows you to define and compute your own custom node feature functions alongside the built-in ones. This provides great flexibility for experimenting with novel graph metrics.
Defining a Custom Feature Function:
Your custom feature function must adhere to the following structure:
- Input: It must accept a single argument, which will be a
graphobject. This object provides access to the graph's structure (nodes, edges) and properties (e.g.,graph.nodes,graph.graph_id,graph.Gwhich is the underlying NetworkX or iGraph object). - Output: It must return a
pandas.DataFramewith the following specific columns in order:-
"node_id": Identifiers for the nodes for which features are computed. -
"graph_id": The identifier of the graph to which these nodes belong. - One or more feature columns: These columns should contain the computed feature values. The naming convention for these columns should ideally follow the pattern
your_feature_name_0,your_feature_name_1, etc., if your feature has multiple components or is expanded over hops (though a single feature column likeyour_feature_nameis also acceptable).
-
Example:
Here's how you can define a simple custom feature function and use it:
```python import pandas as pd
1. Define your custom feature function
This function must be defined at the top level of your script/module
if you plan to use multiprocessing (n_jobs != 1).
def mynodedegreesquared(graph): nodes = list(graph.nodes) # or range(graph.G.vcount()) for igraph if nodes are 0-indexed graphid = graph.graph_id
if hasattr(graph.G, 'degree'): # Handles both NetworkX and iGraph
if isinstance(graph.G, nx.Graph): # NetworkX
degrees = [graph.G.degree(n) for n in nodes]
else: # iGraph
degrees = graph.G.degree(nodes)
else:
raise TypeError("Graph object does not have a degree method.")
degree_squared_values = [d**2 for d in degrees]
df = pd.DataFrame({
'node_id': nodes,
'graph_id': graph_id,
'degree_sq_0': degree_squared_values
})
# Ensure the correct column order
return df[['node_id', 'graph_id', 'degree_sq_0']]
2. Prepare the list of custom feature methods
myfeaturemethods = [ {"featurename": "mydegreesquared", "featurefunction": mynodedegree_squared} ]
3. Pass it to computenodefeatures
Initialize NEExT and load your graph_collection as shown in the Quick Start
nxt = NEExT()
graphcollection = nxt.readfrom_csv(...)
features = nxt.computenodefeatures( graphcollection=graphcollection, featurelist=["pagerank", "mydegreesquared"], # Include your custom feature name featurevectorlength=3, # Applies to built-in features that use it myfeaturemethods=myfeaturemethods )
print(features.features_df.head()) ```
When you include "my_degree_squared" in the feature_list and provide my_feature_methods, NEExT will automatically register and compute your custom function. If "all" is in feature_list, your custom registered function will also be included in the computation.
📦 Installation
Basic Installation
bash
pip install NEExT
Development Installation
```bash
Clone the repository
git clone https://github.com/ashdehghan/NEExT.git cd NEExT
Install with development dependencies
pip install -e ".[dev]" ```
Additional Components
```bash
For running tests
pip install -e ".[test]"
For building documentation
pip install -e ".[docs]"
For running experiments
pip install -e ".[experiments]"
Install all components
pip install -e ".[dev,test,docs,experiments]" ```
🚀 Quick Start
Basic Usage
```python from NEExT import NEExT
Initialize the framework
nxt = NEExT() nxt.setloglevel("INFO")
Load graph data
graphcollection = nxt.readfromcsv( edgespath="edges.csv", nodegraphmappingpath="nodegraphmapping.csv", graphlabelpath="graphlabels.csv", reindexnodes=True, filterlargestcomponent=True, graphtype="igraph" )
Compute node features
features = nxt.computenodefeatures( graphcollection=graphcollection, featurelist=["all"], featurevector_length=3 )
Compute graph embeddings
embeddings = nxt.computegraphembeddings( graphcollection=graphcollection, features=features, embeddingalgorithm="approxwasserstein", embedding_dimension=3 )
Train a classifier
modelresults = nxt.trainmlmodel( graphcollection=graphcollection, embeddings=embeddings, modeltype="classifier", sample_size=50 ) ```
Working with Large Graphs
NEExT supports node sampling for handling large graphs:
```python
Load graphs with 70% of nodes
graphcollection = nxt.readfromcsv( edgespath="edges.csv", nodegraphmappingpath="nodegraphmapping.csv", nodesample_rate=0.7 # Use 70% of nodes ) ```
Feature Importance Analysis
```python
Compute feature importance
importancedf = nxt.computefeatureimportance( graphcollection=graphcollection, features=features, featureimportancealgorithm="supervisedfast", embeddingalgorithm="approxwasserstein" ) ```
📊 Experiments
NEExT includes several pre-built experiments in the examples/experiments directory:
Node Sampling Experiment
Investigates the effect of node sampling on classifier accuracy:
bash
cd examples/experiments
python node_sampling_experiments.py
📝 Input File Formats
edges.csv
csv
src_node_id,dest_node_id
0,1
1,2
...
nodegraphmapping.csv
csv
node_id,graph_id
0,1
1,1
2,2
...
graph_labels.csv
csv
graph_id,graph_label
1,0
2,1
...
🛠️ Development
Running Tests
```bash
Run all tests
pytest
Run with coverage
pytest --cov=NEExT
Run specific test file
pytest tests/testnodesampling.py ```
Building Documentation
bash
cd docs
make html
Code Style
The project uses several tools for code quality: ```bash
Format code
black .
Sort imports
isort .
Check style
flake8 .
Type checking
mypy . ```
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests
- Submit a pull request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
👥 Authors
- Ash Dehghan - ash.dehghan@gmail.com
🙏 Acknowledgments
- NetworkX team for the graph algorithms
- iGraph team for the efficient graph operations
- Scikit-learn team for machine learning components
📧 Contact
For questions and support: - Email: ash@anomalypoint.com - GitHub Issues: NEExT Issues
🔄 Version History
- 0.1.0
- Initial release
- Basic graph operations
- Node feature computation
- Graph embeddings
- Machine learning integration
Owner
- Name: ashdehghan
- Login: ashdehghan
- Kind: organization
- Repositories: 1
- Profile: https://github.com/ashdehghan
GitHub Events
Total
- Create event: 7
- Issues event: 1
- Release event: 1
- Watch event: 1
- Delete event: 1
- Push event: 65
- Pull request review event: 1
- Pull request review comment event: 1
- Pull request event: 8
- Fork event: 1
Last Year
- Create event: 7
- Issues event: 1
- Release event: 1
- Watch event: 1
- Delete event: 1
- Push event: 65
- Pull request review event: 1
- Pull request review comment event: 1
- Pull request event: 8
- Fork event: 1
Packages
- Total packages: 1
-
Total downloads:
- pypi 35 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 11
- Total maintainers: 3
pypi.org: neext
Network Embedding Experimentation Toolkit - A powerful framework for graph analysis, embedding computation, and machine learning on graph-structured data
- Homepage: https://github.com/ashdehghan/NEExT
- Documentation: https://neext.readthedocs.io
- License: MIT
-
Latest release: 0.2.10
published 12 months ago
Rankings
Dependencies
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
- arrow ==1.2.3
- igraph ==0.11.3
- jupyter ==1.0.0
- karateclub ==1.2.2
- matplotlib ==3.7.2
- networkx ==2.8.8
- node2vec ==0.4.6
- numpy ==1.25.2
- pandas ==2.0.3
- plotly ==5.18.0
- scikit-learn ==1.3.0
- scipy ==1.11.2
- tqdm ==4.65.0
- umap-learn ==0.5.4
- vectorizers ==0.2
- xgboost ==2.0.2