smn

Deep Overlapping Community Search via Subspace Embeddings

https://github.com/simonqs/smn

Last synced: 9 months ago · JSON representation ·

Repository

Deep Overlapping Community Search via Subspace Embeddings

Basic Info

Host: GitHub
Owner: SimonQS
License: mit
Language: Python
Default Branch: main
Size: 14 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed 10 months ago

Metadata Files

Readme License Citation

README.md

This home repo contains the implementation for Facebook, MAG, citation networks (Cora, Citeseer, and Pubmed), and Reddit.

Dependencies

Our implementation works with PyTorch>=1.0.0 Install other dependencies: $ pip install -r requirement.txt

Data

We provide the citation network datasets under data/, which corresponds to the public data splits. Due to space limit, please download reddit dataset from FastGCN and put reddit_adj.npz, reddit.npz under data/.

Usage

```

Reproduce all results in the main experiment table

./exp.sh

training with default hyperparameters (e.g. OCS on FACEBOOK)

$ python main.py --dataset facebook

training with default hyperparameters (e.g. OCS on MAG: Computer Science)

$ python main.py --dataset mag_cs

training with default hyperparameters (e.g. Disjoint datasets, CORA)

$ python citation.py --dataset cora

training with default hyperparameters (e.g. Disjoint datasets, Reddit)

$ python reddit.py --dataset reddit ```

OCS vs OCIS

'--case 1' = OCS,

'--case 2' = OCIS

Model and Training Parameters

--no-cuda: Disables CUDA, using CPU instead (default: False).

--seed: Sets a random seed for reproducibility.

--epochs: Specifies the number of training epochs.

--heads: Sets the number of attention heads for multi-head models.

--lr: Initial learning rate.

--weight_decay: Weight decay for L2 regularization.

Model Design Choices

--hidden: Number of hidden units in the model.

--ssf: Method to apply sparsity, options include hard, soft, or none.

--loss: Loss function to use, with several options including focal and spatial losses.

--ssf_dim: Dimensionality of sparse subspace filters.

--sp_rate: Sparsity rate for sparse subspace filters.

--lammda: Penalty coefficient to control the regularization term.

--gamma, --alpha, --gammaneg, --gammapos: Parameters controlling the loss function's focus and balance aspects.

Community and Feature Specifications

--comm_size: Sets the community size for the search.

--cs: Community Search (CS) algorithm to apply; options include subcs and subtopk.

--dropout: Dropout rate to prevent overfitting.

--dataset: Dataset to use for training/testing (default: mag_cs).

--model: Model architecture to use, including options such as SGC, GCN, and SMN.

--feature: Feature type for input processing, with options like mul, cat, and adj.

Scenarios and Problem Settings

--case: Specifies the task scenario (1 for OCS, 2 for OCIS).

--normalization: Normalization strategy for the adjacency matrix, offering methods like NormLap, Lap, and AugNormAdj.

--hop: Degree of approximation, indicating k-hop adjacency.

--fb_num: Facebook dataset option, specifying a subset (0, 107, 348, 414, or 686).

Owner

Login: SimonQS
Kind: user

Repositories: 1
Profile: https://github.com/SimonQS

Citation (citation.py)

import time
import random, math
import argparse
import numpy as np
import torch
import torch.nn.functional as F
import torch.optim as optim
from torchvision.ops import sigmoid_focal_loss
from sklearn.metrics import f1_score
from utils import load_citation, sgc_precompute, set_seed, smn_precompute, centroid_distance, sub_cs, load_facebook, sub_topk
from models import get_model
from metrics import accuracy, f1, cs_accuracy, cs_eval_metrics
import pickle as pkl
import networkx as nx
from args import get_citation_args
from time import perf_counter
import csv

# Arguments
args = get_citation_args()

# setting random seeds
set_seed(args.seed, args.cuda)
    
adj, features, labels, idx_train, idx_val, idx_test = load_citation(args.dataset, 
                                                                args.normalization, 
                                                                cuda = True)
print(labels.shape, idx_train.shape, idx_val.shape, idx_test.shape)
model = get_model(args.model, features.size(1), labels.max().item()+1, 
                  args.hidden, args.ssf_dim, args.sp_rate, 
                  args.hop, args.heads, args.negative_slope, 
                  args.dropout, args.dataset, cuda = True) 

if args.model == "SMN": features_channel, precompute_time = smn_precompute(features, adj, args.hop)
elif args.model == "GCN": features_channel, precompute_time = smn_precompute(features, adj, 0)
elif args.model == "SGC": features_channel, precompute_time = sgc_precompute(features, adj, args.hop)

print("{:.4f}s".format(precompute_time))

def train_regression(model,
                     train_features, train_labels,
                     val_features, val_labels,
                     epochs=args.epochs, weight_decay=args.weight_decay,
                     lr=args.lr, dropout=args.dropout):
    set_seed(args.seed, args.cuda)

    optimizer = optim.Adam(model.parameters(), lr=lr,
                           weight_decay=weight_decay)
    t = perf_counter()
    gamma = 2.0  # Focusing parameter
    alpha = 0.25  # Balance parameter
    gamma_neg=0 # 4
    gamma_pos=4 # 0
    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()
        output, low_passing_filter, emb, spatial_loss, sigma = model(train_features)
        class_loss = F.cross_entropy(output, train_labels)

        spatial_loss = F.cross_entropy(spatial_loss, train_labels)
        if args.ssf:
            total_loss = 0.5 * class_loss / sigma[0]**2 + 0.5 * spatial_loss / sigma[1]**2 + model.l1_penalty()
        else:
            total_loss = class_loss
        total_loss.backward()
        optimizer.step()
        acc_train = accuracy(output, train_labels)
    train_time = perf_counter()-t

    with torch.no_grad():
        model.eval()
        output, low_passing_filter, emb, spatial_loss, sigma = model(val_features)
        acc_val = accuracy(output, val_labels)

    return model, acc_train, acc_val, train_time, low_passing_filter.transpose(0, 1)

def test_regression(model, test_features, test_labels):
    set_seed(args.seed, args.cuda)
    model.eval()
    return (accuracy(model(test_features)[0], test_labels), 
            model(test_features)[0], 
            model(test_features)[1].transpose(0, 1), 
            model(test_features)[2],
            model(test_features)[-1])

set_seed(args.seed, args.cuda)
if args.model == "SMN" or args.model == "SGC":
    (model, acc_train, acc_val, 
     train_time, low_passing_filter) = train_regression(model, features_channel[idx_train], 
                                                        labels[idx_train], features_channel[idx_val], 
                                                        labels[idx_val], args.epochs, 
                                                        args.weight_decay, args.lr, args.dropout)
    (acc_test, model_output, 
     low_passing_filter_test, emb, sigma) = test_regression(model, 
                                                            features_channel, 
                                                            labels)
    print(low_passing_filter[0])


    #----------------------------- Community Search -----------------------------#
    sample_weight = [1/adj[i]._nnz() for i in range(adj.shape[0])]
    query_nodes = torch.multinomial(idx_test.type(torch.FloatTensor).cuda(), 
                                    50, replacement = False).tolist()
    
    query_labels = torch.LongTensor(torch.argmax(model_output[torch.LongTensor(query_nodes)], dim = 1).cpu())

    if args.cs == 'sub_cs':
        communities, cs_time = sub_cs(adj, emb, query_nodes, 
                                      low_passing_filter_test[query_labels], 
                                      community_size = args.comm_size, 
                                      early_stop = 2, lp_filter = args.ssf)
    if args.cs == 'sub_topk':
        communities, cs_time = sub_topk(adj, emb, query_nodes, 
                                        low_passing_filter_test[query_labels], 
                                        community_size = args.comm_size, 
                                        early_stop = 2, lp_filter = args.ssf)
    cs_acc, cs_f1 = cs_eval_metrics(communities, query_nodes, labels)
    #----------------------------- Community Search -----------------------------#


print("Pre-compute time: {:.4f}s, train time: {:.4f}s, total: {:.4f}s, CS Time: {:.4f}s".format(precompute_time, 
                                                                                                train_time, 
                                                                                                precompute_time+train_time, 
                                                                                                cs_time))
print("Training Time: {}".format(round(precompute_time+train_time, 4)))
print("Query Time: {}".format(round(cs_time, 6)))
print('Test_acc/f1: {}'.format(torch.round(acc_test, decimals = 4)))
print("Accuracy: {}".format(round(cs_acc, 4)))
print("F1 Score: {}".format(round(cs_f1, 4)))

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

smn

Science Score: 31.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Dependencies

Data

Usage

Reproduce all results in the main experiment table

training with default hyperparameters (e.g. OCS on FACEBOOK)

training with default hyperparameters (e.g. OCS on MAG: Computer Science)

training with default hyperparameters (e.g. Disjoint datasets, CORA)

training with default hyperparameters (e.g. Disjoint datasets, Reddit)

OCS vs OCIS

Model and Training Parameters

Model Design Choices

Community and Feature Specifications

Scenarios and Problem Settings

Owner

Citation (citation.py)

GitHub Events

Total

Last Year