Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Repository
Deep Overlapping Community Search via Subspace Embeddings
Basic Info
- Host: GitHub
- Owner: SimonQS
- License: mit
- Language: Python
- Default Branch: main
- Size: 14 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
This home repo contains the implementation for Facebook, MAG, citation networks (Cora, Citeseer, and Pubmed), and Reddit.
Dependencies
Our implementation works with PyTorch>=1.0.0 Install other dependencies: $ pip install -r requirement.txt
Data
We provide the citation network datasets under data/, which corresponds to the public data splits.
Due to space limit, please download reddit dataset from FastGCN and put reddit_adj.npz, reddit.npz under data/.
Usage
```
Reproduce all results in the main experiment table
./exp.sh
training with default hyperparameters (e.g. OCS on FACEBOOK)
$ python main.py --dataset facebook
training with default hyperparameters (e.g. OCS on MAG: Computer Science)
$ python main.py --dataset mag_cs
training with default hyperparameters (e.g. Disjoint datasets, CORA)
$ python citation.py --dataset cora
training with default hyperparameters (e.g. Disjoint datasets, Reddit)
$ python reddit.py --dataset reddit ```
OCS vs OCIS
'--case 1' = OCS,
'--case 2' = OCIS
Model and Training Parameters
--no-cuda: Disables CUDA, using CPU instead (default: False).
--seed: Sets a random seed for reproducibility.
--epochs: Specifies the number of training epochs.
--heads: Sets the number of attention heads for multi-head models.
--lr: Initial learning rate.
--weight_decay: Weight decay for L2 regularization.
Model Design Choices
--hidden: Number of hidden units in the model.
--ssf: Method to apply sparsity, options include hard, soft, or none.
--loss: Loss function to use, with several options including focal and spatial losses.
--ssf_dim: Dimensionality of sparse subspace filters.
--sp_rate: Sparsity rate for sparse subspace filters.
--lammda: Penalty coefficient to control the regularization term.
--gamma, --alpha, --gammaneg, --gammapos: Parameters controlling the loss function's focus and balance aspects.
Community and Feature Specifications
--comm_size: Sets the community size for the search.
--cs: Community Search (CS) algorithm to apply; options include subcs and subtopk.
--dropout: Dropout rate to prevent overfitting.
--dataset: Dataset to use for training/testing (default: mag_cs).
--model: Model architecture to use, including options such as SGC, GCN, and SMN.
--feature: Feature type for input processing, with options like mul, cat, and adj.
Scenarios and Problem Settings
--case: Specifies the task scenario (1 for OCS, 2 for OCIS).
--normalization: Normalization strategy for the adjacency matrix, offering methods like NormLap, Lap, and AugNormAdj.
--hop: Degree of approximation, indicating k-hop adjacency.
--fb_num: Facebook dataset option, specifying a subset (0, 107, 348, 414, or 686).
Owner
- Login: SimonQS
- Kind: user
- Repositories: 1
- Profile: https://github.com/SimonQS
Citation (citation.py)
import time
import random, math
import argparse
import numpy as np
import torch
import torch.nn.functional as F
import torch.optim as optim
from torchvision.ops import sigmoid_focal_loss
from sklearn.metrics import f1_score
from utils import load_citation, sgc_precompute, set_seed, smn_precompute, centroid_distance, sub_cs, load_facebook, sub_topk
from models import get_model
from metrics import accuracy, f1, cs_accuracy, cs_eval_metrics
import pickle as pkl
import networkx as nx
from args import get_citation_args
from time import perf_counter
import csv
# Arguments
args = get_citation_args()
# setting random seeds
set_seed(args.seed, args.cuda)
adj, features, labels, idx_train, idx_val, idx_test = load_citation(args.dataset,
args.normalization,
cuda = True)
print(labels.shape, idx_train.shape, idx_val.shape, idx_test.shape)
model = get_model(args.model, features.size(1), labels.max().item()+1,
args.hidden, args.ssf_dim, args.sp_rate,
args.hop, args.heads, args.negative_slope,
args.dropout, args.dataset, cuda = True)
if args.model == "SMN": features_channel, precompute_time = smn_precompute(features, adj, args.hop)
elif args.model == "GCN": features_channel, precompute_time = smn_precompute(features, adj, 0)
elif args.model == "SGC": features_channel, precompute_time = sgc_precompute(features, adj, args.hop)
print("{:.4f}s".format(precompute_time))
def train_regression(model,
train_features, train_labels,
val_features, val_labels,
epochs=args.epochs, weight_decay=args.weight_decay,
lr=args.lr, dropout=args.dropout):
set_seed(args.seed, args.cuda)
optimizer = optim.Adam(model.parameters(), lr=lr,
weight_decay=weight_decay)
t = perf_counter()
gamma = 2.0 # Focusing parameter
alpha = 0.25 # Balance parameter
gamma_neg=0 # 4
gamma_pos=4 # 0
for epoch in range(epochs):
model.train()
optimizer.zero_grad()
output, low_passing_filter, emb, spatial_loss, sigma = model(train_features)
class_loss = F.cross_entropy(output, train_labels)
spatial_loss = F.cross_entropy(spatial_loss, train_labels)
if args.ssf:
total_loss = 0.5 * class_loss / sigma[0]**2 + 0.5 * spatial_loss / sigma[1]**2 + model.l1_penalty()
else:
total_loss = class_loss
total_loss.backward()
optimizer.step()
acc_train = accuracy(output, train_labels)
train_time = perf_counter()-t
with torch.no_grad():
model.eval()
output, low_passing_filter, emb, spatial_loss, sigma = model(val_features)
acc_val = accuracy(output, val_labels)
return model, acc_train, acc_val, train_time, low_passing_filter.transpose(0, 1)
def test_regression(model, test_features, test_labels):
set_seed(args.seed, args.cuda)
model.eval()
return (accuracy(model(test_features)[0], test_labels),
model(test_features)[0],
model(test_features)[1].transpose(0, 1),
model(test_features)[2],
model(test_features)[-1])
set_seed(args.seed, args.cuda)
if args.model == "SMN" or args.model == "SGC":
(model, acc_train, acc_val,
train_time, low_passing_filter) = train_regression(model, features_channel[idx_train],
labels[idx_train], features_channel[idx_val],
labels[idx_val], args.epochs,
args.weight_decay, args.lr, args.dropout)
(acc_test, model_output,
low_passing_filter_test, emb, sigma) = test_regression(model,
features_channel,
labels)
print(low_passing_filter[0])
#----------------------------- Community Search -----------------------------#
sample_weight = [1/adj[i]._nnz() for i in range(adj.shape[0])]
query_nodes = torch.multinomial(idx_test.type(torch.FloatTensor).cuda(),
50, replacement = False).tolist()
query_labels = torch.LongTensor(torch.argmax(model_output[torch.LongTensor(query_nodes)], dim = 1).cpu())
if args.cs == 'sub_cs':
communities, cs_time = sub_cs(adj, emb, query_nodes,
low_passing_filter_test[query_labels],
community_size = args.comm_size,
early_stop = 2, lp_filter = args.ssf)
if args.cs == 'sub_topk':
communities, cs_time = sub_topk(adj, emb, query_nodes,
low_passing_filter_test[query_labels],
community_size = args.comm_size,
early_stop = 2, lp_filter = args.ssf)
cs_acc, cs_f1 = cs_eval_metrics(communities, query_nodes, labels)
#----------------------------- Community Search -----------------------------#
print("Pre-compute time: {:.4f}s, train time: {:.4f}s, total: {:.4f}s, CS Time: {:.4f}s".format(precompute_time,
train_time,
precompute_time+train_time,
cs_time))
print("Training Time: {}".format(round(precompute_time+train_time, 4)))
print("Query Time: {}".format(round(cs_time, 6)))
print('Test_acc/f1: {}'.format(torch.round(acc_test, decimals = 4)))
print("Accuracy: {}".format(round(cs_acc, 4)))
print("F1 Score: {}".format(round(cs_f1, 4)))
GitHub Events
Total
- Public event: 1
- Push event: 6
Last Year
- Public event: 1
- Push event: 6