lgi-ls

[NeurIPS 2023] Latent Graph Inference with Limited Supervision

https://github.com/jianglin954/lgi-ls

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

[NeurIPS 2023] Latent Graph Inference with Limited Supervision

Basic Info
Statistics
  • Stars: 13
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

LGI-LS (NeurIPS 2023)

Codes for the NeurIPS 2023 paper Latent Graph Inference with Limited Supervision.

Project page

Datasets

The Cora, Citeseer, and Pubmed datasets can be download from here. Please place the downloaded files in the folder data_tf. The ogbn-arxiv dataset will be loaded automatically.

Installation

bash conda create -n LGI python=3.7.2 conda activate LGI pip install torch==1.5.1 torchvision==0.6.1 pip install scipy==1.2.1 pip install scikit-learn==0.21.3 pip install dgl-cu102==0.5.2 pip install ogb==1.2.3 wget https://data.pyg.org/whl/torch-1.5.0%2Bcu102/torch_scatter-2.0.5-cp37-cp37m-linux_x86_64.whl wget https://data.pyg.org/whl/torch-1.5.0%2Bcu102/torch_sparse-0.6.5-cp37-cp37m-linux_x86_64.whl wget https://data.pyg.org/whl/torch-1.5.0%2Bcu102/torch_cluster-1.5.4-cp37-cp37m-linux_x86_64.whl wget https://data.pyg.org/whl/torch-1.5.0%2Bcu102/torch_spline_conv-1.2.0-cp37-cp37m-linux_x86_64.whl pip install torch_scatter-2.0.5-cp37-cp37m-linux_x86_64.whl pip install torch_sparse-0.6.5-cp37-cp37m-linux_x86_64.whl pip install torch_cluster-1.5.4-cp37-cp37m-linux_x86_64.whl pip install torch_spline_conv-1.2.0-cp37-cp37m-linux_x86_64.whl pip install torch-geometric==1.6.1

Usage

We provide GCN+KNN, GCN+KNN_U, and GCN+KNN_R as examples due to their simplicity and effectiveness. To test their performances on the Pubmed dataset, run the following command:

bash bash experiments.sh

The experimental results will be saved in the corresponding *.txt file.

Reference

@inproceedings{Jianglin2023LGI,
  title={Latent Graph Inference with Limited Supervision},
  author={Lu, Jianglin and Xu, Yi and Wang, Huan and Bai, Yue and Fu, Yun},
  booktitle={Advances in Neural Information Processing Systems},
  year={2023}
}

@inproceedings{fatemi2021slaps,
  title={SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks},
  author={Fatemi, Bahare and Asri, Layla El and Kazemi, Seyed Mehran},
  booktitle={Advances in Neural Information Processing Systems},
  year={2021}
}

Acknowledgement

Our codes are mainly based on SLAPS. For other comparison methods, please refer to their publicly available code repositories. We gratefully thank the authors for their contributions.

Owner

  • Name: Jianglin Lu
  • Login: Jianglin954
  • Kind: user
  • Location: Boston
  • Company: Northeastern University

I am a first-year Ph.D. student at Northeastern University, USA. My research interests mainly include computer vision, machine learning, and data mining.

Citation (citation_networks.py)

# The MIT License

# Copyright (c) 2016 Thomas Kipf

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

import pickle as pkl
import sys
import warnings

import numpy as np
import scipy.sparse as sp
import torch

warnings.simplefilter("ignore")


def parse_index_file(filename):
    """Parse index file."""
    index = []
    for line in open(filename):
        index.append(int(line.strip()))
    return index


def sample_mask(idx, l):
    """Create mask."""
    mask = np.zeros(l)
    mask[idx] = 1
    return np.array(mask, dtype=np.bool)


def load_citation_network(dataset_str):
    names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph']
    objects = []
    for i in range(len(names)):
        with open("data_tf/ind.{}.{}".format(dataset_str, names[i]), 'rb') as f:
            if sys.version_info > (3, 0):
                objects.append(pkl.load(f, encoding='latin1'))
            else:
                objects.append(pkl.load(f))
    x, y, tx, ty, allx, ally, graph = tuple(objects)

    test_idx_reorder = parse_index_file("data_tf/ind.{}.test.index".format(dataset_str))
    test_idx_range = np.sort(test_idx_reorder)

    if dataset_str == 'citeseer':
        test_idx_range_full = range(min(test_idx_reorder), max(test_idx_reorder) + 1)
        tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))
        tx_extended[test_idx_range - min(test_idx_range), :] = tx
        tx = tx_extended
        ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))
        ty_extended[test_idx_range - min(test_idx_range), :] = ty
        ty = ty_extended


    features = sp.vstack((allx, tx)).tolil()
    features[test_idx_reorder, :] = features[test_idx_range, :]

    labels = np.vstack((ally, ty))
    labels[test_idx_reorder, :] = labels[test_idx_range, :]
    idx_test = test_idx_range.tolist()
    idx_train = range(len(y))
    idx_val = range(len(y), len(y) + 500)

    train_mask = sample_mask(idx_train, labels.shape[0])
    val_mask = sample_mask(idx_val, labels.shape[0])
    test_mask = sample_mask(idx_test, labels.shape[0])

    features = torch.FloatTensor(features.todense())
    labels = torch.LongTensor(labels)
    train_mask = torch.BoolTensor(train_mask)
    val_mask = torch.BoolTensor(val_mask)
    test_mask = torch.BoolTensor(test_mask)

    nfeats = features.shape[1]
    for i in range(labels.shape[0]):
        sum_ = torch.sum(labels[i])
        if sum_ != 1:
            labels[i] = torch.tensor([1, 0, 0, 0, 0, 0])
    labels = (labels == 1).nonzero()[:, 1]
    nclasses = torch.max(labels).item() + 1

    return features, nfeats, labels, nclasses, train_mask, val_mask, test_mask


def load_citation_network_halftrain(dataset_str):
    names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph']
    objects = []
    for i in range(len(names)):
        with open("data_tf/ind.{}.{}".format(dataset_str, names[i]), 'rb') as f:
            if sys.version_info > (3, 0):
                objects.append(pkl.load(f, encoding='latin1'))
            else:
                objects.append(pkl.load(f))
    x, y, tx, ty, allx, ally, graph = tuple(objects)
    test_idx_reorder = parse_index_file("data_tf/ind.{}.test.index".format(dataset_str))
    test_idx_range = np.sort(test_idx_reorder)

    if dataset_str == 'citeseer':
        test_idx_range_full = range(min(test_idx_reorder), max(test_idx_reorder) + 1)
        tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))
        tx_extended[test_idx_range - min(test_idx_range), :] = tx
        tx = tx_extended
        ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))
        ty_extended[test_idx_range - min(test_idx_range), :] = ty
        ty = ty_extended

    features = sp.vstack((allx, tx)).tolil()
    features[test_idx_reorder, :] = features[test_idx_range, :]

    labels = np.vstack((ally, ty))
    labels[test_idx_reorder, :] = labels[test_idx_range, :]
    idx_test = test_idx_range.tolist()
    idx_train = range(len(y))
    idx_val = range(len(y), len(y) + 500)

    train_mask = sample_mask(idx_train, labels.shape[0])
    val_mask = sample_mask(idx_val, labels.shape[0])
    test_mask = sample_mask(idx_test, labels.shape[0])

    features = torch.FloatTensor(features.todense())
    labels = torch.LongTensor(labels)
    train_mask = torch.BoolTensor(train_mask)
    val_mask = torch.BoolTensor(val_mask)
    test_mask = torch.BoolTensor(test_mask)

    nfeats = features.shape[1]
    for i in range(labels.shape[0]):
        sum_ = torch.sum(labels[i])
        if sum_ != 1:
            labels[i] = torch.tensor([1, 0, 0, 0, 0, 0])
    labels = (labels == 1).nonzero()[:, 1]
    nclasses = torch.max(labels).item() + 1

    if dataset_str == 'pubmed':
        colum_sum = torch.zeros((1, 3))
    elif dataset_str == 'cora':
        colum_sum = torch.zeros((1, 7))
    elif dataset_str == 'citeseer':
        colum_sum = torch.zeros((1, 6))

    for iii in range(y.shape[0]):
        colum_sum = colum_sum + y[iii, :]
        if colum_sum.max() > 10:
            colum_sum = colum_sum - y[iii, :]
            train_mask[iii] = 0

    return features, nfeats, labels, nclasses, train_mask, val_mask, test_mask





def load_citation_network_calculate_starved_nodes(dataset_str):
    names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph']
    objects = []
    for i in range(len(names)):
        with open("data_tf/ind.{}.{}".format(dataset_str, names[i]), 'rb') as f:
            if sys.version_info > (3, 0):
                objects.append(pkl.load(f, encoding='latin1'))
            else:
                objects.append(pkl.load(f))
    x, y, tx, ty, allx, ally, graph = tuple(objects)

    num_vertices = len(graph)

    adjacency = [[0 for j in range(num_vertices)] for i in range(num_vertices)]

    for i in range(num_vertices):
        for j in graph[i]:
            adjacency[i][j] = 1

    adjacency = np.array(adjacency)



    test_idx_reorder = parse_index_file("data_tf/ind.{}.test.index".format(dataset_str))
    test_idx_range = np.sort(test_idx_reorder)

    if dataset_str == 'citeseer':
        # Fix citeseer dataset (there are some isolated nodes in the graph)
        # Find isolated nodes, add them as zero-vecs into the right position
        test_idx_range_full = range(min(test_idx_reorder), max(test_idx_reorder) + 1)
        tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))
        tx_extended[test_idx_range - min(test_idx_range), :] = tx
        tx = tx_extended
        ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))
        ty_extended[test_idx_range - min(test_idx_range), :] = ty
        ty = ty_extended


    features = sp.vstack((allx, tx)).tolil()
    features[test_idx_reorder, :] = features[test_idx_range, :]

    labels = np.vstack((ally, ty))
    labels[test_idx_reorder, :] = labels[test_idx_range, :]
    idx_test = test_idx_range.tolist()
    idx_train = range(len(y))
    idx_val = range(len(y), len(y) + 500)

    train_mask = sample_mask(idx_train, labels.shape[0])
    val_mask = sample_mask(idx_val, labels.shape[0])
    test_mask = sample_mask(idx_test, labels.shape[0])

    features = torch.FloatTensor(features.todense())
    labels = torch.LongTensor(labels)
    train_mask = torch.BoolTensor(train_mask)
    val_mask = torch.BoolTensor(val_mask)
    test_mask = torch.BoolTensor(test_mask)

    nfeats = features.shape[1]
    for i in range(labels.shape[0]):  
        sum_ = torch.sum(labels[i])
        if sum_ != 1:
            labels[i] = torch.tensor([1, 0, 0, 0, 0, 0])
    labels = (labels == 1).nonzero()[:, 1]
    nclasses = torch.max(labels).item() + 1

    return features, nfeats, labels, nclasses, train_mask, val_mask, test_mask, adjacency

GitHub Events

Total
Last Year