data

Data repository for PyGOD

https://github.com/pygod-team/data

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Data repository for PyGOD

Basic Info
  • Host: GitHub
  • Owner: pygod-team
  • License: mit
  • Default Branch: main
  • Size: 87.2 MB
Statistics
  • Stars: 41
  • Watchers: 2
  • Forks: 3
  • Open Issues: 1
  • Releases: 0
Created over 3 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

Data Repository for PyGOD

The statistics of the available dataset (#Con. means the number of contextual outliers, while #Strct. means the number of structural outliers. The number of outliers is slightly less than the sum of two types of outliers because of the intersection between two types of outliers.):

| Dataset | Type | #Nodes | #Edges | #Feat | Avg. Degree | #Con. | #Strct. | #Outliers | Outlier Ratio | | ------------ | --------- | ------ | ------- | ------ | ----------- | ----- | ------- | --------- | ------------- | | 'weibo' | organic | 8,405 | 407,963 | 400 | 48.5 | - | - | 868 | 10.3% | | 'reddit' | organic | 10,984 | 168,016 | 64 | 15.3 | - | - | 366 | 3.3% | | 'disney' | organic | 124 | 335 | 28 | 2.7 | - | - | 6 | 4.8% | | 'books' | organic | 1,418 | 3,695 | 21 | 2.6 | - | - | 28 | 2.0% | | 'enron' | organic | 13,533 | 176,987 | 18 | 13.1 | - | - | 5 | 0.04% | | 'injcora' | injected | 2,708 | 11,060 | 1,433 | 4.1 | 70 | 70 | 138 | 5.1% | | 'injamazon' | injected | 13,752 | 515,042 | 767 | 37.2 | 350 | 350 | 694 | 5.0% | | 'injflickr' | injected | 89,250 | 933,804 | 500 | 10.5 | 2,240 | 2,240 | 4,414 | 4.9% | | 'gentime' | generated | 1,000 | 5,746 | 64 | 5.7 | 100 | 100 | 189 | 18.9% | | 'gen100' | generated | 100 | 618 | 64 | 6.2 | 10 | 10 | 18 | 18.0% | | 'gen500' | generated | 500 | 2,662 | 64 | 5.3 | 10 | 10 | 20 | 4.0% | | 'gen1000' | generated | 1,000 | 4,936 | 64 | 4.9 | 10 | 10 | 20 | 2.0% | | 'gen5000' | generated | 5,000 | 24,938 | 64 | 5.0 | 10 | 10 | 20 | 0.4% | | 'gen_10000' | generated | 10,000 | 49,614 | 64 | 5.0 | 10 | 10 | 20 | 0.2% |

To use the datasets:

python from pygod.utils import load_data data = load_data('weibo') # in PyG format Alternative download source in Baidu Disk (Chinese): https://pan.baidu.com/s/1afEZaygCRUYWJPtVbzuRYw Access Code: bond

For injected/generated datasets, the labels meanings are as follows.

  • 0: inlier
  • 1: contextual outlier only
  • 2: structural outlier only
  • 3: both contextual outlier and structural outlier

Examples to convert the labels are as follows:

python y = data.y.bool() # binary labels (inlier/outlier) yc = data.y >> 0 & 1 # contextual outliers ys = data.y >> 1 & 1 # structural outliers

Owner

  • Name: PyGOD Team
  • Login: pygod-team
  • Kind: organization
  • Email: dev@pygod.org

Maintaining A Python Library for Graph Outlier Detection (Anomaly Detection)

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this library, please cite it as below.
title: PyGOD
authors:
  - family-names: PyGOD Team
url: https://pygod.org
preferred-citation:
  type: conference-paper
  title: "BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs"
  authors:
    - family-names: Liu
      given-names: Kay
    - family-names: Dou
      given-names: Yingtong
    - family-names: Zhao
      given-names: Yue
    - family-names: Ding
      given-names: Xueying
    - family-names: Hu
      given-names: Xiyang
    - family-names: Zhang
      given-names: Ruitong
    - family-names: Ding
      given-names: Kaize
    - family-names: Chen
      given-names: Canyu
    - family-names: Peng
      given-names: Hao
    - family-names: Shu
      given-names: Kai
    - family-names: Sun
      given-names: Lichao
    - family-names: Li
      given-names: Jundong
    - family-names: Chen
      given-names: George H
    - family-names: Jia
      given-names: Zhihao
    - family-names: Yu
      given-names: Philip S
  collection-title: Advances in Neural Information Processing Systems 35
  collection-type: proceedings
  editors:
    - family-names: Koyejo
      given-names: S.
    - family-names: Mohamed
      given-names: S.
    - family-names: Agarwal
      given-names: A.
    - family-names: Belgrave
      given-names: D.
    - family-names: Cho
      given-names: K.
    - family-names: Oh
      given-names: A.
  start: 27021
  end: 27035
  year: 2022
  publisher:
    name: Curran Associates, Inc.
  url: https://proceedings.neurips.cc/paper_files/paper/2022/file/acc1ec4a9c780006c9aafd595104816b-Paper-Datasets_and_Benchmarks.pdf

GitHub Events

Total
  • Issues event: 1
  • Watch event: 5
  • Issue comment event: 1
  • Fork event: 1
Last Year
  • Issues event: 1
  • Watch event: 5
  • Issue comment event: 1
  • Fork event: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: 3 months
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Djerry-h (1)
  • uuice11 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels