BiRank

BiRank: Fast and Flexible Ranking on Bipartite Networks with R and Python - Published in JOSS (2020)

https://github.com/brianaronson/birankr

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    2 of 6 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Sociology Social Sciences - 40% confidence
Artificial Intelligence and Machine Learning Computer Science - 40% confidence
Last synced: 6 months ago · JSON representation

Repository

CRAN package for estimating various rank (centrality) measures of nodes in bipartite graphs (two-mode networks).

Basic Info
  • Host: GitHub
  • Owner: BrianAronson
  • License: other
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 885 KB
Statistics
  • Stars: 40
  • Watchers: 2
  • Forks: 7
  • Open Issues: 3
  • Releases: 1
Created about 6 years ago · Last pushed over 4 years ago
Metadata Files
Readme Changelog License

README.md

BiRank R and Python package

JOSS paper: DOI

Python package: PyPI version Downloads

R package: Travis build status R Downloads

Bipartite (two-mode) networks are ubiquitous. When calculating node centrality measures in bipartite networks, a common approach is to apply PageRank on the one-mode projection of the network. However, the projection can cause information loss and distort the network topology. For better node ranking on bipartite networks, it is preferable to use a ranking algorithm that fully accounts for the topology of both modes of the network.

We present the BiRank package, which implements bipartite ranking algorithms HITS, CoHITS, BGRM, and BiRank. BiRank provides convenience options for incorporating node-level weights into rank estimations, allowing maximum flexibility for different purpose. It can efficiently handle networks with millions of nodes on a single midrange server. Both R and Python versions are available.

R version: birankr

Overview

CRAN package with highly efficient functions for estimating various rank (centrality) measures of nodes in bipartite graphs (two-mode networks) including HITS, CoHITS, BGRM, and BiRank. Also provides easy-to-use tools for incorporating or removing edge-weights during rank estimation, projecting two-mode graphs to one-mode, efficiently estimating PageRank in one-mode graphs, and for converting edgelists and matrices to sparseMatrix format. Best of all, the package's rank estimators can work directly with common formats of network data including edgelists (class data.frame, data.table, or tbl_df) and adjacency matrices (class matrix or dgCMatrix).

Installation

This package can be directly installed via CRAN with install.packages("birankr"). Alternatively, newest versions of this package can be installed with devtools::install_github("BrianAronson/birankr")

Example

Let's pretend we have a dataset (df) containing patient-provider ties (patient_id and provider_id) among providers that have ever prescribed an opioid:

r df <- data.frame( patient_id = sample(x = 1:10000, size = 10000, replace = T), provider_id = sample(x = 1:5000, size = 10000, replace = T) )

We are interested in identifying patients who are likely doctor shopping. We assume that a highly central patient in the patient-doctor network is likely to be a person who is deliberately identifying more "generous" opioid prescribers. We therefore estimate a patients' rank in this network with the CoHITS algorithm:

r df.rank <- br_cohits(data = df)

Note that rank estimates are scaled according to the size of the network, with more nodes tending to result in smaller ranks. Due to this, it is often advisable to rescale rank estimates more interpretable numbers. For example, we could rescale such that the mean rank = 1 with the following data.table syntax:

r df.rank <- data.table(df.rank) df.rank[, rank := rank/mean(rank)]

Finally, we decide to identify the IDs and ranks of the highest ranking patients in df:

r head(df.rank[order(rank, decreasing = T), ], 10)

For a more detailed example, check out examples/Marvelsocialnetwork.md, where we use the ranking algorithm to analyze the Marvel comic book social network.

Function overview

Below is a brief outline of each function in this package:

  • bipartite_rank
    • Estimates any type of bipartite rank.
  • br_bgrm
    • Estimates ranks with BGRM algorithm
  • br_birank
    • Estimates ranks with BiRank algorithm
  • br_cohits
    • Estimates ranks with CoHITS algorithm
  • br_hits
    • Estimates ranks with HITS algorithm
  • pagerank
    • Estimates ranks with PageRank algorithm
  • projecttoone_mode
    • Creates a one mode projection of a sparse matrix
  • sparsematrixfromedgelist
    • Creates a sparsematrix from an edgelist
  • sparsematrixfrommatrix
    • Creates a sparsematrix from a matrix
  • sparsematrixrmweights
    • Removes edge weights from a sparsematrix

Documentation

Full documentation of birankr can be found in birankr.pdf.

Tests

To run the unit tests, install the birankr and devtools packages and run:

devtools::test("birankr")

Python version: birankpy

History

  • Nov.10, 2021 (v1.0.1): drop support for python3.5; add support for python3.9

Overview

birankpy provides functions for estimating various rank measures of nodes in bipartite networks including HITS, CoHITS, BGRM, and BiRank. It can also project two-mode networks to one-mode, and estimate PageRank on it. birankpy allows user-defined edge weights. Implemented with sparse matrix, it's highly efficient.

Dependencies

  • networkx
  • pandas
  • numpy
  • scipy

Installation

Install with pip:

bash pip install birankpy

Example

Let's pretend we have an edge list edgelist_df containing ties between top nodes and bottom nodes:

| topnode | bottomnode | | -------- | ----------- | | 1 | a | | 1 | b | | 2 | a | | ... | .. | | 123 | z |

To performing BiRank on this bipartite network, just:

```python bn = birankpy.BipartiteNetwork()

bn.setedgelist(edgelistdf, topcol='topnode', bottomcol='bottomnode')

topbirankdf, bottombirankdf = bn.generate_birank() ```

For a more detailed example, check out examples/Marvelsocialnetwork.ipynb, where we use the ranking algorithm to analyze the Marvel comic book social network.

Documentation

See documentation for birankpy at birankpy doc.

Tests

To run the unit tests, first go to the tests directory and then run:

bash python test_birankpy.py

Community Guidelines

How to Contribute

In general, you can contribute to this project by creating issues. You are also welcome to contribute to the source code directly by forking the project, modifying the code, and creating pull requests. If you are not familiar with pull requests, check out this post. Please use clear and organized descriptions when creating issues and pull requests.

Bug Report and Support Request

You can use issues to report bugs and seek support. Before creating any new issues, please check for similar ones in the issue list first.

Owner

  • Name: Brian Aronson
  • Login: BrianAronson
  • Kind: user
  • Location: NY
  • Company: The Adecco Group

I am a data scientist at The Adecco Group with a PhD in Sociology

JOSS Publication

BiRank: Fast and Flexible Ranking on Bipartite Networks with R and Python
Published
July 10, 2020
Volume 5, Issue 51, Page 2315
Authors
Kai-Cheng Yang
Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN
Brian Aronson
Department of Sociology, Indiana University, Bloomington, IN
Yong-Yeol Ahn
Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN
Editor
Vincent Knight ORCID
Tags
bipartite network PageRank ranking centrality

GitHub Events

Total
  • Issues event: 1
  • Watch event: 4
Last Year
  • Issues event: 1
  • Watch event: 4

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 159
  • Total Committers: 6
  • Avg Commits per committer: 26.5
  • Development Distribution Score (DDS): 0.44
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
yangkc y****c@i****u 89
Brian Aronson b****n@g****m 63
Aronson b****s@i****u 4
YY Ahn y****l@g****m 1
Vince Knight v****t@g****m 1
John Boy 2****c 1
Committer Domains (Top 20 + Academic)
iu.edu: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 23
  • Total pull requests: 11
  • Average time to close issues: 14 days
  • Average time to close pull requests: about 22 hours
  • Total issue authors: 6
  • Total pull request authors: 4
  • Average comments per issue: 1.83
  • Average comments per pull request: 0.18
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • yangkcatiu (12)
  • Nikoleta-v3 (5)
  • gvegayon (3)
  • MichaelChirico (1)
  • lenkahas (1)
  • danielw2904 (1)
Pull Request Authors
  • yangkcatiu (8)
  • jboynyc (1)
  • MichaelChirico (1)
  • drvinceknight (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 104 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 4
  • Total maintainers: 1
pypi.org: birankpy

Ranking nodes in bipartite networks with efficiency and flexibility

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 104 Last month
Rankings
Dependent packages count: 9.9%
Stargazers count: 10.9%
Forks count: 12.6%
Average: 16.3%
Dependent repos count: 21.8%
Downloads: 26.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • Matrix * depends
  • R >= 3.4.0 depends
  • data.table * depends
  • testthat * suggests
setup.py pypi
  • networkx >=2.5
  • numpy >=1.16.2
  • pandas >=0.23.4
  • scipy >=1.2.0