MiNAA

MiNAA: Microbiome Network Alignment Algorithm - Published in JOSS (2024)

https://github.com/solislemuslab/minaa

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    3 of 4 committers (75.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

graal graph-algorithm graphlets interaction-network microbiome network-alignment orca
Last synced: 6 months ago · JSON representation

Repository

MiNAA aligns a pair of networks based their topologies and biologies.

Basic Info
  • Host: GitHub
  • Owner: solislemuslab
  • License: mit
  • Language: C++
  • Default Branch: main
  • Homepage:
  • Size: 3.82 MB
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 5
  • Open Issues: 0
  • Releases: 4
Topics
graal graph-algorithm graphlets interaction-network microbiome network-alignment orca
Created almost 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme Contributing License

README.md

MiNAA: Microbiome Network Alignment Algorithm

GitHub Releases GitHub license GitHub Issues  status

Description

MiNAA takes as input a pair of node-edge networks, and finds a correspondance between them such that each node in one is mapped to its most similar node in the other. MiNAA is capable of using both topological (structural) information about the network, and biological information about the taxa each node represents, in order to produce a good approximation of the optimal alignment. Due to the complexity of this task, an approximation is the best that can be done in an efficient runtime. Network alignment in this setting is done primarily for comparative purposes. For example, an alignment might map clusters of taxa to each other, revealing conserved or analogous functions between microbial communities. See our software note for additional details.

Requirements

This program requires C++20 or higher, and g++.

Compilation

Unix

bash make

Windows

bash mkdir obj make

In addition to C++20 and g++, Windows requires a special means to run the provided makefile. The MinGW Package Manager provides a lightweight make function. It is recommended to download MinGW here, and follow this guide for installation, however any method for compiling C++ using g++ should suffice.

Usage

This utility has the form ./minaa.exe <G> <H> [-B=bio] [-a=alpha] [-b=beta].

Required Arguments (ordered)

  1. G; a network to align.
  2. H; a network to align.
  • Require:
    • The networks are represented by adjacency matrices in CSV format, with labels in both the first column and row.
    • The CSV delimiter must be one of {comma, semicolon, space, tab}, and will be detected automatically.
    • |G| is lesser or equal to |H|.
  • Notes:
    • Any nonzero entry is considered an edge.

Optional Arguments (unordered)

Common

  • -B=: the path to the biological cost matrix file.
    • Require: a CSV adjacency matrix where the first column consists of the labels of G, in order, and first row consists of the labels of H, in order.
    • Default: the algorithm will run using only topological calculations.
    • Notes:
    • The input matrix is normalized by MiNAA such that all entries are in range [0, 1].
    • The input is assumed to be a cost matrix. If it is a similarity matrix, use the -s option detailed below.
  • -a=: alpha; the GDV-edge weight balancer.
    • Require: a real number in range [0, 1].
    • Default: 1 (100% GDV data).
  • -b=: beta; the topological-biological cost matrix balancer.
    • Require: a real number in range [0, 1].
    • Default: 1 (100% topological data).
  • -st=: similarity threshold; The similarity value above which aligned pairs are included in the output.
    • Require: a real number in range [0, 1].
    • Default: 0.
  • -c: conserved subgraphs; whether or not to output a list of the conserved subgraphs in the alignment between G and H.
    • Require: none.
    • Default: this list is not calculated or returned.
    • Note: We define a conserved subgraph as a connected subgraph of G whose nodes are aligned to a connected subgraph of H. See the Examples section for a visual.

Uncommon

  • -Galias=: an alias for the G file.
    • Require: a valid file name.
    • Default: the G file keeps its original name.
  • -Halias=: an alias for the H file.
    • Require: a valid file name.
    • Default: the H file keeps its original name.
  • -Balias=: an alias for the B file.
    • Require: a valid file name.
    • Default: the B file keeps its original name.
  • -p: passthrough; whether or not to write the input files into the output folder.
    • Require: none.
    • Default: the files are not passed through to the output folder.
    • Note: the output reflects the input data after having been processed by the algorithm, this is not a direct copy and paste.
  • -t: timestamp; the output folder's name includes the date and time of execution.
    • Require: none.
    • Default: the output folder's name does not include date and time.
  • -g: greekstamp; the output folder's name includes the values for alpha and beta.
    • Require: none.
    • Default: the output folder's name does not include the values for alpha and beta.
  • -s: similarity conversion; for each entry in the given biological matrix, the value (post normalization) is replaced with 1 - value.
    • Require: none.
    • Default: the given biological matrix is left as is.
    • Note: use this if and only if the provided biological matrix is a similarity matrix.

Outputs

  • G-H/: (where G, H are the input networks) The folder containing the output files specified below.
  • log.txt: record of the important details from the alignment.
  • G_gdvs.csv: (where G is the input network) the Graphlet Degree Vectors for network G.
  • H_gdvs.csv: (where H is the input network) the Graphlet Degree Vectors for network H.
  • top_costs.csv: the topological cost matrix.
  • bio_costs.csv: the biologocal cost matrix (as inputed). Not created unless biological input is given.
  • overall_costs.csv: the combination of the topological and biological cost matrix. Not created unless biological input is given.
  • alignment_list.csv: a complete list of all aligned nodes, with rows in the format g_node,h_node,similarity, descending acording to similarity. The first row in this list is the total cost of the alignment, or the sum of (1 - similarity) for all aligned pairs.
  • alignment_matrix.csv: a matrix form of the same alignment, where the first column and row are the labels from the two input networks, respectively.

Examples

On the left are adjacency matrices for simple networks G and H, and an imagined alignment matrix as returned by MiNAA. On the right is a visual depiction of G overlaying H, and the connected purple graph is what we call a conserved subgraph in the alignment of G and H. We color a node purple if that node in G is aligned to a node in H, and we color an edge purple if a pair of adjacent nodes in G are also adjacent in the nodes they're aligned to in H. Note that the blue edge, red edge, and red node e are not considered part of this conserved subgraph.

Example Execution

Examples of MiNAA's usage with real data and in-depth explanations can be found in the examples/ directory.

Simulations in the Manuscript

All scripts and instructions to reproduce the analyses in the manuscript can be found in the simulations/ directory.

Contributions, Questions, Issues, and Feedback

Users interested in expanding functionalities in MiNAA are welcome to do so. Issues reports are encouraged through Github's issue tracker. See details on how to contribute and report issues in CONTRIBUTING.md.

License

MiNAA is licensed under the MIT license. © Solis-Lemus Lab (2024).

Citation

If you use MiNAA in your work, we kindly ask that you cite the following paper:

bibtex @article{Nelson2024, doi = {10.21105/joss.05448}, url = {https://doi.org/10.21105/joss.05448}, year = {2024}, publisher = {The Open Journal}, volume = {9}, number = {96}, pages = {5448}, author = {Reed Nelson and Rosa Aghdam and Claudia Solis-Lemus}, title = {MiNAA: Microbiome Network Alignment Algorithm}, journal = {Journal of Open Source Software} }

Owner

  • Name: SolisLemus lab projects
  • Login: solislemuslab
  • Kind: organization

JOSS Publication

MiNAA: Microbiome Network Alignment Algorithm
Published
April 07, 2024
Volume 9, Issue 96, Page 5448
Authors
Reed Nelson
Department of Computer Science, University of Wisconsin-Madison, Madison, WI, United States of America
Rosa Aghdam
Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, United States of America
Claudia Solis-Lemus
Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, United States of America, Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI, United States of America
Editor
Gracielle Higino ORCID
Tags
network alignment microbiome graph matching

GitHub Events

Total
  • Watch event: 2
  • Create event: 1
Last Year
  • Watch event: 2
  • Create event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 111
  • Total Committers: 4
  • Avg Commits per committer: 27.75
  • Development Distribution Score (DDS): 0.559
Past Year
  • Commits: 3
  • Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Reed Nelson r****d@g****m 49
Reed Nelson r****6@w****u 45
Claudia Solis-Lemus s****s@w****u 15
Nelson r****n@s****u 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 3
  • Total pull requests: 13
  • Average time to close issues: 1 day
  • Average time to close pull requests: about 5 hours
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 2.33
  • Average comments per pull request: 0.0
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: about 22 hours
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • salix-d (2)
  • Becheler (1)
Pull Request Authors
  • reednel (19)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/draft-pdf.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite