CGIMP

CGIMP: Real-time exploration and covariate projection for self-organizing map datasets - Published in JOSS (2019)

https://github.com/boyle-lab/cgimp

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    4 of 4 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation

Repository

Clustered Genomic Interval Mapping Platform

Basic Info
  • Host: GitHub
  • Owner: Boyle-Lab
  • License: gpl-3.0
  • Language: JavaScript
  • Default Branch: master
  • Homepage:
  • Size: 93.9 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 2
  • Open Issues: 1
  • Releases: 1
Created almost 7 years ago · Last pushed over 3 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

DOI DOI

CGIMP

Clustered Genomic Interval Mapping Platform

Note:

Full documentation can be found here, or by navigating to https://cgimp.readthedocs.io. Documentation is made avaiable through Read the Docs.

System Requirements

A unix-like system with the following prerequisites: * A functioning Docker instance * A running web server (we recommend nginx)

Getting Started

  1. Clone the repository

  2. Pre-process your node and module data into JSON format. Example files are provided in CGIMP/data to show the proper format. Only "id", "node", and "loc" fields are required, but you can include any additional fields your data require. Example source data (.tsv.gz) and data processing scripts (.py) are also provided. Feel free to replace the files in this directory with your own to simplify configuration in step 3! Example data were prepared as follows: $ cd CGIMP/data $ gunzip module_classifications.tsv.gz all_modules.tsv.gz $ python nodes_to_json.py all_modules.tsv > dataMap.json $ python modules_to_nodes.py module_classifications.tsv > nodes.json $ cd ..

  3. Navigate to CGIMP/client/src and edit browserconfig.js to reflect your local network configuration, data file locations/names, and map dimensions, following the directions within the file on which fields to edit. Make sure port mappings match the host ports set up in step 5! ``` $ cd CGIMP/client/src $ vim browserconfig.js

    ...

    $ cd ../../ ```

  4. Navigate to the root CGIMP directory and build the Docker container: $ docker build -t cgimp .

  5. Run the Docker container with a mount to the working directory and appropriate port mappings. Ports are specified with '-p XXXX:YYYY', where XXXX is the host machine port and YYYY is the port on the docker container. $ docker run -it --name cgimp -v $(pwd):/home/node/$(basename $(pwd)) -p 3000:3000 -p 3001:3001 -p 9200:9200 -e LOCAL_USER_ID=`id -u $USER` -e LOCAL_GROUP_ID=`id -g $USER` -e LOCAL_USER_NAME=`id -un` -e LOCAL_GROUP_NAME=`id -gn` cgimp bash root@be51d9bd99b2:/$ exit

  6. Log in to the docker container with your own user account to install node.js dependencies. $ docker start cgimp $ docker exec -it cgimp gosu <your username> bash user@be51d9bd99b2:/$ cd home/node/CGIMP user@be51d9bd99b2:/$ ./configure.sh user@be51d9bd99b2:/$ exit

  7. Log in ot the docker container as root and fire up the server. ```

$ docker exec -it cgimp bash root@be51d9bd99b2:/$ cd home/node/CGIMP root@be51d9bd99b2:/$ npm start ```

  1. Open a web browser and go to the address:port you configured in step 4. Note that the browser will take more time to load the first time it is accessed because the data must be indexed for the search engine. Subsequent loads will be faster.

Note: If you run into browser errors (timeouts, etc.), or if search facets fail to appear, waiting a few minutes and reloading the page usually fixes things. If errors persist, try restarting the server (step 7).

Motivation

Dimensionality-reduction methods are widely used to break down complex datasets into more manageable subunits. For example, self-organizing maps (SOMs), a type of neural network, are capable of projecting high-dimensional data onto a two-dimensional grid topography that facilitates further analysis. In particular, projecting covariates onto these mappings can yield insights into how and why modules cluster together, giving clues to their underlying properties and potential functions within the system from which they were drawn. For example, SOMs have been used in computational genomics to distill co-occurence data for large sets of DNA binding proteins into common co-binding patterns . Projecting various genomic annotations onto these mappings has yielded insights into the biological processes and mechanisms associated with different co-binding patterns.

However, while multiple tools exist to produce SOMs and graphically render their results, none are designed for real-time data exploration and projection of covariate data, which generally requires additional steps outside the core software package. Furthermore, mapped outputs are static and non-interactive. Drilling down into the dataset generally requires manually obtaining slices of the data frame through a scripting language or API. Finally, making comparisons between maps is cumbersome, requiring preparation of multiple individual images through the same text-based interface.

(C)lustered (G)enomic (I)nterval (M)apping (P)latform (CGIMP) is a web application that addresses these limitations by enabling real-time analysis of self-organizing maps for genomics datasets. CGIMP takes two inputs: a JSON file describing the modules from a genomic dataset that has been classified and labeled by an SOM algorithm, and a separate JSON with descriptive data for each node in the map grid. Given these inputs, it will automatically render an interactive map image to the screen and provide a set of data-driven search facets that allow direct exploration of the intrinsic properties of the dataset. It also provides the ability to directly intersect the underlying data with covariate datasets uploaded to the server as BED files. These are intersected with the dataset through a python adapter to the popular BEDTools suite.

Citation

If you use CGIMP in your work, please use the following citation:

Diehl et al., (2019). CGIMP: Real-time exploration and covariate projection for self-organizing map datasets. Journal of Open Source Software, 4(39), 1520, https://doi.org/10.21105/joss.01520

Community Guidelines

Bug reports and requests for improvements, optimizations, and additional features are welcomed! Please feel free to make a post to CGIMP's Issue Tracker on github or follow the guidelines in the CONTRIBUTING document.

Owner

  • Name: The Boyle Lab
  • Login: Boyle-Lab
  • Kind: organization
  • Email: apboyle@umich.edu
  • Location: University of Michigan

JOSS Publication

CGIMP: Real-time exploration and covariate projection for self-organizing map datasets
Published
July 10, 2019
Volume 4, Issue 39, Page 1520
Authors
Adam G. Diehl ORCID
Department of Computational Medicine and Bioinformatics, University of Michigan
Alan P. Boyle ORCID
Department of Computational Medicine and Bioinformatics, University of Michigan, Department of Human Genetics, University of Michigan
Editor
Lorena Pantano ORCID
Tags
neural networks self organizing maps genomics data visualization Javascript

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 158
  • Total Committers: 4
  • Avg Commits per committer: 39.5
  • Development Distribution Score (DDS): 0.297
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
adadiehl a****l@u****u 111
jacklu j****u@u****u 42
Adam Diehl a****7@c****u 3
Alan Boyle a****e@u****u 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 year
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.25
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (2)
  • JackLu1 (1)
  • adadiehl (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (2)

Dependencies

backend/package.json npm
  • @elastic/elasticsearch ^7.0.0-rc.2
  • ajv ^6.10.0
  • axios ^0.18.0
  • body-parser ^1.18.3
  • cors ^2.8.5
  • elasticsearch ^15.4.1
  • express ^4.16.4
  • express-fileupload ^1.1.4
  • formidable ^1.2.1
  • fs-extra ^7.0.1
  • morgan ^1.9.1
  • python-shell ^1.0.7
  • react ^16.8.3
  • react-dom ^16.5.0
  • react-native ^0.59.2
  • react-scripts 1.1.5
client/package.json npm
  • @appbaseio/reactivesearch ^3.0.0-rc.19
  • @elastic/elasticsearch ^7.0.0-rc.2
  • @material-ui/core ^3.9.3
  • @material-ui/icons ^3.0.2
  • ajv ^6.10.0
  • ajv-keywords ^3.4.0
  • axios ^0.18.0
  • classnames ^2.2.6
  • elasticsearch ^15.4.1
  • filepond ^4.3.5
  • jquery ^1.9.1
  • lunr ^2.3.6
  • prop-types ^15.7.2
  • react ^16.8.3
  • react-art ^16.8.6
  • react-color ^2.17.3
  • react-dom ^16.5.0
  • react-filepond ^7.0.1
  • react-native ^0.59.2
  • react-native-web ^0.11.2
  • react-router-dom ^5.0.0
  • react-router-native ^5.0.0
  • react-scripts 1.1.5
package.json npm
  • concurrently ^4.1.0
  • d3 ^5.9.2
  • monk ^6.0.6