ICAT

ICAT: The Interactive Corpus Analysis Tool - Published in JOSS (2025)

https://github.com/ornl/icat

Science Score: 96.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: acm.org, joss.theoj.org
  • Academic email domains
  • Institutional organization owner
    Organization ornl has institutional domain (software.ornl.gov)
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Earth and Environmental Sciences Physical Sciences - 40% confidence
Engineering Computer Science - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

Interactive machine learning dashboard for textual data exploration

Basic Info
Statistics
  • Stars: 4
  • Watchers: 3
  • Forks: 1
  • Open Issues: 18
  • Releases: 14
Created over 3 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License

README.md

ICAT logo

Interactive Corpus Analysis Tool

Code style: black PyPI version tests License status

The Interactive Corpus Analysis Tool (ICAT) is an interactive machine learning (IML) dashboard for unlabeled text datasets that allows a user to iteratively and visually define features, explore and label instances of their dataset, and train a logistic regression model on the fly as they do so to assist in filtering, searching, and labeling tasks.

ICAT Screenshot

ICAT is implemented using holoviz's panel library, so it can either directly be rendered like a widget in a jupyter lab instance, or incorporated as part of a standalone panel website.

Installation

ICAT can be installed via pip with:

pip install icat-iml

Documentation

The user guide and API documentation can be found at https://ornl.github.io/icat.

Visualization

The primary ring visualization is called AnchorViz, a technique from IML literature. (See Chen, Nan-Chen, et al. "AnchorViz: Facilitating classifier error discovery through interactive semantic data exploration")

We implemented an ipywidget version of AnchorViz and use it in this project, it can be found separately at https://github.com/ORNL/ipyanchorviz

Contributing

Contributions for improving ICAT are welcome! If you run into any problems, find bugs, or think of useful improvements and enhancements, feel free to open an issue.

If you add a feature or fix a bug yourself and want it considered for integration, feel free to open a pull request with the changes. Please provide a detailed description of what the pull request is doing and briefly list any significant changes made. If it's in regards to a specific issue, please include or link the issue number.

Citation

To cite usage of ICAT, please use the following bibtex:

bibtex @misc{doecode_105653, title = {Interactive Corpus Analysis Tool}, author = {Martindale, Nathan and Stewart, Scott}, abstractNote = {The Interactive Corpus Analysis Tool (ICAT) is an interactive machine learning dashboard for unlabeled text/natural language processing datasets that allows a user to iteratively and visually define features, explore and label instances of their dataset, and simultaneously train a logistic regression model. ICAT was created to allow subject matter experts in a specific domain to directly train their own models for unlabeled datasets visually, without needing to be a machine learning expert or needing to know how to code the models themselves. This approach allows users to directly leverage the power of machine learning, but critically, also involves the user in the development of the machine learning model.}, year = {2023}, month = {apr} }

Owner

  • Name: Oak Ridge National Laboratory
  • Login: ORNL
  • Kind: organization
  • Email: software@ornl.gov
  • Location: Oak Ridge TN

Software repositories from Oak Ridge National Laboratory

JOSS Publication

ICAT: The Interactive Corpus Analysis Tool
Published
June 10, 2025
Volume 10, Issue 110, Page 6873
Authors
Nathan Martindale ORCID
Oak Ridge National Laboratory
Scott L. Stewart ORCID
Oak Ridge National Laboratory
Editor
Josh Borrow ORCID
Tags
Machine Learning HCI Visual Analytics

GitHub Events

Total
  • Create event: 2
  • Release event: 2
  • Issues event: 4
  • Issue comment event: 5
  • Push event: 15
Last Year
  • Create event: 2
  • Release event: 2
  • Issues event: 4
  • Issue comment event: 5
  • Push event: 15

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 48
  • Total pull requests: 1
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 1 day
  • Total issue authors: 5
  • Total pull request authors: 1
  • Average comments per issue: 0.65
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 0
  • Average time to close issues: 21 days
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 0.8
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • WildfireXIII (39)
  • WarmCyan (4)
  • JBorrow (2)
  • jhagerer (1)
  • SamHames (1)
Pull Request Authors
  • WildfireXIII (1)
Top Labels
Issue Labels
enhancement (27) bug (10) documentation (2) tests (1) cleaning (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 40 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 15
  • Total maintainers: 2
pypi.org: icat-iml

Interactive Corpus Analysis Tool

  • Versions: 15
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 40 Last month
Rankings
Dependent packages count: 7.1%
Average: 27.1%
Forks count: 30.6%
Dependent repos count: 31.3%
Stargazers count: 39.4%
Maintainers (2)
Last synced: 4 months ago

Dependencies

.github/workflows/pages.yml actions
  • actions/checkout v3 composite
  • actions/configure-pages v3 composite
  • actions/deploy-pages v2 composite
  • actions/upload-pages-artifact v2 composite
.github/workflows/pre-commit.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
  • pre-commit/action v3.0.0 composite
.github/workflows/tests.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
pyproject.toml pypi
requirements.txt pypi
  • altair *
  • build *
  • ipyanchorviz *
  • ipyvuetify *
  • ipywidgets *
  • numpy *
  • pandas *
  • panel *
  • pre-commit *
  • pydata-sphinx-theme *
  • pytest *
  • pytest-mock *
  • scikit-learn *
  • sphinx *
  • sphinx-favicon *
  • twine *
setup.py pypi
  • altair *
  • ipyanchorviz *
  • ipyvuetify *
  • ipywidgets *
  • numpy *
  • pandas *
  • panel *
  • scikit-learn *
environment.yml conda
  • ipyvuetify
  • ipywidgets
  • jupyter
  • jupyterlab
  • nodejs
  • numpy
  • pandas
  • panel
  • pytest
  • pytest-mock
  • python 3.10.*
  • scikit-learn