i-analyzer

The great textmining tool that obviates all others

https://github.com/centrefordigitalhumanities/i-analyzer

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.5%) to scientific vocabulary

Keywords

corpus-linguistics corpus-search digital-history digital-humanities elasticsearch literary-studies text-analysis
Last synced: 6 months ago · JSON representation ·

Repository

The great textmining tool that obviates all others

Basic Info
  • Host: GitHub
  • Owner: CentreForDigitalHumanities
  • License: mit
  • Language: Python
  • Default Branch: develop
  • Homepage: https://ianalyzer.hum.uu.nl
  • Size: 61.3 MB
Statistics
  • Stars: 9
  • Watchers: 5
  • Forks: 3
  • Open Issues: 168
  • Releases: 56
Topics
corpus-linguistics corpus-search digital-history digital-humanities elasticsearch literary-studies text-analysis
Created almost 9 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

I-analyzer

DOI Actions Status

"The great text mining tool that obviates all others." — Julian Gonggrijp

I-analyzer is a web application for exploring corpora (large collections of texts). You can use I-analyzer to find relevant documents, or to make visualisations to understand broader trends in the corpus. The interface is designed to be accessible for users of all skill levels.

I-analyzer is primarily intended for academic research and higher education. We focus on data that is relevant for the humanities, but we are open to datasets that are relevant for other fields.

Contents

This repository contains the source code for the I-analyzer web application, which consists of a Django backend and Angular frontend.

For corpora included in I-analyzer, the backend includes a definition file that specifies how to read the source files, and how this data should be structured and presented in I-analyzer. This repository does not include the source data itself, beyond a few sample files for testing.

Usage

If you are interested in using I-analyzer, the most straightforward way to get started is to visit ianalyzer.hum.uu.nl. This server is maintained by the Research Software Lab and contains corpora focused on a variety of fields. We also maintain more specialised collections at PEACE portal and People & Parliament.

I-analyzer does not have an "upload data" option (yet!). If you are interested in using I-analyzer as a way to publish your dataset, or to make it easier to search and analyse, you can go about this two ways:

  • Contact us (see below for details) about hosting your dataset on one of our existing servers, or hosting a new server for your project.
  • Self-host I-analyzer. This would allow you to maintain full control over the data and who can access it. I-analyzer is open source software, so you are free to host it yourself, either as-is or with your own modifications. However, feel free to contact us with any questions or issues.

Development

The documentation directory contains documentation for developers. This includes installation instructions to set up an I-analyzer server.

Licence

The source code of I-analyzer is shared under an MIT licence. See LICENSE for the full licence statement.

Images

This licence does not cover the images used for corpora, which are licensed individually. These images are located in the corpora directory, in the "images" folder for each corpus.

Each image is accompanied by a *.license file that provides information on licensing for that image. If you wish to reuse or distribute this repository including these images, you will have to ensure that you comply with the license terms of the image as well.

Some images currently lack a licence file. We are working on providing clear copyright information for all images; until then, assume that these images are protected under copyright.

Citation

If you wish to cite this repository, please use the metadata provided in our CITATION.cff file.

If you wish to cite material that you accessed through I-analyzer, or you are not sure if you should also be citing this repository, please refer to the citation instructions in the user manual.

Contact

For questions, small feature suggestions, and bug reports, feel free to create an issue. If you don't have a Github account, you can also contact the Centre for Digital Humanities.

If you want to add a new corpus to I-analyzer, or have an idea for a project, please contact the Centre for Digital Humanities rather than making an issue, so we can discuss the possibilities with you.

Owner

  • Name: Centre for Digital Humanities
  • Login: CentreForDigitalHumanities
  • Kind: organization
  • Email: cdh@uu.nl
  • Location: Netherlands

Interdisciplinary centre for research and education in computational and data-driven methods in the humanities.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: I-Analyzer
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - name: 'Research Software Lab, Centre for Digital Humanities, Utrecht University'
    website: 'https://cdh.uu.nl/centre-for-digital-humanities/research-software-lab/'
    city: Utrecht
    country: NL
identifiers:
  - type: doi
    value: 10.5281/zenodo.8064133
repository-code: 'https://github.com/CentreForDigitalHumanities/I-analyzer'
url: 'https://ianalyzer.hum.uu.nl'
abstract: >-
  I-analyzer is a tool for exploring corpora (large
  collections of texts). You can use I-analyzer to find
  relevant documents, or to make visualisations to
  understand broader trends in the corpus. The interface is
  designed to be accessible for users of all skill levels.

  I-analyzer is primarily intended for academic research and
  higher education. We focus on data that is relevant for
  the humanities, but we are open to datasets that are
  relevant for other fields.
keywords:
  - text-mining
  - corpus research
  - data visualization
  - elasticsearch
  - natural language processing
license: MIT
version: 5.22.1
date-released: '2025-09-03'

GitHub Events

Total
  • Create event: 119
  • Release event: 8
  • Issues event: 98
  • Watch event: 1
  • Delete event: 111
  • Member event: 1
  • Issue comment event: 110
  • Push event: 380
  • Pull request review event: 159
  • Pull request event: 187
  • Pull request review comment event: 119
  • Fork event: 1
Last Year
  • Create event: 119
  • Release event: 8
  • Issues event: 98
  • Watch event: 1
  • Delete event: 111
  • Member event: 1
  • Issue comment event: 110
  • Push event: 380
  • Pull request review event: 159
  • Pull request event: 187
  • Pull request review comment event: 119
  • Fork event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 159
  • Total pull requests: 155
  • Average time to close issues: 12 months
  • Average time to close pull requests: 15 days
  • Total issue authors: 8
  • Total pull request authors: 9
  • Average comments per issue: 1.38
  • Average comments per pull request: 0.5
  • Merged pull requests: 105
  • Bot issues: 0
  • Bot pull requests: 20
Past Year
  • Issues: 59
  • Pull requests: 133
  • Average time to close issues: 28 days
  • Average time to close pull requests: 13 days
  • Issue authors: 4
  • Pull request authors: 7
  • Average comments per issue: 0.32
  • Average comments per pull request: 0.42
  • Merged pull requests: 86
  • Bot issues: 0
  • Bot pull requests: 20
Top Authors
Issue Authors
  • BeritJanssen (59)
  • lukavdplas (55)
  • jgonggrijp (23)
  • oktaal (14)
  • JeltevanBoheemen (2)
  • Meesch (2)
  • JosedeKruif (2)
  • JelmerVNuss (2)
Pull Request Authors
  • lukavdplas (68)
  • BeritJanssen (42)
  • dependabot[bot] (20)
  • oktaal (7)
  • JeltevanBoheemen (6)
  • Meesch (5)
  • bbonf (3)
  • ar-jan (3)
  • jgonggrijp (1)
Top Labels
Issue Labels
enhancement (35) frontend (33) bug (31) backend (29) code quality (15) corpus (12) question (9) needs-mockup (3) good first issue (3) affects-elasticsearch-index (3) new feature (3) on hold (2) visualisation (2) accessibility (2) wontfix (1) help wanted (1) dependencies (1) documentation (1) major (1)
Pull Request Labels
dependencies (20) backend (18) frontend (13) python (11) enhancement (10) javascript (9) code quality (9) affects-deployment (5) bug (5) affects-elasticsearch-index (4) on hold (2) corpus (2) documentation (2) major (1) new feature (1) visualisation (1)

Dependencies

.github/workflows/test.yml actions
  • actions/checkout v2 composite
  • actions/setup-node v1 composite
  • actions/setup-python v1 composite
  • postgres * docker
backend/Dockerfile docker
  • python 3.8-buster build
frontend/Dockerfile docker
  • node 14-alpine build
frontend/package.json npm
  • @angular-devkit/build-angular ~13.3.11 development
  • @angular-eslint/builder 13.5.0 development
  • @angular-eslint/eslint-plugin 13.5.0 development
  • @angular-eslint/eslint-plugin-template 13.5.0 development
  • @angular-eslint/schematics 13.5.0 development
  • @angular-eslint/template-parser 13.5.0 development
  • @angular/compiler-cli ^13.2.2 development
  • @angular/language-service ^13.2.2 development
  • @types/chart.js ^2.9.35 development
  • @types/d3 ^7.4.0 development
  • @types/jasmine ~3.6.0 development
  • @types/jasminewd2 ^2.0.8 development
  • @types/node ^6.14.10 development
  • @typescript-eslint/eslint-plugin 5.27.1 development
  • @typescript-eslint/parser 5.27.1 development
  • eslint ^8.17.0 development
  • eslint-plugin-import latest development
  • eslint-plugin-jsdoc latest development
  • eslint-plugin-prefer-arrow latest development
  • jasmine-core ^3.7.0 development
  • jasmine-spec-reporter ^5.0.0 development
  • karma ^6.3.16 development
  • karma-chrome-launcher ^3.1.0 development
  • karma-cli ^1.0.1 development
  • karma-coverage-istanbul-reporter ^3.0.2 development
  • karma-jasmine ^4.0.0 development
  • karma-jasmine-html-reporter ^1.5.0 development
  • pre-commit ^1.2.2 development
  • ts-node ~3.2.0 development
  • @angular/animations ^13.2.2
  • @angular/cdk 13
  • @angular/cli ^13.2.3
  • @angular/common ^13.2.2
  • @angular/compiler ^13.2.2
  • @angular/core ^13.2.2
  • @angular/forms ^13.2.2
  • @angular/localize ^13.2.2
  • @angular/platform-browser ^13.2.2
  • @angular/platform-browser-dynamic ^13.2.2
  • @angular/platform-server ^13.2.2
  • @angular/router ^13.2.2
  • @fortawesome/angular-fontawesome ^0.10.2
  • @fortawesome/fontawesome-svg-core ^6.1.1
  • @fortawesome/free-solid-svg-icons ^6.1.1
  • @types/file-saver ^1.3.1
  • @types/lodash ^4.14.149
  • balloon-css ^0.5.2
  • bulma ^0.5.1
  • bulma-switch ^2.0.0
  • chart.js ^3.7.1
  • chartjs-adapter-moment ^1.0.0
  • chartjs-plugin-zoom ^1.2.0
  • core-js ^2.6.11
  • d3 ^7.8.5
  • d3-cloud ^1.2.7
  • file-saver ^2.0.5
  • font-awesome ^4.7.0
  • html-to-image ^1.9.0
  • iv-viewer ^2.0.1
  • lodash ^4.17.21
  • marked ^4.1.1
  • moment ^2.29.4
  • ng2-pdf-viewer 6.3.2
  • ngx-cookie-service ^2.4.0
  • primeicons ^5.0.0
  • primeng 13.1.1
  • process ^0.11.10
  • rxjs ^6.5.5
  • rxjs-compat ^6.5.5
  • smoothscroll-polyfill ^0.3.6
  • terser-webpack-plugin ^5.3.0
  • tslib ^2.0.0
  • typescript 4.5.5
  • zone.js ~0.11.4
frontend/yarn.lock npm
  • 1242 dependencies
package.json npm
backend/requirements.in pypi
  • Django >=4.0.1,<5
  • Redis *
  • beautifulsoup4 *
  • celery *
  • dj-rest-auth *
  • django-livereload-server *
  • djangorestframework *
  • djangosaml2 *
  • elasticsearch *
  • gensim *
  • langcodes *
  • language_data *
  • lxml *
  • nltk *
  • numpy *
  • openpyxl *
  • pandas *
  • psycopg2 *
  • pypdf2 *
  • pytest *
  • pytest-celery *
  • pytest-django *
  • pytest-xdist *
  • scikit-learn *
  • textdistance *
  • tqdm *
backend/requirements.txt pypi
  • amqp ==5.1.1
  • asgiref ==3.6.0
  • async-timeout ==4.0.2
  • attrs ==22.2.0
  • beautifulsoup4 ==4.11.1
  • billiard ==3.6.4.0
  • celery ==5.2.7
  • certifi ==2023.7.22
  • cffi ==1.15.1
  • charset-normalizer ==3.0.1
  • click ==8.1.3
  • click-didyoumean ==0.3.0
  • click-plugins ==1.1.1
  • click-repl ==0.2.0
  • cryptography ==39.0.1
  • defusedxml ==0.7.1
  • dj-rest-auth ==2.2.7
  • django ==4.1.10
  • django-allauth ==0.52.0
  • django-livereload-server ==0.4
  • djangorestframework ==3.14.0
  • djangosaml2 ==1.5.6
  • elastic-transport ==8.4.0
  • elasticsearch ==8.6.0
  • elementpath ==4.1.1
  • et-xmlfile ==1.1.0
  • execnet ==1.9.0
  • fst-pso ==1.8.1
  • fuzzytm ==2.0.5
  • gensim ==4.3.0
  • idna ==3.4
  • iniconfig ==2.0.0
  • joblib ==1.2.0
  • kombu ==5.2.4
  • langcodes ==3.3.0
  • language-data ==1.1
  • lxml ==4.9.1
  • marisa-trie ==0.7.8
  • miniful ==0.0.6
  • nltk ==3.8.1
  • numpy ==1.24.1
  • oauthlib ==3.2.2
  • openpyxl ==3.1.2
  • packaging ==23.0
  • pandas ==1.5.3
  • pluggy ==1.0.0
  • prompt-toolkit ==3.0.36
  • psycopg2 ==2.9.5
  • pycparser ==2.21
  • pyfume ==0.2.25
  • pyjwt ==2.6.0
  • pyopenssl ==23.1.1
  • pypdf2 ==3.0.1
  • pysaml2 ==7.3.1
  • pytest ==7.2.0
  • pytest-celery ==0.0.0
  • pytest-django ==4.5.2
  • pytest-xdist ==3.1.0
  • python-dateutil ==2.8.2
  • python3-openid ==3.2.0
  • pytz ==2022.7
  • redis ==4.4.4
  • regex ==2022.10.31
  • requests ==2.31.0
  • requests-oauthlib ==1.3.1
  • scikit-learn ==1.2.1
  • scipy ==1.10.0
  • simpful ==2.9.0
  • six ==1.16.0
  • smart-open ==6.3.0
  • soupsieve ==2.3.2.post1
  • sqlparse ==0.4.4
  • textdistance ==4.5.0
  • threadpoolctl ==3.1.0
  • tornado ==6.3.3
  • tqdm ==4.64.1
  • urllib3 ==1.26.13
  • vine ==5.0.0
  • wcwidth ==0.2.6
  • xmlschema ==2.2.3