annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.

https://github.com/natlibfi/annif

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Committers with academic emails
    1 of 19 committers (5.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.8%) to scientific vocabulary

Keywords

annif annotation-tool classification code4lib connexion flask-application glam machine-learning multilabel-classification python rest-api subject-indexing text-classification

Keywords from Contributors

conjugation transformation interactive lemmatization network-simulation tokenizer hacking embedded optim projection
Last synced: 6 months ago · JSON representation

Repository

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.

Basic Info
  • Host: GitHub
  • Owner: NatLibFi
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage: https://annif.org
  • Size: 9.41 MB
Statistics
  • Stars: 238
  • Watchers: 13
  • Forks: 43
  • Open Issues: 72
  • Releases: 0
Topics
annif annotation-tool classification code4lib connexion flask-application glam machine-learning multilabel-classification python rest-api subject-indexing text-classification
Created over 8 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation Security

README.md

DOI License Container image CI/CD codecov Scrutinizer Code Quality OpenSSF Scorecard CodeQL Quality Gate Status docs Code style: black Open in GitHub Codespaces

Annif is an automated subject indexing toolkit. It was originally created as a statistical automated indexing tool that used metadata from the Finna.fi discovery interface as a training corpus.

Annif provides CLI commands for administration, and a REST API and web UI for end-users.

Finto AI is a service based on Annif; see a 🤗 Hugging Face Hub collection of the models that Finto AI uses.

This repository contains a rewritten production version of Annif based on the prototype.

Basic install

Annif is developed and tested on Linux. If you want to run Annif on Windows or Mac OS, the recommended way is to use Docker (see below) or a Linux virtual machine.

You will need Python 3.10-3.13 to install Annif.

The recommended way is to install Annif from PyPI into a virtual environment.

python3 -m venv annif-venv
source annif-venv/bin/activate
pip install annif

Start up the application:

annif

See Getting Started for basic usage instructions and Optional features and dependencies for installation instructions for e.g. fastText and Omikuji backends and for Voikko and spaCy analyzers.

Shell compeletions

Annif supports tab-key completion in bash, zsh and fish shells for commands and options and project id, vocabulary id and path parameters. The completion functionality is not enabled after Annif installation; get instructions for how to enable it by running

annif completion --help

or see this wiki page.

Docker install

You can use Annif as a pre-built Docker container image from quay.io/natlibfi/annif repository. Please see the wiki documentation for details.

Demo install in Codespaces

Annif can be tried out in the GitHub Codespaces. Just open a page for configuring a new codespace via the badge below, start the codespace from the green "Create codespace" button, and a terminal session will start in your browser. The environment will have Annif installed and the contents of the Annif-tutorial repository available.

Open in GitHub Codespaces

Development install

A development version of Annif can be installed by cloning the GitHub repository. Poetry is used for managing dependencies and virtual environment for the development version; Poetry 2.0+ is required.

See CONTRIBUTING.md for information on unit tests, code style, development flow etc. details that are useful when participating in Annif development.

Installation and setup

Clone the repository.

Switch into the repository directory.

Install pipx and Poetry if you don't have them. First pipx:

python3 -m pip install --user pipx
python3 -m pipx ensurepath

Open a new shell, and then install Poetry:

pipx install poetry==2.*

Poetry can be installed also without pipx: check the Poetry documentation.

Create a virtual environment and install dependencies:

poetry install

By default development dependencies are included. Use option -E to install dependencies for selected optional features (-E "extra1 extra2" for multiple extras), or install all of them with --all-extras. By default the virtual environment directory is not under the project directory, but there is a setting for selecting this.

Enter the virtual environment:

eval $(poetry env activate)

Start up the application:

annif

Getting help

Many resources are available:

Publications / How to cite

See below for some articles about Annif in peer-reviewed Open Access journals. The software itself is also archived on Zenodo and has a citable DOI.

Citing the software itself

See "Cite this repository" in the details of the repository.

Annif articles

  • Suominen, O; Inkinen, J.; Lehtinen, M. 2025. Annif at the GermEval-2025 LLMs4Subjects Task: Traditional XMTC Augmented by Efficient LLMs, pre-print. https://arxiv.org/abs/2508.15877
    See BibTex @misc{suominen2025annifgermeval2025, title={https://arxiv.org/abs/2508.15877}, author={Osma Suominen and Juho Inkinen and Mona Lehtinen}, year={2025}, eprint={2508.15877}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.15877}, }
  • Suominen, O; Inkinen, J.; Lehtinen, M. 2025. Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pp. 2424–2431, Vienna, Austria. Association for Computational Linguistics. https://aclanthology.org/2025.semeval-1.315/ https://arxiv.org/abs/2504.19675
    See BibTex @misc{suominen2025annifsemeval2025task5, title={Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs}, title = "Annif at {S}em{E}val-2025 Task 5: Traditional {XMTC} augmented by {LLM}s", author = "Suominen, Osma and Inkinen, Juho and Lehtinen, Mona", editor = "Rosenthal, Sara and Ros{\'a}, Aiala and Ghosh, Debanjan and Zampieri, Marcos", booktitle = "Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)", month = jul, year = "2025", address = "Vienna, Austria", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.semeval-1.315/", pages = "2424--2431", ISBN = "979-8-89176-273-2", # ArXiv # year={2025}, # eprint={2504.19675}, # archivePrefix={arXiv}, # primaryClass={cs.CL}, # url={https://arxiv.org/abs/2504.19675}, }
  • Inkinen, J.; Lehtinen, M.; Suominen, O., 2025. Annif Users Survey: Understanding Usage and Challenges. URL: https://urn.fi/URN:ISBN:978-952-84-1301-1
    See BibTex @misc{inkinen2025, title={Annif Users Survey: Understanding Usage and Challenges}, author={Inkinen, Juho and Lehtinen, Mona and Suominen, Osma}, series={The National Library of Finland. Reports and Studies}, issn={2242–8119}, isbn={978-952-84-1301-1}, year={2025}, url={URN:ISBN:978-952-84-1301-1}, }
  • Golub, K.; Suominen, O.; Mohammed, A.; Aagaard, H.; Osterman, O., 2024. Automated Dewey Decimal Classification of Swedish library metadata using Annif software. Journal of Documentation, 80(5), pp. 1057-1079. URL: https://doi.org/10.1108/JD-01-2022-0026
    See BibTex @article{golub2024annif, title={Automated Dewey Decimal Classification of Swedish library metadata using Annif software}, author={Golub, Koraljka and Suominen, Osma and Mohammed, Ahmed Taiye and Aagaard, Harriet and Osterman, Olof}, journal={J. Doc.}, year={2024}, doi = {10.1108/JD-01-2022-0026}, url={https://www.emerald.com/insight/content/doi/10.1108/JD-01-2022-0026}, }
  • Suominen, O.; Inkinen, J.; Lehtinen, M., 2022. Annif and Finto AI: Developing and Implementing Automated Subject Indexing. JLIS.It, 13(1), pp. 265–282. URL: https://www.jlis.it/index.php/jlis/article/view/437
    See BibTex @article{suominen2022annif, title={Annif and Finto AI: Developing and Implementing Automated Subject Indexing}, author={Suominen, Osma and Inkinen, Juho and Lehtinen, Mona}, journal={JLIS.it}, volume={13}, number={1}, pages={265--282}, year={2022}, doi = {10.4403/jlis.it-12740}, url={https://www.jlis.it/index.php/jlis/article/view/437}, }
  • Suominen, O.; Koskenniemi, I, 2022. Annif Analyzer Shootout: Comparing text lemmatization methods for automated subject indexing. Code4Lib Journal, (54). URL: https://journal.code4lib.org/articles/16719
    See BibTex @article{suominen2022analyzer, title={Annif Analyzer Shootout: Comparing text lemmatization methods for automated subject indexing}, author={Suominen, Osma and Koskenniemi, Ilkka}, journal={Code4Lib J.}, number={54}, year={2022}, url={https://journal.code4lib.org/articles/16719}, }
  • Suominen, O., 2019. Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly, 29(1), pp.1–25. DOI: https://doi.org/10.18352/lq.10285
    See BibTex @article{suominen2019annif, title={Annif: DIY automated subject indexing using multiple algorithms}, author={Suominen, Osma}, journal={{LIBER} Quarterly}, volume={29}, number={1}, pages={1--25}, year={2019}, doi = {10.18352/lq.10285}, url = {https://doi.org/10.18352/lq.10285} }

License

The code in this repository is licensed under Apache License 2.0, except for the dependencies included under annif/static/css and annif/static/js, which have their own licenses; see the file headers for details.

Please note that the YAKE library is licensed under GPLv3, while Annif itself is licensed under the Apache License 2.0. It is commonly accepted that the GPLv3 and Apache 2.0 licenses are compatible at least in one direction (GPLv3 is more restrictive than the Apache License); obviously it also depends on the legal environment. The Annif developers make no legal claims - we simply provide the software and allow the user to install optional extensions if they consider it appropriate. Depending on legal interpretation, the terms of the GPL (for example the requirement to publish corresponding source code when publishing an executable application) may be considered to apply to the whole of Annif+extensions if you decide to install the optional YAKE dependency.

Owner

  • Name: National Library of Finland
  • Login: NatLibFi
  • Kind: organization
  • Email: kk-it-tukipalvelut@helsinki.fi
  • Location: Helsinki, Finland

Repositories of the National Library of Finland (Reporting security issues: https://www.helsinki.fi/en/it/information-security-contact-details)

GitHub Events

Total
  • Fork event: 2
  • Create event: 58
  • Release event: 4
  • Issues event: 28
  • Watch event: 36
  • Delete event: 46
  • Member event: 1
  • Issue comment event: 261
  • Push event: 174
  • Pull request review event: 41
  • Pull request review comment event: 24
  • Pull request event: 87
  • Gollum event: 25
Last Year
  • Fork event: 2
  • Create event: 58
  • Release event: 4
  • Issues event: 28
  • Watch event: 36
  • Delete event: 46
  • Member event: 1
  • Issue comment event: 261
  • Push event: 174
  • Pull request review event: 41
  • Pull request review comment event: 24
  • Pull request event: 87
  • Gollum event: 25

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 2,705
  • Total Committers: 19
  • Avg Commits per committer: 142.368
  • Development Distribution Score (DDS): 0.525
Past Year
  • Commits: 153
  • Committers: 5
  • Avg Commits per committer: 30.6
  • Development Distribution Score (DDS): 0.405
Top Committers
Name Email Commits
Juho Inkinen j****n@h****i 1,285
Osma Suominen o****n@h****i 1,221
Tuomo Virolainen t****n@h****i 63
Moritz Fuerneisen m****n@z****u 30
Sara Veldhoen s****n@k****l 18
dependabot[bot] 4****] 17
Unni Kohonen u****n@a****i 16
Bruno P. Kinoshita k****w 11
Inkinen Juho M M j****n@l****i 11
Bruno P. Kinoshita b****a@n****z 10
Mats Sjöberg m****g@h****i 7
monalehtinen 5****n 4
Christopher Bartz c****z@z****u 4
Sara Veldhoen S****0@w****l 3
Donny Winston d****n@a****u 1
LGTM Migrator l****r 1
Philipp Zumstein z****p@g****m 1
Robin Neatherway r****y@g****m 1
StepSecurity Bot b****t@s****o 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 89
  • Total pull requests: 272
  • Average time to close issues: 10 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 17
  • Total pull request authors: 12
  • Average comments per issue: 2.02
  • Average comments per pull request: 3.15
  • Merged pull requests: 201
  • Bot issues: 0
  • Bot pull requests: 72
Past Year
  • Issues: 23
  • Pull requests: 108
  • Average time to close issues: 17 days
  • Average time to close pull requests: 12 days
  • Issue authors: 5
  • Pull request authors: 5
  • Average comments per issue: 0.83
  • Average comments per pull request: 2.33
  • Merged pull requests: 71
  • Bot issues: 0
  • Bot pull requests: 31
Top Authors
Issue Authors
  • osma (37)
  • juhoinkinen (33)
  • mo-fu (5)
  • kinow (1)
  • CrazyCrud (1)
  • AhtiAhde (1)
  • lunactic (1)
  • annakasprzik (1)
  • s0rin (1)
  • kdw2060 (1)
  • hekl (1)
  • RietdorfC (1)
  • psmukhopadhyay (1)
  • nwagner84 (1)
  • mfakaehler (1)
Pull Request Authors
  • juhoinkinen (124)
  • dependabot[bot] (71)
  • osma (55)
  • UnniKohonen (9)
  • monalehtinen (4)
  • dwinston (2)
  • tavallaie (2)
  • lgtm-com[bot] (1)
  • cbartz (1)
  • Lakshmi-bashyam (1)
  • step-security-bot (1)
  • mo-fu (1)
Top Labels
Issue Labels
enhancement (56) maintenance (14) bug (8) help wanted (4) docker (3) question (2) documentation (2) dependencies (2) website (1) webUI (1)
Pull Request Labels
dependencies (91) enhancement (74) maintenance (68) bug (46) github_actions (20) docker (13) python (11) documentation (6)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 586 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 51
  • Total maintainers: 2
pypi.org: annif

Automated subject indexing and classification tool

  • Versions: 51
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 586 Last month
Rankings
Stargazers count: 5.3%
Forks count: 6.4%
Dependent packages count: 10.1%
Average: 11.5%
Downloads: 14.0%
Dependent repos count: 21.5%
Maintainers (2)
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • docutils <0.18
  • sphinx ==4.5.
  • sphinx-rtd-theme *
  • sphinxcontrib-apidoc ==0.3.0
.github/workflows/cicd.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-python 5ccb29d8773c3f3f653e1705f474dfaa8a06a912 composite
  • codecov/codecov-action 81cd2dc8148241f03f5839d295e000b8f761e378 composite
  • docker/build-push-action c56af957549030174b10d6867f20e78cfd7debc5 composite
  • docker/login-action f4ef78c080cd8ba55a85445d5b36e214a81df20a composite
  • docker/metadata-action 57396166ad8aefe6098280995947635806a0e6ea composite
.github/workflows/codeql.yml actions
  • actions/checkout v3 composite
  • github/codeql-action/analyze v2 composite
  • github/codeql-action/autobuild v2 composite
  • github/codeql-action/init v2 composite
Dockerfile docker
  • python 3.8-slim-bullseye build
docker-compose.yml docker
  • nginx latest
  • quay.io/natlibfi/annif latest
pyproject.toml pypi
  • black 23.* develop
  • bumpversion * develop
  • codecov * develop
  • flake8 * develop
  • isort * develop
  • py * develop
  • pytest * develop
  • pytest-cov * develop
  • pytest-flask * develop
  • pytest-watch * develop
  • requests * develop
  • click 8.1.*
  • click-log *
  • connexion 2.14.*
  • fasttext-wheel 0.9.2
  • flask >=1.0.4,<3
  • flask-cors *
  • gensim 4.3.*
  • gunicorn *
  • joblib 1.2.*
  • lmdb 1.4.0
  • nltk *
  • numpy 1.24.*
  • omikuji 0.5.*
  • optuna 2.10.*
  • python >=3.8,<3.11
  • python-dateutil *
  • rdflib >=4.2,<7.0
  • scikit-learn 1.2.0
  • scipy 1.10.*
  • simplemma 0.9.*
  • spacy 3.4.*
  • stwfsapy 0.3.*
  • swagger_ui_bundle *
  • tensorflow-cpu 2.11.*
  • tomli 2.0.*
  • voikko *
  • yake 0.4.5