conc

A Python library for efficient corpus analysis, enabling corpus linguistic analysis in Jupyter notebooks.

https://github.com/polsci/conc

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A Python library for efficient corpus analysis, enabling corpus linguistic analysis in Jupyter notebooks.

Basic Info
  • Host: GitHub
  • Owner: polsci
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage: https://geoffford.nz/conc/
  • Size: 2.71 MB
Statistics
  • Stars: 6
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 4
Created about 1 year ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License Citation

README.md

Conc

GitHub
Release DOI

Introduction to Conc

Conc is a Python library that brings tools for corpus linguistic analysis to Jupyter notebooks. Conc aims to allow researchers to analyse large corpora in efficient ways using standard hardware, with the ability to produce clear, publication-ready reports and extend analysis where required using standard Python libraries.

Example Concordance

A staple of data science, Jupyter notebooks allow researchers to present their analysis in an interactive form that combines code, reporting and discussion. They are an ideal format for collaborating with other researchers during research or to share analysis in a way others can reproduce and interact with.

Conc uses spaCy for tokenising texts. SpaCy functionality to annotate texts will be supported soon.

Conc uses well-supported Python libraries for processing data and prioritises fast code libraries and data structures. The library produces clear reports with important information to interpret result by default. Conc makes it easy to extend analysis using other libraries or software. Conc’s corpus format is well-documented and there are code examples to help you work with Conc results and data structures outside of Conc if you want to extend your analysis.

Conc’s documentation site has more information on Conc, why it was developed and the principles guiding Conc’s development.

Table of Contents

Direct links to Conc documentation

Acknowledgements

Conc is developed by Dr Geoff Ford.

Work to create this Python library has been made possible by funding/support from:

  • Mapping LAWS: Issue Mapping and Analyzing the Lethal Autonomous Weapons Debate” (Royal Society of New Zealand’s Marsden Fund Grant 19-UOC-068)
  • “Into the Deep: Analysing the Actors and Controversies Driving the Adoption of the World’s First Deep Sea Mining Governance” (Royal Society of New Zealand’s Marsden Fund Grant 22-UOC-059)
  • Sabbatical, University of Canterbury, Semester 1 2025.

Thanks to Jeremy Moses and Sian Troath from the Mapping LAWS project team for their support and feedback as first users of ConText (a web-based application built on an earlier version of Conc that I’ve also recently released).

Dr Ford is a researcher with Te Pokapū Aronui ā-Matihiko | UC Arts Digital Lab (ADL). Thanks to the ADL team and the ongoing support of the University of Canterbury’s Faculty of Arts who make work like this possible.

Thanks to Dr Chris Thomson and Karin Stahel for their feedback on early versions of Conc.

Above all, thanks to my family for their love, patience and kindness.

Development Status

Conc is in active development. It is currently released for beta testing. See the CHANGELOG for notes on releases and the Roadmap for planned updates.

Although this is a Beta release, I’m currently using Conc for research and postgraduate teaching. I’m keen to support new users. If you have any questions, encounter hurdles using Conc or have feature requests, create an issue.

Installation

Installing Conc is simple. Below is the essential information if you want to use Conc. The installation page has more information. You can also install the development version of Conc, which may include new functionality and bug fixes. If you want to download sample corpora you will need to install optional dependencies. If you have an older computer with a pre-2013 CPU, you will probably need to install a version of Polars compiled for older machines, see the install page for details.

1. Install via pip

Conc is tested with Python 3.10+. You can install Conc from pypi using this command:

sh pip install conc

Add the -U flag to upgrade if you are already running Conc.

2. Install a spaCy model for tokenization

Conc uses a SpaCy language model for tokenization. After installing Conc, install a model. If you are working with English-language texts, install SpaCy’s small English model (which is Conc’s default) like this:

sh python -m spacy download en_core_web_sm

If you are working with a different language or want to use a different ‘en’ model, check the SpaCy models documentation for the relevant model name.

Using Conc

Getting started

A good place to start is the Get started with Conc tutorial, which demonstrates how to build a corpus and output Conc reports. There are also simple code recipes for common Conc tasks.

Conc Documentation

There is a dedicated Conc documentation site. This includes tutorials, examples demonstrating how to create reports for analysis, explanation of Conc functionality and its Corpus format, and a reference to Conc’s classes and methods. Here are links to the documentation site sections:

  • Tutorials to get you started with Conc
  • The Explanations section includes information on how Conc works, how to work with the Conc corpus format and Conc results with other Python libraries
  • The Conc API Reference provides detailed documentation of Conc classes and functions
  • The Development section gives information on Conc development, including a Roadmap and Developer’s Guide

Owner

  • Name: Geoff Ford
  • Login: polsci
  • Kind: user
  • Location: Ōtautahi, NZ
  • Company: University of Canterbury Arts Digital Lab

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software in your research, please cite it as below."
authors:
- family-names: "Ford"
  given-names: "Geoffrey"
  orcid: "https://orcid.org/0000-0001-7088-4073"
title: "Conc: a Python library for efficient corpus analysis"
version: 0.1.13
doi: 10.5281/zenodo.16358752
date-released: 2025-07-23
url: "https://github.com/polsci/conc"

GitHub Events

Total
  • Create event: 5
  • Issues event: 1
  • Release event: 1
  • Watch event: 6
  • Issue comment event: 1
  • Push event: 102
  • Public event: 1
Last Year
  • Create event: 5
  • Issues event: 1
  • Release event: 1
  • Watch event: 6
  • Issue comment event: 1
  • Push event: 102
  • Public event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 370
  • Total Committers: 1
  • Avg Commits per committer: 370.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 370
  • Committers: 1
  • Avg Commits per committer: 370.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
polsci p****i 370

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 869 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 14
  • Total maintainers: 1
pypi.org: conc

A Python library for efficient corpus analysis, enabling corpus linguistic analysis in Jupyter notebooks.

  • Versions: 14
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 869 Last month
Rankings
Dependent packages count: 9.0%
Forks count: 31.3%
Average: 33.1%
Stargazers count: 41.3%
Dependent repos count: 50.8%
Maintainers (1)
Last synced: 7 months ago

Dependencies

.github/workflows/deploy.yaml actions
  • fastai/workflows/quarto-ghp master composite
.github/workflows/test.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • fastai/workflows/nbdev-ci master composite
pyproject.toml pypi
setup.py pypi