https://github.com/ausgerechnet/cwb-cads

CWB-based API for Corpus-Assisted Discourse Studies

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

CWB-based API for Corpus-Assisted Discourse Studies

Basic Info

Host: GitHub
Owner: ausgerechnet
License: gpl-3.0
Language: TypeScript
Default Branch: main
Size: 11.4 MB

Statistics

Stars: 3
Watchers: 2
Forks: 0
Open Issues: 4
Releases: 0

Created almost 3 years ago · Last pushed 10 months ago

Metadata Files

Readme License Roadmap

cwb-cads: CWB-based API for Corpus-Assisted Discourse Studies

License

implemented in Python/APIFlask
- JWT authorisation
- interactive OpenAPI documentation at cwb-cads/docs
uses cwb-ccc for connecting to CWB
- CWB must be installed and corpora must be encoded via cwb-encode
- meta data can be stored separately or be parsed from s-attributes

Reference

The functionality is explained in detail in Heinrich & Evert (2024).

Abstract: We propose a framework for quantitative-qualitative research in corpus-assisted discourse studies (CADS), which operationalises the central process of manually forming groups of related words and phrases in terms of “discoursemes” and their constellations. We introduce an open-source implementation of this framework in the form of a REST API based on Corpus Workbench. Going through the workflow of a collocation analysis for fleeing and related terms in the German Federal Parliament, the paper gives details about the underlying algorithms, with available parameters and further possible choices. We also address multi-word units (which are often disregarded by CADS tools), a semantic map visualisation of collocations, and how to compute assocations between discoursemes.

bibtex @InProceedings{HeinrichEvert2024, author = {Heinrich, Philipp and Evert, Stephanie}, title = {Operationalising the Hermeneutic Grouping Process in Corpus-assisted Discourse Studies}, booktitle = {Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers}, year = {2024}, editor = {Klamm, Christopher and Lapesa, Gabriella and Ponzetto, Simone Paolo and Rehbein, Ines and Sen, Indira}, pages = {33--44}, address = {Vienna, Austria}, month = sep, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.cpss-1.3} }

Manual

We provide detailed information regarding general CADS functionality in the manual and details on MMDA functionality in the MMDA manual.

Installation and Configuration

We recommend installing all dependencies of the API in a virtual environment: python3 -m venv venv . venv/bin/activate pip3 install -r requirements.txt
The API is configured using cfg.py in the top-level directory. Use the example config as a starting point. It uses staging specific configs that can be activated using the CWB_CADS_CONFIG environment variable, e.g. export CWB_CADS_CONFIG=cfg.DevConfig
Initialise the database: flask --app cads database init
Import corpus settings from JSON file.: flask --app cads corpus import ${corpora.json}
Meta data can be imported from separate files or from within the XML data stored in structural attributes of indexed corpora: flask --app cads corpus read-meta ${cwb_id} --level "text"
You can also import pre-defined subcorpora using a TSV file: flask --app cads corpus subcorpora ${cwb_id} ${subcorpora.tsv}
Discoursemes can be imported using a TSV file flask --app cads discourseme import --path_in ${discoursemes.tsv} and can similarly be exported: flask --app cads discourseme export --path_out ${discoursemes.tsv}
Start the development server flask --app cads --debug run

Frontend

The repository contains a beta version of a frontend supporting MMDA functionality.

Requirements: - node.js - nvm (node version manager) is recommended

Setup: - Navigate to frontend/ - Install the correct node version. If you have nvm installed, you can just run: sh nvm install And to use it: sh nvm use Otherwise, install the correct node version manually as specified in .nvmrc - Install node dependencies: sh npm install - Specify API in vite.config.ts. This uses our development server by default. - Run development build of frontend: sh npm run dev

Production

set target in frontend/mmda/vite.config.ts
set frontend URL VITE_ROUTER_BASEPATH in frontend/mmda/.env.production
set backend URL VITE_API_URL in frontend/mmda/.env.production
run npm run build and deploy mmda/dist/

Owner

Name: Philipp Heinrich
Login: ausgerechnet
Kind: user
Location: Erlangen
Company: @fau-klue

Website: https://philipp-heinrich.eu
Repositories: 2
Profile: https://github.com/ausgerechnet

GitHub Events

Total

Issues event: 1
Watch event: 1
Delete event: 2
Issue comment event: 1
Push event: 148
Pull request event: 1
Create event: 1

Last Year

Issues event: 1
Watch event: 1
Delete event: 2
Issue comment event: 1
Push event: 148
Pull request event: 1
Create event: 1

Dependencies

requirements.txt pypi

APIFlask ==1.3.1
Flask ==2.3.2
Flask-CORS ==4.0.0
Flask-HTTPAuth ==4.8.0
Flask-JWT-Extended ==4.5.2
Flask-Login ==0.6.2
Flask-SQLAlchemy ==3.0.5
SQLAlchemy ==1.4.49
pytest ==7.4.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/ausgerechnet/cwb-cads

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

cwb-cads: CWB-based API for Corpus-Assisted Discourse Studies

Reference

Manual

Installation and Configuration

Frontend

Production

Owner

GitHub Events

Total

Last Year

Dependencies