sasta

Annotates speech transcripts and scores them using diagnostic metrics

https://github.com/centrefordigitalhumanities/sasta

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary

Keywords from Contributors

interactive mesh interpretability sequences generic projection optim hacking network-simulation
Last synced: 6 months ago · JSON representation ·

Repository

Annotates speech transcripts and scores them using diagnostic metrics

Basic Info
  • Host: GitHub
  • Owner: CentreForDigitalHumanities
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: develop
  • Size: 12.4 MB
Statistics
  • Stars: 1
  • Watchers: 5
  • Forks: 0
  • Open Issues: 39
  • Releases: 11
Created over 6 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog License Citation

README.md

SASTA: Semi-Automatische Spontane Taal Analyse

DOI

SASTA is a tool for the analysis of spontaneous language transcripts, to aid clinical linguists and research into language development and language disorders. SASTA analyzes a transcript grammatically using Alpino, an automatic utterance parser for Dutch, and can recognize a significant number of forms of deviant language use and analyze them correctly, following multiple assessment methods available for Dutch (TARSP, STAP and ASTA).

Overview

  • SASTA can analyze transcripts following multiple assessment methods available for Dutch:
    • TARSP (Schlichting 2005, 2017) for young children (1–4 years), inspired by LARSP for English (Crystal et al. 1989);
    • STAP (Verbeek et al. 2007, van Ierland et al. 2008) for older children (4–8 years);
    • ASTA (Boxum et al. 2013) for adults suffering from aphasia.
  • SASTA generates as output a method-specific form and an annotated transcript. The generated transcript can be corrected by a linguist, if needed, and re-uploaded into SASTA, after which SASTA generates an adapted method-specific form. Overall, SASTA achieves an accuracy between 88 and 95% on training data for TARSP and STAP.
  • SASTA accepts as input transcripts in MS Word or plain text (given some SASTA-specific requirements), as well as CHAT (MacWhinney 2000), and uses AuCHAnn to generate valid CHAT files for transcripts accompanied by an interpretation, which significantly improves results.
  • SASTA analyzes a transcript grammatically using Alpino. It then uses specially constructed (XPath) queries for all measures defined within the assessment method to count the frequencies of linguistic phenomena in the spontaneous language sample. As such, SASTA may be considered a spin-off of GrETEL, that can be used to investigate syntactic phenomena using query-by-example.
  • Further development of SASTA is ongoing, in close collaboration with researchers in language development and with linguists in clinics.

Contents

This repository contains the source code for the SASTA web application, which consists of a Django backend and Angular frontend.

This repository does not include input data, as these can be privacy sensistive. Refer to the documentation for instructions on constructing your own input data.

Sastadev

SASTA relies on a Python package called sastadev in the backend. This package is freely available on Github, with documentation available on Read the Docs.

Usage

If you are interested in using SASTA, the most straightforward way to get started is to make an account at sasta.hum.uu.nl. This server is maintained by the Research Software Lab and runs the most current release.

Consult the user documentation for all information on using the application, input formats, and output formats.

Self-hosting is an option, though support by the Research Software Lab is not provided.

Development

The documentation directory contains documentation for developers. This includes running the application through Docker.

License

SASTA is shared under a BSD-3 Clause licence See LICENSE for more information.

Citation

If you wish to cite this repository, please use the metadata provided in our CITATION.cff file.

Contact

For questions, small feature suggestions, and bug reports, feel free to create an issue. You can also contact the Centre for Digital Humanities.

Publications on SASTA

  • Odijk, J. (2021). Towards Semi-Automatic Analysis of Spontaneous Language for Dutch. In Selected papers from the CLARIN Annual Conference 2020 (Vol. 180, pp. 165-175). (Linköping Electronic Conference Proceedings). Linköping University Press. https://doi.org/10.3384/ecp18018
  • Renckens, E., & Odijk, J. (2021). Online tool SASTA analyseert taal. eData & Research, 15(2), 7-7. https://edata.nl/2021/02/10/online-tool-sasta-analyseert-taal/

Other relevant publications

  • Boxum, E., van der Scheer, F. and Zwaga, M. (2013). ASTA: Analyse voor Spontane Taal bij Afasie (4th ed.). Vereniging voor Klinische Linguïstiek.
  • Crystal, D., Fletcher, P. and Garman, M. (1989). Grammatical Analysis of Language Disability (2nd ed.). London: Cole and Whurr. https://hdl.handle.net/10092/17651
  • van Ierland, M., Verbeek, J. and van den Dungen, L. (2008). Spontane Taal Analyse Procedure: Handleiding van het STAP-instrument. Universiteit van Amsterdam.
  • MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk: Transcription format and programs (3rd ed.). Lawrence Erlbaum Associates Publishers.
  • Odijk, J. (2023, 30 Jan.). Taaltechnologie voor taalkundig onderzoek. Valedictory speech, Utrecht University. https://surfdrive.surf.nl/files/index.php/s/pzNHSgd6t8L0Wnk
  • Schlichting, L. (2005). TARSP: Taal Analyse Remediëring en Screening Procedure: Taalontwikkelingsschaal van Nederlandse kinderen van 1–4 jaar (7th ed.). Amsterdam: Pearson. ISBN 978 90 265 1355 8.
  • Schlichting, L. (2017). TARSP: Taal analyse remediëring en screening procedure: Taalontwikkelingsschaal Van Nederlandse Kinderen van 1–4 Jaar met Aanvullende Structuren tot 6 jaar (8th ed.). Amsterdam: Pearson. ISBN 978 90 430 3561 3.
  • Verbeek, J., van Ierland, M. and van den Dungen, L. (2007). Spontane Taal Analyse Procedure: Verantwoording van het STAP-instrument. Universiteit van Amsterdam.

Owner

  • Name: Centre for Digital Humanities
  • Login: CentreForDigitalHumanities
  • Kind: organization
  • Email: cdh@uu.nl
  • Location: Netherlands

Interdisciplinary centre for research and education in computational and data-driven methods in the humanities.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: SASTA
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Jelte
    family-names: van Boheemen
    email: j.vanboheemen@uu.nl
    affiliation: 'Centre for Digital Humanities, Utrecht University'
    orcid: 'https://orcid.org/0000-0002-2802-3242'
  - name: >-
      Research Software Lab, Centre for Digital Humanities,
      Utrecht University
    city: Utrecht
    country: NL
    email: cdh@uu.nl
    website: >-
      https://cdh.uu.nl/centre-for-digital-humanities/research-software-lab/
  - given-names: Odijk
    family-names: Jan
    email: j.odijk@uu.nl
    affiliation: Utrecht University
    orcid: 'https://orcid.org/0000-0003-3331-1182'
  - given-names: Kroon
    family-names: Martin S.
    email: m.s.kroon@uu.nl
    affiliation: Utrecht University
    orcid: 'https://orcid.org/0000-0003-3059-6872'
identifiers:
  - type: doi
    value: 10.5281/zenodo.10600256
repository-code: 'https://github.com/CentreForDigitalHumanities/sasta'
url: 'https://sasta.hum.uu.nl'
abstract: >-
  SASTA is a tool for the analysis of spontaneous language
  transcripts, to aid clinical linguists and research into
  language development and language disorders.
license: BSD-3-Clause
version: 0.9.8
date-released: '2025-02-19'

GitHub Events

Total
  • Issues event: 4
  • Member event: 1
  • Issue comment event: 2
  • Push event: 9
  • Pull request event: 2
  • Create event: 1
Last Year
  • Issues event: 4
  • Member event: 1
  • Issue comment event: 2
  • Push event: 9
  • Pull request event: 2
  • Create event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 729
  • Total Committers: 6
  • Avg Commits per committer: 121.5
  • Development Distribution Score (DDS): 0.173
Past Year
  • Commits: 7
  • Committers: 1
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Jelte van Boheemen j****n@u****l 603
Jelte van Boheemen j****n@g****m 50
dependabot[bot] 4****] 46
Luka van der Plas 4****s 17
Sheean Spoel s****l@u****l 12
Mees m****p@g****m 1
Committer Domains (Top 20 + Academic)
uu.nl: 2

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 36
  • Total pull requests: 64
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 2
  • Total pull request authors: 3
  • Average comments per issue: 0.47
  • Average comments per pull request: 0.39
  • Merged pull requests: 33
  • Bot issues: 0
  • Bot pull requests: 58
Past Year
  • Issues: 11
  • Pull requests: 1
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 1 hour
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 0.09
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • JeltevanBoheemen (4)
Pull Request Authors
  • JeltevanBoheemen (1)
Top Labels
Issue Labels
bug (1) good first issue (1)
Pull Request Labels

Dependencies

backend/Dockerfile docker
  • python 3.7 build
docker-compose.yml docker
frontend/Dockerfile docker
  • node 14 build
frontend/package.json npm
  • @angular-devkit/build-angular ~12.0.5 development
  • @angular-eslint/builder 12.7.0 development
  • @angular-eslint/eslint-plugin 12.7.0 development
  • @angular-eslint/eslint-plugin-template 12.7.0 development
  • @angular-eslint/schematics 12.7.0 development
  • @angular-eslint/template-parser 12.7.0 development
  • @angular/cli ~12.0.5 development
  • @angular/compiler-cli ~12.0.5 development
  • @angular/language-service ~12.0.5 development
  • @types/jasmine ^3.7.1 development
  • @types/jasminewd2 ~2.0.3 development
  • @types/node ^12.11.1 development
  • @typescript-eslint/eslint-plugin 4.28.2 development
  • @typescript-eslint/parser 4.28.2 development
  • eslint ^7.26.0 development
  • eslint-config-prettier ^8.5.0 development
  • eslint-plugin-import ^2.26.0 development
  • eslint-plugin-jsdoc ^39.6.4 development
  • eslint-plugin-prefer-arrow ^1.2.3 development
  • jasmine-core ^3.7.1 development
  • jasmine-spec-reporter ~5.0.0 development
  • karma ~6.3.16 development
  • karma-chrome-launcher ~3.1.0 development
  • karma-coverage-istanbul-reporter ~3.0.2 development
  • karma-jasmine ~4.0.0 development
  • karma-jasmine-html-reporter ^1.5.0 development
  • protractor ~7.0.0 development
  • ts-node ~7.0.0 development
  • typescript ~4.2.4 development
  • @angular/animations ~12.0.5
  • @angular/cdk ~12.0.5
  • @angular/common ~12.0.5
  • @angular/compiler ~12.0.5
  • @angular/core ~12.0.5
  • @angular/forms ~12.0.5
  • @angular/localize ~12.0.5
  • @angular/platform-browser ~12.0.5
  • @angular/platform-browser-dynamic ~12.0.5
  • @angular/router ~12.0.5
  • @creativebulma/bulma-divider ^1.1.0
  • @fortawesome/angular-fontawesome ~0.9.0
  • @fortawesome/fontawesome-svg-core ^1.2.24
  • @fortawesome/free-solid-svg-icons ^5.11.1
  • @types/file-saver ^2.0.1
  • brace ^0.11.1
  • bulma >0.9.0
  • fast-xml-parser ^3.15.1
  • file-saver ^2.0.2
  • jquery ^3.6.0
  • lassy-xpath ^0.12.0
  • ngx-json-viewer ^2.4.0
  • primeicons ^2.0.0
  • primeng ~12.0.0
  • rxjs ~6.6.7
  • ts-xpath ^0.3.1
  • tslib ^2.0.0
  • zone.js ~0.11.4
frontend/yarn.lock npm
  • 1321 dependencies
package.json npm
yarn.lock npm
backend/requirements.in pypi
  • celery *
  • chamd >=0.5.4
  • corpus2alpino *
  • django >=3.1.12,<3.2
  • django-livereload-server *
  • django-rest-auth *
  • django-revproxy >=0.9.16
  • django_celery_results *
  • djangorestframework *
  • lxml ==4.9.1
  • numpy <1.22
  • openpyxl *
  • pandas ==1.3.
  • psycopg2 *
  • pytest *
  • pytest-django *
  • pytest-lazy-fixture *
  • pytest-xdist *
  • python-docx *
  • sastadev *
  • xlrd >=1.2,<2
  • xlsxwriter *
backend/requirements.txt pypi
  • amqp ==5.1.1
  • argparse ==1.4.0
  • asgiref ==3.5.2
  • attrs ==21.4.0
  • auchann ==0.1.1
  • beautifulsoup4 ==4.11.1
  • billiard ==3.6.4.0
  • blis ==0.7.8
  • catalogue ==2.0.7
  • celery ==5.2.7
  • certifi ==2022.12.7
  • cffi ==1.15.1
  • chamd ==0.5.8
  • charset-normalizer ==2.1.0
  • click ==8.1.3
  • click-didyoumean ==0.3.0
  • click-plugins ==1.1.1
  • click-repl ==0.2.0
  • corpus2alpino ==0.3.10
  • cryptography ==39.0.1
  • cymem ==2.0.6
  • defusedxml ==0.7.1
  • django ==3.1.14
  • django-allauth ==0.51.0
  • django-celery-results ==2.4.0
  • django-livereload-server ==0.4
  • django-rest-auth ==0.9.5
  • django-revproxy ==0.10.0
  • djangorestframework ==3.13.1
  • editdistance ==0.6.2
  • et-xmlfile ==1.1.0
  • execnet ==1.9.0
  • folia ==2.5.8
  • idna ==3.3
  • iniconfig ==1.1.1
  • isodate ==0.6.1
  • jinja2 ==3.1.2
  • kombu ==5.2.4
  • langcodes ==3.3.0
  • lxml ==4.9.1
  • markupsafe ==2.1.1
  • murmurhash ==1.0.7
  • numpy ==1.21.6
  • oauthlib ==3.2.2
  • openpyxl ==3.0.10
  • packaging ==21.3
  • pandas ==1.3.5
  • pathy ==0.6.2
  • pluggy ==1.0.0
  • preshed ==3.0.6
  • prompt-toolkit ==3.0.30
  • psycopg2 ==2.9.3
  • py ==1.11.0
  • pycparser ==2.21
  • pydantic ==1.9.1
  • pyjwt ==2.4.0
  • pyparsing ==3.0.9
  • pytest ==7.1.2
  • pytest-django ==4.5.2
  • pytest-forked ==1.4.0
  • pytest-lazy-fixture ==0.6.3
  • pytest-xdist ==2.5.0
  • python-dateutil ==2.8.2
  • python-docx ==0.8.11
  • python3-openid ==3.2.0
  • pytz ==2022.1
  • pyyaml ==6.0.1
  • pyyaml-include ==1.3.1
  • rdflib ==6.1.1
  • requests ==2.28.1
  • requests-oauthlib ==1.3.1
  • sastadev ==0.1.1
  • six ==1.16.0
  • smart-open ==5.2.1
  • soupsieve ==2.3.2.post1
  • spacy ==3.4.0
  • spacy-legacy ==3.0.9
  • spacy-loggers ==1.0.3
  • sqlparse ==0.4.2
  • srsly ==2.4.3
  • tei-reader ==0.0.17
  • thinc ==8.1.0
  • tomli ==2.0.1
  • tornado ==6.2
  • tqdm ==4.64.0
  • typer ==0.4.2
  • typing-extensions ==4.1.1
  • urllib3 ==1.26.10
  • vine ==5.0.0
  • wasabi ==0.9.1
  • wcwidth ==0.2.5
  • xlrd ==1.2.0
  • xlsxwriter ==3.0.3
functional-tests/requirements.in pypi
  • pytest * test
  • selenium * test
functional-tests/requirements.txt pypi
  • atomicwrites ==1.3.0 test
  • attrs ==19.1.0 test
  • importlib-metadata ==0.23 test
  • more-itertools ==5.0.0 test
  • packaging ==19.2 test
  • pluggy ==0.13.0 test
  • py ==1.10.0 test
  • pyparsing ==2.4.2 test
  • pytest ==4.6.5 test
  • selenium ==3.141.0 test
  • six ==1.12.0 test
  • urllib3 ==1.26.5 test
  • wcwidth ==0.1.7 test
  • zipp ==0.6.0 test
.github/workflows/test-backend.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • postgres * docker
.github/workflows/test-frontend.yml actions
  • actions/checkout v4 composite
  • actions/setup-node v4 composite