sasta
Annotates speech transcripts and scores them using diagnostic metrics
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary
Keywords from Contributors
Repository
Annotates speech transcripts and scores them using diagnostic metrics
Basic Info
- Host: GitHub
- Owner: CentreForDigitalHumanities
- License: bsd-3-clause
- Language: Python
- Default Branch: develop
- Size: 12.4 MB
Statistics
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 39
- Releases: 11
Metadata Files
README.md
SASTA: Semi-Automatische Spontane Taal Analyse
SASTA is a tool for the analysis of spontaneous language transcripts, to aid clinical linguists and research into language development and language disorders. SASTA analyzes a transcript grammatically using Alpino, an automatic utterance parser for Dutch, and can recognize a significant number of forms of deviant language use and analyze them correctly, following multiple assessment methods available for Dutch (TARSP, STAP and ASTA).
Overview
- SASTA can analyze transcripts following multiple assessment methods available for Dutch:
- SASTA generates as output a method-specific form and an annotated transcript. The generated transcript can be corrected by a linguist, if needed, and re-uploaded into SASTA, after which SASTA generates an adapted method-specific form. Overall, SASTA achieves an accuracy between 88 and 95% on training data for TARSP and STAP.
- SASTA accepts as input transcripts in MS Word or plain text (given some SASTA-specific requirements), as well as CHAT (MacWhinney 2000), and uses AuCHAnn to generate valid CHAT files for transcripts accompanied by an interpretation, which significantly improves results.
- SASTA analyzes a transcript grammatically using Alpino. It then uses specially constructed (XPath) queries for all measures defined within the assessment method to count the frequencies of linguistic phenomena in the spontaneous language sample. As such, SASTA may be considered a spin-off of GrETEL, that can be used to investigate syntactic phenomena using query-by-example.
- Further development of SASTA is ongoing, in close collaboration with researchers in language development and with linguists in clinics.
Contents
This repository contains the source code for the SASTA web application, which consists of a Django backend and Angular frontend.
This repository does not include input data, as these can be privacy sensistive. Refer to the documentation for instructions on constructing your own input data.
Sastadev
SASTA relies on a Python package called sastadev in the backend. This package is freely available on Github, with documentation available on Read the Docs.
Usage
If you are interested in using SASTA, the most straightforward way to get started is to make an account at sasta.hum.uu.nl. This server is maintained by the Research Software Lab and runs the most current release.
Consult the user documentation for all information on using the application, input formats, and output formats.
Self-hosting is an option, though support by the Research Software Lab is not provided.
Development
The documentation directory contains documentation for developers. This includes running the application through Docker.
License
SASTA is shared under a BSD-3 Clause licence See LICENSE for more information.
Citation
If you wish to cite this repository, please use the metadata provided in our CITATION.cff file.
Contact
For questions, small feature suggestions, and bug reports, feel free to create an issue. You can also contact the Centre for Digital Humanities.
Publications on SASTA
- Odijk, J. (2021). Towards Semi-Automatic Analysis of Spontaneous Language for Dutch. In Selected papers from the CLARIN Annual Conference 2020 (Vol. 180, pp. 165-175). (Linköping Electronic Conference Proceedings). Linköping University Press. https://doi.org/10.3384/ecp18018
- Renckens, E., & Odijk, J. (2021). Online tool SASTA analyseert taal. eData & Research, 15(2), 7-7. https://edata.nl/2021/02/10/online-tool-sasta-analyseert-taal/
Other relevant publications
- Boxum, E., van der Scheer, F. and Zwaga, M. (2013). ASTA: Analyse voor Spontane Taal bij Afasie (4th ed.). Vereniging voor Klinische Linguïstiek.
- Crystal, D., Fletcher, P. and Garman, M. (1989). Grammatical Analysis of Language Disability (2nd ed.). London: Cole and Whurr. https://hdl.handle.net/10092/17651
- van Ierland, M., Verbeek, J. and van den Dungen, L. (2008). Spontane Taal Analyse Procedure: Handleiding van het STAP-instrument. Universiteit van Amsterdam.
- MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk: Transcription format and programs (3rd ed.). Lawrence Erlbaum Associates Publishers.
- Odijk, J. (2023, 30 Jan.). Taaltechnologie voor taalkundig onderzoek. Valedictory speech, Utrecht University. https://surfdrive.surf.nl/files/index.php/s/pzNHSgd6t8L0Wnk
- Schlichting, L. (2005). TARSP: Taal Analyse Remediëring en Screening Procedure: Taalontwikkelingsschaal van Nederlandse kinderen van 1–4 jaar (7th ed.). Amsterdam: Pearson. ISBN 978 90 265 1355 8.
- Schlichting, L. (2017). TARSP: Taal analyse remediëring en screening procedure: Taalontwikkelingsschaal Van Nederlandse Kinderen van 1–4 Jaar met Aanvullende Structuren tot 6 jaar (8th ed.). Amsterdam: Pearson. ISBN 978 90 430 3561 3.
- Verbeek, J., van Ierland, M. and van den Dungen, L. (2007). Spontane Taal Analyse Procedure: Verantwoording van het STAP-instrument. Universiteit van Amsterdam.
Owner
- Name: Centre for Digital Humanities
- Login: CentreForDigitalHumanities
- Kind: organization
- Email: cdh@uu.nl
- Location: Netherlands
- Website: https://cdh.uu.nl/
- Repositories: 39
- Profile: https://github.com/CentreForDigitalHumanities
Interdisciplinary centre for research and education in computational and data-driven methods in the humanities.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: SASTA
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Jelte
family-names: van Boheemen
email: j.vanboheemen@uu.nl
affiliation: 'Centre for Digital Humanities, Utrecht University'
orcid: 'https://orcid.org/0000-0002-2802-3242'
- name: >-
Research Software Lab, Centre for Digital Humanities,
Utrecht University
city: Utrecht
country: NL
email: cdh@uu.nl
website: >-
https://cdh.uu.nl/centre-for-digital-humanities/research-software-lab/
- given-names: Odijk
family-names: Jan
email: j.odijk@uu.nl
affiliation: Utrecht University
orcid: 'https://orcid.org/0000-0003-3331-1182'
- given-names: Kroon
family-names: Martin S.
email: m.s.kroon@uu.nl
affiliation: Utrecht University
orcid: 'https://orcid.org/0000-0003-3059-6872'
identifiers:
- type: doi
value: 10.5281/zenodo.10600256
repository-code: 'https://github.com/CentreForDigitalHumanities/sasta'
url: 'https://sasta.hum.uu.nl'
abstract: >-
SASTA is a tool for the analysis of spontaneous language
transcripts, to aid clinical linguists and research into
language development and language disorders.
license: BSD-3-Clause
version: 0.9.8
date-released: '2025-02-19'
GitHub Events
Total
- Issues event: 4
- Member event: 1
- Issue comment event: 2
- Push event: 9
- Pull request event: 2
- Create event: 1
Last Year
- Issues event: 4
- Member event: 1
- Issue comment event: 2
- Push event: 9
- Pull request event: 2
- Create event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Jelte van Boheemen | j****n@u****l | 603 |
| Jelte van Boheemen | j****n@g****m | 50 |
| dependabot[bot] | 4****] | 46 |
| Luka van der Plas | 4****s | 17 |
| Sheean Spoel | s****l@u****l | 12 |
| Mees | m****p@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 36
- Total pull requests: 64
- Average time to close issues: about 2 months
- Average time to close pull requests: about 2 months
- Total issue authors: 2
- Total pull request authors: 3
- Average comments per issue: 0.47
- Average comments per pull request: 0.39
- Merged pull requests: 33
- Bot issues: 0
- Bot pull requests: 58
Past Year
- Issues: 11
- Pull requests: 1
- Average time to close issues: 3 months
- Average time to close pull requests: about 1 hour
- Issue authors: 2
- Pull request authors: 1
- Average comments per issue: 0.09
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- JeltevanBoheemen (4)
Pull Request Authors
- JeltevanBoheemen (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- python 3.7 build
- node 14 build
- @angular-devkit/build-angular ~12.0.5 development
- @angular-eslint/builder 12.7.0 development
- @angular-eslint/eslint-plugin 12.7.0 development
- @angular-eslint/eslint-plugin-template 12.7.0 development
- @angular-eslint/schematics 12.7.0 development
- @angular-eslint/template-parser 12.7.0 development
- @angular/cli ~12.0.5 development
- @angular/compiler-cli ~12.0.5 development
- @angular/language-service ~12.0.5 development
- @types/jasmine ^3.7.1 development
- @types/jasminewd2 ~2.0.3 development
- @types/node ^12.11.1 development
- @typescript-eslint/eslint-plugin 4.28.2 development
- @typescript-eslint/parser 4.28.2 development
- eslint ^7.26.0 development
- eslint-config-prettier ^8.5.0 development
- eslint-plugin-import ^2.26.0 development
- eslint-plugin-jsdoc ^39.6.4 development
- eslint-plugin-prefer-arrow ^1.2.3 development
- jasmine-core ^3.7.1 development
- jasmine-spec-reporter ~5.0.0 development
- karma ~6.3.16 development
- karma-chrome-launcher ~3.1.0 development
- karma-coverage-istanbul-reporter ~3.0.2 development
- karma-jasmine ~4.0.0 development
- karma-jasmine-html-reporter ^1.5.0 development
- protractor ~7.0.0 development
- ts-node ~7.0.0 development
- typescript ~4.2.4 development
- @angular/animations ~12.0.5
- @angular/cdk ~12.0.5
- @angular/common ~12.0.5
- @angular/compiler ~12.0.5
- @angular/core ~12.0.5
- @angular/forms ~12.0.5
- @angular/localize ~12.0.5
- @angular/platform-browser ~12.0.5
- @angular/platform-browser-dynamic ~12.0.5
- @angular/router ~12.0.5
- @creativebulma/bulma-divider ^1.1.0
- @fortawesome/angular-fontawesome ~0.9.0
- @fortawesome/fontawesome-svg-core ^1.2.24
- @fortawesome/free-solid-svg-icons ^5.11.1
- @types/file-saver ^2.0.1
- brace ^0.11.1
- bulma >0.9.0
- fast-xml-parser ^3.15.1
- file-saver ^2.0.2
- jquery ^3.6.0
- lassy-xpath ^0.12.0
- ngx-json-viewer ^2.4.0
- primeicons ^2.0.0
- primeng ~12.0.0
- rxjs ~6.6.7
- ts-xpath ^0.3.1
- tslib ^2.0.0
- zone.js ~0.11.4
- 1321 dependencies
- celery *
- chamd >=0.5.4
- corpus2alpino *
- django >=3.1.12,<3.2
- django-livereload-server *
- django-rest-auth *
- django-revproxy >=0.9.16
- django_celery_results *
- djangorestframework *
- lxml ==4.9.1
- numpy <1.22
- openpyxl *
- pandas ==1.3.
- psycopg2 *
- pytest *
- pytest-django *
- pytest-lazy-fixture *
- pytest-xdist *
- python-docx *
- sastadev *
- xlrd >=1.2,<2
- xlsxwriter *
- amqp ==5.1.1
- argparse ==1.4.0
- asgiref ==3.5.2
- attrs ==21.4.0
- auchann ==0.1.1
- beautifulsoup4 ==4.11.1
- billiard ==3.6.4.0
- blis ==0.7.8
- catalogue ==2.0.7
- celery ==5.2.7
- certifi ==2022.12.7
- cffi ==1.15.1
- chamd ==0.5.8
- charset-normalizer ==2.1.0
- click ==8.1.3
- click-didyoumean ==0.3.0
- click-plugins ==1.1.1
- click-repl ==0.2.0
- corpus2alpino ==0.3.10
- cryptography ==39.0.1
- cymem ==2.0.6
- defusedxml ==0.7.1
- django ==3.1.14
- django-allauth ==0.51.0
- django-celery-results ==2.4.0
- django-livereload-server ==0.4
- django-rest-auth ==0.9.5
- django-revproxy ==0.10.0
- djangorestframework ==3.13.1
- editdistance ==0.6.2
- et-xmlfile ==1.1.0
- execnet ==1.9.0
- folia ==2.5.8
- idna ==3.3
- iniconfig ==1.1.1
- isodate ==0.6.1
- jinja2 ==3.1.2
- kombu ==5.2.4
- langcodes ==3.3.0
- lxml ==4.9.1
- markupsafe ==2.1.1
- murmurhash ==1.0.7
- numpy ==1.21.6
- oauthlib ==3.2.2
- openpyxl ==3.0.10
- packaging ==21.3
- pandas ==1.3.5
- pathy ==0.6.2
- pluggy ==1.0.0
- preshed ==3.0.6
- prompt-toolkit ==3.0.30
- psycopg2 ==2.9.3
- py ==1.11.0
- pycparser ==2.21
- pydantic ==1.9.1
- pyjwt ==2.4.0
- pyparsing ==3.0.9
- pytest ==7.1.2
- pytest-django ==4.5.2
- pytest-forked ==1.4.0
- pytest-lazy-fixture ==0.6.3
- pytest-xdist ==2.5.0
- python-dateutil ==2.8.2
- python-docx ==0.8.11
- python3-openid ==3.2.0
- pytz ==2022.1
- pyyaml ==6.0.1
- pyyaml-include ==1.3.1
- rdflib ==6.1.1
- requests ==2.28.1
- requests-oauthlib ==1.3.1
- sastadev ==0.1.1
- six ==1.16.0
- smart-open ==5.2.1
- soupsieve ==2.3.2.post1
- spacy ==3.4.0
- spacy-legacy ==3.0.9
- spacy-loggers ==1.0.3
- sqlparse ==0.4.2
- srsly ==2.4.3
- tei-reader ==0.0.17
- thinc ==8.1.0
- tomli ==2.0.1
- tornado ==6.2
- tqdm ==4.64.0
- typer ==0.4.2
- typing-extensions ==4.1.1
- urllib3 ==1.26.10
- vine ==5.0.0
- wasabi ==0.9.1
- wcwidth ==0.2.5
- xlrd ==1.2.0
- xlsxwriter ==3.0.3
- pytest * test
- selenium * test
- atomicwrites ==1.3.0 test
- attrs ==19.1.0 test
- importlib-metadata ==0.23 test
- more-itertools ==5.0.0 test
- packaging ==19.2 test
- pluggy ==0.13.0 test
- py ==1.10.0 test
- pyparsing ==2.4.2 test
- pytest ==4.6.5 test
- selenium ==3.141.0 test
- six ==1.12.0 test
- urllib3 ==1.26.5 test
- wcwidth ==0.1.7 test
- zipp ==0.6.0 test
- actions/checkout v4 composite
- actions/setup-python v5 composite
- postgres * docker
- actions/checkout v4 composite
- actions/setup-node v4 composite