udstyle

Compute complexity metrics from Universal Dependencies

https://github.com/andreasvc/udstyle

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Compute complexity metrics from Universal Dependencies

Basic Info
  • Host: GitHub
  • Owner: andreasvc
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 33.2 KB
Statistics
  • Stars: 2
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 4 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

udstyle

Compute complexity metrics from Universal Dependencies. Input can be a .conllu file, or a plain text file that will be parsed by Stanza, if installed and language is specified. Usage: python3 udstyle.py [OPTIONS] FILE... --parse=LANG parse texts with Stanza; provide 2 letter language code --output=FILENAME write result to a tab-separated file. --persentence report per sentence results, not mean per document Reported metrics: - LEN: mean sentence length in words (excluding punctuation). - MDD: mean dependency distance (Gibson, 1998). - NDD: normalized dependency distance (Lei & Jockers, 2018). - ADJD: proportion of adjacent dependencies. - LEFT: dependency direction: proportion of left dependents. - MOD: nominal modifiers (Biber & Gray, 2010). - CLS: number of clauses per sentence. - CLL: average clause length (clauses/words) - LXD: lexical density: ratio of content words over total number of words - POS/DEP tag frequencies (only with --output) Example: $ python3 udstyle.py UD_Dutch-LassySmall/*.conllu LEN MDD NDD ADJD LEFT MOD CLS CLL LXD dev.conllu 14.182 2.461 0.926 0.500 0.459 0.052 2.223 9.190 0.603 test.conllu 11.434 2.192 0.807 0.547 0.412 0.074 1.771 9.013 0.657 train.conllu 11.027 2.172 0.775 0.564 0.391 0.072 1.863 8.107 0.645 $ python3 udstyle.py --parse=nl troonrede.txt [...]

References

Simple readability metrics: https://github.com/andreasvc/readability/

If you use this code for research, please cite this repository.

Owner

  • Name: Andreas van Cranenburgh
  • Login: andreasvc
  • Kind: user
  • Location: Groningen

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: udstyle
message: >-
  Please cite this software using the metadata from
  'preferred-citation'.
type: software
authors:
  - orcid: 'https://orcid.org/0000-0002-4545-1548'
    given-names: Andreas
    name-particle: van
    family-names: Cranenburgh
    email: a.w.van.cranenburgh@rug.nl
    affiliation: University of Groningen
identifiers:
  - type: url
    value: 'https://github.com/andreasvc/udstyle'

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2