greek-complexity
Query XML treebank to explore syntactic complexity in Ancient Greek
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.2%) to scientific vocabulary
Repository
Query XML treebank to explore syntactic complexity in Ancient Greek
Basic Info
- Host: GitHub
- Owner: nevenjovanovic
- License: cc-by-4.0
- Language: XQuery
- Default Branch: main
- Size: 9.58 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Linguistic complexity in ancient Greek - Sentence complexity and grammar
We query a set of Greek texts, hand-encoded for morphology and syntax (as treebanks) by Vanessa Gorman, to explore complexity in Greek sentence. The treebanks and queries in this repository are published under a CC-BY license.
Contents
- The encoded texts (Alpheios dependency scheme), cloned from the Greek-Dependency-Trees repository, are in data directory
- Various XQuery scripts to transform and analyze the files are in scripts
- Reports made by scripts are in info
How to use
Download the files or clone the repository. Install BaseX XML database.
In BaseX, run the script create-grccomp-db.xq to create the grc-com database. Query the database by running other scripts in the scripts/xq directory. Adapt the scripts to query as needed.
A list of queries (from simple to complex)
Create DB, get some statistics
- Create the
grc-comdatabase: create-grccomp-db.xq - Get basic information about the database, how many words, sentences, documents: db-basic-info.xq
- Get stats on sentence length: db-stats-sentence.xq
- Get stats on relations: db-stats-relations
Statistics on syntactic relations
- Which POS have role of PRED (and similar): list-pred-types.xq
- Which POS have role of COORD (and similar): list-coord-types.xq
Analyse lemmata and their functions
- For a subset of sentences (based on number of elements, words etc), list lemmata: list-lemmata.xq
- For a lemma in a subset of sentences (based on number of elements), list its syntactic relations: lemma-list-functions.xq
- For a specific syntactic relation of lemma in a subset, list all sentences: relation-lemma-12-18-words.xq
Retrieve specific syntactic features
- Find sentences with all basic roles (PRED, SBJ, OBJ, ADV): find-sentences-all-basic-roles.xq
- Find sentences with ellipsis (a role is missing and is artificially added during annotation), exactly 6 sentence elements: find-ellipsis.xq
- Find sentences with 12 words or less where PRED is adjective: find-sentences-with-pred-adj.xq
- Find sentences with 12 words or less where PRED is conjunction: find-sentences-with-pred-conj.xq
- Find sentences with 15 words or less without PRED: find-sentences-no-pred.xq
- Sentences with PRED and COORD dependent on sentence root: find-pred-coord-0.xq
- Find sentences with 12 words or less where the article is not ATR (or its variations): find-article-not-atr.xq
- Find sentences with COORD by asyndeton (u): find-coord-sentences-asyndeton.xq
- Find sentences with PRED_CO: find-coord-pred-co.xq
- Find sentences with some number of words where some word has some _CO function: find-suffix-co.xq
- Find infinitive used as PRED: find-pred-inf.xq
- Find sentences without AuxY: find-sentences-no-auxy.xq
- Find sentences with many AuxY: find-sentences-with-many-auxy.xq
- Find sentences without OBJ, PNOM, SBJ (and combinations): find-no-sbj-obj-pnom.xq
- Find sentences without nouns or adjectives: find-no-nouns.xq
- List syntactic roles of participles with frequencies of occurrences: find-participles-roles.xq
- Find substantivated participles: find-participles-substantivated.xq
- Find substantivated infinitives: find-infinitives-substantivated.xq
- Find sentences where article is head: find-sentences-with-subst-expr.xq
- Find sentences with transitive verbs as PRED without OBJ: find-sentences-no-obj.xq; the list of transitive verbs was compiled with find-verbs-obj.xq
- Find verbs ruling PNOM which appear without PNOM as well: find-sentences-no-pnom.xq; the list of verbs ruling PNOM was compiled with find-pnom-pred.xq
Results
- Database: grc-com
- Date: 2022-06-02+02:00
- Documents: 153
- Sentences: 26781
- Words: 633763
- Stats on relations: relations-stats.md
- Stats on PRED: pred-stats.md
- Stats on COORD: coord-stats.md
- Sentences with all basic roles (PRED, SBJ, OBJ, ADV) expressed: sentences-basic-roles.md
- Sentences with ellipsis (artificially added elements), 6 sentence elements: sentences-ellipsis-6.md
- Sentences with PRED adjective: sentences-pred-adj.md
- Sentences with PRED conjunction: sentences-pred-c.md
- Sentences without PRED relation: sentences-no-pred.md
- Sentences where the article is not ATR: sentences-article-not-atr.md
- Sentences with COORD performed by punctuation (asyndeton): sentences-coord-asyndeton.md
- Sentences with PRED_CO: sentences-pred-co.md
- Sentences with infinitives used as PRED: sentences-inf-pred.md
- Sentences without AuxY (particles): sentences-no-auxy.md
- Sentences with many AuxY: sentences-many-auxy.md
- Sentences without OBJ, PNOM, SBJ (and combinations): no-sbj-obj-pnom.md
- Sentences without nouns or adjectives: no-nouns-adj.md
- Sentences with transitive verbs (active) as PRED, no OBJ: sentences-trans-no-obj.md
- Syntactic roles of participles: roles-participles.md
- Sentences with substantivated participles: subst-participles.md
- Sentences with substantivated infinitives: subst-inf.md
- Sentences where article is head: article-head.md
- Sentences with verbs taking PNOM in which the verbs are PRED but have no PNOM: pnom-no-pnom.md
On a server
- Landing page with list of functions
- Basic information on treebanks
- Retrieve a subset of sentences based on word count (default: 12 to 18 elements)
- List lemmata in a subset of sentences (default: 12 to 18 elements)
- List relations (sentence functions) for a lemma (default: καί, 12 to 18 elements)
- For relation of lemma, list sentences in subset (default: καί as PRED, 12 to 18 elements)
- Retrieve a subset of sentences without participles
- Retrieve a subset of sentences without participles and subordinate conjunctions
- Retrieve a subset of sentences without participles, infinitives, and subordinate conjunctions
- Retrieve a subset based on number of words, with PRED and COORD dependent on sentence root
Modules and functions for web application (RESTXQ)
- Modules (xqm, directory
/scripts/webapp/repo/)- Functions for analysing treebanks (in general): grccom-analysis.xqm
- Functions for displaying HTML (in general): grccom.xqm
- Functions for individual pages (xq, directory
/scripts/webapp/app/grccom)- Landing page: grccom-home.xq
- Basic information on database: grccom-basic-ana.xq
AGDT data format
For syntactic roles, see the description by Giuseppe G. A. Celano, Guidelines for the Ancient Greek Dependency Treebank 2.0.
``` Data Format
The data given in this treebank is provided as an XML document. Each
word contains six required attributes:
id: This is a unique identifier, and corresponds to the word's linear
position in the sentence. The first word in a sentence is given
id 1.
cid: This is a canonical identifier for the word within the larger corpus.
form: The token form of the word.
lemma: The base lemma from which the word is derived, in Beta Code.
head: The id of the word's parent. If a word depends on the sentence
root, its head is 0.
relation: The syntactic relation between the word and its parent. A
catalogue of syntactic tags can be found in the syntactic guidelines
described below.
postag: The morphological analysis for the word. This field is 9
characters long, and corresponds to the following morphological
features:
1: part of speech
n noun
v verb
t participle
a adjective
d adverb
l article
g particle
c conjunction
r preposition
p pronoun
m numeral
i interjection
e exclamation
u punctuation
2: person
1 first person
2 second person
3 third person
3: number
s singular
p plural
d dual
4: tense
p present
i imperfect
r perfect
l pluperfect
t future perfect
f future
a aorist
5: mood
i indicative
s subjunctive
o optative
n infinitive
m imperative
p participle
6: voice
a active
p passive
m middle
e medio-passive
7: gender
m masculine
f feminine
n neuter
8: case
n nominative
g genitive
d dative
a accusative
v vocative
l locative
9: degree
c comparative
s superlative
---
For example, the postag for the noun "a)/ndra" is "n-s---ma-",
which corresponds to the following features:
1: n noun
2: -
3: s singular
4: -
5: -
6: -
7: m masculine
8: a accusative
9: -
```
Editor of this repository
- Neven Jovanović (nevenjovanovic), Department of Classical Philology, Faculty of Humanities and Social Sciences, University of Zagreb; orcid.org/0000-0002-9119-399X
Owner
- Name: Neven Jovanović
- Login: nevenjovanovic
- Kind: user
- Location: Zagreb, Croatia
- Company: University of Zagreb, Faculty of Humanities and Social Sciences
- Website: https://orcid.org/my-orcid?orcid=0000-0002-9119-399X
- Repositories: 9
- Profile: https://github.com/nevenjovanovic
Classical philologist, university teacher of Greek and Latin. Digital humanities, textual editing. University of Zagreb, Croatia.
Citation (CITATION.cff)
cff-version: 1.1.0
message: "If you use this collection of scripts and data, please cite it as below."
authors:
- family-names: Jovanović
given-names: Neven
orcid: https://orcid.org/0000-0002-9119-399X
title: nevenjovanovic/greek-complexity: Complexity in Ancient Greek Treebanks, First Set of Queries (number of elements, lemmata, relations)
version: v1.0.0
date-released: 2023-01-14