nuuchahnulth

Linguistic data on the Nuuchahnulth (Wakashan) language

https://github.com/dwhieb/nuuchahnulth

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary

Keywords

corpora corpus corpus-linguistics documentary-linguistics language-documentation linguistics nuuchahnulth wakashan

Keywords from Contributors

mesh sequences interactive hacking network-simulation

Last synced: 9 months ago · JSON representation ·

Repository

Linguistic data on the Nuuchahnulth (Wakashan) language

Basic Info

Host: GitHub
Owner: dwhieb
License: cc-by-sa-4.0
Language: JavaScript
Default Branch: main
Homepage:
Size: 7.06 MB

Statistics

Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 7

Topics

corpora corpus corpus-linguistics documentary-linguistics language-documentation linguistics nuuchahnulth wakashan

Created about 7 years ago · Last pushed over 4 years ago

Metadata Files

Readme License Citation

Nuuchahnulth

This repository contains linguistic texts in Nuuchahnulth, a language of the Wakashan language family, spoken in the Pacific Northwest. These texts are digitally-searchable versions of those prepared by Toshihide Nakayama (Tokyo University of Foreign Studies), and published as volumes A2-027 and A2-028 of the series Endangered Languages of the Pacific Rim. The texts were dictated by George Louie and Caroline Little to Toshihide Nakayama, who then transcribed, analyzed, and prepared the edited versions.

Attribution
Reporting Typos & Issues
Corpus Statistics
Text Formats
Sounds of Nuuchahnulth
Abbreviations
Converting the Corpus

Attribution

If you would like to use the data in the repository for research, please cite the following sources, depending on the text:

Nakayama, Toshihide (ed.). 2003. Caroline Little's Nuu-chah-nulth (Ahousaht) texts with grammatical analysis (Endangered Languages of the Pacific Rim A2-027). Kyoto: Nakanishi Printing Co.
Nakayama, Toshihide (ed.). 2003. George Louie's Nuu-chah-nulth (Ahousaht) texts with grammatical analysis (Endangered Languages of the Pacific Rim A2-028). Kyoto: Nakanishi Printing Co.

You may also use the stable DOI made available through Zenodo to cite this online version of the corpus:

DOI:10.5281/zenodo.3931864

For other uses of this data, please contact Toshihide Nakayama.

Reporting Typos & Issues

To report a typo or other problem, open an issue on GitHub.

Corpus Statistics

Statistic | Value -----------|------ Speakers | 2 Texts | 24 Utterances | 2,081 Tokens | 8,366 Wordforms | 4,216 Stems | 2,547 Roots | 1,313

Text Formats

The texts are available in three formats:

The "raw" versions of the texts, in a practical writing system used for the purpose of quickly typing in the data. These versions are used to produce the other versions of the texts. These versions are located in the folder texts/raw.
An interlinear gloss format (IGL) — a format used by linguists to represent data in a way that can be read and understood by anyone. Each document itself follows a format called scription, which enforces consistency in the structure of the text, making it computationally parseable. These versions are located in the folder texts/interlinear.

At the top of each text is a header (between the two sets of dashes ---), which provides the title in English (and sometimes Nuuchahnulth), the abbreviation, and the unique ID for each text.

Beneath the header are utterances (sentences) in the text. Each utterance is separated from the next by a blank line.

Each utterance has 5 lines, which contain the following kinds of information:

Utterance Number: The number of the utterance within the text.
Transcript: A transcription of each utterance using the Nuuchahnulth writing system, along with punctuation.
Morphemes: A list of each morpheme (meaningful part) of each word, where morphemes are separated by hyphens.
Glosses: A short gloss (abbreviation) indicating the meaning of each morpheme in the word, separated by hyphens. See the Abbreviations section below.
Literal Translations: Literal translations of each word.
Free Translations: A free (loose) translation for the utterance.

For more information about the scription format, visit https://scription.digitallinguistics.io.

A JSON version, formatted according to the Data Format for Digital Linguistics (DaFoDiL). This version of the corpus is most useful for programmatically interacting with the texts. See the DaFoDiL page for more information about how this data is formatted.

Sounds of Nuuchahnulth

The following table shows the consonant sounds of Nuuchahnulth, arranged by place and manner of articulation in accordance with the International Phonetic Alphabet (IPA).

Manner | Labial | Apical | Alveolar | Lateral | Palatal | Velar | Labio-Velar | Uvular | Labio-Uvular | Pharyngeal | Glottal ------------------|:------:|:------:|:--------:|:-------:|:-------:|:-----:|:-----------:|:------:|:------------:|:----------:|:------: Stops | p | t | c | ƛ | č | k | kʷ | q | qʷ | ʕ | ʔ Ejectives | p̓ | t̓ | c̓ | ƛ̓ | č̓ | k̓ | k̓ʷ | | (q̓ʷ) | | Fricatives | | | s | ɬ | š | x | xʷ | x̣ | | ḥ | h Resonants | m | n | | | y | | w | | | | Glottal Resonants | m̓ | n̓ | | | y̓ | | w̓ | | | |

Ahousaht Nuuchahnulth has three vowels: /i, a, u/, each of which may be long (/Vː/), short, or variable-length (/V·/).

Certain suffixes in Nuuchahnulth change the sounds that precede them:

Hardening suffixes change stops, affricates, and resonants into their glottalized counterparts, and fricatives into /w̓/ or /y̓/ depending on whether the consonant is rounded. Hardening suffixes are indicated by ⟨ʼ⟩.
Softening suffixes change a preceding fricative into /w/ or /y/ depending on whether the consonant is rounded. Softening suffixes are indicated by ⟨ʽ⟩.

Abbreviations

The following abbreviations are used in the texts.

Abbreviation | Meaning -------------|------------------------------------- CAUS | causative COND | conditional mood CONT | continuative aspect DEF | definite DIM | diminutive DISTR | distributive DUB | dubitative mood DUP | CV reduplication DUP# | syllable reduplication DUPCV | CV reduplication DUR | durative aspect EXP | expression that cannot be translated FIN(ITE) | finite event FUT | future FUT.IMP | future imperative GRAD | graduative aspect IMP | imperative INC | inceptive aspect INC.CAUS | inceptive causative IND | indicative mood INDF | indefinite mood INF | inferential mood INTER | interrogative INTJ | interjection IT | iterative aspect IT.INC | iterative inceptive aspect IT.PL | iterative plural LOC | location MOM | momentaneous MOMCAUS | momentaneous causative PL | plural POSS | possessive PURP | purposive QUOT | quotative REL | relative mood REL.DUB | relative dubitative mood REP | repetitive aspect SG | singular SHIFT | perspective shifting SIM | simultaneous (‘while doing…’) SPOR | sporadic aspect SUB | subordinate mood

Converting the Corpus

To run the scripts that convert the corpus for yourself, you will need to 1) install Node.js, 2) clone this repository to your computer, 3) install the necessary scripts by running npm install from the command line in the folder for the repository, and 4) then run the command npm build from the command line in the folder for this repository.

You can also run just the transliteration step (npm run transliterate) or the conversion step (npm run convert).

Find & Replace

I've also written a find-and-replace script (scripts/findAndReplace.js), which allows the user to run searches on the corpus or update the JSON files in the corpus. See the documentation on how to use this function in the findAndReplace.js file. An example of how to use this function can be seen in scripts/getCorpusStats.js, which calculates the statistics for the corpus.

Owner

Name: Daniel W. Hieber
Login: dwhieb
Kind: user
Location: Edmonton, AB

Website: https://danielhieber.info
Twitter: dwhieb
Repositories: 8
Profile: https://github.com/dwhieb

I'm a diversity linguist documenting the world's endangered languages.

Citation (CITATION.cff)

authors:
  - family-names: Hieber
    given-names:  Daniel W.
    orcid:        https://orcid.org/0000-0002-1411-3773
cff-version:   1.2.0
date-released: '2021-09-03'
doi:           10.5281/zenodo.3931864
license:       CC-BY-SA-4.0
message:       Please cite this corpus using the following references, depending on the text.
title:         A corpus of Nuuchahnulth
type:          dataset
version:       1.3.1
references:
  - editors:
      - family-names: Nakayama
        given-names:  Toshihide
    publisher:
      name: Nakanishi Printing Co.
    title:  Caroline Little's Nuu-chah-nulth (Ahousaht) texts with grammatical analysis
    type:   book
    volume: A2-027
    year:   2003
  - editors:
      - family-names: Nakayama
        given-names:  Toshihide
    publisher:
      name: Nakanishi Printing Co.
    title:  George Louie's Nuu-chah-nulth (Ahousaht) texts with grammatical analysis
    type:   book
    volume: A2-028
    year:   2003

GitHub Events

Total

Last Year

Committers

Last synced: over 1 year ago

All Time

Total Commits: 417
Total Committers: 2
Avg Commits per committer: 208.5
Development Distribution Score (DDS): 0.002

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Danny Hieber	d**b@g**m	416
dependabot[bot]	4****]	1

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 12
Total pull requests: 8
Average time to close issues: 4 months
Average time to close pull requests: about 2 months
Total issue authors: 1
Total pull request authors: 2
Average comments per issue: 0.58
Average comments per pull request: 0.75
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 5

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

dwhieb (9)

Pull Request Authors

dependabot[bot] (3)
dwhieb (3)

Top Labels

Issue Labels

Pull Request Labels

dependencies (3)

Dependencies

package-lock.json npm

146 dependencies

package.json npm

@digitallinguistics/javascript ^0.5.0 development
@digitallinguistics/scription2dlx ^0.12.0 development
@digitallinguistics/transliterate ^0.2.2 development
@digitallinguistics/word-aligner ^0.3.1 development
eslint ^7.7.0 development
fs-extra ^10.0.0 development
ora ^6.0.0 development
smartquotes ^2.3.1 development
yamljs ^0.3.0 development

nuuchahnulth

Science Score: 67.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Nuuchahnulth

Contents

Attribution

Reporting Typos & Issues

Corpus Statistics

Text Formats

Sounds of Nuuchahnulth

Abbreviations

Converting the Corpus

Find & Replace

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies