https://github.com/cldf/linglit
Programmatic access to linguistic literature
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary
Repository
Programmatic access to linguistic literature
Basic Info
- Host: GitHub
- Owner: cldf
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 618 KB
Statistics
- Stars: 7
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
linglit
Programmatic access to linguistic literature
Overview
linglit provides programmatic access to data buried in linguistic literature. Currently, this means
extracting
- bibliographies
- IGT examples
from - books published with Language Science Press (if LaTeX sources are publicly available) - papers published in Glossa (if XML downloads are publicly available)
linglit does not come with any data (except some configuration), but it provides functionality to create and
curate repositories with the "raw" data per publication provider (see CLI). For Language Science Press such a repository
is publicly available at https://github.com/langsci/raw_texfiles .
Install
Install from PyPI with pip:
shell
pip install linglit
Some linglit funtionality depends on other programs that need to be installed separately:
- Extracting data from LSP books requires bibtool.
- Creating a local repository of LSP data requires gh (but this is only
necessary if you are not happy with the content in https://github.com/langsci/raw_texfiles, which can
simply be cloned or downloaded from a release).
CLI
Installing the linglit python package will also install a commandline tool linglit. All functionality is
provided by subcommands. To see a list of available subcommands, run
```shell
$ linglit -h
usage: linglit [-h] [--log-level LOG_LEVEL] COMMAND ...
optional arguments: -h, --help show this help message and exit --log-level LOG_LEVEL log level ERROR|WARN|INFO|DEBUG
available commands: Run "COMAMND -h" to get help for a specific command.
COMMAND bib Show the bibliography of a publication igt Show the IGT examples of a publication update Update a linglit data repository ```
Downloading "raw" data
Running
shell
linglit update <PROVIDER> <DIRECTORY>
will load the raw data for a provider in the existing directory <DIRECTORY>.
Extracting bibliographies
Running
shell
linglit bib <PROVIDER> <DIRECTORY> <PUBID>
will print the bibliography of a publication in a serialization format roughly following the Unified Stylesheet
for Linguistics.
```shell $ linglit bib glossa ../../cldf_datasets/imtvault/raw/glossa/ 6371 Aissen, Judith. 2003. Differential object marking: Iconicity vs. economy. Natural Language and Linguistic Theory 21. 435-483.
Ameka, Felix and de Witte, Carlien and Wilkins, David and Wilkins, David. 1999. Picture series for positional verbs: Eliciting the verbal component in locative descriptions. In Manual for the 1999 field season, 48-54. Nijmegen: Max Planck Institute for Psycholinguistics. ... ```
Using the --bibtex option will print out the bibliography formatted in BibTeX:
```shell
$ linglit bib glossa ../../cldf_datasets/imtvault/raw/glossa/ 6371 --bibtex
@article{glossa6371:B1,
author = {Aissen, Judith},
year = {2003},
pages = {435-483},
doi = {10.1023/A:1024109008573},
title = {Differential object marking: Iconicity vs. economy},
journal = {Natural Language and Linguistic Theory},
volume = {21}
}
@incollection{glossa6371:B2, author = {Ameka, Felix and de Witte, Carlien and Wilkins, David and Wilkins, David}, year = {1999}, pages = {48-54}, title = {Picture series for positional verbs: Eliciting the verbal component in locative descriptions}, booktitle = {Manual for the 1999 field season}, address = {Nijmegen}, publisher = {Max Planck Institute for Psycholinguistics} } ... ```
Extracting IGT examples
Running
shell
linglit igt <PROVIDER> <DIRECTORY> <PUBID>
will print the IGT examples from a publication.
```shell $ linglit igt glossa ../../cldfdatasets/imtvault/raw/glossa/ 6371 (1) daww1239 (glossa6371: 1) tir ka’ mãr [yeg ked/rid/∅)] tir ka’ mãr [yeg ked/rid/∅)] 3SG lie.in.hammock REP [hammock in/LOC/∅] ‘He was lying in the hammock [inanimate noun], they say.’ (MS, ailla:254700, 20130724historia_McS.wav, 4:30–4:46)’
(2) daww1239 (glossa6371: 2) ‘aa’ nẽed dôo’ [baal’ rid/ked/ *∅)] ‘aa’ nẽed dôo’ [baal’ rid/ked/ ∅)] ANPH come AUX:source [Manaus LOC/IN/*∅] ‘He came yesterday from Manaus [place name].’ (MFM, ailla:254700, 20130723historiaMFM.wav, 6:50–7:30)’
... ```
Python API
linglit provides a python API to access the content of different publication providers in a unified way. The
main point of access for data is a Repository. Each provider is implemented as subclass of linglit.base.Repository,
which can be retrieved by provider ID:
```python
from linglit import PROVIDERS repocls = PROVIDERS['langsci'] langsci = repocls('langsci') print(langsci['17']) Wilbur, Joshua 2014. A grammar of Pite Saami ```
IGT examples
Examples are modeled as instances of linglit.base.Example. These can be accessed as follows:
```python
ex = langsci['17'].examples[10] print(ex.as_igt()) dä virtiv válldet giehpajd ja ribbrev ja dagarijd ulgos dä virti-v vállde-t giehpa-jd ja ribbre-v ja dagari-jd ulgos then must-1SG.PRS take-INF lung-ACC.PL and liver-ACC.SG and such-ACC.PL out ‘Then I have to take out the lungs, the liver and such things. 080909103’ ```
References
References are modeled as pycldf.sources.Source instances.
```python
src = langsci['17'].cited_references[5] print(src) Grundström, Harald and Väisänen, A. O. 1958. Lapska sånger: Texter och melodier från svenska Lappland (Jonas Eriksson Steggos sånger). (Skrifter utgivna genom Landsmåls- och Folkminnesarkivet i Uppsala, 1.) Uppsala: Lundequistska bokhandeln. print(src.bibtex()) @book{langsci17:grundstroem1958a, address = {Uppsala}, keywords = {Pite, Jojk, Musicology}, language = {Swedish and German and Pite Saami}, number = {1}, publisher = {Lundequistska bokhandeln}, series = {Skrifter utgivna genom Landsmåls- och Folkminne ```
Owner
- Name: Cross-Linguistic Data Formats
- Login: cldf
- Kind: organization
- Website: https://cldf.clld.org
- Repositories: 15
- Profile: https://github.com/cldf
GitHub Events
Total
- Push event: 11
- Create event: 1
Last Year
- Push event: 11
- Create event: 1
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite