https://github.com/cldf/linglit

Programmatic access to linguistic literature

https://github.com/cldf/linglit

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Programmatic access to linguistic literature

Basic Info
  • Host: GitHub
  • Owner: cldf
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 618 KB
Statistics
  • Stars: 7
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog Contributing License

README.md

linglit

Programmatic access to linguistic literature

Build Status PyPI

Overview

linglit provides programmatic access to data buried in linguistic literature. Currently, this means extracting - bibliographies - IGT examples

from - books published with Language Science Press (if LaTeX sources are publicly available) - papers published in Glossa (if XML downloads are publicly available)

linglit does not come with any data (except some configuration), but it provides functionality to create and curate repositories with the "raw" data per publication provider (see CLI). For Language Science Press such a repository is publicly available at https://github.com/langsci/raw_texfiles .

Install

Install from PyPI with pip: shell pip install linglit

Some linglit funtionality depends on other programs that need to be installed separately: - Extracting data from LSP books requires bibtool. - Creating a local repository of LSP data requires gh (but this is only necessary if you are not happy with the content in https://github.com/langsci/raw_texfiles, which can simply be cloned or downloaded from a release).

CLI

Installing the linglit python package will also install a commandline tool linglit. All functionality is provided by subcommands. To see a list of available subcommands, run ```shell $ linglit -h usage: linglit [-h] [--log-level LOG_LEVEL] COMMAND ...

optional arguments: -h, --help show this help message and exit --log-level LOG_LEVEL log level ERROR|WARN|INFO|DEBUG

available commands: Run "COMAMND -h" to get help for a specific command.

COMMAND bib Show the bibliography of a publication igt Show the IGT examples of a publication update Update a linglit data repository ```

Downloading "raw" data

Running shell linglit update <PROVIDER> <DIRECTORY> will load the raw data for a provider in the existing directory <DIRECTORY>.

Extracting bibliographies

Running shell linglit bib <PROVIDER> <DIRECTORY> <PUBID> will print the bibliography of a publication in a serialization format roughly following the Unified Stylesheet for Linguistics.

```shell $ linglit bib glossa ../../cldf_datasets/imtvault/raw/glossa/ 6371 Aissen, Judith. 2003. Differential object marking: Iconicity vs. economy. Natural Language and Linguistic Theory 21. 435-483.

Ameka, Felix and de Witte, Carlien and Wilkins, David and Wilkins, David. 1999. Picture series for positional verbs: Eliciting the verbal component in locative descriptions. In Manual for the 1999 field season, 48-54. Nijmegen: Max Planck Institute for Psycholinguistics. ... ```

Using the --bibtex option will print out the bibliography formatted in BibTeX: ```shell $ linglit bib glossa ../../cldf_datasets/imtvault/raw/glossa/ 6371 --bibtex @article{glossa6371:B1, author = {Aissen, Judith}, year = {2003}, pages = {435-483}, doi = {10.1023/A:1024109008573}, title = {Differential object marking: Iconicity vs. economy}, journal = {Natural Language and Linguistic Theory}, volume = {21} }

@incollection{glossa6371:B2, author = {Ameka, Felix and de Witte, Carlien and Wilkins, David and Wilkins, David}, year = {1999}, pages = {48-54}, title = {Picture series for positional verbs: Eliciting the verbal component in locative descriptions}, booktitle = {Manual for the 1999 field season}, address = {Nijmegen}, publisher = {Max Planck Institute for Psycholinguistics} } ... ```

Extracting IGT examples

Running shell linglit igt <PROVIDER> <DIRECTORY> <PUBID> will print the IGT examples from a publication.

```shell $ linglit igt glossa ../../cldfdatasets/imtvault/raw/glossa/ 6371 (1) daww1239 (glossa6371: 1) tir ka’ mãr [yeg ked/rid/∅)] tir ka’ mãr [yeg ked/rid/∅)] 3SG lie.in.hammock REP [hammock in/LOC/∅] ‘He was lying in the hammock [inanimate noun], they say.’ (MS, ailla:254700, 20130724historia_McS.wav, 4:30–4:46)’

(2) daww1239 (glossa6371: 2) ‘aa’ nẽed dôo’ [baal’ rid/ked/ *∅)] ‘aa’ nẽed dôo’ [baal’ rid/ked/ ∅)] ANPH come AUX:source [Manaus LOC/IN/*∅] ‘He came yesterday from Manaus [place name].’ (MFM, ailla:254700, 20130723historiaMFM.wav, 6:50–7:30)’

... ```

Python API

linglit provides a python API to access the content of different publication providers in a unified way. The main point of access for data is a Repository. Each provider is implemented as subclass of linglit.base.Repository, which can be retrieved by provider ID: ```python

from linglit import PROVIDERS repocls = PROVIDERS['langsci'] langsci = repocls('langsci') print(langsci['17']) Wilbur, Joshua 2014. A grammar of Pite Saami ```

IGT examples

Examples are modeled as instances of linglit.base.Example. These can be accessed as follows: ```python

ex = langsci['17'].examples[10] print(ex.as_igt()) dä virtiv válldet giehpajd ja ribbrev ja dagarijd ulgos dä virti-v vállde-t giehpa-jd ja ribbre-v ja dagari-jd ulgos then must-1SG.PRS take-INF lung-ACC.PL and liver-ACC.SG and such-ACC.PL out ‘Then I have to take out the lungs, the liver and such things. 080909103’ ```

References

References are modeled as pycldf.sources.Source instances.

```python

src = langsci['17'].cited_references[5] print(src) Grundström, Harald and Väisänen, A. O. 1958. Lapska sånger: Texter och melodier från svenska Lappland (Jonas Eriksson Steggos sånger). (Skrifter utgivna genom Landsmåls- och Folkminnesarkivet i Uppsala, 1.) Uppsala: Lundequistska bokhandeln. print(src.bibtex()) @book{langsci17:grundstroem1958a, address = {Uppsala}, keywords = {Pite, Jojk, Musicology}, language = {Swedish and German and Pite Saami}, number = {1}, publisher = {Lundequistska bokhandeln}, series = {Skrifter utgivna genom Landsmåls- och Folkminne ```

Owner

  • Name: Cross-Linguistic Data Formats
  • Login: cldf
  • Kind: organization

GitHub Events

Total
  • Push event: 11
  • Create event: 1
Last Year
  • Push event: 11
  • Create event: 1

Dependencies

.github/workflows/python-package.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
pyproject.toml pypi
requirements.txt pypi
setup.py pypi