qlcdata

Datamanagement for Quantitative Language Comparison

https://github.com/cysouw/qlcdata

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Datamanagement for Quantitative Language Comparison

Basic Info

Host: GitHub
Owner: cysouw
Language: R
Default Branch: master
Size: 25 MB

Statistics

Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 5

Created over 10 years ago · Last pushed about 2 years ago

Metadata Files

Readme Changelog Citation

qlcData

Functions for data managements in Quantitative Language Comparison

The package combines various methods to deal with data in language comparison, and it is intended to grow in the future to allow different datasets to be used and compared.

It consists of various read and write functions to import and produce different kinds of data.

When using external data, there are often various tweaks that one would like to perform before using the data for further research. This package offers assistance for some common recoding problems occurring for nominal data with the function recode. Please see the vignette for a detailed explanation of the intended usage.

To process strings, it is often very useful to tokenize them into graphemes (i.e. functional units of the orthography), and possibly replace those graphemes by other symbols to harmonize the orthographic representation of different orthographic representations ('transcription'). As a quick and easy way to specify, save, and document the decisions taken for the tokenization, we propose using an orthography profile. Function to write and read orthography profiles are provided in this package. The main function tokenize can check orthography profiles against data, and tokenize data into (tailored) graphemes according to orthography profiles.

This is an early alpha version, but it should function. You can download the package directly from CRAN. Have a look at the examples in the help files and at the vignettes to get an idea how to use the package:

install.packages("qlcData")

If you want to have the latest changes, it is pretty easy to install this package directly from github into R by using:

install.packages("devtools")
devtools::install_github("cysouw/qlcData")

There are vignettes trying to explain the intended usage of this package. Unfortunately, the vignette will not by build when you install this package. You can try the following, but it might throw an error:

devtools::install_github("cysouw/qlcData", build_vignettes = TRUE)
vignette("orthography_processing")
vignette("recoding_nominal_data")

A few functions are available through a bash terminal. You will have to manually softlink these interfaces to you PATH, for example to link the function tokenize to /usr/local/bin/ use something like:

ln -is `Rscript -e 'cat(file.path(find.package("qlcData"), "exec", "tokenize"))'` /usr/local/bin

All available executables are tokenize, writeprofile and pass_align

Michael Cysouw cysouw@mac.com

Owner

Name: Michael Cysouw
Login: cysouw
Kind: user
Location: Marburg, Germany
Company: Philipps-Universität Marburg

Website: cysouw.de/home
Repositories: 39
Profile: https://github.com/cysouw

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: qlcData. Processing Data for Quantitative Language Comparison in R.
message: >-
  If you use this software, please cite it using the
  metadata from the CITATION file.
type: software
authors:
  - given-names: Michael
    family-names: Cysouw
    email: cysouw@mac.com
    orcid: 'https://orcid.org/0000-0003-3168-4946'
    affiliation: Philipps Universität Marburg
repository-code: 'https://github.com/cysouw/qlcData'
abstract: >-
  Functionality to read, recode, and transcode data as used in 
  quantitative language comparison, specifically to deal with multilingual 
  orthographic variation and with the recoding of nominal data.
keywords:
  - linguistics
  - language comparison
  - data quality
license: GPL-3.0
version: v0.3
date-released: '2024-06-09'

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Dependencies

DESCRIPTION cran

R >= 2.10 depends
ape * imports
data.tree * imports
docopt * imports
phytools * imports
shiny * imports
stringi >= 0.2 imports
yaml >= 2.1.11 imports
knitr * suggests
rmarkdown * suggests

.github/workflows/rhub.yaml actions

r-hub/actions/checkout v1 composite
r-hub/actions/platform-info v1 composite
r-hub/actions/run-check v1 composite
r-hub/actions/setup v1 composite
r-hub/actions/setup-deps v1 composite
r-hub/actions/setup-r v1 composite

revdep/library.noindex/qlcData/new/qlcData/DESCRIPTION cran

R >= 3.5.0 depends
ape * imports
data.tree * imports
docopt * imports
phytools * imports
shiny * imports
stringi >= 0.2 imports
yaml >= 2.1.11 imports
knitr * suggests
rmarkdown * suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science