https://github.com/concepticon/norare-data

Cross-Linguistic Norms, Ratings, and Relations for Words and Concepts

https://github.com/concepticon/norare-data

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Cross-Linguistic Norms, Ratings, and Relations for Words and Concepts

Basic Info
  • Host: GitHub
  • Owner: concepticon
  • License: other
  • Language: Python
  • Default Branch: master
  • Size: 45.1 MB
Statistics
  • Stars: 16
  • Watchers: 5
  • Forks: 3
  • Open Issues: 59
  • Releases: 5
Created about 6 years ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

Database of Cross-Linguistic Norms, Ratings, and Relations for Words and Concepts

Installation of the database

Get started

We recommend using your terminal to carry out the following instructions. We (strongly) advise you to install a virtual environment. In addition, you need GIT and Python (version 3) on your system.

Install pynorare

The NoRaRe database uses a Python package called pynorare for data curation (as a dependency pyconcepticon will be installed automatically). The Python package can be installed from PyPi.

$ pip install pynorare

Clone norare-data GIT repository

To access the NoRaRe data sets, you need to clone the norare-data GIT repository into a folder of your choice. Navigate or create a folder where you want to store the norare-data repository.

$ mkdir PATH/TO/NAME-NEW-FOLDER or $ cd PATH/TO/FOLDER Clone the repository by typing:

$ git clone https://github.com/concepticon/norare-data.git

Clone concepticon-data GIT repository

The NoRaRe database is linked to Concepticon. To access all data sets and perform the mapping of your own data, you need to download the concepticon-data GIT repository.

$ git clone https://github.com/concepticon/concepticon-data.git

Commands

To check if the installation worked and see the available commands of the pynorare package, type:

$ norare

Show NoRaRe statistics

For example, you can get statistic of the distribution of the Concepticon identifiers by typing the following command. Note that you should add the path to a specific clone of concepticon-data, because you are not within the root of the concepticon-data. The same is possible for the norare-data repository if you are working with multiple clones. You can also use the catalog.ini file to specify the path (see 'Define repository paths' section).

$ norare --repos=/PATH/TO/concepticon-data --norarepo=/PATH/TO/norare-data stats

List NoRaRe data sets

To see all available data sets, navigate into the norare-data folder and type:

$ norare --repos=/PATH/TO/concepticon-data ls

Download a NoRaRe data set

By typing the following command, the original data set will be downloaded into the raw folder. It is only locally stored.

$ norare --repos=/PATH/TO/concepticon-data download Abdaoui-2017-EmoLex

Define repository paths

To make your live easier, you can define the default path to concepticon-data in a catalog.ini file (Linux users can follow the desciption in this blog post).

For Mac users: Open the catalog.ini file in a text editor of your choice and add the following lines.

[clones] concepticon = /PATH/TO/concepticon-data

If you can't find the catalog.ini file, create a directory mkdir /Users/YOURNAME/Library/Application\ Support/cldf/ and add it to the cldf folder.

Mapping procedure

"Mapping a dataset" refers to the process of mapping the words or concepts described in a dataset to Concepticon's concept sets and extracting associated norms, ratings or relations from a dataset's "raw" data.

This is facilitated with small Python modules (one per dataset), located at datasets/<dataset-ID>/norare.py. These python modules may provide two functions, download and map with signatures as shown below, which will be called when the respective command is run with norare.

``python def download(dataset: pynorare.api.Dataset): """use methods ofdataset` to download or manipulate data"""

def map(dataset: pynorare.api.Dataset, concepticon: pyconcepticon.Concepticon, mappings: dict): """eventually call dataset.extract_data to write the extracted data to files.""" ```

Create a metadata.json file

The metadata file is meant to provide additional information to your data set. It is also used to define the column names. For a comprehensive description of the possible namespace terms see the CSVW documentation. You can also follow the standards of the metadata.json files for other data sets. Note that you can specify the old "titles" and new column names with "name" as follows:

json { "name": "ENGLISH", "datatype": "string", "titles": "word" }

Indicate the data "datatype" of each column: integer, float, string.

Don't forget to add the Concepticon columns:

json { "name": "CONCEPTICON_ID", "datatype": "integer" }, { "name": "CONCEPTICON_GLOSS", "datatype": "string" }

Download and map

Make sure that you have stored the norare.py and <YOUR-DATASET-ID>.tsv-metadata.json files and created a raw folder in your data set folder. If you are ready, navigate to the norare-data folder and type the following commands into your terminal:

$ norare download YOUR-DATASET-ID $ norare map YOUR-DATASET-ID

The raw file should be stored in the raw folder and a new <YOUR-DATASET-ID>.tsv should occur in your data set folder.

Validate

You can validate your data set by typing:

$ norare validate YOUR-DATASET-ID

References

Tjuka, Annika, Robert Forkel, and Johann-Mattis List. 2022. Linking Norms, Ratings, and Relations of Words and Concepts Across Multiple Language Varieties. Behavior Research Methods 54. 864–884. https://doi.org/10.3758/s13428-021-01650-1.

Tjuka, Annika. “Adding Data Sets to NoRaRe: A Guide for Beginners,” in Computer-Assisted Language Comparison in Practice, 11/08/2021, https://calc.hypotheses.org/2890.

Tjuka, Annika. “Comparing NoRaRe Data Sets: Calculation of Correlations and Creation of Plots in R,” in Computer-Assisted Language Comparison in Practice, 24/11/2021, https://calc.hypotheses.org/3109.

Owner

  • Name: Concepticon
  • Login: concepticon
  • Kind: organization
  • Email: concepticon@eva.mpg.de

A Resource for the Linking of Concept Lists

GitHub Events

Total
  • Create event: 42
  • Release event: 1
  • Issues event: 53
  • Watch event: 1
  • Delete event: 39
  • Issue comment event: 60
  • Push event: 125
  • Pull request review event: 96
  • Pull request review comment event: 81
  • Pull request event: 49
Last Year
  • Create event: 42
  • Release event: 1
  • Issues event: 53
  • Watch event: 1
  • Delete event: 39
  • Issue comment event: 60
  • Push event: 125
  • Pull request review event: 96
  • Pull request review comment event: 81
  • Pull request event: 49

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 43
  • Total pull requests: 26
  • Average time to close issues: over 1 year
  • Average time to close pull requests: 5 days
  • Total issue authors: 4
  • Total pull request authors: 3
  • Average comments per issue: 0.51
  • Average comments per pull request: 0.88
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 27
  • Pull requests: 26
  • Average time to close issues: 17 days
  • Average time to close pull requests: 5 days
  • Issue authors: 3
  • Pull request authors: 3
  • Average comments per issue: 0.37
  • Average comments per pull request: 0.88
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • MiraAhmedovic (19)
  • AnnikaTjuka (14)
  • LinguList (9)
  • SimonGreenhill (1)
Pull Request Authors
  • MiraAhmedovic (24)
  • arubehn (1)
  • johenglisch (1)
Top Labels
Issue Labels
new dataset (13) nlp (2) network (1) ProduSemy (1)
Pull Request Labels

Dependencies

examples/requirements.txt pypi
  • matplotlib *
  • numpy *
  • scipy *
datasets/Wikidata/raw/requirements.txt pypi
  • qwikidata *