https://github.com/ben-aaron188/r_helper_functions
... some helper functions for text preprocessing and statistical analysis in R
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
... some helper functions for text preprocessing and statistical analysis in R
Basic Info
Statistics
- Stars: 2
- Watchers: 1
- Forks: 5
- Open Issues: 0
- Releases: 0
Created almost 9 years ago
· Last pushed about 5 years ago
Metadata Files
Readme
README.md
rhelperfunctions
These functions might be useful to others who work on comp. linguistics problems with R.
Current functions (9 Mar. 2018)
Computational linguistics functions
- process texts in folder to r dataframe txtdffrom_dir.R
- includes recursive file retrieval
- extract parts-of-speech frequencies and named entities in R with the pyhton-to-R bridge for SpaCy https://github.com/kbenoit/spacyr
- get vectorized readability indices (using Tyler Rinker's readability package)
- calculate the Linguistic Category Model (LCM) as proposed by Seih et al. (2017)
- calculate a linguistic concreteness score per text
- this function uses the 40k+ human annotation by Brysbaert et al. (2014)
- calculate_concreteness.R
- extract the narrative structure of texts
- this is based on the syuzhet package and the sentimentr packages
- includes a minimal version for faster processing of massive text datasets
- allows for multidimensional narrative structure modelling (currently sentiment and concreteness)
- Note: the function below is built to deal with non-punctuated data and performs token-based sentiment extraction
- if you have data with sentence boundaries, use the
get_narrative_dim_min(...)function - getnarrativedim.R
Effect size calculations
some misc function(s) that need refinement
- coerced multiclass classification in supervised machine learning with variable thresholds: cmc.R
Let me know if there are any bugs.
Owner
- Name: BKleinberg
- Login: ben-aaron188
- Kind: user
- Website: https://bkleinberg.net/
- Repositories: 18
- Profile: https://github.com/ben-aaron188