Science Score: 46.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, pubmed.ncbi, ncbi.nlm.nih.gov -
✓Committers with academic emails
3 of 21 committers (14.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (20.1%) to scientific vocabulary
Keywords
deep-learning
machine-learning
nlp
transformers
Last synced: 6 months ago
·
JSON representation
Repository
Using Transformers from HuggingFace in R
Basic Info
- Host: GitHub
- Owner: OscarKjell
- Language: R
- Default Branch: master
- Homepage: https://r-text.org
- Size: 37.8 MB
Statistics
- Stars: 153
- Watchers: 9
- Forks: 31
- Open Issues: 7
- Releases: 0
Topics
deep-learning
machine-learning
nlp
transformers
Created about 6 years ago
· Last pushed 6 months ago
Metadata Files
Readme
Changelog
README.Rmd
---
output: github_document #rmarkdown::html_vignette # #rmarkdown::html_vignette
---
```{r}
#| echo: false
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
# text
[](https://CRAN.R-project.org/package=text)
[](https://github.com/oscarkjell/text/actions)
[](https://www.repostatus.org/#active)
[](https://lifecycle.r-lib.org/articles/stages.html#maturing-1)
[](https://CRAN.R-project.org/package=text)
[](https://app.codecov.io/gh/oscarkjell/text)
## Overview
An R-package for analyzing natural language with transformers-based large language models. The `text` package is part of the *R Language Analysis Suite*, including `talk`, `text` and `topics`.
+ [`talk`](https://www.r-talk.org/) transforms voice recordings into text, audio features, or embeddings.
+ [`text`](https://www.r-text.org/) provides many language tasks such as converting digital text into word embeddings.
`talk` and `text` offer access to Large Language Models from Hugging Face.
+ [`topics`](https://www.r-topics.org/) visualizes language patterns into topics to generate psychological insights.
{width=50%}
The *R Language Analysis Suite* is created through a collaboration between psychology and computer science to address research needs and ensure state-of-the-art techniques. The suite is continuously tested on Ubuntu, Mac OS and Windows using the latest stable R version.
The *text*-package has two main objectives:
* First, to serve R-users as a *point solution* for transforming text to state-of-the-art word embeddings that are ready to be used for downstream tasks. The package provides a user-friendly link to language models based on transformers from [Hugging Face](https://huggingface.co/).
* Second, to serve as an *end-to-end solution* that provides state-of-the-art AI techniques tailored for social and behavioral scientists.
Please reference our tutorial article when using the `text` package: [The text-package: An R-package for Analyzing and Visualizing Human Language Using Natural Language Processing and Deep Learning](https://pubmed.ncbi.nlm.nih.gov/37126041/).
### Short installation guide
Most users simply need to run below installation code.
For those experiencing problems or want more alternatives, please see the [Extended Installation Guide](https://www.r-text.org/articles/ext_install_guide.html).
For the text-package to work, you first have to install the text-package in R, and then make it work with text required python packages.
1. Install text-version (at the moment the second step only works using the development version of text from GitHub).
[GitHub](https://github.com/) development version:
``` r
# install.packages("devtools")
devtools::install_github("oscarkjell/text")
```
[CRAN](https://CRAN.R-project.org/package=text) version:
``` r
install.packages("text")
```
2. Install and initialize text required python packages:
``` r
library(text)
# Install text required python packages in a conda environment (with defaults).
textrpp_install()
# Initialize the installed conda environment.
# save_profile = TRUE saves the settings so that you don't have to run textrpp_initialize() after restarting R.
textrpp_initialize(save_profile = TRUE)
```
### Point solution for transforming text to embeddings
Recent significant advances in NLP research have resulted in improved representations of human language (i.e., language models). These language models have produced big performance gains in tasks related to understanding human language. Text are making these SOTA models easily accessible through an interface to [HuggingFace](https://huggingface.co/docs/transformers/index) in Python.
*Text* provides many of the contemporary state-of-the-art language models that are based on deep learning to model word order and context. Multilingual language models can also represent several languages; multilingual BERT comprises *104 different languages*.
*Table 1. Some of the available language models*
``` {r HuggingFface_tabble_short, echo=FALSE, results='asis'}
library(magrittr)
Models <- c("'bert-base-uncased'",
"'roberta-base'",
"'distilbert-base-cased'",
"'bert-base-multilingual-cased'",
"'xlm-roberta-large'"
)
References <- c("[Devlin et al. 2019](https://aclanthology.org/N19-1423/)",
"[Liu et al. 2019](https://arxiv.org/abs/1907.11692)",
"[Sahn et al., 2019](https://arxiv.org/abs/1910.01108)",
"[Devlin et al. 2019](https://aclanthology.org/N19-1423/)",
"[Liu et al](https://arxiv.org/pdf/1907.11692)"
)
Layers <- c("12",
"12",
"6",
"12",
"24")
Language <- c("English",
"English",
"English",
"[104 top languages at Wikipedia](https://meta.wikimedia.org/wiki/List_of_Wikipedias)",
"[100 language](https://huggingface.co/docs/transformers/multilingual)")
Dimensions <- c("768",
"768",
"768",
"768",
"1024")
Tables_short <- tibble::tibble(Models, References, Layers, Dimensions, Language)
knitr::kable(Tables_short, caption="", bootstrap_options = c("hover"), full_width = T)
```
See [HuggingFace](https://huggingface.co/models/) for a more comprehensive list of models.
The ```textEmbed()``` function is the main embedding function in text; and can output contextualized embeddings for tokens (i.e., the embeddings for each single word instance of each text) and texts (i.e., single embeddings per text taken from aggregating all token embeddings of the text).
```{r short_word_embedding_example, eval = FALSE, warning=FALSE, message=FALSE}
library(text)
# Transform the text data to BERT word embeddings
# Example text
texts <- c("I feel great!")
# Defaults
embeddings <- textEmbed(texts)
embeddings
```
See [Get Started](https://www.r-text.org/articles/text.html) for more information.
### Language Analysis Tasks
It is also possible to access many language analysis tasks such as textClassify(), textGeneration(), and textTranslate().
```{r language_analysis_task_examples, eval = FALSE, warning=FALSE, message=FALSE}
library(text)
# Generate text from the prompt "I am happy to"
generated_text <- textGeneration("I am happy to",
model = "gpt2")
generated_text
```
For a full list of language analysis tasks supported in text see the [References](https://www.r-text.org/reference/index.html)
### An end-to-end package
*Text* also provides functions to analyse the word embeddings with well-tested machine learning algorithms and statistics. The focus is to analyze and visualize text, and their relation to other text or numerical variables. For example, the `textTrain()` function is used to examine how well the word embeddings from a text can predict a numeric or categorical variable. Another example is functions plotting statistically significant words in the word embedding space.
```{r DPP_plot, message=FALSE, warning=FALSE}
library(text)
# Use data (DP_projections_HILS_SWLS_100) that have been pre-processed with the textProjectionData function; the preprocessed test-data included in the package is called: DP_projections_HILS_SWLS_100
plot_projection <- textProjectionPlot(
word_data = DP_projections_HILS_SWLS_100,
y_axes = TRUE,
title_top = " Supervised Bicentroid Projection of Harmony in life words",
x_axes_label = "Low vs. High HILS score",
y_axes_label = "Low vs. High SWLS score",
position_jitter_hight = 0.5,
position_jitter_width = 0.8
)
plot_projection$final_plot
```
### Featured Bluesky Post
```{r, echo = FALSE, results = 'asis'}
cat('
Version 1.3 of the #r-text package is now available from #CRAN.
This new version makes it easier to apply pre-trained language assessments from the #LBAM-library (r-text.org/articles/LBA...).
#mlsky #PsychSciSky #Statistics #PsychSciSky #StatsSky #NLP
[image or embed]
— Oscar Kjell (@oscarkjell.bsky.social) Dec 22, 2024 at 9:48
')
```
Owner
- Name: Oscar Kjell
- Login: OscarKjell
- Kind: user
- Location: Sweden
- Website: https://oscarkjell.se
- Twitter: OscarKjell
- Repositories: 1
- Profile: https://github.com/OscarKjell
GitHub Events
Total
- Issues event: 24
- Watch event: 21
- Delete event: 9
- Issue comment event: 36
- Push event: 236
- Pull request review comment event: 3
- Pull request review event: 4
- Pull request event: 23
- Fork event: 1
- Create event: 11
Last Year
- Issues event: 24
- Watch event: 21
- Delete event: 9
- Issue comment event: 36
- Push event: 236
- Pull request review comment event: 3
- Pull request review event: 4
- Pull request event: 23
- Fork event: 1
- Create event: 11
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Oscar Kjell | o****l@h****m | 896 |
| Oscar Kjell | o****l@O****l | 194 |
| moomoofarm1 | 4****1 | 149 |
| CarlViggo | c****o@i****m | 77 |
| Oscar Kjell | o****l@O****l | 45 |
| Salvatore Giorgi | s****i@g****m | 19 |
| Salvatore Giorgi | s****i@g****m | 19 |
| LeonAckermann | l****n@g****m | 10 |
| Adithya V Ganesan | v****n@g****m | 9 |
| Mingcen Wei (sAy) | 4****i | 4 |
| Matt Cowgill | m****l@g****m | 3 |
| andy | h****s@c****u | 3 |
| AugustNilsson | 6****n | 3 |
| Daniel Hamngren | d****l@w****m | 2 |
| Humbert Costas | h****s@g****m | 2 |
| Andrej Pawluczenko | a****o@g****m | 2 |
| George Ostrouchov | g****c@u****u | 1 |
| Teun van den Brand | t****d@g****m | 1 |
| oskarbang | 7****g | 1 |
| Dustin Stoltz | 6****z | 1 |
| Vasudha Varadarajan | v****n@c****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 71
- Total pull requests: 160
- Average time to close issues: 4 months
- Average time to close pull requests: 2 days
- Total issue authors: 46
- Total pull request authors: 21
- Average comments per issue: 2.83
- Average comments per pull request: 0.11
- Merged pull requests: 127
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 12
- Pull requests: 27
- Average time to close issues: 21 days
- Average time to close pull requests: about 14 hours
- Issue authors: 11
- Pull request authors: 6
- Average comments per issue: 1.92
- Average comments per pull request: 0.07
- Merged pull requests: 20
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- moomoofarm1 (6)
- LuigiC72 (5)
- tressoldi (4)
- sebsilas (4)
- scm1210 (3)
- lilchow (3)
- massimoaria (2)
- lingdoc (2)
- NewUser36 (2)
- adamramey (2)
- cenotechnology (1)
- nelsonlrdsantos (1)
- MattCowgill (1)
- maria-pro (1)
- promothesh (1)
Pull Request Authors
- CarlViggo (86)
- OscarKjell (32)
- moomoofarm1 (27)
- adithya8 (19)
- LeonAckermann (19)
- soni-n (13)
- Marwolaeth (3)
- dustinstoltz (2)
- MattCowgill (2)
- AugustNilsson (2)
- teunbrand (2)
- mingcenwei (2)
- vasevarad (2)
- sjgiorgi (2)
- michaelgrund (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (1)
Packages
- Total packages: 3
-
Total downloads:
- cran 1,893 last-month
-
Total dependent packages: 2
(may contain duplicates) -
Total dependent repositories: 1
(may contain duplicates) - Total versions: 36
- Total maintainers: 1
proxy.golang.org: github.com/oscarkjell/text
- Documentation: https://pkg.go.dev/github.com/oscarkjell/text#section-documentation
-
Latest release: v1.4.0
published 11 months ago
Rankings
Dependent packages count: 5.5%
Average: 5.6%
Dependent repos count: 5.8%
Last synced:
6 months ago
proxy.golang.org: github.com/OscarKjell/text
- Documentation: https://pkg.go.dev/github.com/OscarKjell/text#section-documentation
-
Latest release: v1.4.0
published 11 months ago
Rankings
Dependent packages count: 5.5%
Average: 5.6%
Dependent repos count: 5.8%
Last synced:
6 months ago
cran.r-project.org: text
Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
- Homepage: https://r-text.org/
- Documentation: http://cran.r-project.org/web/packages/text/text.pdf
- License: GPL-3
-
Latest release: 1.7.0
published 6 months ago
Rankings
Stargazers count: 3.6%
Forks count: 3.8%
Average: 11.2%
Dependent packages count: 13.6%
Dependent repos count: 23.8%
Maintainers (1)
Last synced:
6 months ago
Dependencies
.github/workflows/System specific installation WithPy.yaml
actions
- actions/cache v1 composite
- actions/checkout v2 composite
- goanpeca/setup-miniconda v1 composite
- r-lib/actions/setup-pandoc v2-branch composite
- r-lib/actions/setup-r v2-branch composite
.github/workflows/Virtual-Environment-Test.yaml
actions
- actions/cache v1 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- r-lib/actions/setup-pandoc v2-branch composite
- r-lib/actions/setup-r v2-branch composite
.github/workflows/dont run/not now in use/New.yaml
actions
- actions/checkout v2 composite
- goanpeca/setup-miniconda v1 composite
- r-lib/actions/setup-pandoc master composite
- r-lib/actions/setup-r master composite
.github/workflows/dont run/not now in use/System specific installation NoPy.yaml
actions
- actions/cache v1 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc master composite
- r-lib/actions/setup-r master composite
.github/workflows/test-coverage-RCMD.yaml
actions
- actions/cache v1 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- goanpeca/setup-miniconda v1 composite
- r-lib/actions/setup-pandoc v2-branch composite
- r-lib/actions/setup-r v2-branch composite
.github/workflows/test-coverage.yaml
actions
- actions/cache v1 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- goanpeca/setup-miniconda v1 composite
- r-lib/actions/setup-pandoc v2-branch composite
- r-lib/actions/setup-r v2-branch composite
DESCRIPTION
cran
- R >= 4.00 depends
- cowplot * imports
- dplyr * imports
- furrr * imports
- future * imports
- ggplot2 * imports
- ggrepel * imports
- magrittr * imports
- overlapping * imports
- parsnip * imports
- purrr * imports
- recipes * imports
- reticulate * imports
- rlang * imports
- rsample * imports
- stringi * imports
- tibble * imports
- tidyr * imports
- tune * imports
- workflows * imports
- yardstick * imports
- covr * suggests
- glmnet * suggests
- knitr * suggests
- randomForest * suggests
- ranger * suggests
- rio * suggests
- rmarkdown * suggests
- testthat * suggests
- utils * suggests
- xml2 * suggests