FMAT

๐Ÿ˜ท The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language.

https://github.com/psychbruce/fmat

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • โ—‹
    CITATION.cff file
  • โœ“
    codemeta.json file
    Found codemeta.json file
  • โœ“
    .zenodo.json file
    Found .zenodo.json file
  • โœ“
    DOI references
    Found 14 DOI reference(s) in README
  • โœ“
    Academic publication links
    Links to: arxiv.org
  • โ—‹
    Academic email domains
  • โ—‹
    Institutional organization owner
  • โ—‹
    JOSS paper metadata
  • โ—‹
    Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary

Keywords

ai artificial-intelligence bert bert-model bert-models contextualized-representation fill-in-the-blank fill-mask huggingface language-model language-models large-language-models masked-language-models natural-language-processing natural-language-understanding nlp pretrained-models transformer transformers
Last synced: 6 months ago · JSON representation

Repository

๐Ÿ˜ท The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language.

Basic Info
Statistics
  • Stars: 15
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
ai artificial-intelligence bert bert-model bert-models contextualized-representation fill-in-the-blank fill-mask huggingface language-model language-models large-language-models masked-language-models natural-language-processing natural-language-understanding nlp pretrained-models transformer transformers
Created about 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License

README.md

FMAT

๐Ÿ˜ท The Fill-Mask Association Test (ๆŽฉ็ ๅกซ็ฉบ่”็ณปๆต‹้ชŒ).

The Fill-Mask Association Test (FMAT) is an integrative and probability-based method using [BERT Models] to measure conceptual associations (e.g., attitudes, biases, stereotypes, social norms, cultural values) as propositions in natural language (Bao, 2024, JPSP).

โš ๏ธ Please update this package to version โ‰ฅ 2025.4 for faster and more robust functionality.

CRAN-Version GitHub-Version R-CMD-check CRAN-Downloads GitHub-Stars

Author

Bruce H. W. S. Bao ๅŒ…ๅฏ’ๅด้œœ

๐Ÿ“ฌ baohws\@foxmail.com

๐Ÿ“‹ psychbruce.github.io

Citation

(1) FMAT Package

(2) FMAT Research Articles - Methodology

  • Bao, H. W. S. (2024). The Fill-Mask Association Test (FMAT): Measuring propositions in natural language. Journal of Personality and Social Psychology, 127(3), 537โ€“561. https://doi.org/10.1037/pspa0000396

(3) FMAT Research Articles - Application

  • Bao, H. W. S., & Gries, P. (2024). Intersectional raceโ€“gender stereotypes in natural language. British Journal of Social Psychology, 63(4), 1771โ€“1786. https://doi.org/10.1111/bjso.12748
  • Bao, H. W. S., & Gries, P. (2025). Biases about Chinese people in English language use: Stereotypes, prejudice and discrimination. China Quarterly. https://doi.org/10.1017/S0305741025100532
  • Wang, Z., Xia, H., Bao, H. W. S., Jing, Y., & Gu, R. (2025). Artificial intelligence is stereotypically linked more with socially dominant groups in natural language. Advanced Science. https://doi.org/10.1002/advs.202508623

Installation

The R package FMAT and three Python packages (transformers, torch, huggingface-hub) all need to be installed.

(1) R Package

``` r

Method 1: Install from CRAN

install.packages("FMAT")

Method 2: Install from GitHub

install.packages("devtools") devtools::install_github("psychbruce/FMAT", force=TRUE) ```

(2) Python Environment and Packages

Install Anaconda (a recommended package manager that automatically installs Python, its IDEs like Spyder, and a large list of common Python packages).

Specify the Anaconda's Python interpreter in RStudio.

RStudio โ†’ Tools โ†’ Global/Project Options\ โ†’ Python โ†’ Select โ†’ Conda Environments\ โ†’ Choose ".../Anaconda3/python.exe"

Install specific versions of Python packages "transformers", "torch", and "huggingface-hub".\ (RStudio Terminal / Anaconda Prompt / Windows Command)

For CPU users:

pip install transformers==4.40.2 torch==2.2.1 huggingface-hub==0.20.3

For GPU (CUDA) users:

pip install transformers==4.40.2 huggingface-hub==0.20.3 pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121

To use some models (e.g., microsoft/deberta-v3-base), "You need to have sentencepiece installed to convert a slow tokenizer to a fast one":

pip install sentencepiece

  • See [Guidance for GPU Acceleration] for installation guidance if you have an NVIDIA GPU device on your PC and want to use GPU to accelerate the pipeline.
  • According to the May 2024 releases, "transformers" โ‰ฅ 4.41 depends on "huggingface-hub" โ‰ฅ 0.23. The suggested versions of "transformers" (4.40.2) and "huggingface-hub" (0.20.3) ensure the console display of progress bars when downloading BERT models while keeping these packages as new as possible.
  • Proxy users may use the "global mode" (ๅ…จๅฑ€ๆจกๅผ) to download models.
  • If you find the error HTTPSConnectionPool(host='huggingface.co', port=443), please try to (1) reinstall Anaconda so that some unknown issues may be fixed, or (2) downgrade the "urllib3" package to version โ‰ค 1.25.11 (pip install urllib3==1.25.11) so that it will use HTTP proxies (rather than HTTPS proxies as in later versions) to connect to Hugging Face.

Guidance for FMAT

Step 1: Download BERT Models

Use BERT_download() to download [BERT models]. Model files are saved in your local cache folder "%USERPROFILE%/.cache/huggingface". A full list of BERT models are available at Hugging Face.

Use BERT_info() and BERT_vocab() to obtain detailed information of BERT models.

Step 2: Design FMAT Queries

Design queries that conceptually represent the constructs you would measure (see Bao, 2024, JPSP for how to design queries).

Use FMAT_query() and/or FMAT_query_bind() to prepare a data.table of queries.

Step 3: Run FMAT

Use FMAT_run() to get raw data (probability estimates) for further analysis.

Several steps of preprocessing have been included in the function for easier use (see FMAT_run() for details).

  • For BERT variants using <mask> rather than [MASK] as the mask token, the input query will be automatically modified so that users can always use [MASK] in query design.
  • For some BERT variants, special prefix characters such as \u0120 and \u2581 will be automatically added to match the whole words (rather than subwords) for [MASK].

Notes

  • Improvements are ongoing, especially for adaptation to more diverse (less popular) BERT models.
  • If you find bugs or have problems using the functions, please report them at GitHub Issues or send me an email.

Guidance for GPU Acceleration

By default, the FMAT package uses CPU to enable the functionality for all users. But for advanced users who want to accelerate the pipeline with GPU, the FMAT_run() function now supports using a GPU device, about 3x faster than CPU.

Test results (on the developer's computer, depending on BERT model size):

  • CPU (Intel 13th-Gen i7-1355U): 500~1000 queries/min
  • GPU (NVIDIA GeForce RTX 2050): 1500~3000 queries/min

Checklist:

  1. Ensure that you have an NVIDIA GPU device (e.g., GeForce RTX Series) and an NVIDIA GPU driver installed on your system.
  2. Install PyTorch (Python torch package) with CUDA support.
    • Find guidance for installation command at https://pytorch.org/get-started/locally/.
    • CUDA is available only on Windows and Linux, but not on MacOS.
    • If you have installed a version of torch without CUDA support, please first uninstall it (command: pip uninstall torch) and then install the suggested one.
    • You may also install the corresponding version of CUDA Toolkit (e.g., for the torch version supporting CUDA 12.1, the same version of CUDA Toolkit 12.1 may also be installed).

Example code for installing PyTorch with CUDA support:\ (RStudio Terminal / Anaconda Prompt / Windows Command)

pip install torch==2.2.1 --index-url https://download.pytorch.org/whl/cu121

BERT Models

The reliability and validity of the following 12 BERT models in the FMAT have been established in our research, but future work is needed to examine the performance of other models.

(model name on Hugging Face - model file size)

  1. bert-base-uncased (420 MB)
  2. bert-base-cased (416 MB)
  3. bert-large-uncased (1283 MB)
  4. bert-large-cased (1277 MB)
  5. distilbert-base-uncased (256 MB)
  6. distilbert-base-cased (251 MB)
  7. albert-base-v1 (45 MB)
  8. albert-base-v2 (45 MB)
  9. roberta-base (476 MB)
  10. distilroberta-base (316 MB)
  11. vinai/bertweet-base (517 MB)
  12. vinai/bertweet-large (1356 MB)

For details about BERT, see:

r library(FMAT) models = c( "bert-base-uncased", "bert-base-cased", "bert-large-uncased", "bert-large-cased", "distilbert-base-uncased", "distilbert-base-cased", "albert-base-v1", "albert-base-v2", "roberta-base", "distilroberta-base", "vinai/bertweet-base", "vinai/bertweet-large" ) BERT_download(models)

``` {style="height: 500px"} โ„น Device Info:

R Packages: FMAT 2024.5 reticulate 1.36.1

Python Packages: transformers 4.40.2 torch 2.2.1+cu121

NVIDIA GPU CUDA Support: CUDA Enabled: TRUE CUDA Version: 12.1 GPU (Device): NVIDIA GeForce RTX 2050

โ”€โ”€ Downloading model "bert-base-uncased" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 570/570 [00:00<00:00, 114kB/s] โ†’ (2) Downloading tokenizer... tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 48.0/48.0 [00:00<00:00, 23.9kB/s] vocab.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 232k/232k [00:00<00:00, 1.50MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 466k/466k [00:00<00:00, 1.98MB/s] โ†’ (3) Downloading model... model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 440M/440M [00:36<00:00, 12.1MB/s] โœ” Successfully downloaded model "bert-base-uncased"

โ”€โ”€ Downloading model "bert-base-cased" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 570/570 [00:00<00:00, 63.3kB/s] โ†’ (2) Downloading tokenizer... tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 49.0/49.0 [00:00<00:00, 8.66kB/s] vocab.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 213k/213k [00:00<00:00, 1.39MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 436k/436k [00:00<00:00, 10.1MB/s] โ†’ (3) Downloading model... model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 436M/436M [00:37<00:00, 11.6MB/s] โœ” Successfully downloaded model "bert-base-cased"

โ”€โ”€ Downloading model "bert-large-uncased" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 571/571 [00:00<00:00, 268kB/s] โ†’ (2) Downloading tokenizer... tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 48.0/48.0 [00:00<00:00, 12.0kB/s] vocab.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 232k/232k [00:00<00:00, 1.50MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 466k/466k [00:00<00:00, 1.99MB/s] โ†’ (3) Downloading model... model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.34G/1.34G [01:36<00:00, 14.0MB/s] โœ” Successfully downloaded model "bert-large-uncased"

โ”€โ”€ Downloading model "bert-large-cased" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 762/762 [00:00<00:00, 125kB/s] โ†’ (2) Downloading tokenizer... tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 49.0/49.0 [00:00<00:00, 12.3kB/s] vocab.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 213k/213k [00:00<00:00, 1.41MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 436k/436k [00:00<00:00, 5.39MB/s] โ†’ (3) Downloading model... model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.34G/1.34G [01:35<00:00, 14.0MB/s] โœ” Successfully downloaded model "bert-large-cased"

โ”€โ”€ Downloading model "distilbert-base-uncased" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 483/483 [00:00<00:00, 161kB/s] โ†’ (2) Downloading tokenizer... tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 48.0/48.0 [00:00<00:00, 9.46kB/s] vocab.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 232k/232k [00:00<00:00, 16.5MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 466k/466k [00:00<00:00, 14.8MB/s] โ†’ (3) Downloading model... model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 268M/268M [00:19<00:00, 13.5MB/s] โœ” Successfully downloaded model "distilbert-base-uncased"

โ”€โ”€ Downloading model "distilbert-base-cased" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 465/465 [00:00<00:00, 233kB/s] โ†’ (2) Downloading tokenizer... tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 49.0/49.0 [00:00<00:00, 9.80kB/s] vocab.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 213k/213k [00:00<00:00, 1.39MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 436k/436k [00:00<00:00, 8.70MB/s] โ†’ (3) Downloading model... model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 263M/263M [00:24<00:00, 10.9MB/s] โœ” Successfully downloaded model "distilbert-base-cased"

โ”€โ”€ Downloading model "albert-base-v1" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 684/684 [00:00<00:00, 137kB/s] โ†’ (2) Downloading tokenizer... tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 25.0/25.0 [00:00<00:00, 3.57kB/s] spiece.model: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 760k/760k [00:00<00:00, 4.93MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.31M/1.31M [00:00<00:00, 13.4MB/s] โ†’ (3) Downloading model... model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 47.4M/47.4M [00:03<00:00, 13.4MB/s] โœ” Successfully downloaded model "albert-base-v1"

โ”€โ”€ Downloading model "albert-base-v2" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 684/684 [00:00<00:00, 137kB/s] โ†’ (2) Downloading tokenizer... tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 25.0/25.0 [00:00<00:00, 4.17kB/s] spiece.model: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 760k/760k [00:00<00:00, 5.10MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.31M/1.31M [00:00<00:00, 6.93MB/s] โ†’ (3) Downloading model... model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 47.4M/47.4M [00:03<00:00, 13.8MB/s] โœ” Successfully downloaded model "albert-base-v2"

โ”€โ”€ Downloading model "roberta-base" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 481/481 [00:00<00:00, 80.3kB/s] โ†’ (2) Downloading tokenizer... tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 25.0/25.0 [00:00<00:00, 6.25kB/s] vocab.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 899k/899k [00:00<00:00, 2.72MB/s] merges.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 456k/456k [00:00<00:00, 8.22MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.36M/1.36M [00:00<00:00, 8.56MB/s] โ†’ (3) Downloading model... model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 499M/499M [00:38<00:00, 12.9MB/s] โœ” Successfully downloaded model "roberta-base"

โ”€โ”€ Downloading model "distilroberta-base" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 480/480 [00:00<00:00, 96.4kB/s] โ†’ (2) Downloading tokenizer... tokenizer_config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 25.0/25.0 [00:00<00:00, 12.0kB/s] vocab.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 899k/899k [00:00<00:00, 6.59MB/s] merges.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 456k/456k [00:00<00:00, 9.46MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.36M/1.36M [00:00<00:00, 11.5MB/s] โ†’ (3) Downloading model... model.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 331M/331M [00:25<00:00, 13.0MB/s] โœ” Successfully downloaded model "distilroberta-base"

โ”€โ”€ Downloading model "vinai/bertweet-base" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 558/558 [00:00<00:00, 187kB/s] โ†’ (2) Downloading tokenizer... vocab.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 843k/843k [00:00<00:00, 7.44MB/s] bpe.codes: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.08M/1.08M [00:00<00:00, 7.01MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2.91M/2.91M [00:00<00:00, 9.10MB/s] โ†’ (3) Downloading model... pytorch_model.bin: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 543M/543M [00:48<00:00, 11.1MB/s] โœ” Successfully downloaded model "vinai/bertweet-base"

โ”€โ”€ Downloading model "vinai/bertweet-large" โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ†’ (1) Downloading configuration... config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 614/614 [00:00<00:00, 120kB/s] โ†’ (2) Downloading tokenizer... vocab.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 899k/899k [00:00<00:00, 5.90MB/s] merges.txt: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 456k/456k [00:00<00:00, 7.30MB/s] tokenizer.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.36M/1.36M [00:00<00:00, 8.31MB/s] โ†’ (3) Downloading model... pytorch_model.bin: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.42G/1.42G [02:29<00:00, 9.53MB/s] โœ” Successfully downloaded model "vinai/bertweet-large"

โ”€โ”€ Downloaded models: โ”€โ”€

                       size

albert-base-v1 45 MB albert-base-v2 45 MB bert-base-cased 416 MB bert-base-uncased 420 MB bert-large-cased 1277 MB bert-large-uncased 1283 MB distilbert-base-cased 251 MB distilbert-base-uncased 256 MB distilroberta-base 316 MB roberta-base 476 MB vinai/bertweet-base 517 MB vinai/bertweet-large 1356 MB

โœ” Downloaded models saved at C:/Users/Bruce/.cache/huggingface/hub (6.52 GB) ```

r BERT_info(models)

model size vocab dims mask <fctr> <char> <int> <int> <char> 1: bert-base-uncased 420MB 30522 768 [MASK] 2: bert-base-cased 416MB 28996 768 [MASK] 3: bert-large-uncased 1283MB 30522 1024 [MASK] 4: bert-large-cased 1277MB 28996 1024 [MASK] 5: distilbert-base-uncased 256MB 30522 768 [MASK] 6: distilbert-base-cased 251MB 28996 768 [MASK] 7: albert-base-v1 45MB 30000 128 [MASK] 8: albert-base-v2 45MB 30000 128 [MASK] 9: roberta-base 476MB 50265 768 <mask> 10: distilroberta-base 316MB 50265 768 <mask> 11: vinai/bertweet-base 517MB 64001 768 <mask> 12: vinai/bertweet-large 1356MB 50265 1024 <mask>

(Tested 2024-05-16 on the developer's computer: HP Probook 450 G10 Notebook PC)

Related Packages

While the FMAT is an innovative method for the computational intelligent analysis of psychology and society, you may also seek for an integrative toolbox for other text-analytic methods. Another R package I developed---PsychWordVec---is useful and user-friendly for word embedding analysis (e.g., the Word Embedding Association Test, WEAT). Please refer to its documentation and feel free to use it.

Owner

  • Name: Bruce H.-W.-S. Bao
  • Login: psychbruce
  • Kind: user
  • Location: Shanghai, China
  • Company: East China Normal University

๐Ÿ”ฎ Computational Intelligent Social Psychology | Assistant Professor @ ECNU

GitHub Events

Total
  • Issues event: 1
  • Watch event: 4
  • Issue comment event: 8
  • Push event: 23
Last Year
  • Issues event: 1
  • Watch event: 4
  • Issue comment event: 8
  • Push event: 23

Packages

  • Total packages: 1
  • Total downloads:
    • cran 322 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 9
  • Total maintainers: 1
cran.r-project.org: FMAT

The Fill-Mask Association Test

  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 322 Last month
Rankings
Stargazers count: 28.3%
Forks count: 28.4%
Dependent packages count: 28.7%
Dependent repos count: 35.1%
Average: 41.8%
Downloads: 88.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • R >= 4.0.0 depends
  • PsychWordVec * imports
  • cli * imports
  • data.table * imports
  • forcats * imports
  • glue * imports
  • parallel * imports
  • plyr * imports
  • purrr * imports
  • reticulate * imports
  • stringr * imports
  • text * imports
  • bruceR * suggests
  • car * suggests
  • knitr * suggests
  • nlme * suggests
  • rmarkdown * suggests