conversationalign

An R package for analyzing linguistic alignment between partners in conversation transcripts

https://github.com/reilly-conceptscognitionlab/conversationalign

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: joss.theoj.org
✓
Committers with academic emails
1 of 3 committers (33.3%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.0%) to scientific vocabulary

Keywords

communication conversation dyadic-data language natural-language-processing psycholinguistics

Last synced: 6 months ago · JSON representation

Repository

An R package for analyzing linguistic alignment between partners in conversation transcripts

Basic Info

Host: GitHub
Owner: Reilly-ConceptsCognitionLab
License: lgpl-3.0
Language: R
Default Branch: main
Homepage: https://reilly-conceptscognitionlab.github.io/ConversationAlign/
Size: 94.5 MB

Statistics

Stars: 13
Watchers: 1
Forks: 2
Open Issues: 0
Releases: 2

Topics

communication conversation dyadic-data language natural-language-processing psycholinguistics

Created over 2 years ago · Last pushed 7 months ago

Metadata Files

Readme License

README.Rmd

---
output: github_document
---


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/",
  out.width = "100%",
  fig.align='left'
)
```







ConversationAlign
Open-source software for computing main effects and indices of alignment across coversation partners in dyadic conversation transcripts.

# 




```{r echo=FALSE, results='asis'}
# Generate badges with HTML wrapper for horizontal layout
cat('')

# Simplified GitHub Release badge (let Shields.io auto-detect version)
cat('[![GitHub release](https://img.shields.io/github/v/release/Reilly-ConceptsCognitionLab/ConversationAlign?color=blue&include_prereleases&label=Release)](https://github.com/Reilly-ConceptsCognitionLab/ConversationAlign/releases)')

#JOSS submission badge
cat('[![status](https://joss.theoj.org/papers/839ab720504a5c966b4b2893f78ec2b2/status.svg)](https://joss.theoj.org/papers/839ab720504a5c966b4b2893f78ec2b2)
')

# GitHub Stars badge
cat('
[![GitHub stars](https://img.shields.io/github/stars/Reilly-ConceptsCognitionLab/ConversationAlign?style=social)](https://github.com/Reilly-ConceptsCognitionLab/ConversationAlign/stargazers)
')

# Other badges
cat('
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/Reilly-ConceptsCognitionLab/ConversationAlign/graphs/commit-activity)
[![License: GPL v3+](https://img.shields.io/badge/License-GPL%20v3+-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![R-CMD-check](https://github.com/Reilly-ConceptsCognitionLab/ConversationAlign/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Reilly-ConceptsCognitionLab/ConversationAlign/actions/workflows/R-CMD-check.yaml)
')

cat('')  # Close flex container

```

  


# Overview
`ConversationAlign` analyzes alignment and computes main effects across more than 40 unique dimensions between interlocutors (conversation partners) engaged in two-person conversations.  ConversationAlign transforms raw language data into simultaneous time series objects across >40 possible dimensions via an embedded lookup database. There are a number of issues you should consider and steps you should take to prepare your data. 



# License
`ConversationAlign` is licensed under the [GNU LGPL v3.0](https://www.gnu.org/licenses/lgpl-3.0).



# Installation and Technical Considerations
One of the main features of the `ConversationAlign` algorithm involves yoking norms for many different lexical, affective, and semantic dimensions to each content word in your conversation transcripts of interest. We accomplish this by joining your data to several large lookup databases. These databases are too large to embed within `ConversationAlign`. When you load `ConversationAlign`, all of these databases should automatically download and load from an external companionn repository `ConversationAlign_Data`. `ConversationAlign` needs these data, so you will need a decent internet connection to load the package. It might take a second or two to complete the download if Github is acting up. Install the development version of ConversationAlign from [GitHub](https://github.com/) using the ``devtools`` package. 

```{r, message=FALSE, warning=F}
# Check if devtools is installed, if not install it
if (!require("devtools", quietly = TRUE)) {
  install.packages("devtools")
}

# Load devtools
library(devtools)

# Check if ConversationAlign is installed, if not install from GitHub
if (!require("ConversationAlign", quietly = TRUE)) {
  devtools::install_github("Reilly-ConceptsCognitionLab/ConversationAlign")
}

# Load SemanticDistance
library(ConversationAlign)
```


# Step 1: Read and Format Transcript Options
## `read_dyads()`  
- Reads transcripts from a local drive or directory of your choice. 
- Store each of your  individual conversation transcripts (`.csv`, `.txt`, `.ai`) that you wish to concatenate into a corpus in a folder. `ConversationAlign` will search for a folder called `my_transcripts` in the same directory as your script.  However, feel free to name your folder anything you like. You can specify a custom path as an argument to `read_dyads()`
- Each transcript must nominally contain two columns of data (Participant and Text). All other columns (e.g., meta-data) will be retained. 

### Arguments to `read_dyads`:  

- `my_path` default is 'my_transcripts', change path to your folder name

```{r, eval=F, message=F, warning=F}
#will search for folder 'my_transcripts' in your current directory
MyConvos <- read_dyads()

#will scan custom folder called 'MyStuff' in your current directory, concatenating all files in that folder into a single dataframe
MyConvos2 <- read_dyads(my_path='/MyStuff')
```



## `read_1file()`
- Read single transcript already in R environment. We will use read_1file() to prep the Marc Maron and Terry Gross transcript. Look at how the column headers have changed and the object name (MaronGross_2013) is now the Event_ID (a document identifier), 


### Arguments to `read_1file`:  

- `my_dat` object already in your R environment containing text and speaker information.
```{r, eval=T, message=F, warning=F}
MaryLittleLamb <- read_1file(MaronGross_2013)
#print first ten rows of header
knitr::kable(head(MaronGross_2013, 10), format = "pipe")
```





# Step 2: Clean, Format, Align Norms
## `prep_dyads()`
-Cleans, formats, and vectorizes conversation transwcripts to a one-word-per-row format
-Yokes psycholinguistic norms for up to three dimensions at a time (from <40 possible dimensions) to each content word.
-Retains metadata

### Arguments to `prep_dyads()`: 

- `dat_read` name of the dataframe created during `read_dyads()` 

- `omit_stops` T/F (default=T) option to remove stopwords
-  `lemmatize` T/F (default=T) lemmatize strings converting each entry to its dictionary form
-  `which_stoplist` quoted argument specifying stopword list to apply, options include `none`, `MIT_stops`, `SMART_stops`, `CA_OriginalStops`, or `Temple_stops25`. Default is `Temple_stops25`.
```{r, eval=F, message=F, warning=F}
NurseryRhymes_Prepped <- prep_dyads(dat_read=NurseryRhymes, lemmatize=TRUE, omit_stops=T, which_stoplist="Temple_stops25")
```

Example of a prepped dataset embedded as external data in the package with 'anger' values yoked to each word.
```{r}
knitr::kable(head(NurseryRhymes_Prepped, 10), format = "simple", digits=2)
```





# Step 3: Summarize Data, Alignment Stats 
## `summarize_dyads()`
This is the computational stage where the package generates a dataframe boiled down to two rows per converation with summary data appended to each level of Participant_ID. This returns the difference time series AUC (dAUC) for every variable of interest you specified and the correlation at lags -2,,0, 2. You decide whether you want a Pearson or Spearman lagged correlation. 


### Arguments to `summarize_dyads()`: 

- `df_prep` dataframe created by `prep_dyads()` function 
- `custom_lags` user specifies a custom set of turn-lags. Default is NULL with `ConversationAlign` producing correlations at a lead of 2 turns, immediate response, and lag of 2 turns for each dimension of interest. 

- `sumdat_only` default is TRUE, produces grouped summary dataframe with averages by conversation and participant for each alignment dimension, FALSE retrains all of the original rows, filling down empty rows of summary statistics for the conversation (e.g., AUC) 
- `corr_type` specifies correlation madel (parametric default = 'Pearson'); other option 'Spearman' for computing turn-by-turn correlations across interlocutors for each dimension of interest.
```{r, eval=T, warning=F, message=F, options(digits = 3)}
MarySumDat <- summarize_dyads(df_prep = NurseryRhymes_Prepped, custom_lags=NULL, sumdat_only = TRUE, corr_type='Pearson') 
colnames(MarySumDat)
knitr::kable(head(MarySumDat, 10), format = "simple", digits = 3)
```

# Optional: Generate corpus analytics 
## `corpus_analytics()`
It is often critical to produce descriptives/summary statistics to characterize your language sample. This is typically a laborious process. ``corpus_analytics`` will do it for you, generating a near publication ready table of analytics that you can easily export to the specific journal format of your choice using any number of packages such as `flextable` or `tinytable`.

### Arguments to `corpus_analytics()`: 
- `dat_prep` dataframe created by ``prep_dyads()``function 

```{r, eval=T, warning=F, message=F}
NurseryRhymes_Analytics <-  corpus_analytics(dat_prep=NurseryRhymes_Prepped)
knitr::kable(head(NurseryRhymes_Analytics, 10), format = "simple", digits = 2)
```





# News and Getting Help
For bugs, feature requests, and general questions, reach out via one of the following options: 


  Bugs/Features:
[Open an Issue](https://github.com/Reilly-ConceptsCognitionLab/ConversationAlign/issues)
  Questions:
[Join Discussion](https://github.com/Reilly-ConceptsCognitionLab/SemanticDistance/discussions)
  Urgent:
Email jamie.reilly@temple.edu

Owner

Name: Concepts Cognition Lab (ReillyLab) @ Temple
Login: Reilly-ConceptsCognitionLab
Kind: organization
Email: reillyj@temple.edu

Website: www.reilly-coglab.com
Repositories: 2
Profile: https://github.com/Reilly-ConceptsCognitionLab

GitHub Events

Total

Release event: 2
Watch event: 5
Delete event: 1
Push event: 138
Fork event: 1
Create event: 6

Last Year

Release event: 2
Watch event: 5
Delete event: 1
Push event: 138
Fork event: 1
Create event: 6

Committers

Last synced: 10 months ago

All Time

Total Commits: 207
Total Committers: 3
Avg Commits per committer: 69.0
Development Distribution Score (DDS): 0.425

Past Year

Commits: 42
Committers: 1
Avg Commits per committer: 42.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Ben-Sacks	t**4@t**u	119
reilly-lab	j**y@g**m	64
ginny-ulichneyl	v**y@m**m	24

Committer Domains (Top 20 + Academic)

me.com: 1 temple.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 376 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

cran.r-project.org: ConversationAlign

Process Text and Compute Linguistic Alignment in Conversation Transcripts

Homepage: https://github.com/Reilly-ConceptsCognitionLab/ConversationAlign
Documentation: http://cran.r-project.org/web/packages/ConversationAlign/ConversationAlign.pdf
License: LGPL (≥ 3)
Latest release: 0.3.2
published 7 months ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 376 Last month

Rankings

Dependent packages count: 25.8%

Dependent repos count: 31.7%

Average: 47.7%

Downloads: 85.6%

Maintainers (1)

jamie_reilly@temple.edu

Last synced: 6 months ago

Dependencies

.github/workflows/pkgdown.yaml actions

JamesIves/github-pages-deploy-action v4.4.1 composite
actions/checkout v3 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/static.yml actions

actions/checkout v3 composite
actions/configure-pages v3 composite
actions/deploy-pages v2 composite
actions/upload-pages-artifact v2 composite

DESCRIPTION cran

R >= 2.10 depends
dplyr >= 0.4.3 depends
here * depends
knitr * depends
magrittr * depends
stringi * depends
stringr * depends
textclean * depends
textstem * depends
tidytable * depends
tidyverse * depends
tm * depends
knitr * suggests
rmarkdown * suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

conversationalign

Science Score: 46.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

ConversationAlign

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: ConversationAlign

Rankings

Maintainers (1)

Dependencies