align
Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 12 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
3 of 12 committers (25.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.
Basic Info
Statistics
- Stars: 52
- Watchers: 3
- Forks: 16
- Open Issues: 18
- Releases: 8
Topics
Metadata Files
README.md
ALIGN, a computational tool for multi-level language analysis (optimized for Python 3.10)
align is a Python library for extracting quantitative, reproducible
metrics of multi-level alignment between two speakers in naturalistic
language corpora. The method was introduced in "ALIGN: Analyzing
Linguistic Interactions with Generalizable techNiques" (Duran, Paxton, &
Fusaroli, 2019; Psychological Methods).
Examples of papers relying on the ALIGN library:
- Duran, N. D., Paige, A., & D'Mello, S. K. (2024). Multi‐Level Linguistic Alignment in a Dynamic Collaborative Problem‐Solving Task. Cognitive Science, 48(1), e13398. https://doi.org/10.1111/cogs.13398
- Dideriksen, C., Christiansen, M. H., Tylén, K., Dingemanse, M., & Fusaroli, R. (2023). Quantifying the interplay of conversational devices in building mutual understanding. Journal of Experimental Psychology: General, 152(3), 864. Pre-print: https://doi.org/10.31234/osf.io/a5r74
- Dideriksen, C., Christiansen, M. H., Dingemanse, M., Højmark‐Bertelsen, M., Johansson, C., Tylén, K., & Fusaroli, R. (2023). Language‐Specific Constraints on Conversation: Evidence from Danish and Norwegian. Cognitive Science, 47(11), e13387. Pre-print: https://doi.org/10.31234/osf.io/t3s6c.
- Fusaroli, R., Weed, E., Rocca, R., Fein, D., & Naigles, L. (2023). Caregiver linguistic alignment to autistic and typically developing children: A natural language processing approach illuminates the interactive components of language development. Cognition, 236, 105422. Pre-print: https://doi.org/10.31234/osf.io/ysjec
- Fusaroli, R., Weed, E., Rocca, R., Fein, D., & Naigles, L. (2023). Repeat After Me? Both Children With and Without Autism Commonly Align Their Language With That of Their Caregivers. Cognitive Science, 47(11), e13369. DOI: 10.31234/osf.io/m8fhk.
- Tylén, K., Fusaroli, R., Østergaard, S. M., Smith, P., & Arnoldi, J. (2023). The Social Route to Abstraction: Interaction and Diversity Enhance Performance and Transfer in a Rule‐Based Categorization Task. Cognitive Science, 47(9), e13338.
- Trujillo, J. P., Dideriksen, C., Tylén, K., Christiansen, M. H., & Fusaroli, R. (2023). The dynamic interplay of kinetic and linguistic coordination in Danish and Norwegian conversation. Cognitive Science, 47(6), e13298.
Installation
align may be downloaded directly using pip.
To download the stable version released on PyPI:
pip install align
Or to update:
pip install align --upgrade
And it's always good practice to install a package like
align, which has several dependencies (seerequirements.txt), in a virtual environment.Anaconda users: The above should work in the vast majority of cases. However, if you prefer an easy way to install
alignwithin a virtual environment in one go, or you are experiencing problems with trying to updatealign, a YAML file has been provided to streamline things. Just follow these simple steps:
- Download the
environment.ymlfile and navigate to the folder where it has been downloaded- Run the following command in Terminal:
conda env create -f environment.yml- Be sure to activate the new enviroment (i.e.,
conda activate align0.1.1) before running anyalignanalyses (such as the tutorials; see below)
If you experience any problems, please put them in the "Issues" section of this repository.
Quick documentation
ALIGN consists of two primary modules for conducting analyses, prepare_transcripts and calculate_alignment. To get a quick glance of the functions contained within each module, please check out the following:
prepare_transcripts: https://nickduran.github.io/align-linguistic-alignment/prepare_transcripts.htmlcalculate_alignment: https://nickduran.github.io/align-linguistic-alignment/calculate_alignment.html
Additional tools required for some align options
The Google News pre-trained word2vec vectors (GoogleNews-vectors-negative300.bin)
and the Stanford part-of-speech tagger (stanford-postagger-full-2020-11-17)
are required for some optional align parameters but must be downloaded
separately. Please see the tutorials for more information.
Google News: https://code.google.com/archive/p/word2vec/ (page) or https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing (direct download)
Stanford POS tagger: https://nlp.stanford.edu/software/tagger.shtml#Download (page) or https://nlp.stanford.edu/software/stanford-tagger-4.2.0.zip (direct download)
Tutorials
We created Jupyter Notebook tutorials to provide an easily accessible
step-by-step walkthrough on how to use align. Below are descriptions of the
current tutorials that can be found in the examples directory within this
repository. If unfamiliar with Jupyter Notebooks, instructions for installing
and running can be found here: http://jupyter.org/install. We recommend installing
Jupyter using Anaconda. Anaconda is a widely-used Python data science platform
that helps streamline workflows.
Jupyter Notebook 1: CHILDES
- This tutorial walks users through an analysis of conversations from a single English corpus from the CHILDES database (MacWhinney, 2000)---specifically, Kuczaj’s Abe corpus (Kuczaj, 1976). We analyze the last 20 conversations in the corpus in order to explore how ALIGN can be used to track multi-level linguistic alignment between a parent and child over time, which may be of interest to developmental language researchers. Specifically, we explore how alignment between a parent and a child changes over a brief span of developmental trajectory.
Jupyter Notebook 2: Devil's Advocate
- This tutorial walks users throught the analysis reported in (Duran, Paxton, & Fusaroli, 2019). The corpus consists of 94 written transcripts of conversations, lasting eight minutes each, collected from an experimental study of truthful and deceptive communication. The goal of the study was to examine interpersonal linguistic alignment between dyads across two conversations where participants either agreed or disagreed with each other (as a randomly assigned between-dyads condition) and where one of the conversations involved the truth and the other deception (as a within-subjects condition).
We are in the process of adding more tutorials and would welcome additional tutorials by interested contributors.
Attribution
If you find the package useful, please cite our manuscript:
Duran, N., Paxton, A., & Fusaroli, R. (2019). ALIGN: Analyzing Linguistic Interactions with Generalizable techNiques. Psychological Methods. http://dynamicog.org/papers/
Licensing of example data
CHILDES
- Example corpus "Kuczaj Corpus" by Stan Kuczaj is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License (https://childes.talkbank.org/access/Eng-NA/Kuczaj.html):
Kuczaj, S. (1977). The acquisition of regular and irregular past tense forms. Journal of Verbal Learning and Verbal Behavior, 16, 589–600.
Devil's Advocate
- The complete de-identified dataset of raw conversational transcripts is hosted on a secure protected-access repository provided by the Inter-university Consortium for Political and Social Research (ICPSR). Please click on the link to access: http://dx.doi.org/10.3886/ICPSR37124.v1. Due to the requirements of our IRB, please note that users interested in obtaining these data must complete a Restricted Data Use Agreement, specify the reason for the request, and obtain IRB approval or notice of exemption for their research.
Duran, Nicholas, Alexandra Paxton, and Riccardo Fusaroli. Conversational Transcripts of Truthful and Deceptive Speech Involving Controversial Topics, Central California, 2012. ICPSR37124-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2018-08-29.
Owner
- Name: Nicholas Duran
- Login: nickduran
- Kind: user
- Location: Glendale, AZ
- Company: Arizona State University
- Website: dynamicog.org
- Repositories: 5
- Profile: https://github.com/nickduran
Nicholas Duran is an associate professor in the Social and Behavioral Sciences division of the New College of Interdisciplinary Arts and Sciences at ASU
GitHub Events
Total
- Issues event: 2
- Watch event: 12
- Issue comment event: 2
- Member event: 1
- Push event: 9
- Pull request event: 2
- Fork event: 2
- Create event: 1
Last Year
- Issues event: 2
- Watch event: 12
- Issue comment event: 2
- Member event: 1
- Push event: 9
- Pull request event: 2
- Fork event: 2
- Create event: 1
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Alexandra Paxton | p****a@g****m | 171 |
| nickduran | n****1@g****m | 132 |
| Nick Duran | n****4@a****u | 60 |
| Nicholas Duran | n****n@a****u | 44 |
| Nick Duran | n****n@n****u | 13 |
| Alexandra Paxton | a****n | 10 |
| yuvipanda | y****a@g****m | 8 |
| Nick Duran | n****n@N****l | 3 |
| Saul Kohn | s****n@S****l | 2 |
| Ludvig Renbo Olsen | m****l@l****k | 1 |
| dependabot[bot] | 4****] | 1 |
| Riccardo Fusaroli | f****i@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 23
- Total pull requests: 40
- Average time to close issues: 3 months
- Average time to close pull requests: 20 days
- Total issue authors: 9
- Total pull request authors: 6
- Average comments per issue: 0.91
- Average comments per pull request: 0.63
- Merged pull requests: 33
- Bot issues: 0
- Bot pull requests: 6
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- fusaroli (9)
- nickduran (4)
- AdrianaChieng (3)
- a-paxton (2)
- douggetty (1)
- jseale-asapp (1)
- pavelgold (1)
- akhilraheja (1)
- katrinsgr (1)
- LudvigOlsen (1)
Pull Request Authors
- nickduran (24)
- a-paxton (7)
- dependabot[bot] (6)
- yuvipanda (2)
- LudvigOlsen (1)
- SaulAryehKohn (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 562 last-month
- Total dependent packages: 0
- Total dependent repositories: 8
- Total versions: 10
- Total maintainers: 2
pypi.org: align
Analyzing Linguistic Interaction with Generalizable techNiques. Read the latest ALIGN tutorials.
- Homepage: https://github.com/nickduran/align-linguistic-alignment
- Documentation: https://align.readthedocs.io/
- License: LICENSE
-
Latest release: 0.1.1
published over 3 years ago
Rankings
Dependencies
- alabaster =0.7.12=pyhd3eb1b0_0
- align =0.1.0=dev_0
- appnope =0.1.2=py310hecd8cb5_1001
- asttokens =2.0.5=pyhd3eb1b0_0
- babel =2.9.1=pyhd3eb1b0_0
- backcall =0.2.0=pyhd3eb1b0_0
- blas =1.0=mkl
- bleach =4.1.0=pyhd3eb1b0_0
- bottleneck =1.3.4=py310h4e76f89_0
- brotlipy =0.7.0=py310hca72f7f_1002
- build =0.7.0=pyhd8ed1ab_0
- bzip2 =1.0.8=h1de35cc_0
- ca-certificates =2022.6.15=h033912b_0
- certifi =2022.6.15=py310h2ec42d9_0
- cffi =1.15.0=py310hc55c11b_1
- charset-normalizer =2.0.4=pyhd3eb1b0_0
- check-manifest =0.48=pyhd8ed1ab_0
- click =8.0.4=py310hecd8cb5_0
- cmarkgfm =0.8.0=py310h1961e1f_1
- colorama =0.4.4=pyhd3eb1b0_0
- commonmark =0.9.1=pyhd3eb1b0_0
- cryptography =37.0.1=py310hf6deb26_0
- cython =0.29.28=py310he9d5cce_0
- dataclasses =0.8=pyh6d0b6a4_7
- decorator =5.1.1=pyhd3eb1b0_0
- docutils =0.17.1=pypi_0
- executing =0.8.3=pyhd3eb1b0_0
- future =0.18.2=py310hecd8cb5_1
- gensim =4.1.2=py310he9d5cce_0
- idna =3.3=pyhd3eb1b0_0
- imagesize =1.3.0=pyhd3eb1b0_0
- importlib-metadata =4.11.3=py310hecd8cb5_0
- importlib_metadata =4.11.3=hd3eb1b0_0
- intel-openmp =2021.4.0=hecd8cb5_3538
- ipython =8.3.0=py310hecd8cb5_0
- jedi =0.18.1=py310hecd8cb5_1
- jinja2 =3.0.3=pyhd3eb1b0_0
- joblib =1.1.0=pyhd3eb1b0_0
- keyring =23.4.0=py310hecd8cb5_0
- libcxx =12.0.0=h2f01273_0
- libffi =3.3=hb1e8313_2
- libgfortran =3.0.1=h93005f0_2
- markupsafe =2.1.1=py310hca72f7f_0
- matplotlib-inline =0.1.2=pyhd3eb1b0_2
- mkl =2021.4.0=hecd8cb5_637
- mkl-service =2.4.0=py310hca72f7f_0
- mkl_fft =1.3.1=py310hf879493_0
- mkl_random =1.2.2=py310hc081a56_0
- ncurses =6.3=hca72f7f_2
- nltk =3.7=pyhd3eb1b0_0
- numexpr =2.8.1=py310hdcd3fac_2
- numpy =1.22.3=py310hdcd3fac_0
- numpy-base =1.22.3=py310hfd2de13_0
- openssl =1.1.1p=hfe4f2af_0
- packaging =21.3=pyhd3eb1b0_0
- pandas =1.4.2=py310he9d5cce_0
- parso =0.8.3=pyhd3eb1b0_0
- pep517 =0.12.0=py310hecd8cb5_0
- pexpect =4.8.0=pyhd3eb1b0_3
- pickleshare =0.7.5=pyhd3eb1b0_1003
- pip =21.2.4=py310hecd8cb5_0
- pkginfo =1.8.2=pyhd3eb1b0_0
- prompt-toolkit =3.0.20=pyhd3eb1b0_0
- ptyprocess =0.7.0=pyhd3eb1b0_2
- pure_eval =0.2.2=pyhd3eb1b0_0
- pycparser =2.21=pyhd3eb1b0_0
- pygments =2.11.2=pyhd3eb1b0_0
- pyopenssl =22.0.0=pyhd3eb1b0_0
- pyparsing =3.0.4=pyhd3eb1b0_0
- pysocks =1.7.1=py310hecd8cb5_0
- python =3.10.4=hdfd78df_0
- python-build =0.8.0=pyhd8ed1ab_0
- python-dateutil =2.8.2=pyhd3eb1b0_0
- python_abi =3.10=2_cp310
- pytz =2022.1=py310hecd8cb5_0
- readline =8.1.2=hca72f7f_1
- readme_renderer =35.0=pyhd8ed1ab_0
- regex =2022.3.15=py310hca72f7f_0
- requests =2.28.0=py310hecd8cb5_0
- requests-toolbelt =0.9.1=pyhd3eb1b0_0
- rfc3986 =1.4.0=pyhd3eb1b0_0
- rich =12.4.4=pyhd8ed1ab_0
- scipy =1.7.3=py310h3dd3380_0
- setuptools =61.2.0=py310hecd8cb5_0
- six =1.16.0=pyhd3eb1b0_1
- smart_open =5.2.1=py310hecd8cb5_0
- snowballstemmer =2.2.0=pyhd3eb1b0_0
- sphinx =3.5.3=pyhd3eb1b0_0
- sphinx-rtd-theme =1.0.0=pypi_0
- sphinxcontrib-applehelp =1.0.2=pyhd3eb1b0_0
- sphinxcontrib-devhelp =1.0.2=pyhd3eb1b0_0
- sphinxcontrib-htmlhelp =2.0.0=pyhd3eb1b0_0
- sphinxcontrib-jsmath =1.0.1=pyhd3eb1b0_0
- sphinxcontrib-qthelp =1.0.3=pyhd3eb1b0_0
- sphinxcontrib-serializinghtml =1.1.5=pyhd3eb1b0_0
- sqlite =3.38.5=h707629a_0
- stack_data =0.2.0=pyhd3eb1b0_0
- tk =8.6.12=h5d9f67b_0
- toml =0.10.2=pyhd8ed1ab_0
- tomli =1.2.2=pyhd3eb1b0_0
- tqdm =4.64.0=py310hecd8cb5_0
- traitlets =5.1.1=pyhd3eb1b0_0
- twine =4.0.1=pyhd8ed1ab_1
- typing_extensions =4.1.1=pyh06a4308_0
- tzdata =2022a=hda174b7_0
- urllib3 =1.26.9=py310hecd8cb5_0
- wcwidth =0.2.5=pyhd3eb1b0_0
- webencodings =0.5.1=py310hecd8cb5_1
- wheel =0.37.1=pyhd3eb1b0_0
- xz =5.2.5=hca72f7f_1
- zipp =3.8.0=py310hecd8cb5_0
- zlib =1.2.12=h4dc903c_2
- align ==0.1.1
- pip ==21.2.4