https://github.com/chennesy/dfr_lq
topic modeling library quarterly
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
topic modeling library quarterly
Basic Info
- Host: GitHub
- Owner: chennesy
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 56.2 MB
Statistics
- Stars: 1
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 0
Created over 6 years ago
· Last pushed almost 4 years ago
https://github.com/chennesy/dfr_lq/blob/main/
# dfr_lq Code used to generate LDA topic models analyzed in the article, Computational Topic Models of the *Library Quarterly* (1931-2015). ## Data access Metadata and unigrams for 8,808 items from the *Library Quarterly* (LQ) were retrieved from JSTOR's Data for Research (DfR) platform using the query, *jcode:libraryq*. Each article from the journal is represented in the DfR download via: 1. an XML file with metadata for each article and 2. a tab-delimited TXT file listing the ngrams for each article. As of Fall 2020 JSTOR has plans to sunset the DfR platform, but its function has already been migrated to the [Digital Scholar Workbench website](https://tdm-pilot.org/). Researchers seeking to replicate the topic models can reach out to the authors for access to the underlying data. ## Using this code The code used to import and analyze the LQ data is available in four IPython notebooks in this repository, each of which are explained in greater depth below: 1_import_r.ipynb, 2_clean.ipynb, 3_lda_models.ipynb, and 4_tm_analysis.ipynb. ### Set-up To execute the code in the Jupyter notebooks in this repository on your own, we recommend first reading and following our [setup instructions](setup.md), especially if the goal in doing so is to replicate our results. ### Data import, cleaning and analysis 1. [1_import_r.ipynb](https://github.com/chennesy/dfr_lq/blob/master/1_import_r.ipynb) - Thomas Klebels [jstor package](https://docs.ropensci.org/jstor/) for the programming language R (2018) was leveraged to reformat the metadata XML files into a single CSV file containing key metadata for the entire corpus. 2. [2_clean.ipynb](https://github.com/chennesy/dfr_lq/blob/master/2_clean.ipynb) - The metadata was then imported into Python and combined with the ngrams for each article. To prepare the corpus for topic modeling, the words in the ngram files were stemmed using the Natural Language Toolkits Snowball Stemmer (Bird, Loper, and Klein 2009). 3. [3_lda_models.ipynb](https://github.com/chennesy/dfr_lq/blob/master/3_lda_models.ipynb) - This notebook may be skipped for purposes of replicability. It used scikit-learn's GridSearchCV from the model_selection module to find the best performing model and parameters, including the number of topics (40) that were ultimately applied to the LQ corpus. 4. [4_tm_analysis.ipynb](https://github.com/chennesy/dfr_lq/blob/master/4_tm_analysis.ipynb) - Used to generate and analyze the topic model. See the article and supplemental appendix for more information. ## Suggested citation Hennesy, C. & Naughton, D. (2022). Computational Topic Models of the *Library Quarterly*. *portal: Libraries & the Academy* 22(3). https://doi.org/10.1353/pla.2022.0030.
Owner
- Name: Cody Hennesy
- Login: chennesy
- Kind: user
- Repositories: 17
- Profile: https://github.com/chennesy
Univ of Minnesota. Librarian.