universitatespodcastdata
An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Keywords
Repository
An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Universitates Podcast Data
An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization
:writing_hand: Author
- Russian CV: https://nevmenandr.github.io/
- English CV: https://nevmenandr.github.io/homepage/
- Full Russian lists and descriptions: http://nevmenandr.net/bo.php
📥 Installation
To install the package from CRAN (once submitted and approved):
r
install.packages("UniversitatesPodcastData")
To install the development version from GitHub:
```r
install.packages("devtools") # Uncomment if devtools is not installed
devtools::install_github("your-github-username/UniversitatesPodcastData") ```
🔧 Functions
Downloading Webpages
download_podcast_pages(ids)
Downloads interview pages and extracts structured data.
- Parameters: ids – a vector of page numbers.
- Returns: A list containing extracted data.
📊 Extracting Data
extract_podcast_date(page)
Extracts the publication date from a webpage.
- Parameters: page – parsed HTML content.
- Returns: A date string.
extract_interlocutor_name(page)
Extracts the interviewee's name. - Returns: A string containing the name.
🧑🔬 extract_interlocutor_specialty(page)
Extracts the interviewee's specialty. - Returns: A string with the specialty.
🏛️ extract_interlocutor_universities(page)
Extracts the universities associated with the interviewee. - Returns: A list of university names.
💾 Saving Data
save_data_to_json(data, file)
Saves extracted data to a JSON file.
- Parameters:
- data – structured data to save.
- file – output file path.
🔍 Searching and Filtering
find_pages_by_specialty(specialty, data)
Finds interview pages by specialty. - Returns: A vector of page numbers.
find_pages_by_university(university, data)
Finds interview pages by university. - Returns: A vector of page numbers.
Usage Example
```r library(UniversitatesPodcastData)
Download and extract data for pages 1 and 2
data <- downloadpodcastpages(c(1, 2))
Save to JSON
savedatatojson(data, "podcastdata.json")
Find pages by specialty
physicspages <- findpagesbyspecialty("Кандидат исторических наук", data)
Find pages by university
mitpages <- findpagesbyuniversity("МГУ", data) ```
Usage in Russian
License
This package is licensed under the GPL-3 license.
Citation info
Chicago Style (17th edition, Author-Date)
Reference list entry:
Orekhov, Boris. 2025. Universitates Podcast Data. Version 1.0.0. https://github.com/nevmenandr/UniversitatesPodcastData.
In-text citation:
(Orekhov 2024)
ГОСТ Р 7.0.5–2008 (автор-датировка, для онлайн-ресурсов)
Орехов, Б. Universitates Podcast Data [Электронный ресурс] / Б. Орехов. – Версия 1.0.0. – 2025. – URL: https://github.com/nevmenandr/UniversitatesPodcastData (дата обращения: 21.03.2025).
BibTeX entry for LaTeX users
latex
@software{Orekhov_UniversitatesPodcastData_2025,
author = {Boris Orekhov},
title = {Universitates Podcast Data},
year = {2025},
version = {1.0.0},
url = {https://github.com/nevmenandr/UniversitatesPodcastData},
orcid = {0000-0002-9099-0436},
license = {GPL-3.0},
abstract = {An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization.}
}
Owner
- Name: Boris Orekhov
- Login: nevmenandr
- Kind: user
- Location: Moscow
- Website: https://nevmenandr.github.io
- Twitter: nevmenandr
- Repositories: 42
- Profile: https://github.com/nevmenandr
Digital humanities researcher
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this package, please cite it as follows:"
title: "Universitates Podcast Data"
authors:
- family-names: "Orekhov"
given-names: "Boris"
orcid: "0000-0002-9099-0436"
repository-code: "https://github.com/nevmenandr/UniversitatesPodcastData"
version: "1.0.0"
date-released: "2025-03-22"
license: "GPL-3.0"
doi: "10.5281/zenodo.XXXXXXX"
abstract: "An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization."
keywords:
- R
- data collection
- r package
- data processing
- web scraping
- text analysis
- podcast
- russian language
GitHub Events
Total
- Push event: 19
- Create event: 2
Last Year
- Push event: 19
- Create event: 2
Committers
Last synced: 6 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Boris Orekhov | n****r@g****m | 22 |
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- dplyr * imports
- ggplot2 * imports
- httr * imports
- jsonlite * imports
- rvest * imports
- stringr * imports
- tidyr * imports
