universitatespodcastdata

An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization

https://github.com/nevmenandr/universitatespodcastdata

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary

Keywords

data-collection data-processing podcast r r-package russian-language text-analysis web-scraping
Last synced: 4 months ago · JSON representation ·

Repository

An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization

Basic Info
  • Host: GitHub
  • Owner: nevmenandr
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 55.7 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
data-collection data-processing podcast r r-package russian-language text-analysis web-scraping
Created 9 months ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

R

Universitates Podcast Data

CRAN status License DOI Telegram

An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization

:writing_hand: Author

  • Russian CV: https://nevmenandr.github.io/
  • English CV: https://nevmenandr.github.io/homepage/
  • Full Russian lists and descriptions: http://nevmenandr.net/bo.php

📥 Installation

To install the package from CRAN (once submitted and approved):

r install.packages("UniversitatesPodcastData")

To install the development version from GitHub:

```r

install.packages("devtools") # Uncomment if devtools is not installed

devtools::install_github("your-github-username/UniversitatesPodcastData") ```

🔧 Functions

Downloading Webpages

download_podcast_pages(ids)

Downloads interview pages and extracts structured data. - Parameters: ids – a vector of page numbers. - Returns: A list containing extracted data.

📊 Extracting Data

extract_podcast_date(page)

Extracts the publication date from a webpage. - Parameters: page – parsed HTML content. - Returns: A date string.

extract_interlocutor_name(page)

Extracts the interviewee's name. - Returns: A string containing the name.

🧑‍🔬 extract_interlocutor_specialty(page)

Extracts the interviewee's specialty. - Returns: A string with the specialty.

🏛️ extract_interlocutor_universities(page)

Extracts the universities associated with the interviewee. - Returns: A list of university names.

💾 Saving Data

save_data_to_json(data, file)

Saves extracted data to a JSON file. - Parameters:
- data – structured data to save.
- file – output file path.

🔍 Searching and Filtering

find_pages_by_specialty(specialty, data)

Finds interview pages by specialty. - Returns: A vector of page numbers.

find_pages_by_university(university, data)

Finds interview pages by university. - Returns: A vector of page numbers.

Usage Example

```r library(UniversitatesPodcastData)

Download and extract data for pages 1 and 2

data <- downloadpodcastpages(c(1, 2))

Save to JSON

savedatatojson(data, "podcastdata.json")

Find pages by specialty

physicspages <- findpagesbyspecialty("Кандидат исторических наук", data)

Find pages by university

mitpages <- findpagesbyuniversity("МГУ", data) ```

Usage in Russian

In Russian

License

This package is licensed under the GPL-3 license.

Citation info

Chicago Style (17th edition, Author-Date)

Reference list entry:

Orekhov, Boris. 2025. Universitates Podcast Data. Version 1.0.0. https://github.com/nevmenandr/UniversitatesPodcastData.

In-text citation:

(Orekhov 2024)

ГОСТ Р 7.0.5–2008 (автор-датировка, для онлайн-ресурсов)

Орехов, Б. Universitates Podcast Data [Электронный ресурс] / Б. Орехов. – Версия 1.0.0. – 2025. – URL: https://github.com/nevmenandr/UniversitatesPodcastData (дата обращения: 21.03.2025).

BibTeX entry for LaTeX users

latex @software{Orekhov_UniversitatesPodcastData_2025, author = {Boris Orekhov}, title = {Universitates Podcast Data}, year = {2025}, version = {1.0.0}, url = {https://github.com/nevmenandr/UniversitatesPodcastData}, orcid = {0000-0002-9099-0436}, license = {GPL-3.0}, abstract = {An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization.} }

Owner

  • Name: Boris Orekhov
  • Login: nevmenandr
  • Kind: user
  • Location: Moscow

Digital humanities researcher

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this package, please cite it as follows:"
title: "Universitates Podcast Data"
authors:
  - family-names: "Orekhov"
    given-names: "Boris"
    orcid: "0000-0002-9099-0436"
repository-code: "https://github.com/nevmenandr/UniversitatesPodcastData"
version: "1.0.0"
date-released: "2025-03-22"
license: "GPL-3.0"
doi: "10.5281/zenodo.XXXXXXX" 
abstract: "An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization."
keywords:
  - R
  - data collection
  - r package
  - data processing
  - web scraping
  - text analysis
  - podcast
  - russian language

GitHub Events

Total
  • Push event: 19
  • Create event: 2
Last Year
  • Push event: 19
  • Create event: 2

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 22
  • Total Committers: 1
  • Avg Commits per committer: 22.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 22
  • Committers: 1
  • Avg Commits per committer: 22.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Boris Orekhov n****r@g****m 22

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • dplyr * imports
  • ggplot2 * imports
  • httr * imports
  • jsonlite * imports
  • rvest * imports
  • stringr * imports
  • tidyr * imports