universitatespodcastdata

An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization

https://github.com/nevmenandr/universitatespodcastdata

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Keywords

data-collection data-processing podcast r r-package russian-language text-analysis web-scraping

Last synced: 6 months ago · JSON representation ·

Repository

An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization

Basic Info

Host: GitHub
Owner: nevmenandr
License: gpl-3.0
Language: R
Default Branch: main
Homepage:
Size: 55.7 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

data-collection data-processing podcast r r-package russian-language text-analysis web-scraping

Created 11 months ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

Universitates Podcast Data

An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization

:writing_hand: Author

Russian CV: https://nevmenandr.github.io/
English CV: https://nevmenandr.github.io/homepage/
Full Russian lists and descriptions: http://nevmenandr.net/bo.php

📥 Installation

To install the package from CRAN (once submitted and approved):

r install.packages("UniversitatesPodcastData")

To install the development version from GitHub:

```r

install.packages("devtools") # Uncomment if devtools is not installed

devtools::install_github("your-github-username/UniversitatesPodcastData") ```

🔧 Functions

Downloading Webpages

`download_podcast_pages(ids)`

Downloads interview pages and extracts structured data. - Parameters: ids – a vector of page numbers. - Returns: A list containing extracted data.

📊 Extracting Data

`extract_podcast_date(page)`

Extracts the publication date from a webpage. - Parameters: page – parsed HTML content. - Returns: A date string.

`extract_interlocutor_name(page)`

Extracts the interviewee's name. - Returns: A string containing the name.

🧑‍🔬 `extract_interlocutor_specialty(page)`

Extracts the interviewee's specialty. - Returns: A string with the specialty.

🏛️ `extract_interlocutor_universities(page)`

Extracts the universities associated with the interviewee. - Returns: A list of university names.

💾 Saving Data

`save_data_to_json(data, file)`

Saves extracted data to a JSON file. - Parameters:
- data – structured data to save.
- file – output file path.

🔍 Searching and Filtering

`find_pages_by_specialty(specialty, data)`

Finds interview pages by specialty. - Returns: A vector of page numbers.

`find_pages_by_university(university, data)`

Finds interview pages by university. - Returns: A vector of page numbers.

Usage Example

```r library(UniversitatesPodcastData)

Download and extract data for pages 1 and 2

data <- downloadpodcastpages(c(1, 2))

Save to JSON

savedatatojson(data, "podcastdata.json")

Find pages by specialty

physicspages <- findpagesbyspecialty("Кандидат исторических наук", data)

Find pages by university

mitpages <- findpagesbyuniversity("МГУ", data) ```

Usage in Russian

In Russian

License

This package is licensed under the GPL-3 license.

Citation info

Chicago Style (17th edition, Author-Date)

Reference list entry:

Orekhov, Boris. 2025. Universitates Podcast Data. Version 1.0.0. https://github.com/nevmenandr/UniversitatesPodcastData.

In-text citation:

(Orekhov 2024)

ГОСТ Р 7.0.5–2008 (автор-датировка, для онлайн-ресурсов)

Орехов, Б. Universitates Podcast Data [Электронный ресурс] / Б. Орехов. – Версия 1.0.0. – 2025. – URL: https://github.com/nevmenandr/UniversitatesPodcastData (дата обращения: 21.03.2025).

BibTeX entry for LaTeX users

latex @software{Orekhov_UniversitatesPodcastData_2025, author = {Boris Orekhov}, title = {Universitates Podcast Data}, year = {2025}, version = {1.0.0}, url = {https://github.com/nevmenandr/UniversitatesPodcastData}, orcid = {0000-0002-9099-0436}, license = {GPL-3.0}, abstract = {An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization.} }

Owner

Name: Boris Orekhov
Login: nevmenandr
Kind: user
Location: Moscow

Website: https://nevmenandr.github.io
Twitter: nevmenandr
Repositories: 42
Profile: https://github.com/nevmenandr

Digital humanities researcher

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this package, please cite it as follows:"
title: "Universitates Podcast Data"
authors:
  - family-names: "Orekhov"
    given-names: "Boris"
    orcid: "0000-0002-9099-0436"
repository-code: "https://github.com/nevmenandr/UniversitatesPodcastData"
version: "1.0.0"
date-released: "2025-03-22"
license: "GPL-3.0"
doi: "10.5281/zenodo.XXXXXXX" 
abstract: "An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization."
keywords:
  - R
  - data collection
  - r package
  - data processing
  - web scraping
  - text analysis
  - podcast
  - russian language

GitHub Events

Total

Push event: 19
Create event: 2

Last Year

Push event: 19
Create event: 2

Committers

Last synced: 8 months ago

All Time

Total Commits: 22
Total Committers: 1
Avg Commits per committer: 22.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 22
Committers: 1
Avg Commits per committer: 22.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Boris Orekhov	n**r@g**m	22

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

DESCRIPTION cran

dplyr * imports
ggplot2 * imports
httr * imports
jsonlite * imports
rvest * imports
stringr * imports
tidyr * imports

universitatespodcastdata

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Universitates Podcast Data

:writing_hand: Author

📥 Installation

install.packages("devtools") # Uncomment if devtools is not installed

🔧 Functions

Downloading Webpages

download_podcast_pages(ids)

📊 Extracting Data

extract_podcast_date(page)

extract_interlocutor_name(page)

🧑‍🔬 extract_interlocutor_specialty(page)

🏛️ extract_interlocutor_universities(page)

💾 Saving Data

save_data_to_json(data, file)

🔍 Searching and Filtering

find_pages_by_specialty(specialty, data)

find_pages_by_university(university, data)

Usage Example

Download and extract data for pages 1 and 2

Save to JSON

Find pages by specialty

Find pages by university

Usage in Russian

License

Citation info

Chicago Style (17th edition, Author-Date)

ГОСТ Р 7.0.5–2008 (автор-датировка, для онлайн-ресурсов)

BibTeX entry for LaTeX users

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

`download_podcast_pages(ids)`

`extract_podcast_date(page)`

`extract_interlocutor_name(page)`

🧑‍🔬 `extract_interlocutor_specialty(page)`

🏛️ `extract_interlocutor_universities(page)`

`save_data_to_json(data, file)`

`find_pages_by_specialty(specialty, data)`

`find_pages_by_university(university, data)`