Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.8%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: wragge
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 1.54 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created almost 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

Trove newspaper front pages

DOI

This repository demonstrates how to harvest information about the contents of newspaper front pages from Trove. It then uses the harvested data to explore how the contents of front pages have changed over time.

Notebooks

Datasets

front_pages.parquet

Contains summary information about articles published on the front pages of newspapers. There are 16,398,514 rows of data (274.4mb). It was created on 2 August 2023. Includes the following columns:

| Column | Description | |--------|-------------| article_id| Trove numeric identifier for article| title | title of the article newspaper_id | Trove numeric identifier for the newspaper in which the article was published date | date the article was published category | category of the article, eg: 'Article', 'Advertising' word_count | number of words in the article page_id | Trove numeric identifier for the page on which the article was published

front_pages_totals.parquet

Derived from front_pages.parquet by adding together the word counts for articles within each category, giving us the total words per category for each front page. It was created on 2 August 2023. There are 4,351,009 rows of data (35.1mb). Includes the following columns:

| Column | Description | |--------|-------------| date | date the page was published page_id | Trove numeric identifier for the page newspaper_id | Trove numeric identifier for the newspaper category | article category eg: 'Article', 'Advertising' total | number of words in this category on this page


Created by Tim Sherratt, August 2023

Owner

  • Name: Tim Sherratt
  • Login: wragge
  • Kind: user

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.in pypi
  • altair *
  • black *
  • duckdb *
  • ipywidgets *
  • isort *
  • jupyter-archive *
  • jupyterlab *
  • jupyterlab-code-formatter *
  • pandas *
  • pyarrow *
  • python-dotenv *
  • requests *
  • tqdm *
  • trove-newspaper-harvester *
  • voila *
requirements.txt pypi
  • 120 dependencies