trove-newspaper-harvester

https://github.com/wragge/trove-newspaper-harvester

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: wragge
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 572 KB

Statistics

Stars: 2
Watchers: 3
Forks: 0
Open Issues: 4
Releases: 6

Created over 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

trove-newspaper-harvester

[![](https://zenodo.org/badge/DOI/10.5281/zenodo.7103174.svg)](https://doi.org/10.5281/zenodo.7103174)

View the full documentation

The Trove Newspaper (& Gazette) Harvester makes it easy to download large quantities of digitised articles from Trove’s newspapers and gazettes. Just give it a search from the Trove web interface, and the harvester will save the metadata of all the articles in a CSV (spreadsheet) file for further analysis. You can also save the full text of every article, as well as copies of the articles as JPG images, and even PDFs. While the web interface will only show you the first 2,000 results matching your search, the Newspaper Harvester will get everything.

No installation required!

If you want to use the harvester without installing anything, just head over to the Trove Newspaper Harvester section in my GLAM Workbench.

Installation

sh pip install trove-newspaper-harvester

Before you do any harvesting you need to get yourself a Trove API key.

Use as a library

python from trove_newspaper_harvester.core import prepare_query, Harvester

Generate a set of query parameters using prepare_query.

``` python myquery = "https://trove.nla.gov.au/search/category/newspapers?keyword=wragge" myapi_key = "mYSecREtkEy"

myqueryparams = preparequery(query=myquery) ```

Initialise the Harvester with your query parameters and api key.

python harvester = Harvester(query_params=my_query_params, key=my_api_key)

Start the harvest!

python harvester.harvest()

If the harvest fails just run Harvester.harvest again.

See the core module documentation for more options and examples.

Use as a command-line tool

There are three basic commands:

start – start a new harvest
restart – restart a stalled harvest
report – view harvest details

Start a harvest

To start a new harvest you can just do:

sh troveharvester start "[Trove query]" [Trove API key]

The Trove query can either be a url copied and pasted from a search in the Trove web interface, or a Trove API query url constructed using something like the Trove API Console. Enclose the url in double quotes.

See the CLI module documentation for more details.

Created by Tim Sherratt for the GLAM Workbench. Support this project by becoming a GitHub sponsor.

Owner

Name: Tim Sherratt
Login: wragge
Kind: user

Website: https://timsherratt.org
Repositories: 209
Profile: https://github.com/wragge

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Sherratt"
  given-names: "Tim"
  orcid: "https://orcid.org/0000-0001-7956-4498"
title: "trove-newspaper-harvester"
version: 0.6.4
doi: 10.5281/zenodo.7103174
date-released: 2022-10-11
url: "https://github.com/wragge/trove-newspaper-harvester"

GitHub Events

Total

Issues event: 1

Last Year

Issues event: 1

Committers

Last synced: 12 months ago

All Time

Total Commits: 21
Total Committers: 1
Avg Commits per committer: 21.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Tim Sherratt	t**m@d**u	21

Committer Domains (Top 20 + Academic)

discontents.com.au: 1

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 9
Total pull requests: 0
Average time to close issues: 2 months
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 0
Average comments per issue: 0.22
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

wragge (8)
5p4r74cu5 (1)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 23 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 10
Total maintainers: 1

pypi.org: trove-newspaper-harvester

Tool for bulk harvests of digitised newspaper articles from Trove

Homepage: https://github.com/wragge/trove-newspaper-harvester
Documentation: https://trove-newspaper-harvester.readthedocs.io/
License: MIT License
Latest release: 0.7.2
published over 2 years ago

Versions: 10
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 23 Last month

Rankings

Dependent packages count: 10.0%

Dependent repos count: 21.7%

Average: 24.6%

Stargazers count: 27.8%

Forks count: 29.8%

Downloads: 33.9%

Maintainers (1)

wragge

Last synced: 10 months ago

Dependencies

.github/workflows/deploy.yaml actions

fastai/workflows/quarto-ghp master composite

setup.py pypi

trove-newspaper-harvester

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

trove-newspaper-harvester

No installation required!

Installation

Use as a library

Use as a command-line tool

Start a harvest

Owner

Citation (citation.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: trove-newspaper-harvester

Rankings

Maintainers (1)

Dependencies