finna-crawler

Crawler for downloading records and their metadata from OAI-PMH API of the Finnish cultural heritage aggregator Finna.

https://github.com/hsci-r/finna-crawler

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.5%) to scientific vocabulary
Last synced: 8 months ago · JSON representation ·

Repository

Crawler for downloading records and their metadata from OAI-PMH API of the Finnish cultural heritage aggregator Finna.

Basic Info
  • Host: GitHub
  • Owner: hsci-r
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 65.4 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 1
  • Open Issues: 2
  • Releases: 1
Created about 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

Finna-crawler

Crawler meant for downloading records and their metadata from OAI-PMH API of the Finnish cultural heritage aggregator Finna.

Installation

Install the script (and Python module) using pip install finna-crawler. After this, the script should be usable from the command line, and the functionality importable from Python. Or, if you have pipx and just want the command line script, use pipx install finna-crawler instead.

Usage

``` Usage: finna-crawler [OPTIONS]

Download metadata and records from Finna from the desired metadata prefix

Options: -p, --metadata-prefix TEXT metadata prefix to query -s, --set TEXT set to query -sf, --status-file TEXT status file for recovering an aborted crawl [required] -sx, --strip-xml / -nsx, --no-strip-xml whether to strip XML namespaces from XML output (default is to strip) -fr, --full-record / -nfr, --no-full-record whether to output the record in full or only the main content of it without the OAI/PMH metadata (default is to output only the main content) -mo, --metadata-output TEXT output TSV (gz/bz2/xz/zst) file in which to write metadata -ro, --record-output TEXT output (gz/bz2/xz/zst) file in which to write records --help Show this message and exit. ```

For information on what the different available metadata sets and versions mean and contain, please consult the Finna OAI-PMH API documentation.

Owner

  • Name: Human Sciences – Computing Interaction Research Group
  • Login: hsci-r
  • Kind: organization
  • Location: University of Helsinki

Citation (CITATION.cff)

cff-version: 1.2.0
title: Finna-crawler
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
license: MIT
doi: 10.5281/zenodo.7805498
authors:
  - given-names: Eetu
    family-names: Mäkelä
    email: eetu.makela@helsinki.fi
    affiliation: University of Helsinki
    orcid: 'https://orcid.org/0000-0002-8366-8414'
abstract: >+
  Crawler meant for downloading records and their metadata
  from OAI-PMH API of the Finnish cultural heritage
  aggregator Finna.
repository-code: 'https://github.com/hsci-r/finna-crawler/'

GitHub Events

Total
Last Year

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 7
  • Total Committers: 1
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Eetu Mäkelä e****a@h****i 7
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (1)

Dependencies

.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
poetry.lock pypi
  • certifi 2022.12.7
  • cffi 1.15.1
  • charset-normalizer 3.1.0
  • click 8.1.3
  • colorama 0.4.6
  • hereutil 0.1.3
  • idna 3.4
  • isal 1.1.0
  • lxml 4.9.2
  • pycparser 2.21
  • pyprojroot 0.2.0
  • requests 2.28.2
  • sickle 0.7.0
  • tqdm 4.65.0
  • urllib3 1.26.15
  • xopen 1.7.0
  • zstandard 0.20.0
pyproject.toml pypi
  • Sickle ^0.7.0
  • click ^8.1.3
  • hereutil ^0.1.1
  • python ^3.10
  • tqdm ^4.64.1
  • xopen ^1.7.0