finna-crawler
Crawler for downloading records and their metadata from OAI-PMH API of the Finnish cultural heritage aggregator Finna.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary
Repository
Crawler for downloading records and their metadata from OAI-PMH API of the Finnish cultural heritage aggregator Finna.
Basic Info
- Host: GitHub
- Owner: hsci-r
- License: mit
- Language: Python
- Default Branch: main
- Size: 65.4 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 2
- Releases: 1
Metadata Files
README.md
Finna-crawler
Crawler meant for downloading records and their metadata from OAI-PMH API of the Finnish cultural heritage aggregator Finna.
Installation
Install the script (and Python module) using pip install finna-crawler. After this, the script should be usable from the command line, and the functionality importable from Python. Or, if you have pipx and just want the command line script, use pipx install finna-crawler instead.
Usage
``` Usage: finna-crawler [OPTIONS]
Download metadata and records from Finna from the desired metadata prefix
Options: -p, --metadata-prefix TEXT metadata prefix to query -s, --set TEXT set to query -sf, --status-file TEXT status file for recovering an aborted crawl [required] -sx, --strip-xml / -nsx, --no-strip-xml whether to strip XML namespaces from XML output (default is to strip) -fr, --full-record / -nfr, --no-full-record whether to output the record in full or only the main content of it without the OAI/PMH metadata (default is to output only the main content) -mo, --metadata-output TEXT output TSV (gz/bz2/xz/zst) file in which to write metadata -ro, --record-output TEXT output (gz/bz2/xz/zst) file in which to write records --help Show this message and exit. ```
For information on what the different available metadata sets and versions mean and contain, please consult the Finna OAI-PMH API documentation.
Owner
- Name: Human Sciences – Computing Interaction Research Group
- Login: hsci-r
- Kind: organization
- Location: University of Helsinki
- Website: http://heldig.fi/hsci/
- Twitter: hsci_research
- Repositories: 61
- Profile: https://github.com/hsci-r
Citation (CITATION.cff)
cff-version: 1.2.0
title: Finna-crawler
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
license: MIT
doi: 10.5281/zenodo.7805498
authors:
- given-names: Eetu
family-names: Mäkelä
email: eetu.makela@helsinki.fi
affiliation: University of Helsinki
orcid: 'https://orcid.org/0000-0002-8366-8414'
abstract: >+
Crawler meant for downloading records and their metadata
from OAI-PMH API of the Finnish cultural heritage
aggregator Finna.
repository-code: 'https://github.com/hsci-r/finna-crawler/'
GitHub Events
Total
Last Year
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Eetu Mäkelä | e****a@h****i | 7 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
- certifi 2022.12.7
- cffi 1.15.1
- charset-normalizer 3.1.0
- click 8.1.3
- colorama 0.4.6
- hereutil 0.1.3
- idna 3.4
- isal 1.1.0
- lxml 4.9.2
- pycparser 2.21
- pyprojroot 0.2.0
- requests 2.28.2
- sickle 0.7.0
- tqdm 4.65.0
- urllib3 1.26.15
- xopen 1.7.0
- zstandard 0.20.0
- Sickle ^0.7.0
- click ^8.1.3
- hereutil ^0.1.1
- python ^3.10
- tqdm ^4.64.1
- xopen ^1.7.0