ckanext-attribution
A CKAN extension that adds support for complex attribution.
Science Score: 62.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
✓Institutional organization owner
Organization naturalhistorymuseum has institutional domain (www.nhm.ac.uk) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
A CKAN extension that adds support for complex attribution.
Basic Info
Statistics
- Stars: 0
- Watchers: 9
- Forks: 0
- Open Issues: 6
- Releases: 24
Topics
Metadata Files
README.md
ckanext-attribution
A CKAN extension that adds support for complex attribution.
Overview
This extension standardises author/contributor attribution for datasets, enabling enhanced metadata and greater linkage between datasets. It currently integrates with the ORCID and ROR APIs; contributors ('agents') can be added directly from these databases, or manually.
Contributors can be added and edited via actions or via a Vue app that can be inserted into
the package_metadata_fields.html template snippet.

Schema
The schema is (partially) based on
the RDA/TDWG recommendations. Three new models are
added: Agent (contributors), ContributionActivity, and Affiliation (plus small linking models
between these and Package records).
Agent
Defines one agent.
| Field | Type | Values | Notes |
|----------------------|--------|--------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| agent_type | string | 'person', 'org', 'other' | |
| family_name | string | | only used for 'person' records |
| given_names | string | | only used for 'person' records |
| given_names_first | bool | True, False | only used for 'person' records; if the given names should be displayed first according to the person's culture/language (default True) |
| name | string | | used for non-'person' records |
| location | string | | used for non-person records, optional; a location to display for the organisation to help differentiate between similar names (e.g. 'Natural History Museum (London)' and 'Natural History Museum (Dublin)')
external_id | string | | an identifier from an external service like ORCID or ROR
external_id_scheme | string | 'orcid', 'ror', other | the scheme for the external_id; currently only 'orcid' and 'ror' are fully supported, though basic support for others can be implemented by adding to the attribution_controlled_lists action
user_id | string | User.id foreign key | link to a user account on the CKAN instance
ContributionActivity
Defines one activity performed by one agent on one specific dataset.
| Field | Type | Values | Notes |
|------------|----------|-------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| activity | string | [controlled vocabulary] | the activity/role the agent is associated with, e.g. 'Editor', 'Methodology'; roles are defined in the attribution_controlled_lists action, which currently lists the Datacite and CRediT role taxonomies (but can be expanded) |
| scheme | string | [controlled vocabulary] | name of the defined scheme from attribution_controlled_lists |
| level | string | 'Lead', 'Equal', 'Supporting' | optional degree of contribution (from CRediT) |
| time | datetime | | optional date/time of the activity |
| order | integer | | order of the agent within all who are associated with the same activity, e.g. 1st Editor, 3rd DataCollector (optional) |
A specialised ContributionActivity entry with a '[citation]' activity is used to define the order
in which contributors should be cited (and/or if they should be cited at all).
Affiliation
Defines a relationship between two agents, either as a 'universal' (persistent) affiliation or for a single package (e.g. a project affiliation).
| Field | Type | Values | Notes |
|--------------------|--------|--------------------------|------------------------------------------------------------------------------------|
| agent_a_id | string | Agent.id foreign key | one of the two agents (a/b order does not matter) |
| agent_b_id | string | Agent.id foreign key | one of the two agents (a/b order does not matter) |
| affiliation_type | string | | very short description (1 or 2 words) of affiliation, e.g. 'employment' (optional) |
| description | string | | longer description of affiliation (optional) |
| start_date | date | | date at which the relationship began, e.g. employment start date (optional) |
| end_date | date | | date at which the relationship ended (optional) |
| package_id | string | Package.id foreign key | links affiliation to a specific package/dataset (optional) |
Installation
Path variables used below:
- $INSTALL_FOLDER (i.e. where CKAN is installed), e.g. /usr/lib/ckan/default
- $CONFIG_FILE, e.g. /etc/ckan/default/development.ini
Installing from PyPI
```shell pip install ckanext-attribution
to use the CLI as well:
pip install ckanext-attribution[cli] ```
Installing from source
Clone the repository into the
srcfolder:shell cd $INSTALL_FOLDER/src git clone https://github.com/NaturalHistoryMuseum/ckanext-attribution.gitActivate the virtual env:
shell . $INSTALL_FOLDER/bin/activateInstall via pip: ```shell pip install $INSTALL_FOLDER/src/ckanext-attribution
to use the cli as well:
pip install $INSTALL_FOLDER/src/ckanext-attribution[cli] ```
Installing in editable mode
Installing from a pyproject.toml in editable mode (i.e. pip install -e) requires setuptools>=64; however, CKAN 2.9 requires setuptools==44.1.0. See our CKAN fork for a version of v2.9 that uses an updated setuptools if this functionality is something you need.
Post-install setup
Add 'attribution' to the list of plugins in your
$CONFIG_FILE:ini ckan.plugins = ... attributionInstall
lesscglobally:shell npm install -g "less@~4.1"Add this block to
package_metadata_fields.htmlto show the Vue app:jinja2 {% block package_custom_fields_agent %} {{ super() }} {% endblock %}Change the
authorsfield in your SOLRschema.xmlto set up faceting.xml <schema> <fields> <...> <field name="author" type="string" indexed="true" stored="true" multiValued="true"/> <...> </fields> <...> <copyField source="author" dest="text"/> </schema>
After making the changes, restart SOLR and reindex (ckan -c $CONFIG_FILE search-index rebuild).
You will also have to enable the config option (see below) to see this in the UI.
Configuration
These are the options that can be specified in your .ini config file. NB:
setting ckanext.attribution.debug to True means that the API
accesses sandbox.orcid.org instead of orcid.org.
Although both run by the ORCID organisation, these are different websites and you will need a
separate account/set of credentials for each. It is also worth noting that you will not have access
to the full set of authors on the sandbox.
API credentials [REQUIRED]
| Name | Description | Options |
|------------------------------------|------------------------------|---------|
| ckanext.attribution.orcid_key | Your ORCID API client ID/key | |
| ckanext.attribution.orcid_secret | Your ORCID API client secret | |
Optional
| Name | Description | Options | Default |
|---------------------------------------|-----------------------------------------------------------------------|------------|---------|
| ckanext.attribution.debug | If true, use sandbox.orcid.org (for testing) | True/False | True |
| ckanext.attribution.enable_faceting | Enable filtering by contributor name (requires change to SOLR schema) | True/False | False |
Usage
Actions
This extension adds numerous new actions. These are primarily CRUD actions for managing models, with inline documentation and predictable interactions. It's probably more helpful to only go over the more "unusual" new actions here.
agent_list
Search for agents by name or external ID, or just list all agents.
```python datadict = { 'q': 'QUERY', # optional; searches in name, familyname, givennames, and externalid }
toolkit.getaction('agentlist')({}, data_dict) ```
package_contributions_show
Show all contribution records for a package, grouped by agent. Optionally provide a limit and offset for pagination.
```python datadict = { 'id': 'PACKAGEID', 'limit': 'PAGE_SIZE', 'offset': 'OFFSET' }
toolkit.getaction('packagecontributionsshow')({}, datadict) ```
Returns a dict:
python
{
'contributions': [
{
'agent': {
# Agent.as_dict()
},
'activities': [
# list of Activity.as_dict()
],
'affiliations': [
{
'affiliation': {
# Affiliation.as_dict()
},
'other_agent': {
# Agent.as_dict()
}
},
# ...
]
},
# ...
],
'total': total,
'offset': offset,
'page_size': limit or total
}
agent_affiliations
Show all affiliations for a given agent, optionally limited to a specific dataset/package (plus ' global' affiliations).
```python datadict = { 'agentid': 'AGENTID', 'packageid': 'PACKAGE_ID' # optional }
toolkit.getaction('agentaffiliations')({}, data_dict) ```
Returns a list of records formatted as such:
python
{
'affiliation': {
# Affiliation.as_dict()
},
'other_agent': {
# Agent.as_dict()
}
}
attribution_controlled_lists
Returns collections of defined values (which can be modified by using @toolkit.chained_action).
```python data_dict = { 'lists': ['NAME1', 'NAME2'] # optional; only return these lists }
toolkit.getaction('attributioncontrolledlists')({}, datadict) ```
There are four collections:
agent_typesdescribes valid types for agents and adds additional detail;contribution_activity_typescontains role/activity taxonomies (i.e. Datacite and CRediT) and lists the available activity values;contribution_activity_levelsis a list of contribution levels (i.e. 'lead', 'equal', and ' supporting', from CRediT);agent_external_id_schemesdescribes valid schemes for external IDs (currently, ORCID and ROR).
These collections are useful for validation and frontend connectivity/standardisation. They are contained within an action to a. enable frontend access via AJAX requests, and b. allow users to override values as needed.
agent_external_search
Search external sources (ORCID and ROR) for agent data. Ignores records that already exist in the database.
```python datadict = { 'q': 'QUERYSTRING', 'sources': ['SOURCE1', 'SOURCE2'] # optional; only search these sources }
toolkit.getaction('agentexternalsearch')({}, datadict) ```
Results are returned formatted as such:
python
{
'SCHEME_NAME': {
'records': [
# list of agent dicts
]
'remaining': 10000 # number of other records found
}
}
agent_external_read
Read data from an external source like ORCID or ROR, either from an existing record or a new external ID.
```python
EITHER
datadictexisting = { 'agentid': 'AGENTID', 'diff': False # optional; only show values that differ from the record's current values (default False) }
OR
datadictnew = { 'externalid': 'EXTERNALID', 'externalidscheme': 'orcid' # or 'ror', etc. }
toolkit.getaction('agentexternalread')({}, datadict) ```
Commands
NB: you will have to install the optional [cli] packages to use several of these commands.
initdb
shell
ckan -c $CONFIG_FILE attribution initdb
Initialise database tables.
sync
shell
ckan -c $CONFIG_FILE attribution sync $OPTIONAL_ID $ANOTHER_OPTIONAL_ID
Retrieve up-to-date information from external APIs for contributors with an external ID set.
refresh-packages
shell
ckan -c $CONFIG_FILE attribution refresh-packages $OPTIONAL_ID $ANOTHER_OPTIONAL_ID
Update the author string for all (or the specified) packages.
agent-external-search
shell
ckan -c $CONFIG_FILE attribution agent-external-search --limit 10 $OPTIONAL_ID $ANOTHER_OPTIONAL_ID
Search external APIs for contributors without an external ID set. Run refresh-packages and rebuild the search index after this command.
merge-agents
shell
ckan -c $CONFIG_FILE attribution merge-agents --q $SEARCH_QUERY --match-threshold 75
Find agents with similar names (optionally matching the search query) and merge them. Run refresh-packages and rebuild the search index after this command.
migratedb
shell
ckan -c $CONFIG_FILE attribution migratedb --limit 10 --dry-run --no-search-api
Attempt to extract names of contributors from author fields and convert them to the new format.
- --limit will only convert a certain number of packages at a time.
- --dry-run prevents saving to the database.
- --no-search-api just extracts the names, without searching external APIs for contributors after.
It is recommended to run merge-agents, refresh-packages, and rebuild the search index after running this command.
Testing
There is a Docker compose configuration available in this repository to make it easier to run tests. The ckan image uses the Dockerfile in the docker/ folder.
To run the tests against ckan 2.9.x on Python3:
Build the required images:
shell docker compose buildThen run the tests. The root of the repository is mounted into the ckan container as a volume by the Docker compose configuration, so you should only need to rebuild the ckan image if you change the extension's dependencies.
shell docker compose run ckan
Owner
- Name: Natural History Museum
- Login: NaturalHistoryMuseum
- Kind: organization
- Location: London
- Website: https://www.nhm.ac.uk
- Repositories: 171
- Profile: https://github.com/NaturalHistoryMuseum
Citation (CITATION.cff)
cff-version: 1.2.0
title: CKAN Attribution extension
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- name: Natural History Museum
city: London
country: GB
alias: NHM
email: data@nhm.ac.uk
repository-code: 'https://github.com/NaturalHistoryMuseum/ckanext-attribution'
abstract: A CKAN extension that adds support for complex attribution.
keywords:
- ckan
- ckanext
- attribution
license: GPL-3.0-or-later
version: 1.2.14
GitHub Events
Total
- Create event: 6
- Issues event: 1
- Release event: 3
- Delete event: 6
- Issue comment event: 1
- Push event: 31
- Pull request event: 15
Last Year
- Create event: 6
- Issues event: 1
- Release event: 3
- Delete event: 6
- Issue comment event: 1
- Push event: 31
- Pull request event: 15
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 169
- Total Committers: 3
- Avg Commits per committer: 56.333
- Development Distribution Score (DDS): 0.154
Top Committers
| Name | Commits | |
|---|---|---|
| Ginger Butcher | a****i@g****m | 143 |
| Josh Humphries | j****s@n****k | 18 |
| github-actions[bot] | g****]@u****m | 8 |
Committer Domains (Top 20 + Academic)
Packages
- Total packages: 1
-
Total downloads:
- pypi 385 last-month
- Total dependent packages: 1
- Total dependent repositories: 0
- Total versions: 17
- Total maintainers: 1
pypi.org: ckanext-attribution
A CKAN extension that adds support for complex attribution.
- Documentation: https://ckanext-attribution.readthedocs.io/
- License: GPL-3.0-or-later
-
Latest release: 1.2.14
published 9 months ago
Rankings
Maintainers (1)
Dependencies
- 681 dependencies
- @babel/core ^7.17.9 development
- @babel/plugin-proposal-class-properties ^7.16.7 development
- @babel/plugin-syntax-dynamic-import ^7.8.3 development
- @babel/plugin-transform-runtime ^7.17.0 development
- @babel/preset-env ^7.16.11 development
- @babel/runtime ^7.17.9 development
- babel-loader ^8.2.4 development
- babel-preset-minify ^0.5.1 development
- css-loader ^6.7.1 development
- css-minimizer-webpack-plugin ^3.4.1 development
- style-loader ^3.3.1 development
- terser-webpack-plugin ^5.3.1 development
- vue-loader ^15.9.8 development
- vue-style-loader ^4.1.3 development
- vue-template-compiler ^2.6.14 development
- webpack ^5.72.0 development
- webpack-cli ^4.9.2 development
- webpack-dev-server ^4.8.1 development
- webpack-merge ^5.8.0 development
- @citation-js/core ^0.5.4
- @citation-js/plugin-csl ^0.5.5
- @vuex-orm/core ^0.36.4
- axios ^0.26.0
- axios-cancel ^0.2.2
- d3-collection ^1.0.7
- lodash.clonedeep ^4.5.0
- lodash.debounce ^4.0.8
- nanoid ^3.3.2
- node-polyfill-webpack-plugin ^1.1.4
- vue legacy
- vuedraggable ^2.24.3
- vuex ^3.6.2
- beautifulsoup4 >=4.4.0
- orcid *
- requests *
- spacy *
- sqlalchemy *
- actions/checkout v3 composite
- commitizen-tools/commitizen-action master composite
- softprops/action-gh-release v1 composite
- actions/checkout v3 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish release/v1 composite
- naturalhistorymuseum/ckantest latest build
- ckan/ckan-solr 2.9
- redis latest
- package-edit file:assets/scripts/apps/package-edit
- mkdocs *
- mkdocs-gen-files *
- mkdocs-include-markdown-plugin *
- mkdocs-material *
- mkdocs-section-index *
- mkdocstrings *
- ckantools >=0.3.0
- fuzzywuzzy [speedup]
- nameparser *
- orcid *
- prompt_toolkit *
- requests *
- spacy [transformers]
- sqlalchemy *
- unidecode *
- actions/checkout v3 composite
- connor-baer/action-sync-branch main composite