dataverse - v6.7.1

Dataverse 6.7.1

This is a bug fix release for Dataverse 6.7.1 that fixes a performance problem when loading the "create dataset" and "edit dataset pages". For details see #11700.

Complete List of Changes

For the complete list of code changes in this release, see the 6.7.1 milestone in GitHub.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Upgrade Instructions

You only need to follow the redeployment instructions below if you had deployed the originally released dataverse-6.7.war in the few days before it was removed from the release page. If you followed the 6.7 instructions in their current form, you should already be running dataverse-6.7.1 below and do not need to do anything else.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.7.

0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.

Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

shell export PAYARA=/usr/local/payara6

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. List deployed applications

shell $PAYARA/bin/asadmin list-applications

2. Undeploy the previous version (should match "list-applications" above)

shell $PAYARA/bin/asadmin undeploy dataverse-6.7

3. Download and deploy this version

shell wget https://github.com/IQSS/dataverse/releases/download/v6.7/dataverse-6.7.1.war $PAYARA/bin/asadmin deploy dataverse-6.7.1.war

Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.

shell sudo service payara stop sudo rm -rf $PAYARA/glassfish/domains/domain1/generated sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases sudo service payara start

- Java
Published by ofahimIQSS 11 months ago

dataverse - v6.7

Dataverse 6.7

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.7 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!

Release Highlights

Highlights for Dataverse 6.7 include:

Keeping S3 storage working after December 2025
Limiting files per dataset
Curation status label enhancements
Configurable search services
Linking drafts
Dataset metadata exports from drafts
API for switching Datasets to DOIs, for example
TK Labels
Model Context Protocol (MCP) server
A new AI Guide
OJS 3 is now a supported integration
Tagged Docker images
Rate limiting statistics API
Infrastructure: Payara upgraded to 6.2025.3
Security fixes

Features Added

Keep S3 Storage Working After December 2025

To support S3 storage, Dataverse uses the AWS SDK. We have upgraded to v2 of this SDK because v1 reaches End Of Life (EOL) in December 2025.

As part of the upgrade, the payload-signing setting for S3 stores (dataverse.files.<id>.payload-signing) has been removed because it is no longer necessary. With the updated SDK, a payload signature will automatically be sent when required (and not sent when not required).

Dataverse developers should note that LocalStack is used to test S3 and older versions appear to be incompatible. The development environment has been upgraded to LocalStack v2.3.2 to v4.2.0, which seems to work fine.

Limiting Files Per Dataset

It's now possible to set a limit on the number of files that can be uploaded to a dataset. Limits can be set globally, per collection, or per dataset.

See also the guides, #11275, and #11359.

Curation Status Label Enhancements

The External/Curation Status Label mechanism has been enhanced:

adding tracking of who creates the status label and when
keeping a history of past statuses
updating the CSV report to include the creation time and assigner of a status
updating the getCurationStatus API call to return a JSON object for the status with label, assigner, and create time
adding an includeHistory query param for these API calls to allow seeing prior statuses
adding a facet to allow filtering by curation status (for users able to set them)
adding the creation time to Solr as a pdate to support search by time period, e.g. current status set prior to a given date
standardizing the language around "curation status" vs "external status"
adding a "curation-status" class to displayed labels to allow styling
adding a dataverse.ui.show-curation-status-to-all feature flag that allows users who can see a draft but not publish it to also view the curation status

Due to changes in the Solr schema (the addition of fields "curationStatus" and "curationStatusCreateTime"), updating the Solr schema and reindexing is required as described below in upgrade instructions. Background reindexing should be OK. See also #9247 and #11268.

Configurable Search Services

Dataverse now has an experimental capability to dynamically add and configure new search engines. The current Dataverse user interface can be configured to use a specified search engine instead of the built-in solr search. The Search API now supports an optional searchService query parameter that allows using any configured search engine. An additional /api/search/services endpoint allows discovery of the services installed.

In addition to two trivial example services designed for testing, Dataverse ships with two search engine classes that support calling an externally-hosted search service (via HTTP GET or POST). These classes rely on the internal solr search to perform access-control and to format the final results, simplifying development of such an external engine.

Details about the new functionality are described in the guides. See also #11281.

Linking Drafts

It is now possible to link draft datasets to other Dataverse collections. As usual, the datasets will only become publicly visible in the linked collection(s) after they have been published. To publish a linked dataset, your account must have the "Publish Dataset" permission for the Dataverse collection in which the dataset was originally created. Permissions in the linked Dataverse collections do not apply. See also #10134.

Dataset Metadata Can Be Exported From Draft Datasets (via API)

In previous versions of Dataverse, it was only possible to export metadata from published datasets. It is now possible to export metadata from draft datasets via API as long as you supply an API token that has access to the draft. As before, when exporting metadata from published datasets, only the latest published version is supported. Internal exporters have been updated to work with drafts but external exporters might need to be updated (Croissant definitely does). See "upgrade instructions" below for details. See the guides, #11305, and #11398.

API for Switching Datasets to DOIs, for Example

In some cases, you might want draft datasets to begin their life with a zero-cost PIDs such as Permalinks and later decide to give certain datasets a DOI. To support use cases like this, a new API for persistent identifier reconciliation has been added.

Here's how it works. An unpublished dataset can be updated with a new pidProvider. If a persistent identifier was already registered when the dataset was registered, this is undone and the new provider (if changed in the meantime) is used. Note that this change does not affect the storage repository where the old identifier is still used. See the guides, #10501, and #10567.

TK Labels

New API calls to find projects at https://localcontextshub.org associated with a dataset have been added. This supports integration via an external vocabulary script that allows users to associate such a project with their dataset and display the associated Notices and Tribal Knowledge Labels.

Connecting to LocalContexts requires a LocalContexts API Key. Using both the production and sandbox (test) LocalContexts servers are supported.

Model Context Protocol (MCP) Server for Dataverse

Model Context Protocol (MCP) is a standard for AI Agents to communicate with tools and services, announced in November 2024.

An MCP server for Dataverse has been deployed to https://mcp.dataverse.org, powered by the code at https://github.com/gdcc/mcp-dataverse written by Vyacheslav Tykhonov.

All are welcome to experiment with the MCP Server and give feedback in the thread on Google Group and Zulip. See also #11474.

AI Guide

Information about various Dataverse-related AI efforts have been documented in a new AI Guide. See also #11540 and #11541.

OJS 3 is Supported

OJS 3 (version 3.3 and higher) is now supported as an integration with Dataverse. See the guides and #11518 for details.

Tagged Docker Images

Container image management has been enhanced to provide better support for multiple Dataverse releases and improved maintenance workflows.

Versioned Image Tags: Application ("dataverse") and Config Baker images on Docker Hub now have versioned tags, supporting the latest three Dataverse software releases. This enables users to pin to specific versions (e.g. 6.7), providing better stability for production deployments. Previously, the "alpha" tag could be used, but it was always overwritten by the latest release. Now, you can choose the 6.7 tag, for example, to stay on that version. Please note that the "alpha" tag should no longer be used and will likely be deleted. The equivalent is the new "latest" tag.

Backport Support: Application and Config Baker image builds now support including code backports for past releases, enabling the delivery of security fixes and critical updates to older (supported) versions.

Enhanced Documentation: Container image documentation has been updated to reflect the new versioning scheme and maintenance processes.

Config Baker Base Image Change: The Config Baker image has been migrated from Alpine to Ubuntu as its base operating system, aligning with other container images in the project for consistency and better compatibility. The past releases have not been migrated, only future releases (6.7+) will use Ubuntu.

Workflow Responsibility Split: GitHub Actions workflows for containers have been reorganized with a clear separation of concerns:

container_maintenance.yml handles all release-time and maintenance activities
Other workflows focus solely on preview images for development merges and pull requests

These improvements provide more robust container image lifecycle management, better security update delivery, and clearer operational procedures for both development and production environments. See also the Container Guide, #10618, and #11477.

Rate Limiting Statistics API

A new Rate Limiting Statistics API gives insight into the current state of rate limiting such as the number of users being limited and the number of available bucket tokens for a command.

See also the guides, #11413, and #11424.

Payara 6.2025.3

The recommended Payara version has been updated to Payara-6.2025.3. See the upgrade instructions below and #11357.

Rclone Support

Rclone ("rsync for cloud storage") is a command-line program to sync files and directories to and from different cloud storage providers. As of version 1.70 Rclone supports Dataverse. See the announcement, the guides, #11608, and #11609.

Unique Filenames for Zip Downloads

The Data Access APIs that generate multi-file zipped bundles will offer file name suggestions based on the persistent identifiers (for example, doi-10.70122-fk2-xxyyzz.zip), instead of the fixed filename dataverse_files.zip as in prior versions. This means you'll see unique names in your "downloads" folder. See the guides, #9620, and #11466.

New Metadata Field Type: String

The "string" type has been added as a new field type for metadata fields.

In contrast to "text" fields, "string" fields are stored and indexed exactly as provided, without any text analysis or transformations.

This field type is suitable for fields like IDs (e.g. ORCIDs) or enums, where exact matches are required when searching.

Tabular Tags Can Now Be Replaced

Previously the API POST /files/{id}/metadata/tabularTags could only add new tags to the tabular tags list. Now with the query parameter ?replace=true the list of tags will be replaced.

See also the guides, #11292, and #11359.

Make Data Count Improvements

Counter Processor, used to power Make Data Count metrics in Dataverse, is now maintained in the https://github.com/gdcc/counter-processor repository. Multiple improvements to efficiency and scalability have been made. The example counter_daily.sh and counter_weekly.sh scripts that automate using Counter Processor, available from the MDC section of the Dataverse Guides have been updated to work with the latest Counter Processor release and also have minor improvements. See also #11489.

Improved Navigation for Guides

Navigation across the guides has been improved. You can now click in the upper left to go "home". The navbar has been simplified with fewer links. The bottom of every page now has "Next" and "Previous" links. A "Source" link at the bottom has also been added. See also #10942.

Video Subtitles (vtt Files)

Video subtitles (vtt files) are now supported and indexed using full text indexing, if configured.

All new files uploaded with a .vtt extension will be assigned the context type "text/vtt" and shown as "Web Video Text Tracks". See upgrade instructions below to convert existing files.

The upgrade instructions below also explain how to upgrade to v1.5 of the Dataverse Previewers, which includes an updated video previewer that supports subtitles. The new previewer version presents vtt files as subtitles for videos, and the naming convention is <video-basename>.<language-tag>.vtt. The previewer does not rely on the content type. A proper content type may hint users to ask permission for the subtitles together with a video.

Dataset Types Can Set Available Licenses

Licenses (e.g. "MIT") can now be linked to dataset types (e.g. "software") using new superuser APIs. The create Dataset Type APIs have been extended to allow you to set metadata blocks and/or licenses on the creation of a Dataset Type. (You can change both later.)

If a license is not available for a given Dataset Type then the Create Dataset API will prevent that license from being applied to the dataset. Also, the UI will only show those licenses that are available to the Dataset Type.

For more information, see the guides (overview, new APIs), #10520, and #11385.

Loading Metadata Blocks in Docker

The tutorial on running Dataverse in Docker has been updated to include how to load a metadata block and then update Solr to know about the new fields. See also #11004 and #11204.

Solr Indexing Speed Improved

The performance of Solr indexing has been significantly improved, particularly for datasets with many files.

A new dataverse.solr.min-files-to-use-proxy microprofile setting can be used to further improve performance/lower memory requirements for datasets with many files (e.g. 500+) (defaults to Integer.MAX, disabling use of the new functionality). See also #11374.

Improved Efficiency for Per-Request Filters

This release improves the performance of Dataverse's per-request handling of CORS Headers and API calls.

It adds new JVM options/Microprofile settings (starting with dataverse.cors and dataverse.api) replacing the now deprecated database settings (starting with :BlockedApi and :AllowCors). (See "new settings" and "deprecated settings" below for a full list.)

Additional changes:

CORS headers can now be configured with a list of desired origins, methods, and allowed and exposed headers.
An X-Dataverse-unblock-key header has been added that can be used instead of the less secure unblock-key query parameter when the :BlockedApiPolicy is set to unblock-key.
Warnings have been added to the log if the Blocked API settings are misconfigured or if the key is weak (when the unblock-key policy is used).
The new dataverse.api.blocked.key can be configured using Payara password aliases or other secure storage options.

New OIDC Feature Flag

A new feature flag called API_BEARER_AUTH_USE_BUILTIN_USER_ON_ID_MATCH has been introduced, which allows the use of a built-in user account when an identity match is found during OIDC API bearer token authentication.

This feature enables automatic association of an incoming Identity Provider (IdP) identity with an existing built-in user account, bypassing the need for additional user registration steps.

See the guides, #11193, #11197, and #11314.

dataverse-metadata-crawler

The dataverse-metadata-crawler was added to the guides. See #11581.

Bugs Fixed

Reduced Chance of Losing Metadata Edits

Changes were made to the "edit dataset metadata" page to reduce the chance of losing metadata edits.

The remedy for the problem consists of two parts: - Do not show the "Host Dataverse" field when there is nothing to choose. This mimics the behaviour for templates. - When you accidentally start typing in the "Host Dataverse" field, undo the change with backspace, fill in the other metadata fields and save the draft, the page used to get blocked due to an exception. Reloading the page would erase all your input. The exception (caused by an invalid argument) is remedied returning the currently selected value.

Improved "Role Has Already Been Granted" Message

A simple "role has already been granted" message is now given, fixing a bug where "dataset" was incorrectly indicated instead of "collection". See also #11191 and #11362.

NcML Previewer Bug Fix

Dataverse Previewers v1.4 contains a bug in the NcML previewer that prevented it from working with signed URLs. See #11252 for screenshots.

This has been fixed in the "betatest" and v1.5 versions of the previewer. See also #11252 and #11311. Upgrading to v1.5 of all previewers is recommended in the upgrade instructions below.

Search API Bug Fix

The Search API now returns all type totals (Dataverses, Dataset, and Files) regardless of the list of types requested. None requested types were returned with total count set to 0. &type=dataverse&type=dataset would result in "Files": 0 since type=file was not requested. Now all counts show the correct totals. See also #11280.

Other Bug Fixes

Deeply nested compound fields are not (yet) supported by Dataverse but the Search API now properly avoids returning duplicate values for them. See #11172.
An issue causing more than one edit of a versionNote to fail, when done without a page refresh, has been fixed. See #11394.
The deaccessionedReason was missing in the fileDifferenceSummary json object returned by API GET "$SERVER_URL/api/files/{ID}/versionDifferences". See #11438.
Memory usage has been reduced and potential memory leaks closed in the metadata exporters. See #11417.

API Updates

Update File Metadata API

A new API endpoint has been added to allow updating file metadata for one or more files in a dataset. See the Native API documentation for details on usage and #11271.

Extend Restrict API to Include New Attributes

The "restrict" API only allowed for a boolean to update the restricted attribute of a file. For backward compatibility, this is still supported, but now a richer JSON object can be passed instead. The JSON object allows for the required restrict flag as well as optional attributes: enableAccessRequest and termsOfAccess. If enableAccessRequest is false then the termsOfAccess text must also be included.

See the guides, #11299, and #11349.

Categories Can Now Be Replaced

Previously the API POST /files/{id}/metadata/categories could only add new categories to the categories list. Now with the query parameter ?replace=true the list of categories will be replaced.

See also the guides, #11401, and #11359.

Application Terms of Use Available via API

It's now possible to retrieve the Application Terms of Use (called General Terms of Use in the UI) via API. These are the terms users agree to when creating an account. See the guides, #11415 and #11422.

dvObject and type Fields Added to Featured Items

Dataverse Featured Items can now be linked to Dataverses, Datasets, or Datafiles.

Pre-existing featured items as well as new items without dvObjects will be defaulted to type=custom. See also #11414.

Edit Dataset Metadata: Removing Fields

The "edit dataset metadata" endpoint now allows removing fields (by sending empty values) as long as they are not required by the dataset. See also #11243.

Edit Dataset Metadata: Prevent Inconsistencies

A new sourceInternalVersionNumber optional query parameter, which prevents inconsistencies by managing updates that may occur from other users while a dataset is being edited. See also #11243.

api/roles/userSelectable

A new endpoint (api/roles/userSelectable) has been implemented, which returns the appropriate roles that the calling user can use as filters when searching within their data. See #11434.

Security Updates

This release contains important security updates. If you are not receiving security notices, please sign up by following the steps in the guides.

Updates for Documentation Writers

Sphinx Upgraded

Sphinx has been upgraded to 7.4.0 and new dependencies have been added, including semver. Please re-run the pip install -r requirements.txt setup step to upgrade your environment. Otherwise you might see an error like ModuleNotFoundError: No module named 'semver'.

Updates for Developers

Development of Dataverse on Windows

Development of Dataverse on Windows has been confirmed to work as long as you use WSL rather than cmd.exe. See the updated quickstart, the rewritten page on Windows, #10606, and #11583.

Keycloak SPI for Built-In Users

A Keycloak SPI, builtin-users-spi, has been implemented that allows the use of Keycloak on instances with built-in accounts for OIDC authentication, enabling the use of the SPA on those instances.

Looking ahead, this authenticator SPI could also support mapping Shibboleth users coming in through Keycloak to existing Shib users without changing the provider in the Dataverse database. However, this would require changes to the storage provider to support more than just built-in users.

The SPI code is available in the Dataverse code repository (conf/keycloak/builtin-users-spi).

File Previews Available in Dev Environment, More Docs

In Dataverse 6.5 File Previewers were enabled in the "demo or eval" containerized (Dockerized) environment (#11025). These previewers are now available in the development environment as well and documentation has been added explaining how to configure them. See also #10506 and #11181.

XML Parsers

The configuration of XML parsers used in Dataverse has been centralized and unused functionality has been turned off to enhance security. See #11619.

End-Of-Life (EOL) Announcements

Whole Tale EOL

Unfortunately, the Whole Tale project is no longer active and has been removed from the list of integrations in the Admin Guide. See #11497.

New Settings

The following settings have been added:

dataverse.api.blocked.policy: Policy for blocking API endpoints
dataverse.api.blocked.endpoints: List of API endpoints to be blocked (comma-separated)
dataverse.api.blocked.key: Key for unblocking API endpoints
dataverse.bagit.sourceorg.name
dataverse.cors.origin: Allowed origins for CORS requests
dataverse.cors.methods: Allowed HTTP methods for CORS requests
dataverse.cors.headers.allow: Allowed headers for CORS requests
dataverse.cors.headers.expose: Headers to expose in CORS responses
dataverse.files.hide-schema-dot-org-download-urls: now configurable via MicroProfile Config, see #11482
dataverse.localcontexts.url
dataverse.localcontexts.api-key
dataverse.search.services.directory
dataverse.search.default-service
dataverse.solr.min-files-to-use-proxy
dataverse.ui.show-curation-status-to-all
:GetExternalSearchUrl
:GetExternalSearchName
:PostExternalSearchUrl
:PostExternalSearchName

Deprecated Settings

bagit.SourceOrganization entry in Bundle.properties
:AllowCors
:BlockedApiPolicy
:BlockedApiEndpoints
:BlockedApiKey

Removed Settings

dataverse.files.<id>.payload-signing: See #11360

Backward Incompatible Changes

Generally speaking, see the API Changelog for a list of backward-incompatible API changes.

showmydata removed from Search API

An undocumented Search API parameter called "showmydata" has been removed. It was never exercised by tests and is believed to be unused. API users should use the MyData API instead. See #11287 and #11375.

curationStatus API

/api/datasets/{id}/curationStatus API now includes a JSON object with curation label, createtime, and assigner rather than a string label and it supports a new boolean includeHistory parameter (default false) that returns a JSON array of statuses. See #11268.

listCurationStates API

/api/datasets/{id}/listCurationStates includes new columns "Status Set Time" and "Status Set By" columns listing the time the current status was applied and by whom. It also supports the boolean includeHistory parameter. See #11268.

XML serialization of empty elements

Due to updates in libraries used by Dataverse, XML serialization may have changed slightly with respect to whether self-closing tags are used for empty elements. This primarily affects XML-based metadata exports. The XML structure of the export itself has not changed, so this is only an incompatibility if you are not using an XML parser. See #11360.

Complete List of Changes

For the complete list of code changes in this release, see the 6.7 milestone in GitHub.

Getting Help

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!

Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!

You are also very welcome to join the Global Dataverse Community Consortium (GDCC).

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.6.

0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.

Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

shell export PAYARA=/usr/local/payara6

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. List deployed applications

shell $PAYARA/bin/asadmin list-applications

2. Undeploy the previous version (should match "list-applications" above)

shell $PAYARA/bin/asadmin undeploy dataverse-6.6

3. Stop Payara

shell sudo service payara stop

4. Upgrade to Payara-6.2025.3

The steps below reuse your existing domain directory with the new distribution of Payara. You may also want to review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara Release Notes. We also recommend you ensure you followed all update instructions from the past releases regarding Payara. (The most recent Payara update was for v6.6.)

Move the current Payara directory out of the way:

shell mv $PAYARA $PAYARA.6.2025.2

Download the new Payara version 6.2025.3 (from https://www.payara.fish/downloads/payara-platform-community-edition/ or https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2025.3/payara-6.2025.3.zip), and unzip it in its place:

shell cd /usr/local unzip payara-6.2025.3.zip

Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1:

shell mv payara6/glassfish/domains/domain1 payara6/glassfish/domains/domain1_DIST mv payara6.6.2025.2/glassfish/domains/domain1 payara6/glassfish/domains/

5. Download and deploy this version

shell wget https://github.com/IQSS/dataverse/releases/download/v6.7/dataverse-6.7.war $PAYARA/bin/asadmin deploy dataverse-6.7.war

Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.

shell sudo service payara stop sudo rm -rf $PAYARA/glassfish/domains/domain1/generated sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases sudo service payara start

6. For installations with internationalization or text customizations:

Please remember to update translations via Dataverse language packs.

If you have text customizations you can get the latest English files from https://github.com/IQSS/dataverse/tree/v6.7/src/main/java/propertyFiles.

7. Restart Payara

shell sudo service payara stop sudo service payara start

8. If you have enabled the Croissant exporter, update it and run reExportAll to update dataset metadata exports

After Dataverse 6.6 was released on 2024-03-18, two versions of the Croissant exporter have been released. You are encouraged to upgrade to the latest version, which is 0.1.5.

Under "installation" at the README at https://github.com/gdcc/exporter-croissant you'll find instructions about upgrading the Croissant exporter. In the same repo you can find a changelog if you are curious about what has changed.

Afterwards, we recommend reexporting all dataset metadata. (Reexporting just a single export format, like Croissant, is not supported.) Below is the simple way to reexport all dataset metadata. For more advanced usage, please see the guides.

shell curl http://localhost:8080/api/admin/metadata/reExportAll

9. Archival bags

If you are using archival bags, be sure that the dataverse.bagit.sourceorg.name JVM option is set.

Archival Bags now use the JVM option dataverse.bagit.sourceorg.name in generating the bag-info.txt file's "Internal-Sender-Identifier" (in addition to its use for "Source-Organization") rather than pulling the value from a deprecated bagit.SourceOrganization entry in Bundle.properties ("Internal-Sender-Identifier" is generated by appending " Catalog" in both cases). Sites using archival bags would not see a change if these settings were already using the same value. See #10680 and #11416.

10. API Filters

Per-request filtering has been improved. Migrate to the new settings as explained below as the old settings have been deprecated.

The deprecated database settings will continue to work in this version. To use the new settings (which are more efficient),

If :AllowCors is not set or is true:

shell bin/asadmin create-jvm-options -Ddataverse.cors.origin=*

Optionally set origin to a list of hosts and/or set other CORS JvmSettings Your currently blocked API endpoints can be found at http://localhost:8080/api/admin/settings/:BlockedApiEndpoints

Copy them into the new setting with the following command. As with the deprecated setting, the endpoints should be comma-separated.

shell bin/asadmin create-jvm-options '-Ddataverse.api.blocked.endpoints=<current :BlockedApiEndpoints>'

If :BlockedApiPolicy is set and is not 'drop'

shell bin/asadmin create-jvm-options '-Ddataverse.api.blocked.policy=<current :BlockedApiPolicy>' If :BlockedApiPolicy is 'unblock-key' and :BlockedApiKey is set

shell `echo "API_BLOCKED_KEY_ALIAS=<value of :BlockedApiKey>" > /tmp/dataverse.api.blocked.key.txt`

shell sudo -u dataverse /usr/local/payara6/bin/asadmin create-password-alias --passwordfile /tmp/dataverse.api.blocked.key.txt

When you are prompted "Enter the value for the aliasname operand", enter api_blocked_key_alias

You should see "Command create-password-alias executed successfully."

shell bin/asadmin create-jvm-options '-Ddataverse.api.blocked.key=${ALIAS=api_blocked_key_alias}'

Restart Payara:

shell service payara restart

Check server.log to verify that your new settings are in effect.

Cleanup: delete deprecated settings:

shell curl -X DELETE http://localhost:8080/api/admin/settings/:AllowCors curl -X DELETE http://localhost:8080/api/admin/settings/:BlockedApiEndpoints curl -X DELETE http://localhost:8080/api/admin/settings/:BlockedApiPolicy curl -X DELETE http://localhost:8080/api/admin/settings/:BlockedApiKey

11. Upgrade to Dataverse Previewers v1.5

Dataverse Previewers has been upgraded to v1.5. See the announcement for upgrade instructions.

12. Re-detect video subtitle (vtt) files

Existing files with extension ".vtt" will keep the content type application/octet-stream presented as "Unknown". The following query shows the number of files per extension with an "Unknown" content type:

sql SELECT substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2)) AS extension, COUNT(*) as count FROM datafile f LEFT JOIN filemetadata m ON f.id = m.datafile_id WHERE f.contenttype = 'application/octet-stream' GROUP BY extension;

If vtt does not appear in the result, you are done. Otherwise, you may want to update the content type for existing files and reindex those datasets.

First figure out which datasets would need reindexing:

sql select distinct o.protocol, o.authority, o.identifier, v.versionnumber, v.minorversionnumber, v.versionstate from datafile f left join filemetadata m on f.id = m.datafile_id left join datasetversion v on v.id = m.datasetversion_id left join dvobject o on o.id = v.dataset_id WHERE contenttype = 'application/octet-stream' AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2)) ;

Then update the content type for the files:

sql UPDATE datafile SET contenttype = 'text/vtt' WHERE id IN ( SELECT datafile_id FROM filemetadata m WHERE contenttype = 'application/octet-stream' AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2)) );

The vtt files will be reindexed in a step below.

13. Update Solr schema and reindex

Due to changes in the Solr schema (the addition of fields "curationStatus" and"curationStatusCreateTime"), updating the Solr schema and reindexing is required.

Download the updated schema.xml file:

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/schema.xml cp schema.xml /usr/local/solr/solr-9.8.0/server/solr/collection1/conf

13a. For installations with additional metadata blocks or external controlled vocabulary scripts, update fields

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide).
Run the update-fields.sh script that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your Solr installation):

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.7/conf/solr/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.8.0/server/solr/collection1/conf/schema.xml

Note that Docker-based installations use a different directory: solr/data/data/collection1/conf/schema.xml.

Start Solr instance (usually service solr start depending on Solr/OS).

14. Reindex Solr

shell curl http://localhost:8080/api/admin/index

- Java
Published by ofahimIQSS 11 months ago

dataverse - v6.6

Dataverse 6.6

Please note: Dataverse 6.6 was released in March 2025 but GitHub shows a newer date because we had to the tag and the master branch in git. (For the gory details, please see the doc about it.) The war file and dvinstall.zip are original, as released in March 2025.

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.6 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!

Release Highlights

Highlights for Dataverse 6.6 include:

metadata fields can be "display on create" per collection
ORCIDs linked to accounts
version notes
harvesting from DataCite
citations using Citation Style Language (CSL)
license metadata enhancements
metadata fields now support range searches (dates, integers, etc.)
more accurate search highlighting
collections can be moved by using the superuser dashboard
new 3D Objects metadata block
new Archival metadata block (experimental)
optionally prevent publishing of datasets without files
Signposting output now contains links to all dataset metadata export formats
infrastructure updates (Payara and Solr)

In a recent community call, we talked about many of these highlights if you'd like to watch the video (around 22:30).

Features Added

Metadata Fields Can Be "Display on Create" Per Collection

Collection administrators can now configure which metadata fields appear during dataset creation through the displayOnCreate property, even when fields are not required. This provides greater control over metadata visibility and can help improve metadata completeness.

Currently this feature can only be configured via API, but a UI implementation is planned in #11221. See #10476, #11224, and #11312.

ORCIDs Linked to Accounts

Dataverse now includes improved integration with ORCID, supported through a grant to GDCC from the (ORCID Global Participation Fund).

Specifically, Dataverse users can now link their Dataverse account with their ORCID profile. Previously, this was only available to users who logged in with ORCID. Once linked, Dataverse will automatically prepopulate their ORCID to their author metadata when they create a dataset.

This functionality leverages Dataverse's existing support for login via ORCID, but can be turned on independently of it. If ORCID login is enabled, the user's ORCID will automatically be added to their profile. If the user has logged in via some other mechanism, they are able to click a button to initiate a similar authentication process in which the user must login to their ORCID account and approve the connection.

Feedback from installations that enable this functionality is requested and we expect that updates can be made in the next Dataverse release.

See the User Guide, Installation Guide, #7284, and #11222.

Version Notes

Dataverse now supports the option of adding a version note before or during the publication of a dataset. These notes can be used, for example, to indicate why a version was created or how it differs from the prior version. Whether this feature is enabled is controlled by the flag dataverse.feature.enable-version-note. Version notes are shown in the user interface (in the dataset page version table), indexed (as versionNote), available via the API, and have been added to the JSON, DDI, DataCite, and OAI-ORE exports.

With the addition of this feature, work has been done to clean-up and rename fields that have been used for specifying the reason for deaccessioning a dataset and providing an optional link to a non-Dataverse location where the dataset still can be found. The former was listed in some JSON-based API calls and exports as "versionNote" and is now "deaccessionNote", while the latter was referred to as "archiveNote" and is now "deaccessionLink".

Further, some database consolidation has been done to combine the deaccessionlink and archivenote fields, which appear to have both been used for the same purpose. The deaccessionlink database field is older and also was not displayed in the current UI. Going forward, only the deaccessionlink column exists.

See the User Guide, API Guide #8431, and #11068.

OAI-PMH Harvesting from DataCite

DataCite maintains an OAI server (https://oai.datacite.org/oai) that serves records for every DOI they have registered. There's been a lot of interest in the community in being able to harvest from them. This way, it will be possible to harvest metadata from institution X even if the institution X does not maintain an OAI server of their own, if they happen to register their DOIs with DataCite. One extra element of this harvesting model that makes it especially powerful and flexible is the DataCite's concept of a "dynamic OAI set": a harvester is not limited to harvesting the pre-defined set of ALL the records registered by the institution X, but can instead harvest virtually any arbitrary subset thereof; any query that the DataCite search API understands can be used as an OAI set. The feature is already in use at Harvard Dataverse, as a beta version patch.

For various reasons, in order to take advantage of this feature harvesting clients must be created using the /api/harvest/clients API. Once configured however, harvests can be run from the Harvesting Clients control panel in the UI.

DataCite-harvesting clients must be configured with 2 new feature flags, useListRecords and useOaiIdentifiersAsPids (added in Dataverse 6.5). Note that these features may be of use when harvesting from other sources, not just from DataCite.

See the Admin Guide, API Guide, #10909, and #11011.

Citations Using Citation Style Language (CSL)

This release adds support for generating citations in any of the standard independent formats specified using the Citation Style Language.

The CSL formats are available to copy/paste if you click "Cite Dataset" and then "View Styled Citations" on the dataset page. An API call to retrieve a dataset citation in EndNote, RIS, BibTeX, and CSLJson format has also been added. The first three have been available as downloads from the UI (CSLJson is not) but have not been directly accessible via API until now. The CSLJson format is new to Dataverse and can be used with open source libraries to generate all of the other CSL-style citations.

Admins can use a new dataverse.csl.common-styles setting to highlight commonly used styles. Common styles are listed in the pop-up, while others can be found by type-ahead search in a list of 1000+ options.

See the User Guide, Settings, API Guide, and #11163.

License Metadata Enhancements

Added new fields to licenses: rightsIdentifier, rightsIdentifierScheme, schemeUri, languageCode. See JSON files under Adding Licenses in the guides
Updated DataCite metadata export to include rightsIdentifier, rightsIdentifierScheme, and schemeUri consistent with the DataCite 4.5 schema and examples
Enhanced metadata exports to include all new license fields
Existing licenses from the example set included with Dataverse will be automatically updated with new fields
Existing API calls support the new optional fields

See below for upgrade instructions. See also #10883 and #11232.

Range Search

This release enhances how numerical and date fields are indexed in Solr. Previously, all fields were indexed as English text (text_en), but with this update:

Integer fields are indexed as plong
Float fields are indexed as pdouble
Date fields are indexed as date_range (solr.DateRangeField)

This change enables range queries when searching from both the UI and the API, such as dateOfDeposit:[2000-01-01 TO 2014-12-31] or targetSampleActualSize:[25 TO 50]. See below for a full list of fields that now support range search.

Additionally, search result highlighting is now more accurate, ensuring that only fields relevant to the query are highlighted in search results. If the query is specifically limited to certain fields, the highlighting is now limited to those fields as well. See #10887.

Specifically, the following fields were updated:

coverage.Depth
coverage.ObjectCount
coverage.ObjectDensity
coverage.Redshift.MaximumValue
coverage.Redshift.MinimumValue
coverage.RedshiftValue
coverage.SkyFraction
coverage.Spectral.CentralWavelength
coverage.Spectral.MaximumWavelength
coverage.Spectral.MinimumWavelength
coverage.Temporal.StartTime
coverage.Temporal.StopTime
dateOfCollectionEnd
dateOfCollectionStart
dateOfDeposit
distributionDate
dsDescriptionDate
journalPubDate
productionDate
resolution.Redshift
targetSampleActualSize
timePeriodCoveredEnd
timePeriodCoveredStart

New 3D Objects Metadata Block

A new metadata block has been added for describing 3D object data. You can download it from the guides. See also #11120 and #11167.

All new Dataverse installations will receive this metadata block by default. We recommend adding it by following the upgrade instructions below.

New Archival Metadata Block (Experimental)

An experimental "Archival" metadata block has been added, downloadable from the User Guide. The purpose of the metadata block is to enable repositories to register metadata relating to the potential archiving of the dataset at a depositor archive, whether that being your own institutional archive or an external archive, i.e. a historical archive. Feedback is welcome! See also #10626.

Prevent Publishing of Datasets Without Files

Datasets without files can be optionally prevented from being published through a new "requireFilesToPublishDataset" boolean defined at the collection level. This boolean can be set only via API and only by a superuser. See Change Collection Attributes. If the boolean is not set, the parent collection is consulted. If you do not set the boolean, the existing behavior of datasets being able to be published without files will continue. Superusers can still publish datasets whether or not the boolean is set. See #10981 and #10994.

Metadata Source Facet Can Now Differentiate Between Harvested Sources

The behavior of the feature flag index-harvested-metadata-source and the "Metadata Source" facet, which were added and updated, respectively, in Dataverse 6.3 (through pull requests #10464 and #10651), have been updated. A new field called "Source Name" has been added to harvesting clients.

Before Dataverse 6.3, all harvested content (datasets and files) appeared together under "Harvested" under the "Metadata Source" facet. This is still the behavior of Dataverse out of the box. Since Dataverse 6.3, enabling the index-harvested-metadata-source feature flag (and reindexing) resulted in harvested content appearing under the nickname for whatever harvesting client was used to bring in the content. This meant that instead of having all harvested content lumped together under "Harvested", content would appear under "client1", "client2", etc.

With this release, enabling the index-harvested-metadata-source feature flag, populating a new field for harvesting clients called "Source Name" ("sourceName" in the API), and reindexing (see upgrade instructions below) results in the source name appearing under the "Metadata Source" facet rather than the harvesting client nickname. This gives you more control over the name that appears under the "Metadata Source" facet and allows you to reuse the same source name to group harvested content from various harvesting clients under the same name if you wish.

Previously, index-harvested-metadata-source was not documented in the guides, but now you can find information about it under Feature Flags. See also #10217 and #11217.

Globus Framework Improvements

The improvements and optimizations in this release build on top of the earlier work (such as #10781). They are based on the experience gained at IQSS as part of the production rollout of the Large Data Storage services that utilizes Globus.

The changes in this release focus on improving Globus downloads, i.e., transfers from Dataverse-linked Globus volumes to users' Globus collections. Most importantly, the mechanism of "Asynchronous Task Monitoring", first introduced in #10781 for uploads, has been extended to handle downloads as well. This generally makes downloads more reliable, specifically in how Dataverse manages temporary access rules granted to users, minimizing the risk of consequent downloads failing because of stale access rules left in place.

Multiple other improvements have been made making the underlying Globus framework more reliable and robust.

See globus-use-experimental-async-framework under Feature Flags and dataverse.files.globus-monitoring-server in the Installation Guide, #11057, and #11125.

OIDC Bearer Tokens

The release extends the OIDC API auth mechanism, available through feature flag api-bearer-auth, to properly handle cases where BearerTokenAuthMechanism successfully validates the token but cannot identify any Dataverse user because there is no account associated with the token.

To register a new user who has authenticated via an OIDC provider, a new endpoint has been implemented (/users/register). A feature flag named api-bearer-auth-provide-missing-claims has been implemented to allow sending missing user claims in the request JSON. This is useful when the identity provider does not supply the necessary claims. However, this flag will only be considered if the api-bearer-auth feature flag is enabled. If the latter is not enabled, the api-bearer-auth-provide-missing-claims flag will be ignored.

A feature flag named api-bearer-auth-handle-tos-acceptance-in-idp has been implemented. When enabled, it specifies that Terms of Service acceptance is managed by the identity provider, eliminating the need to explicitly include the acceptance in the user registration request JSON.

See the guides, #10959, and #10972.

Signposting Output Now Contains Links to All Dataset Metadata Export Formats

When Signposting was added in Dataverse 5.14 (#8981), it provided links only for the schema.org metadata export format.

The output of HEAD, GET, and the Signposting "linkset" API have all been updated to include links to all available dataset metadata export formats, including any external exporters, such as Croissant, that have been enabled.

This provides a lightweight machine-readable way to first retrieve a list of links, such as via a HTTP HEAD request, to each available metadata export format and then follow up with a request for the export format of interest.

In addition, the content type for the schema.org dataset metadata export format has been corrected. It was application/json and now it is application/ld+json.

Dataset Types Can Be Linked to Metadata Blocks

Metadata blocks, such as (e.g. "CodeMeta") can now be linked to dataset types (e.g. "software") using new superuser APIs.

This will have the following effects for the APIs used by the new Dataverse UI:

The list of fields shown when creating a dataset will include fields marked as "displayoncreate" (in the tsv/database) for metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.
The metadata blocks shown when editing a dataset will include metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.

Mostly in order to write automated tests for the above, a displayOnCreate API endpoint has been added.

For more information, see the guides (overview, new APIs), #10519 and #11001.

Other Features

In addition to the API Move a Dataverse Collection, it is now possible for a Dataverse administrator to move a collection using the Dataverse dashboard. See #10304 and #11150.
The Preview URL popup and related documentation have been updated to give more information about anonymous access, including the names of the dataset fields that will be withheld from the Anonymous Preview URL user and to suggest how to review the URL before releasing it. See also #11159 and #11164.
ROR (Research Organization Registry) has been added as an Author Identifier Type for when the author is an organization rather than a person. Like ORCID, ROR will appear in the "Datacite" metadata export format. See #11075 and #11118.
The publisher value of harvested datasets is now attributed to the dataset's distributor instead of its producer. This improves the citation associated with these datasets, but the change affects only newly harvested datasets. See "Upgrade Instructions" below on how to re-harvest. For more information, see the guides, #8739, and #9013.
A new harvest status differentiates between a complete harvest with errors ("completed with failures") and without errors ("completed"). Also, harvest status labels are now internationalized. See #9294 and #11017.
The OAI-ORE exporter can now export metadata containing nested compound fields or compound fields within compound fields. See #10809 and #11190.
It is now possible to edit a custom role with the same alias. See #8808 and #10612.
The Metadata Customization documentation has been updated to explain how to implement a boolean fieldtype (look for "boolean"). See #7961 and #11064.
The version of Stata files is now detected during S3 direct upload (as it was for normal uploads), allowing ingest of Stata 14 and 15 files that have been uploaded directly. See the guides #10108, and #11054.
It is now possible to populate the "Keyword" metadata field from an OntoPortal service. The code has been shared to the GDCC dataverse-external-vocab-support GitHub repository. See #11258.
Support for legacy configuration of a PermaLink PID provider, such as using the :Protocol,:Authority, and :Shoulder settings, has been fixed. See #10516 and #10521.
On the home page for each guide (User Guide, etc.) there was an overwhelming amount of information in the form of a deeply nested table of contents. The depth of the table of contents has been reduced to two levels, making the home page for each guide more readable. Compare the User Guide for 6.5 vs. 6.6 and see #11166.
For compliance with GDPR and other privacy regulations, advice on adding a cookie consent popup has been added to the guides. See the new cookie consent section and #10320.
A new file has been added to import the French Open License to Dataverse: licenseEtalab-2.0.json. You can download it from the guides. This license, which is compatible with the Creative Commons license, is recommended by the French government for open documents. See #9301, #9302, and #11302.
The API that lists versions of a dataset now features an optional excludeMetadataBlocks parameter, which defaults to "false" for backward compatibility. For a dataset with a large number of versions and/or metadataBlocks, having the metadata blocks included can dramatically increase the volume of the output. See also the guides, #10171, and #10778.
Deeply nested metadata fields are not supported but the code used to generate the Solr schema has been adjusted to support them. See #11136.
The tutorial on running Dataverse in Docker has been updated to explain how to configure the root collection using a JSON file (#10541 and #11201) and now uses the Permalink PID provider instead of the FAKE DOI Provider (#11107 and #11108).
Payara application server has been upgraded to version 6.2025.2. See #11126 and #11128.
Solr has been upgraded to version 9.8.0. See #10713.
For testing purposes, the FAKE PID provider can now be used with file PIDs enabled. (The FAKE provider is not recommended for any production use.) See #10979.

Bugs Fixed

A bug which causes users of the Anonymous Review URL to have some metadata of published datasets withheld has been fixed. See #11202 and #11164.
A bug that caused ORCIDs starting with "https://orcid.org/" entered as author identifier to be ignored when creating the DataCite metadata has been fixed. This primarily affected users of the ORCID external vocabulary script; for the manual entry form, we used to recommend not using the URL form. The display of authorIdentifier, when not using any external vocabulary scripts, has been improved so that either the plain identifier (e.g. "0000-0002-1825-0097") or its URL form (e.g. "https://orcid.org/0000-0002-1825-0097") will result in valid links in the display (for identifier types that have a URL form). The URL form is now recommended when doing manual entry. See #11242 and #11242.
Multiple small issues with the formatting of PIDs in the DDI exporters, and EndNote and BibTeX citation formats have been addressed. These should improve the ability to import Dataverse citations into reference managers and fix potential issues harvesting datasets using PermaLinks. See #10768, #10769, #11165, and #10790.
On the Advanced Search page, the metadata fields are now displayed in the correct order as defined in the TSV file via the displayOrder value, making the order the same as when you view or edit metadata. Note that fields that are not defined in the TSV file, like the "Persistent ID" and "Publication Date", will be displayed at the end. See #11272 and #11279.
Bugs that caused 1) guestbook questions to appear along with terms of use/terms of access in the request access dialog when no guestbook was configured, and 2) terms of access to not be shown when using the per-file request access/download menu items have been fixed. Text related to configuring the choice to have guestbooks appear when file access is requested or when files are downloaded has been updated to make it clearer that this affects only datasets where guestbooks have been configured. See #11203.
The file page version table now shows whether a file has been replaced. See #11142 and #11145.
We fixed an issue where draft versions of datasets were sorted using the release timestamp of their most recent major version. This caused newer drafts to appear incorrectly alongside their corresponding major version, instead of at the top, when sorted by "newest first". Sorting now uses the last update timestamp when sorting draft datasets. The sorting behavior of published major and minor dataset versions is unchanged. There is no need to reindex datasets because Solr is being upgraded (see "Upgrade Instructions"), which will result in an empty database that will be reindexed. See #11178.
Some external controlled vocabulary scripts/configurations, when used on a metadata field that is single-valued, could result in indexing failure for the dataset, e.g. when the script tried to index both the identifier and name of the identified entity for indexing. Dataverse has been updated to correctly indicate the need for a multi-valued Solr field in these cases in the call to /api/admin/index/solr/schema. Configuring the Solr schema and running the update-fields.sh script as usually recommended when using custom metadata blocks (see "Upgrade Instructions") will resolve the issue. See the guides, #11095, and #11096.
The OpenAIRE metadata export format can now correctly process one or multiple productionPlaces as geolocation. See #9546 and #11194
We fixed a bug that caused adding free-form provenance to a file to fail. See #11145.
A bug has been fixed which could cause publication of datasets to fail in cases where they were not assigned a DOI at creation. See #11234 and #11236.
When users request access to files, the people who have permission to grant access received an email with a link that didn't work due to a trailing period (full stop) right next to the link, e.g. https://demo.dataverse.org/permissions-manage-files.xhtml?id=9. A space has been added to fix this. See #10384 and #11115.
Harvesting clients now use the correct granularity while re-running a partial harvest, using the from parameter. The correct granularity comes from the Identify verb request. See #11020 and #11038.
Access requests were missing on the File Permission page after upgrading from Dataverse 6.0. This has been corrected with a database update script. See #10714 and #11061.
When a dataset has a long running lock, including when it is "in review", Dataverse will now slow the page refresh rate over time. See #11264 and #11269.
The /api/info/metrics/files/monthly API call had a bug that resulted in files being counted each time they were published in a new version if those publication events occurred in different months. This resulted in an over-count. The /api/info/metrics/files and /api/info/metrics/files/toMonth API calls had a bug that resulted in files that were published but no longer in the latest published version as of the specified date (now, or the date entered in the /toMonth variant). This resulted in an under-count. See #11189.
DatasetFieldTypes in MetadataBlock response that are also a child of another DatasetFieldType were being returned twice. The child DatasetFieldType was included in the "fields" object as well as in the "childFields" of its parent DatasetFieldType. This fix suppresses the standalone object so only one instance of the DatasetFieldType is returned (in the "childFields" of its parent). This fix changes the JSON output of the API /api/dataverses/{dataverseAlias}/metadatablocks (see "Backward Incompatible Changes", below). See #10472 and #11066.
A bug that caused replacing files via API when file PIDs were enabled to fail has been fixed. See #10975 and #10979.
The :CustomDatasetSummaryFields setting now allows spaces along with a comma separating field names. In addition, a bug that caused license information to be hidden if there are no values for any of the custom fields specified has been fixed. See #11228 and #11229.
Dataverse 6.5 introduced a bug which causes search to fail for non-superusers in multiple groups when the AVOID_EXPENSIVE_SOLR_JOIN feature flag is set to true. This release fixes the bug. See #11133 and #11134.
We fixed a bug with My Data where listing collections for a user with only rights on harvested collections would result in a server error response. See #11083.
Minor styling fixes for the Related Publication field and fields using ORCID or ROR have been made. See #11053, #10964, and #11106.
In the Search API, files were displaying DRAFT version instead of latest released version under dataset_citation. See #10735 and #11051.
Unnecessary Solr documents were being created when a file was added or deleted from a draft dataset. These documents could accumulate and potentially impact performance. There is no action to take because this release includes a new Solr version, which will start with an empty database. See #11113 and #11114.
When using the API to update a collection, omitting optional fields such as inputLevels, facetIds, or metadataBlockNames caused data to be deleted. The fix no longer deletes data for these fields. Two new flags have been added to the metadataBlocks JSON object to signal the deletion of the data: inheritMetadataBlocksFromParent: true and inheritFacetsFromParent: true. See the guides, #11130, and #11144.

API Updates

Search API Returns Additional Fields for Files

Added new fields to search results type=files

For Files:

restricted: boolean
canDownloadFile: boolean (from file user permission)
categories: array of string "categories" would be similar to what it is in metadata api.

For tabular files:

tabularTags: array of string for example, {"tabularTags" : ["Event", "Genomics", "Geospatial"]}
variables: number/int shows how many variables we have for the tabular file
observations: number/int shows how many observations for the tabular file

See #11027 and #11097.

Backend Support for Collection Featured Items

CRUD endpoints for Collection Featured Items have been implemented. In particular, the following endpoints have been implemented:

Create a feature item (POST /api/dataverses/<dataverse_id>/featuredItems)
Update a feature item (PUT /api/dataverseFeaturedItems/<item_id>)
Delete a feature item (DELETE /api/dataverseFeaturedItems/<item_id>)
List all featured items in a collection (GET /api/dataverses/<dataverse_id>/featuredItems)
Delete all featured items in a collection (DELETE /api/dataverses/<dataverse_id>/featuredItems)
Update all featured items in a collection (PUT /api/dataverses/<dataverse_id>/featuredItems)

See also the "Settings Added" section, #10943 and #11124.

Other API Updates

Multiple files can be deleted from a dataset at once. See the the guides and #11230.
An API has been added to get the "classic" download count from a dataset with an optional includeMDC parameter (for Make Data Count). See the guides, #11244 and #11282.
An API has been added that lists the collections that the user has access to via the permission passed. See the guides, #6467, and #10906.
An API has been added to get dataset versions including a summary of differences between consecutive versions where available. See the docs, #10888, and #10945.
An API has been added to list of versions of a data file showing any changes that affected the file with each version. See the guides, #11198 and #11237.
The Search API has a new parameter called show_type_counts. If you set it to true, it will return total_count_per_object_type for the types dataverse, dataset, and files (#11065 and #11082) even if the search result for any given type is 0 (#11127 and #11138).
CRUD operations for external tools are now available for superusers from non-localhost. See the guides, #10930 and #11079.
A new API endpoint has been added that allows a global role to be updated. See the guides and #10612.
An API has been added to send feedback to the collection, dataset, or data file's contacts. If necessary, you can rate limit the CheckRateLimitForDatasetFeedbackCommand and configure the new :ContactFeedbackMessageSizeLimit database setting. See the guides, #11129, and #11162.
/api/metadatablocks is no longer returning duplicated metadata properties and does not omit metadata properties when called. See "Backward Incompatible Changes" below and #10764.
A new query param, returnChildCount, has been added to the getDataverse endpoint (/api/dataverses/{id}) for optionally retrieving the child count, which represents the number of collections, datasets, or files within the collection (direct children only). See also #11255 and #11259.

End-Of-Life (EOL) Announcements

PostgreSQL 13 reaches EOL on 13 November 2025

Per https://www.postgresql.org/support/versioning/ PostgreSQL 13 reaches EOL on 13 November 2025. Our first step toward moving off version 13 was to switch our testing to version 16, as we've noted in the guides. You are encouraged to start planning your upgrade and may want to review the Dataverse 5.4 release notes as the upgrade process (e.g. pg_dumpall, etc.) will likely be similar. If you notice any bumps along the way, please let us know!

Dataverse developers using Docker have been using PostgreSQL 17 since Dataverse 6.5 (#10912). (Developers not using Docker who are still on PostgreSQL 13 are encouraged to upgrade.) Older or newer versions should work, within reason.

Security

SameSite Cookie Attribute

The SameSite cookie attribute is defined in an upcoming revision to RFC 6265 (HTTP State Management Mechanism) called 6265bis ("bis" meaning "repeated"). The possible values are "None", "Lax", and "Strict".

"If no SameSite attribute is set, the cookie is treated as Lax by default" by browsers according to MDN. This was the previous behavior of Dataverse, to not set the SameSite attribute.

New Dataverse installations now explicitly set to the SameSite cookie attribute to "Lax" out of the box through the installer (in the case of a "classic" installation) or through an updated base image (in the case of a Docker installation). Classic installations should follow the upgrade instructions below to bring their installation up to date with the behavior for new installations. Docker installations will automatically get the updated base image.

While you are welcome to experiment with "Strict", which is intended to help prevent Cross-Site Request Forgery (CSRF) attacks, as described in the RFC proposal and an OWASP cheatsheet, our testing so far indicates that some functionality, such as OIDC login, seems to be incompatible with "Strict".

You should avoid the use of "None" as it is less secure than "Lax". See also the guides, https://github.com/IQSS/dataverse-security/issues/27, #11210, and the upgrade instructions below.

Settings Added

dataverse.feature.enable-version-note
dataverse.csl.common-styles
dataverse.files.featured-items.image-maxsize - It sets the maximum allowed size of the image that can be added to a featured item.
dataverse.files.featured-items.image-uploads - It specifies the name of the subdirectory for saving featured item images within the docroot directory.
dataverse.feature.api-bearer-auth-provide-missing-claims
dataverse.feature.api-bearer-auth-handle-tos-acceptance-in-idp
:ContactFeedbackMessageSizeLimit

Backward Incompatible Changes

Generally speaking, see the API Changelog for a list of backward-incompatible API changes.

/api/metadatablocks is no longer returning duplicated metadata properties and does not omit metadata properties when called. See #10764.
The JSON response of API call /api/dataverses/{dataverseAlias}/metadatablocks will no longer include the DatasetFieldTypes in "fields" if they are children of another DatasetFieldType. The child DatasetFieldType will only be included in the "childFields" of its parent DatasetFieldType. See #10472 and #11066.
versionNote has been renamed to deaccessionNote. archiveNote has been renamed to deaccessionLink. See #11068.
The Show Role API endpoint was returning 401 Unauthorized when a permission check failed. This has been corrected to return 403 Forbidden instead. That is, the API token is known to be good (401 otherwise) but the user lacks permission (403 is now sent). See also the API Changelog, #10340, and #11116.
Changes to PID formatting occur in the DDI/DDI Html export formats and the EndNote and BibTex citation formats. These changes correct errors and improve conformance with best practices but could break parsing of these formats. See #10768, #10769, #11165, and #10790.

Complete List of Changes

For the complete list of code changes in this release, see the 6.6 milestone in GitHub.

Getting Help

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!

Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!

You are also very welcome to join the Global Dataverse Community Consortium (GDCC).

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.5.

0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.

Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

shell export PAYARA=/usr/local/payara6

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. List deployed applications

shell $PAYARA/bin/asadmin list-applications

2. Undeploy the previous version (should match "list-applications" above)

shell $PAYARA/bin/asadmin undeploy dataverse-6.5

3. Stop Payara

shell sudo service payara stop

4. Upgrade to Payara 6.2025.2

The steps below reuse your existing domain directory with the new distribution of Payara. You may also want to review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara Release Notes. We also recommend you ensure you followed all update instructions from the past releases regarding Payara. (The most recent Payara update was for v6.3.)

Move the current Payara directory out of the way:

shell mv $PAYARA $PAYARA.6.2024.6

Download the new Payara version 6.2025.2 (from https://www.payara.fish/downloads/payara-platform-community-edition/ or https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2025.2/payara-6.2025.2.zip), and unzip it in its place:

shell cd /usr/local unzip payara-6.2025.2.zip

Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1:

shell mv payara6/glassfish/domains/domain1 payara6/glassfish/domains/domain1_DIST mv payara6.6.2024.6/glassfish/domains/domain1 payara6/glassfish/domains/

5. Download and deploy this version

shell wget https://github.com/IQSS/dataverse/releases/download/v6.6/dataverse-6.6.war $PAYARA/bin/asadmin deploy dataverse-6.6.war

Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.

shell sudo service payara stop sudo rm -rf $PAYARA/glassfish/domains/domain1/generated sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases sudo service payara start

6. For installations with internationalization or text customizations:

Please remember to update translations via Dataverse language packs.

If you have text customizations you can get the latest English files from https://github.com/IQSS/dataverse/tree/v6.6/src/main/java/propertyFiles.

7. Decide to enable (or not) the index-harvested-metadata-source feature flag

Decide whether or not to enable the dataverse.feature.index-harvested-metadata-source feature flag described above, in the guides, #10217 and #11217. The reason to decide now is that reindexing is required and the next steps involve restarting Payara and upgrading Solr, which will result in a fresh index.

8. Configure SameSite

To bring your Dataverse installation in line with new installations, as described above and in the guides, we recommend running the following commands:

``` ./asadmin set server-config.network-config.protocols.protocol.http-listener-1.http.cookie-same-site-value=Lax

./asadmin set server-config.network-config.protocols.protocol.http-listener-1.http.cookie-same-site-enabled=true ```

Please note that "None" is less secure than "Lax" and should be avoided. You can test the setting by inspecting headers with curl, looking at the JSESSIONID cookie for "SameSite=Lax" (yes, it's expected to be repeated, probably due to a bug in Payara) like this:

% curl -s -I http://localhost:8080 | grep JSESSIONID Set-Cookie: JSESSIONID=6574324d75aebeb86dc96ecb3bb0; Path=/;SameSite=Lax;SameSite=Lax

Before making the changes above, SameSite attribute should be absent, like this:

% curl -s -I http://localhost:8080 | grep JSESSIONID Set-Cookie: JSESSIONID=6574324d75aebeb86dc96ecb3bb0; Path=/

8. Restart Payara

shell sudo service payara stop sudo service payara start

9. Update metadata blocks

These changes reflect incremental improvements made to the handling of core metadata fields.

Expect the loading of the citation block to take several seconds because of its size (especially due to the number of languages).

```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/scripts/api/data/metadatablocks/citation.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv ```

The 3D Objects metadata block is included in all new installations of Dataverse so we recommend adding it.

```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/scripts/api/data/metadatablocks/3d_objects.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file 3d_objects.tsv ```

10. Upgrade Solr

Solr 9.8.0 is now the version recommended in our Installation Guide and used with automated testing. Additionally, due to the new range search support feature and the addition of fields (e.g. versionNote, fileRestricted, canDownloadFile, variableCount, and observations), the default schema.xml files has changed so you must upgrade.

Install Solr 9.8.0 following the instructions from the Installation Guide.

The instructions in the guide suggest to use the config files from the installer zip bundle. When upgrading an existing instance, it may be easier to download them from the source tree:

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/solrconfig.xml wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/schema.xml cp solrconfig.xml schema.xml /usr/local/solr/solr-9.8.0/server/solr/collection1/conf

10a. For installations with additional metadata blocks or external controlled vocabulary scripts, update fields

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide).
Run the update-fields.sh script that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your Solr installation):

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.8.0/server/solr/collection1/conf/schema.xml

Start Solr instance (usually service solr start depending on Solr/OS).

11. Reindex Solr

shell curl http://localhost:8080/api/admin/index

12. Run reExportAll to update dataset metadata exports

For existing published datasets, additional license metadata will not be available from DataCite or in metadata exports until

the dataset is republished or
the /api/admin/metadata/{id}/reExportDataset is run for the dataset or
the /api/datasets/{id}/modifyRegistrationMetadata API is run for the dataset or
the global version of these API calls (/api/admin/metadata/reExportAll, /api/datasets/modifyRegistrationPIDMetadataAll) are used.

For this reason, we recommend reexporting all dataset metadata. For more advanced usage, please see the guides.

shell curl http://localhost:8080/api/admin/metadata/reExportAll

13. (Optional) Re-harvest datasets

The publisher value of harvested datasets is now attributed to the dataset's distributor instead of its producer. For more information, see the guides, #8739, and #9013.

This improves the citation associated with these datasets, but the change only affects newly harvested datasets.

If you would like to pick up this change for existing harvested datasets, you should re-harvest them. This can be accomplished by deleting and re-adding each harvesting client, followed by a harvesting run. You may want to use harvesting client APIs to save (serialize), add, and remove clients.

- Java
Published by pdurbin 11 months ago

dataverse - v6.6

Dataverse 6.6

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.6 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!

Release Highlights

Highlights for Dataverse 6.6 include:

metadata fields can be "display on create" per collection
ORCIDs linked to accounts
version notes
harvesting from DataCite
citations using Citation Style Language (CSL)
license metadata enhancements
metadata fields now support range searches (dates, integers, etc.)
more accurate search highlighting
collections can be moved by using the superuser dashboard
new 3D Objects metadata block
new Archival metadata block (experimental)
optionally prevent publishing of datasets without files
Signposting output now contains links to all dataset metadata export formats
infrastructure updates (Payara and Solr)

In a recent community call, we talked about many of these highlights if you'd like to watch the video (around 22:30).

Features Added

Metadata Fields Can Be "Display on Create" Per Collection

Collection administrators can now configure which metadata fields appear during dataset creation through the displayOnCreate property, even when fields are not required. This provides greater control over metadata visibility and can help improve metadata completeness.

Currently this feature can only be configured via API, but a UI implementation is planned in #11221. See #10476, #11224, and #11312.

ORCIDs Linked to Accounts

Dataverse now includes improved integration with ORCID, supported through a grant to GDCC from the (ORCID Global Participation Fund).

Specifically, Dataverse users can now link their Dataverse account with their ORCID profile. Previously, this was only available to users who logged in with ORCID. Once linked, Dataverse will automatically prepopulate their ORCID to their author metadata when they create a dataset.

This functionality leverages Dataverse's existing support for login via ORCID, but can be turned on independently of it. If ORCID login is enabled, the user's ORCID will automatically be added to their profile. If the user has logged in via some other mechanism, they are able to click a button to initiate a similar authentication process in which the user must login to their ORCID account and approve the connection.

Feedback from installations that enable this functionality is requested and we expect that updates can be made in the next Dataverse release.

See the User Guide, Installation Guide, #7284, and #11222.

Version Notes

Dataverse now supports the option of adding a version note before or during the publication of a dataset. These notes can be used, for example, to indicate why a version was created or how it differs from the prior version. Whether this feature is enabled is controlled by the flag dataverse.feature.enable-version-note. Version notes are shown in the user interface (in the dataset page version table), indexed (as versionNote), available via the API, and have been added to the JSON, DDI, DataCite, and OAI-ORE exports.

With the addition of this feature, work has been done to clean-up and rename fields that have been used for specifying the reason for deaccessioning a dataset and providing an optional link to a non-Dataverse location where the dataset still can be found. The former was listed in some JSON-based API calls and exports as "versionNote" and is now "deaccessionNote", while the latter was referred to as "archiveNote" and is now "deaccessionLink".

Further, some database consolidation has been done to combine the deaccessionlink and archivenote fields, which appear to have both been used for the same purpose. The deaccessionlink database field is older and also was not displayed in the current UI. Going forward, only the deaccessionlink column exists.

See the User Guide, API Guide #8431, and #11068.

OAI-PMH Harvesting from DataCite

DataCite maintains an OAI server (https://oai.datacite.org/oai) that serves records for every DOI they have registered. There's been a lot of interest in the community in being able to harvest from them. This way, it will be possible to harvest metadata from institution X even if the institution X does not maintain an OAI server of their own, if they happen to register their DOIs with DataCite. One extra element of this harvesting model that makes it especially powerful and flexible is the DataCite's concept of a "dynamic OAI set": a harvester is not limited to harvesting the pre-defined set of ALL the records registered by the institution X, but can instead harvest virtually any arbitrary subset thereof; any query that the DataCite search API understands can be used as an OAI set. The feature is already in use at Harvard Dataverse, as a beta version patch.

For various reasons, in order to take advantage of this feature harvesting clients must be created using the /api/harvest/clients API. Once configured however, harvests can be run from the Harvesting Clients control panel in the UI.

DataCite-harvesting clients must be configured with 2 new feature flags, useListRecords and useOaiIdentifiersAsPids (added in Dataverse 6.5). Note that these features may be of use when harvesting from other sources, not just from DataCite.

See the Admin Guide, API Guide, #10909, and #11011.

Citations Using Citation Style Language (CSL)

This release adds support for generating citations in any of the standard independent formats specified using the Citation Style Language.

The CSL formats are available to copy/paste if you click "Cite Dataset" and then "View Styled Citations" on the dataset page. An API call to retrieve a dataset citation in EndNote, RIS, BibTeX, and CSLJson format has also been added. The first three have been available as downloads from the UI (CSLJson is not) but have not been directly accessible via API until now. The CSLJson format is new to Dataverse and can be used with open source libraries to generate all of the other CSL-style citations.

Admins can use a new dataverse.csl.common-styles setting to highlight commonly used styles. Common styles are listed in the pop-up, while others can be found by type-ahead search in a list of 1000+ options.

See the User Guide, Settings, API Guide, and #11163.

License Metadata Enhancements

Added new fields to licenses: rightsIdentifier, rightsIdentifierScheme, schemeUri, languageCode. See JSON files under Adding Licenses in the guides
Updated DataCite metadata export to include rightsIdentifier, rightsIdentifierScheme, and schemeUri consistent with the DataCite 4.5 schema and examples
Enhanced metadata exports to include all new license fields
Existing licenses from the example set included with Dataverse will be automatically updated with new fields
Existing API calls support the new optional fields

See below for upgrade instructions. See also #10883 and #11232.

Range Search

This release enhances how numerical and date fields are indexed in Solr. Previously, all fields were indexed as English text (text_en), but with this update:

Integer fields are indexed as plong
Float fields are indexed as pdouble
Date fields are indexed as date_range (solr.DateRangeField)

This change enables range queries when searching from both the UI and the API, such as dateOfDeposit:[2000-01-01 TO 2014-12-31] or targetSampleActualSize:[25 TO 50]. See below for a full list of fields that now support range search.

Additionally, search result highlighting is now more accurate, ensuring that only fields relevant to the query are highlighted in search results. If the query is specifically limited to certain fields, the highlighting is now limited to those fields as well. See #10887.

Specifically, the following fields were updated:

coverage.Depth
coverage.ObjectCount
coverage.ObjectDensity
coverage.Redshift.MaximumValue
coverage.Redshift.MinimumValue
coverage.RedshiftValue
coverage.SkyFraction
coverage.Spectral.CentralWavelength
coverage.Spectral.MaximumWavelength
coverage.Spectral.MinimumWavelength
coverage.Temporal.StartTime
coverage.Temporal.StopTime
dateOfCollectionEnd
dateOfCollectionStart
dateOfDeposit
distributionDate
dsDescriptionDate
journalPubDate
productionDate
resolution.Redshift
targetSampleActualSize
timePeriodCoveredEnd
timePeriodCoveredStart

New 3D Objects Metadata Block

A new metadata block has been added for describing 3D object data. You can download it from the guides. See also #11120 and #11167.

All new Dataverse installations will receive this metadata block by default. We recommend adding it by following the upgrade instructions below.

New Archival Metadata Block (Experimental)

An experimental "Archival" metadata block has been added, downloadable from the User Guide. The purpose of the metadata block is to enable repositories to register metadata relating to the potential archiving of the dataset at a depositor archive, whether that being your own institutional archive or an external archive, i.e. a historical archive. Feedback is welcome! See also #10626.

Prevent Publishing of Datasets Without Files

Datasets without files can be optionally prevented from being published through a new "requireFilesToPublishDataset" boolean defined at the collection level. This boolean can be set only via API and only by a superuser. See Change Collection Attributes. If the boolean is not set, the parent collection is consulted. If you do not set the boolean, the existing behavior of datasets being able to be published without files will continue. Superusers can still publish datasets whether or not the boolean is set. See #10981 and #10994.

Metadata Source Facet Can Now Differentiate Between Harvested Sources

The behavior of the feature flag index-harvested-metadata-source and the "Metadata Source" facet, which were added and updated, respectively, in Dataverse 6.3 (through pull requests #10464 and #10651), have been updated. A new field called "Source Name" has been added to harvesting clients.

Before Dataverse 6.3, all harvested content (datasets and files) appeared together under "Harvested" under the "Metadata Source" facet. This is still the behavior of Dataverse out of the box. Since Dataverse 6.3, enabling the index-harvested-metadata-source feature flag (and reindexing) resulted in harvested content appearing under the nickname for whatever harvesting client was used to bring in the content. This meant that instead of having all harvested content lumped together under "Harvested", content would appear under "client1", "client2", etc.

With this release, enabling the index-harvested-metadata-source feature flag, populating a new field for harvesting clients called "Source Name" ("sourceName" in the API), and reindexing (see upgrade instructions below) results in the source name appearing under the "Metadata Source" facet rather than the harvesting client nickname. This gives you more control over the name that appears under the "Metadata Source" facet and allows you to reuse the same source name to group harvested content from various harvesting clients under the same name if you wish.

Previously, index-harvested-metadata-source was not documented in the guides, but now you can find information about it under Feature Flags. See also #10217 and #11217.

Globus Framework Improvements

The improvements and optimizations in this release build on top of the earlier work (such as #10781). They are based on the experience gained at IQSS as part of the production rollout of the Large Data Storage services that utilizes Globus.

The changes in this release focus on improving Globus downloads, i.e., transfers from Dataverse-linked Globus volumes to users' Globus collections. Most importantly, the mechanism of "Asynchronous Task Monitoring", first introduced in #10781 for uploads, has been extended to handle downloads as well. This generally makes downloads more reliable, specifically in how Dataverse manages temporary access rules granted to users, minimizing the risk of consequent downloads failing because of stale access rules left in place.

Multiple other improvements have been made making the underlying Globus framework more reliable and robust.

See globus-use-experimental-async-framework under Feature Flags and dataverse.files.globus-monitoring-server in the Installation Guide, #11057, and #11125.

OIDC Bearer Tokens

The release extends the OIDC API auth mechanism, available through feature flag api-bearer-auth, to properly handle cases where BearerTokenAuthMechanism successfully validates the token but cannot identify any Dataverse user because there is no account associated with the token.

To register a new user who has authenticated via an OIDC provider, a new endpoint has been implemented (/users/register). A feature flag named api-bearer-auth-provide-missing-claims has been implemented to allow sending missing user claims in the request JSON. This is useful when the identity provider does not supply the necessary claims. However, this flag will only be considered if the api-bearer-auth feature flag is enabled. If the latter is not enabled, the api-bearer-auth-provide-missing-claims flag will be ignored.

A feature flag named api-bearer-auth-handle-tos-acceptance-in-idp has been implemented. When enabled, it specifies that Terms of Service acceptance is managed by the identity provider, eliminating the need to explicitly include the acceptance in the user registration request JSON.

See the guides, #10959, and #10972.

Signposting Output Now Contains Links to All Dataset Metadata Export Formats

When Signposting was added in Dataverse 5.14 (#8981), it provided links only for the schema.org metadata export format.

The output of HEAD, GET, and the Signposting "linkset" API have all been updated to include links to all available dataset metadata export formats, including any external exporters, such as Croissant, that have been enabled.

This provides a lightweight machine-readable way to first retrieve a list of links, such as via a HTTP HEAD request, to each available metadata export format and then follow up with a request for the export format of interest.

In addition, the content type for the schema.org dataset metadata export format has been corrected. It was application/json and now it is application/ld+json.

Dataset Types Can Be Linked to Metadata Blocks

Metadata blocks, such as (e.g. "CodeMeta") can now be linked to dataset types (e.g. "software") using new superuser APIs.

This will have the following effects for the APIs used by the new Dataverse UI:

The list of fields shown when creating a dataset will include fields marked as "displayoncreate" (in the tsv/database) for metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.
The metadata blocks shown when editing a dataset will include metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.

Mostly in order to write automated tests for the above, a displayOnCreate API endpoint has been added.

For more information, see the guides (overview, new APIs), #10519 and #11001.

Other Features

In addition to the API Move a Dataverse Collection, it is now possible for a Dataverse administrator to move a collection using the Dataverse dashboard. See #10304 and #11150.
The Preview URL popup and related documentation have been updated to give more information about anonymous access, including the names of the dataset fields that will be withheld from the Anonymous Preview URL user and to suggest how to review the URL before releasing it. See also #11159 and #11164.
ROR (Research Organization Registry) has been added as an Author Identifier Type for when the author is an organization rather than a person. Like ORCID, ROR will appear in the "Datacite" metadata export format. See #11075 and #11118.
The publisher value of harvested datasets is now attributed to the dataset's distributor instead of its producer. This improves the citation associated with these datasets, but the change affects only newly harvested datasets. See "Upgrade Instructions" below on how to re-harvest. For more information, see the guides, #8739, and #9013.
A new harvest status differentiates between a complete harvest with errors ("completed with failures") and without errors ("completed"). Also, harvest status labels are now internationalized. See #9294 and #11017.
The OAI-ORE exporter can now export metadata containing nested compound fields or compound fields within compound fields. See #10809 and #11190.
It is now possible to edit a custom role with the same alias. See #8808 and #10612.
The Metadata Customization documentation has been updated to explain how to implement a boolean fieldtype (look for "boolean"). See #7961 and #11064.
The version of Stata files is now detected during S3 direct upload (as it was for normal uploads), allowing ingest of Stata 14 and 15 files that have been uploaded directly. See the guides #10108, and #11054.
It is now possible to populate the "Keyword" metadata field from an OntoPortal service. The code has been shared to the GDCC dataverse-external-vocab-support GitHub repository. See #11258.
Support for legacy configuration of a PermaLink PID provider, such as using the :Protocol,:Authority, and :Shoulder settings, has been fixed. See #10516 and #10521.
On the home page for each guide (User Guide, etc.) there was an overwhelming amount of information in the form of a deeply nested table of contents. The depth of the table of contents has been reduced to two levels, making the home page for each guide more readable. Compare the User Guide for 6.5 vs. 6.6 and see #11166.
For compliance with GDPR and other privacy regulations, advice on adding a cookie consent popup has been added to the guides. See the new cookie consent section and #10320.
A new file has been added to import the French Open License to Dataverse: licenseEtalab-2.0.json. You can download it from the guides. This license, which is compatible with the Creative Commons license, is recommended by the French government for open documents. See #9301, #9302, and #11302.
The API that lists versions of a dataset now features an optional excludeMetadataBlocks parameter, which defaults to "false" for backward compatibility. For a dataset with a large number of versions and/or metadataBlocks, having the metadata blocks included can dramatically increase the volume of the output. See also the guides, #10171, and #10778.
Deeply nested metadata fields are not supported but the code used to generate the Solr schema has been adjusted to support them. See #11136.
The tutorial on running Dataverse in Docker has been updated to explain how to configure the root collection using a JSON file (#10541 and #11201) and now uses the Permalink PID provider instead of the FAKE DOI Provider (#11107 and #11108).
Payara application server has been upgraded to version 6.2025.2. See #11126 and #11128.
Solr has been upgraded to version 9.8.0. See #10713.
For testing purposes, the FAKE PID provider can now be used with file PIDs enabled. (The FAKE provider is not recommended for any production use.) See #10979.

Bugs Fixed

A bug which causes users of the Anonymous Review URL to have some metadata of published datasets withheld has been fixed. See #11202 and #11164.
A bug that caused ORCIDs starting with "https://orcid.org/" entered as author identifier to be ignored when creating the DataCite metadata has been fixed. This primarily affected users of the ORCID external vocabulary script; for the manual entry form, we used to recommend not using the URL form. The display of authorIdentifier, when not using any external vocabulary scripts, has been improved so that either the plain identifier (e.g. "0000-0002-1825-0097") or its URL form (e.g. "https://orcid.org/0000-0002-1825-0097") will result in valid links in the display (for identifier types that have a URL form). The URL form is now recommended when doing manual entry. See #11242 and #11242.
Multiple small issues with the formatting of PIDs in the DDI exporters, and EndNote and BibTeX citation formats have been addressed. These should improve the ability to import Dataverse citations into reference managers and fix potential issues harvesting datasets using PermaLinks. See #10768, #10769, #11165, and #10790.
On the Advanced Search page, the metadata fields are now displayed in the correct order as defined in the TSV file via the displayOrder value, making the order the same as when you view or edit metadata. Note that fields that are not defined in the TSV file, like the "Persistent ID" and "Publication Date", will be displayed at the end. See #11272 and #11279.
Bugs that caused 1) guestbook questions to appear along with terms of use/terms of access in the request access dialog when no guestbook was configured, and 2) terms of access to not be shown when using the per-file request access/download menu items have been fixed. Text related to configuring the choice to have guestbooks appear when file access is requested or when files are downloaded has been updated to make it clearer that this affects only datasets where guestbooks have been configured. See #11203.
The file page version table now shows whether a file has been replaced. See #11142 and #11145.
We fixed an issue where draft versions of datasets were sorted using the release timestamp of their most recent major version. This caused newer drafts to appear incorrectly alongside their corresponding major version, instead of at the top, when sorted by "newest first". Sorting now uses the last update timestamp when sorting draft datasets. The sorting behavior of published major and minor dataset versions is unchanged. There is no need to reindex datasets because Solr is being upgraded (see "Upgrade Instructions"), which will result in an empty database that will be reindexed. See #11178.
Some external controlled vocabulary scripts/configurations, when used on a metadata field that is single-valued, could result in indexing failure for the dataset, e.g. when the script tried to index both the identifier and name of the identified entity for indexing. Dataverse has been updated to correctly indicate the need for a multi-valued Solr field in these cases in the call to /api/admin/index/solr/schema. Configuring the Solr schema and running the update-fields.sh script as usually recommended when using custom metadata blocks (see "Upgrade Instructions") will resolve the issue. See the guides, #11095, and #11096.
The OpenAIRE metadata export format can now correctly process one or multiple productionPlaces as geolocation. See #9546 and #11194
We fixed a bug that caused adding free-form provenance to a file to fail. See #11145.
A bug has been fixed which could cause publication of datasets to fail in cases where they were not assigned a DOI at creation. See #11234 and #11236.
When users request access to files, the people who have permission to grant access received an email with a link that didn't work due to a trailing period (full stop) right next to the link, e.g. https://demo.dataverse.org/permissions-manage-files.xhtml?id=9. A space has been added to fix this. See #10384 and #11115.
Harvesting clients now use the correct granularity while re-running a partial harvest, using the from parameter. The correct granularity comes from the Identify verb request. See #11020 and #11038.
Access requests were missing on the File Permission page after upgrading from Dataverse 6.0. This has been corrected with a database update script. See #10714 and #11061.
When a dataset has a long running lock, including when it is "in review", Dataverse will now slow the page refresh rate over time. See #11264 and #11269.
The /api/info/metrics/files/monthly API call had a bug that resulted in files being counted each time they were published in a new version if those publication events occurred in different months. This resulted in an over-count. The /api/info/metrics/files and /api/info/metrics/files/toMonth API calls had a bug that resulted in files that were published but no longer in the latest published version as of the specified date (now, or the date entered in the /toMonth variant). This resulted in an under-count. See #11189.
DatasetFieldTypes in MetadataBlock response that are also a child of another DatasetFieldType were being returned twice. The child DatasetFieldType was included in the "fields" object as well as in the "childFields" of its parent DatasetFieldType. This fix suppresses the standalone object so only one instance of the DatasetFieldType is returned (in the "childFields" of its parent). This fix changes the JSON output of the API /api/dataverses/{dataverseAlias}/metadatablocks (see "Backward Incompatible Changes", below). See #10472 and #11066.
A bug that caused replacing files via API when file PIDs were enabled to fail has been fixed. See #10975 and #10979.
The :CustomDatasetSummaryFields setting now allows spaces along with a comma separating field names. In addition, a bug that caused license information to be hidden if there are no values for any of the custom fields specified has been fixed. See #11228 and #11229.
Dataverse 6.5 introduced a bug which causes search to fail for non-superusers in multiple groups when the AVOID_EXPENSIVE_SOLR_JOIN feature flag is set to true. This release fixes the bug. See #11133 and #11134.
We fixed a bug with My Data where listing collections for a user with only rights on harvested collections would result in a server error response. See #11083.
Minor styling fixes for the Related Publication field and fields using ORCID or ROR have been made. See #11053, #10964, and #11106.
In the Search API, files were displaying DRAFT version instead of latest released version under dataset_citation. See #10735 and #11051.
Unnecessary Solr documents were being created when a file was added or deleted from a draft dataset. These documents could accumulate and potentially impact performance. There is no action to take because this release includes a new Solr version, which will start with an empty database. See #11113 and #11114.
When using the API to update a collection, omitting optional fields such as inputLevels, facetIds, or metadataBlockNames caused data to be deleted. The fix no longer deletes data for these fields. Two new flags have been added to the metadataBlocks JSON object to signal the deletion of the data: inheritMetadataBlocksFromParent: true and inheritFacetsFromParent: true. See the guides, #11130, and #11144.

API Updates

Search API Returns Additional Fields for Files

Added new fields to search results type=files

For Files:

restricted: boolean
canDownloadFile: boolean (from file user permission)
categories: array of string "categories" would be similar to what it is in metadata api.

For tabular files:

tabularTags: array of string for example, {"tabularTags" : ["Event", "Genomics", "Geospatial"]}
variables: number/int shows how many variables we have for the tabular file
observations: number/int shows how many observations for the tabular file

See #11027 and #11097.

Backend Support for Collection Featured Items

CRUD endpoints for Collection Featured Items have been implemented. In particular, the following endpoints have been implemented:

Create a feature item (POST /api/dataverses/<dataverse_id>/featuredItems)
Update a feature item (PUT /api/dataverseFeaturedItems/<item_id>)
Delete a feature item (DELETE /api/dataverseFeaturedItems/<item_id>)
List all featured items in a collection (GET /api/dataverses/<dataverse_id>/featuredItems)
Delete all featured items in a collection (DELETE /api/dataverses/<dataverse_id>/featuredItems)
Update all featured items in a collection (PUT /api/dataverses/<dataverse_id>/featuredItems)

See also the "Settings Added" section, #10943 and #11124.

Other API Updates

Multiple files can be deleted from a dataset at once. See the the guides and #11230.
An API has been added to get the "classic" download count from a dataset with an optional includeMDC parameter (for Make Data Count). See the guides, #11244 and #11282.
An API has been added that lists the collections that the user has access to via the permission passed. See the guides, #6467, and #10906.
An API has been added to get dataset versions including a summary of differences between consecutive versions where available. See the docs, #10888, and #10945.
An API has been added to list of versions of a data file showing any changes that affected the file with each version. See the guides, #11198 and #11237.
The Search API has a new parameter called show_type_counts. If you set it to true, it will return total_count_per_object_type for the types dataverse, dataset, and files (#11065 and #11082) even if the search result for any given type is 0 (#11127 and #11138).
CRUD operations for external tools are now available for superusers from non-localhost. See the guides, #10930 and #11079.
A new API endpoint has been added that allows a global role to be updated. See the guides and #10612.
An API has been added to send feedback to the collection, dataset, or data file's contacts. If necessary, you can rate limit the CheckRateLimitForDatasetFeedbackCommand and configure the new :ContactFeedbackMessageSizeLimit database setting. See the guides, #11129, and #11162.
/api/metadatablocks is no longer returning duplicated metadata properties and does not omit metadata properties when called. See "Backward Incompatible Changes" below and #10764.
A new query param, returnChildCount, has been added to the getDataverse endpoint (/api/dataverses/{id}) for optionally retrieving the child count, which represents the number of collections, datasets, or files within the collection (direct children only). See also #11255 and #11259.

End-Of-Life (EOL) Announcements

PostgreSQL 13 reaches EOL on 13 November 2025

Per https://www.postgresql.org/support/versioning/ PostgreSQL 13 reaches EOL on 13 November 2025. Our first step toward moving off version 13 was to switch our testing to version 16, as we've noted in the guides. You are encouraged to start planning your upgrade and may want to review the Dataverse 5.4 release notes as the upgrade process (e.g. pg_dumpall, etc.) will likely be similar. If you notice any bumps along the way, please let us know!

Dataverse developers using Docker have been using PostgreSQL 17 since Dataverse 6.5 (#10912). (Developers not using Docker who are still on PostgreSQL 13 are encouraged to upgrade.) Older or newer versions should work, within reason.

Security

SameSite Cookie Attribute

The SameSite cookie attribute is defined in an upcoming revision to RFC 6265 (HTTP State Management Mechanism) called 6265bis ("bis" meaning "repeated"). The possible values are "None", "Lax", and "Strict".

"If no SameSite attribute is set, the cookie is treated as Lax by default" by browsers according to MDN. This was the previous behavior of Dataverse, to not set the SameSite attribute.

New Dataverse installations now explicitly set to the SameSite cookie attribute to "Lax" out of the box through the installer (in the case of a "classic" installation) or through an updated base image (in the case of a Docker installation). Classic installations should follow the upgrade instructions below to bring their installation up to date with the behavior for new installations. Docker installations will automatically get the updated base image.

While you are welcome to experiment with "Strict", which is intended to help prevent Cross-Site Request Forgery (CSRF) attacks, as described in the RFC proposal and an OWASP cheatsheet, our testing so far indicates that some functionality, such as OIDC login, seems to be incompatible with "Strict".

You should avoid the use of "None" as it is less secure than "Lax". See also the guides, https://github.com/IQSS/dataverse-security/issues/27, #11210, and the upgrade instructions below.

Settings Added

dataverse.feature.enable-version-note
dataverse.csl.common-styles
dataverse.files.featured-items.image-maxsize - It sets the maximum allowed size of the image that can be added to a featured item.
dataverse.files.featured-items.image-uploads - It specifies the name of the subdirectory for saving featured item images within the docroot directory.
dataverse.feature.api-bearer-auth-provide-missing-claims
dataverse.feature.api-bearer-auth-handle-tos-acceptance-in-idp
:ContactFeedbackMessageSizeLimit

Backward Incompatible Changes

Generally speaking, see the API Changelog for a list of backward-incompatible API changes.

/api/metadatablocks is no longer returning duplicated metadata properties and does not omit metadata properties when called. See #10764.
The JSON response of API call /api/dataverses/{dataverseAlias}/metadatablocks will no longer include the DatasetFieldTypes in "fields" if they are children of another DatasetFieldType. The child DatasetFieldType will only be included in the "childFields" of its parent DatasetFieldType. See #10472 and #11066.
versionNote has been renamed to deaccessionNote. archiveNote has been renamed to deaccessionLink. See #11068.
The Show Role API endpoint was returning 401 Unauthorized when a permission check failed. This has been corrected to return 403 Forbidden instead. That is, the API token is known to be good (401 otherwise) but the user lacks permission (403 is now sent). See also the API Changelog, #10340, and #11116.
Changes to PID formatting occur in the DDI/DDI Html export formats and the EndNote and BibTex citation formats. These changes correct errors and improve conformance with best practices but could break parsing of these formats. See #10768, #10769, #11165, and #10790.

Complete List of Changes

For the complete list of code changes in this release, see the 6.6 milestone in GitHub.

Getting Help

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!

Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!

You are also very welcome to join the Global Dataverse Community Consortium (GDCC).

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.5.

0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.

Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

shell export PAYARA=/usr/local/payara6

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. List deployed applications

shell $PAYARA/bin/asadmin list-applications

2. Undeploy the previous version (should match "list-applications" above)

shell $PAYARA/bin/asadmin undeploy dataverse-6.5

3. Stop Payara

shell sudo service payara stop

4. Upgrade to Payara 6.2025.2

The steps below reuse your existing domain directory with the new distribution of Payara. You may also want to review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara Release Notes. We also recommend you ensure you followed all update instructions from the past releases regarding Payara. (The most recent Payara update was for v6.3.)

Move the current Payara directory out of the way:

shell mv $PAYARA $PAYARA.6.2024.6

Download the new Payara version 6.2025.2 (from https://www.payara.fish/downloads/payara-platform-community-edition/ or https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2025.2/payara-6.2025.2.zip), and unzip it in its place:

shell cd /usr/local unzip payara-6.2025.2.zip

Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1:

shell mv payara6/glassfish/domains/domain1 payara6/glassfish/domains/domain1_DIST mv payara6.6.2024.6/glassfish/domains/domain1 payara6/glassfish/domains/

5. Download and deploy this version

shell wget https://github.com/IQSS/dataverse/releases/download/v6.6/dataverse-6.6.war $PAYARA/bin/asadmin deploy dataverse-6.6.war

Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.

shell sudo service payara stop sudo rm -rf $PAYARA/glassfish/domains/domain1/generated sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases sudo service payara start

6. For installations with internationalization or text customizations:

Please remember to update translations via Dataverse language packs.

If you have text customizations you can get the latest English files from https://github.com/IQSS/dataverse/tree/v6.6/src/main/java/propertyFiles.

7. Decide to enable (or not) the index-harvested-metadata-source feature flag

Decide whether or not to enable the dataverse.feature.index-harvested-metadata-source feature flag described above, in the guides, #10217 and #11217. The reason to decide now is that reindexing is required and the next steps involve restarting Payara and upgrading Solr, which will result in a fresh index.

8. Configure SameSite

To bring your Dataverse installation in line with new installations, as described above and in the guides, we recommend running the following commands:

``` ./asadmin set server-config.network-config.protocols.protocol.http-listener-1.http.cookie-same-site-value=Lax

./asadmin set server-config.network-config.protocols.protocol.http-listener-1.http.cookie-same-site-enabled=true ```

Please note that "None" is less secure than "Lax" and should be avoided. You can test the setting by inspecting headers with curl, looking at the JSESSIONID cookie for "SameSite=Lax" (yes, it's expected to be repeated, probably due to a bug in Payara) like this:

% curl -s -I http://localhost:8080 | grep JSESSIONID Set-Cookie: JSESSIONID=6574324d75aebeb86dc96ecb3bb0; Path=/;SameSite=Lax;SameSite=Lax

Before making the changes above, SameSite attribute should be absent, like this:

% curl -s -I http://localhost:8080 | grep JSESSIONID Set-Cookie: JSESSIONID=6574324d75aebeb86dc96ecb3bb0; Path=/

8. Restart Payara

shell sudo service payara stop sudo service payara start

9. Update metadata blocks

These changes reflect incremental improvements made to the handling of core metadata fields.

Expect the loading of the citation block to take several seconds because of its size (especially due to the number of languages).

```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/scripts/api/data/metadatablocks/citation.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv ```

The 3D Objects metadata block is included in all new installations of Dataverse so we recommend adding it.

```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/scripts/api/data/metadatablocks/3d_objects.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file 3d_objects.tsv ```

10. Upgrade Solr

Solr 9.8.0 is now the version recommended in our Installation Guide and used with automated testing. Additionally, due to the new range search support feature and the addition of fields (e.g. versionNote, fileRestricted, canDownloadFile, variableCount, and observations), the default schema.xml files has changed so you must upgrade.

Install Solr 9.8.0 following the instructions from the Installation Guide.

The instructions in the guide suggest to use the config files from the installer zip bundle. When upgrading an existing instance, it may be easier to download them from the source tree:

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/solrconfig.xml wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/schema.xml cp solrconfig.xml schema.xml /usr/local/solr/solr-9.8.0/server/solr/collection1/conf

10a. For installations with additional metadata blocks or external controlled vocabulary scripts, update fields

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide).
Run the update-fields.sh script that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your Solr installation):

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.8.0/server/solr/collection1/conf/schema.xml

Start Solr instance (usually service solr start depending on Solr/OS).

11. Reindex Solr

shell curl http://localhost:8080/api/admin/index

12. Run reExportAll to update dataset metadata exports

For existing published datasets, additional license metadata will not be available from DataCite or in metadata exports until

the dataset is republished or
the /api/admin/metadata/{id}/reExportDataset is run for the dataset or
the /api/datasets/{id}/modifyRegistrationMetadata API is run for the dataset or
the global version of these API calls (/api/admin/metadata/reExportAll, /api/datasets/modifyRegistrationPIDMetadataAll) are used.

For this reason, we recommend reexporting all dataset metadata. For more advanced usage, please see the guides.

shell curl http://localhost:8080/api/admin/metadata/reExportAll

13. (Optional) Re-harvest datasets

The publisher value of harvested datasets is now attributed to the dataset's distributor instead of its producer. For more information, see the guides, #8739, and #9013.

This improves the citation associated with these datasets, but the change only affects newly harvested datasets.

If you would like to pick up this change for existing harvested datasets, you should re-harvest them. This can be accomplished by deleting and re-adding each harvesting client, followed by a harvesting run. You may want to use harvesting client APIs to save (serialize), add, and remove clients.

- Java
Published by stevenwinship over 1 year ago

dataverse - v6.5

Dataverse 6.5

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.5 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!

Release Highlights

Highlights for Dataverse 6.5 include:

new API endpoints, including editing of collections, Search API file counts, listing of exporters, comparing dataset versions, and auditing data files
UX improvements, especially Preview URLs
increased harvesting flexibility
performance gains
a security vulnerability addressed
many bug fixes
and more! Please see below.

Features Added

Private URL Renamed to Preview URL and Improved

The name of the URL that may be used by dataset administrators to share a draft version of a dataset has been changed from Private URL to Preview URL.

Also, additional information about the creation of Preview URLs has been added to the popup accessed via edit menu of the Dataset Page.

Users of the Anonymous Preview URL will no longer be able to see the name of the Dataverse that the dataset is in but will be able to see the name of the repository.

Any Private URLs created in previous versions of Dataverse will continue to work.

The old "privateUrl" API endpoints for the creation and deletion of Preview (formerly Private) URLs have been deprecated. They will continue to work but please switch to the "previewUrl" equivalents that have been documented in the API Guide.

See also #8184, #8185, #10950, #10961, and #11085.

Showing Differences Between Dataset Versions is More Scalable

Showing differences between dataset versions, which is done during dataset edit operations and to populate the dataset page versions table, has been made significantly more scalable. See #10814 and #10818.

Version Differences Details Sorting Added

In order to facilitate the comparison between the draft version and the published version of a dataset, a sort on subfields has been added. See #10969.

Reindexing After a Role Assignment is Less Memory Intensive

Adding or removing a user from a role on a collection, particularly the root collection, could lead to a significant increase in memory use, resulting in Dataverse itself failing with an out-of-memory condition. Such changes now consume much less memory. A Solr reindexing step is included in the upgrade instructions below. See also #10697 and #10698.

Longer Custom Questions in Guestbooks

Custom questions in Guestbooks can now be more than 255 characters and the bug causing a silent failure when questions were longer than this limit has been fixed. See also #9492, #10117, #10118.

PostgreSQL and Flyway Updates

This release bumps the version of PostgreSQL and Flyway used in containers as well as the PostgreSQL JDBC driver used all installations, including classic (non-Docker) installations. PostgreSQL and its driver have been bumped to version 17. Flyway has been bumped to version 10.

PostgreSQL 13 remains the version used with automated testing, leading us to continue to recommend that version for classic installations.

As of Flyway 10, supporting older versions of PostgreSQL no longer requires a paid subscription. While we don't encourage the use of older PostgreSQL versions, this flexibility may benefit some of our long-standing installations in their upgrade paths.

As part of this update, the containerized development environment now uses Postgres 17 instead of 16. Developers must delete their data (rm -rf docker-dev-volumes) and start with an empty database (rerun the quickstart in the dev guide), as explained on the dev mailing list.

The Docker compose file used for evaluations or demos has been upgraded from Postgres 13 to 17.

Harvesting "oai_dc" Metadata Prefix When Extended With Specific Namespaces

Some data repositories extend the "oai_dc" metadata prefix with specific namespaces. In this case, harvesting of these datasets into Dataverse was not possible because an XML parsing error was raised.

Harvesting of these datasets has been fixed by excluding tags with namespaces that are not "dc:". That is, only harvesting metadata with the "dc" namespace. See #10837.

Harvested Dataset PID from Record Header

When harvesting, Dataverse can now use the identifier from the OAI-PMH record header as the persistent id for the harvested dataset.

This will allow harvesting from sources that do not include a persistent id in their oai_dc metadata records, but use valid DOIs or handles as the OAI-PMH record header identifiers.

It is also possible to optionally configure a harvesting client to use this OAI-PMH identifier as the preferred choice for the persistent id. See the Harvesting Clients API section of the Guides, #11049 and #10982 for more information.

Harvested Datasets Can Have Multiple "otherId" Values

When harvesting using the DDI format, datasets can now have multiple "otherId" values. See #10772.

Multiple Languages in Docker

Documentation has been added to explain how to set up multiple languages (e.g. English and French) in the tutorial for setting up Dataverse in Docker.

See the tutorial, #10939, and #10940.

GlobusBatchLookupSize

An optimization has been added for the Globus upload workflow, with a corresponding new database setting: :GlobusBatchLookupSize

See the Database Settings section of the guides, #10977, and #11040 for more information.

Bugs Fixed

Relation Type (Related Publication) and DataCite

The subfield "Relation Type" was added to the field "Related Publication" in Dataverse 6.4 (#10632) but couldn't be used without workarounds described in an announcement about the problem. The bug has been fixed and workarounds are no longer required. See #10926 and the announcement above.

Sort Order for Files

"Newest" and "Oldest" were reversed when sorting files on the dataset landing page. This has been fixed. See #10742 and #11000.

Guestbook Email Validation

In the Guestbook UI form, the email address is now checked for validity. See #10661 and #11022.

Updating Files Now Possible When Latest and Only Dataset Version is Deaccessioned

When a dataset was deaccessioned, and was the only previous version, it would cause an error when trying to update the files. This has been fixed. See #9351 and #10901.

My Data Filter by Username Feature Restored

The superuser-only feature of filtering by a username on the My Data page was not working. Entering a username in the "Results for Username" field now returns data for the desired user. See also #7239 and #10980.

Better Handling of Parallel Edit/Publish Errors

Improvements have been made in handling the errors when a dataset has been edited in one browser window and an attempt is made to edit or publish it in another. (This practice is discouraged, by the way.) See #10793 and #10794.

Facets Filter Labels Now Translated Above Search Results

On the main page, it's possible to filter results using search facets. If internationalization (i18n) has been enabled in the Dataverse installation, allowing pages to be displayed in several languages, the facets were correctly translated in the filter column at the left. However, they were not being translated above the search results, remaining in the default language, English. This has been fixed. See #9408 and #10158.

Unpublished File Bug Fix Related to Deaccessioning

A bug fix was made related to retrieval of the major version of a Dataset when all major versions were deaccessioned. This fixes the incorrect showing of the files as "Unpublished" in the search list even when they are published. In the upgrade instructions below, there is a step to reindex Solr. See also #10947 and #10974.

Minor DataCiteXML Fix (Useless Null)

A minor bug fix was made to avoid sending a useless ", null" in the DataCiteXML sent to DataCite and in the DataCite export when a dataset has a metadata entry for "Software Name" and no entry for "Software Version". The bug fix will update datasets upon publication. Anyone with existing published datasets with this problem can be fixed by pushing updated metadata to DataCite for affected datasets and re-exporting the dataset metadata. See "Pushing updated metadata to DataCite" in the upgrade instructions below. See also #10919.

PIDs and Make Data Count Citation Retrieval

Make Data Count (MDC) citation retrieval with the PID settings has been fixed. PID parsing in Dataverse is now case insensitive, improving interaction with services that may change the case of PIDs. Warnings related to managed/excluded PID lists for PID providers have been reduced. See #10708.

Quirk in Overview Display When Using External Controlled Variables

This bugfix corrects an issue when there are duplicated entries on the metadata page. It is fixed by correcting an IF-clause in metadataFragment.xhtml. See #11005 and #11034.

Globus "missing properties" Logging Fixed

In previous releases, logging would show Globus-related strings were missing from properties files. This has been fixed. See #11030.

API Updates

Editing Collections

A new endpoint (PUT /api/dataverses/<identifier>) for updating an existing collection (dataverse) has been added. It uses the same JSON structure as the one used for collection creation. See also the docs, #10904, and #10925.

fileCount Added to Search API

A new search field called fileCount can be searched to discover the number of files per dataset. The upgrade instructions below explain how to update your Solr schema.xml file to add the new field and reindex Solr. See also #8941 and #10598.

List Dataset Metadata Exporters

A list of available dataset metadata exporters can now be retrieved programmatically via API. See the docs and #10739.

Comparing Dataset Versions

An API has been added to compare dataset versions. See the docs, #10888, and #10945.

Audit Data Files

A superuser-only API endpoint has been added to audit datasets with data files where the physical files are missing or the file metadata is missing. See the docs, #11016, and #220.

Update Collection API Inheritance

The update collection (dataverse) API endpoint has been updated to support an "inherit from parent" configuration for metadata blocks, facets, and input levels.

Previously, not setting these fields meant using a copy of the settings from the parent collection, which could get out of sync. See also the docs, #11018, and #11026.

isMetadataBlockRoot and isFacetRoot

The JSON payload of the "get collection" endpoint has been extended to include properties isMetadataBlockRoot and isFacetRoot. See also the docs, #11012, and #11013.

Whitespace Trimming When Loading Metadata Block TSV Files

When loading custom metadata blocks using the api/admin/datasetfield/load API endpoint, whitespace can be introduced into field names. Whitespace is now trimmed from the beginning and end of all values read into the API before persisting them. See #10688 and #10696.

Image URLs from the Search API

As of 6.4 (#10855) image_url is being returned from the Search API. The logic has been updated to only show the image if each of the following are true:

The data file is not harvested
A thumbnail is available for the data file
If the data file is restricted, then the caller must have DownloadFile permission for the data file
The data file is NOT actively embargoed
The data file's retention period has NOT expired

Metrics API Bug Fixes

Two bugs in the Metrics API have been fixed:

The /datasets and /datasets/byMonth endpoints could report incorrect values if or when they have been called using the "dataLocation" parameter (which allows getting metrics for local, remote (harvested), or all datasets) as the metrics cache was not storing different values for these cases.
Metrics endpoints whose calculation relied on finding the latest published dataset version were incorrect if/when the minor version number was > 9.

The upgrade instructions below include a step for clearing the metrics cache.

API Tokens

An optional query parameter called "returnExpiration" has been added to the /api/users/token/recreate endpoint, which, if set to true, returns the expiration time in the response. See the docs, #10857 and #10858.

The /api/users/token endpoint has been extended to support any auth mechanism for retrieving the token information. Previously this endpoint only accepted an API token to retrieve its information. Now it accepts any authentication mechanism and returns the associated API token information. See #10914 and #10924.

Settings Added

:GlobusBatchLookupSize

Backward Incompatible Changes

Generally speaking, see the API Changelog for a list of backward-incompatible API changes.

List Collections Linked to a Dataset

The API endpoint that returns a list of collections that a dataset has been linked to has been improved to provide a more structured JSON response. See the docs, #9650, and #9665.

Complete List of Changes

For the complete list of code changes in this release, see the 6.5 milestone in GitHub.

Getting Help

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!

Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!

You are also very welcome to join the Global Dataverse Community Consortium (GDCC).

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.4.

0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.

Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

shell export PAYARA=/usr/local/payara6

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. List deployed applications

shell $PAYARA/bin/asadmin list-applications

2. Undeploy the previous version (should match "list-applications" above)

shell $PAYARA/bin/asadmin undeploy dataverse-6.4

3. Stop and start Payara

shell sudo service payara stop sudo service payara start

4. Download and deploy this version

shell wget https://github.com/IQSS/dataverse/releases/download/v6.5/dataverse-6.5.war $PAYARA/bin/asadmin deploy dataverse-6.5.war

Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.

shell sudo service payara stop sudo rm -rf $PAYARA/glassfish/domains/domain1/generated sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases

5. For installations with internationalization:

Please remember to update translations via Dataverse language packs.

6. Restart Payara

shell sudo service payara stop sudo service payara start 7. Update Solr schema.xml file. Start with the standard v6.5 schema.xml, then, if your installation uses any custom or experimental metadata blocks, update it to include the extra fields (step 7a).

Run the commands below as a non-root user.

Stop Solr (usually sudo service solr stop, depending on Solr installation/OS, see the Installation Guide).

shell sudo service solr stop

Replace schema.xml

Please note that the path to Solr may differ from the example below.

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.5/conf/solr/schema.xml sudo cp schema.xml /usr/local/solr/solr-9.4.1/server/solr/collection1/conf

Start Solr (but if you use any custom metadata blocks, perform the next step, 7a first).

shell sudo service solr start

7a. For installations with custom or experimental metadata blocks:

Before starting Solr, update the schema.xml file to include all the extra metadata fields that your installation uses.

We do this by collecting the output of Dataverse's Solr schema API endpoint (/api/admin/index/solr/schema) and piping it to the update-fields.sh script which updates the schema.xml file supplied as an argument.

The example below assumes the default installation location of Solr, but you can modify the commands as needed.

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.5/conf/solr/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | sudo ./update-fields.sh /usr/local/solr/solr-9.4.1/server/solr/collection1/conf/schema.xml

Now start Solr.

shell sudo service solr start

8. Reindex Solr

Below is the simplest way to reindex Solr:

shell curl http://localhost:8080/api/admin/index

The API above rebuilds the existing index. If you want to be absolutely sure that your index is up-to-date and consistent, you may consider wiping it clean and reindexing everything from scratch (see the guides). Just note that, depending on the size of your database, a full reindex may take a while and the users will be seeing incomplete search results during that window.

9. Run reExportAll to update dataset metadata exports

Below is the simple way to reexport all dataset metadata. For more advanced usage, please see the guides.

shell curl http://localhost:8080/api/admin/metadata/reExportAll

10. Clear metrics cache

Run the clearMetricsCache API endpoint to remove old cached values that may be incorrect.

shell curl -X DELETE http://localhost:8080/api/admin/clearMetricsCache

11. Pushing updated metadata to DataCite

(If you don't use DataCite, you can skip this. Also, if you aren't affected by the "useless null" bug described above, you can skip this.)

Entries at DataCite for published datasets can be updated by a superuser using an API call (newly documented):

curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/modifyRegistrationPIDMetadataAll

This will loop through all published datasets (and released files with PIDs). As long as the loop completes, the call will return a 200/OK response. Any PIDs for which the update fails can be found using the following command:

grep 'Failure for id' server.log

Failures may occur if PIDs were never registered, or if they were never made findable. Any such cases can be fixed manually in DataCite Fabrica or using the Reserve a PID API call and the newly documented /api/datasets/<id>/modifyRegistration call respectively. See https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#send-dataset-metadata-to-pid-provider. Please reach out with any questions.

PIDs can also be updated by a superuser on a per-dataset basis using

curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/<id>/modifyRegistrationMetadata

- Java
Published by ofahimIQSS over 1 year ago

dataverse - v6.4

Dataverse 6.4

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.4 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

New features in Dataverse 6.4:

Enhanced DataCite Metadata, including "Relation Type"
All ISO 639-3 languages are now supported
There is now a button for "Unlink Dataset"
Users will have DOIs/PIDs reserved for their files as part of file upload instead of at publication time
Datasets can now have types such as "software" or "workflow"
Croissant support
RO-Crate support
and more! Please see below.

New client library:

Rust

This release also fixes two important bugs described below and in a post on the mailing list:

"Update Current Version" can cause metadata loss
Publishing breaks designated dataset thumbnail, messes up collection page

Additional details on the above as well as many more features and bug fixes included in the release are described below. Read on!

Features Added

Enhanced DataCite Metadata, Including "Relation Type"

Within the "Related Publication" field, a new subfield has been added called "Relation Type" that allows for the most common values recommended by DataCite: isCitedBy, Cites, IsSupplementTo, IsSupplementedBy, IsReferencedBy, and References. For existing datasets where no "Relation Type" has been specified, "IsSupplementTo" is assumed.

Dataverse now supports the DataCite v4.5 schema. Additional metadata is now being sent to DataCite including metadata about related publications and files in the dataset. Improved metadata is being sent including how PIDs (ORCID, ROR, DOIs, etc.), license/terms, geospatial, and other metadata are represented. The enhanced metadata will automatically be sent to DataCite when datasets are created and published. Additionally, after publication, you can inspect what was sent by looking at the DataCite XML export.

The additions are in rough alignment with the OpenAIRE XML export, but there are some minor differences in addition to the Relation Type addition, including an update to the DataCite 4.5 schema. For details see #10632, #10615 and the design document referenced there.

Multiple backward incompatible changes and bug fixes have been made to API calls (three of four of which were not documented) related to updating PID target URLs and metadata at the provider service: - Update Target URL for a Published Dataset at the PID provider - Update Target URL for all Published Datasets at the PID provider - Update Metadata for a Published Dataset at the PID provider - Update Metadata for all Published Datasets at the PID provider

Full List of ISO 639-3 Languages Now Supported

The controlled vocabulary values list for the metadata field "Language" in the citation block has now been extended to include roughly 7920 ISO 639-3 values.

Some of the language entries in the pre-6.4 list correspond to "macro languages" in ISO-639-3 and admins/users may wish to update to use the corresponding individual language entries from ISO-639-3. As these cases are expected to be rare (they do not involve major world languages), finding them is not covered in the release notes. Anyone who desires help in this area is encouraged to reach out to the Dataverse community via any of the standard communication channels.

ISO 639-3 codes were downloaded from sil.org and the file used for merging with the existing citation.tsv was "iso-639-3.tab". See also #8578 and #10762.

Unlink Dataset Button

A new "Unlink Dataset" button has been added to the dataset page to allow a user to unlink a dataset from a collection. To unlink a dataset the user must have permission to link the dataset. Additionally, the existing API for unlinking datasets has been updated to no longer require superuser access as the "Publish Dataset" permission is now enough. See also #10583 and #10689.

Pre-Publish File DOI Reservation

Dataverse installations using DataCite as a persistent identifier (PID) provider (or other providers that support reserving PIDs) will be able to reserve PIDs for files when they are uploaded (rather than at publication time). Note that reserving file DOIs can slow uploads with large numbers of files so administrators may need to adjust timeouts (specifically any Apache "ProxyPass / ajp://localhost:8009/ timeout=" setting in the recommended Dataverse configuration). See also #7334.

Initial Support for Dataset Types

Out of the box, all datasets now have the type "dataset" but superusers can add additional types. At this time the type of a dataset can only be set at creation time via API. The types "dataset", "software", and "workflow" (just those three, for now) will be sent to DataCite (as resourceTypeGeneral) when the dataset is published.

For details see the guides, #10517 and #10694. Please note that this feature is highly experimental and is expected to evolve.

Croissant Support (Metadata Export)

A new metadata export format called Croissant is now available as an external metadata exporter. It is oriented toward making datasets consumable by machine learning.

For more about the Croissant exporter, including installation instructions, see https://github.com/gdcc/exporter-croissant. See also #10341, #10533, and discussion on the mailing list.

Please note: the Croissant exporter works best with Dataverse 6.2 and higher (where it updates the content of <head> as described in the guides) but can be used with 6.0 and higher (to get the export functionality).

RO-Crate Support (Metadata Export)

Dataverse now supports RO-Crate as a metadata export format. This functionality is not available out of the box, but you can enable one or more RO-Crate exporters from the list of external exporters. See also #10744 and #10796.

Rust API Client Library

An Dataverse API client library for the Rust programming language is now available at https://github.com/gdcc/rust-dataverse and has been added to the list of client libraries in the API Guide. See also #10758.

Collection Thumbnail Logo for Featured Collections

Collections can now have a thumbnail logo that is displayed when the collection is configured as a featured collection. If present, this thumbnail logo is shown. Otherwise, the collection logo is shown. Configuration is done under the "Theme" for a collection as explained in the guides. See also #10291 and #10433.

Saved Searches Can Be Deleted

Saved searches can now be deleted via API. See the Saved Search section of the API Guide, #9317 and #10198.

Notification Email Improvement

When notification emails are sent the part of the closing that says "contact us for support at" will now show the support email address (dataverse.mail.support-email), when configured, instead of the default system email address. Using the system email address here was particularly problematic when it was a "noreply" address. See also #10287 and #10504.

Ability to Disable Automatic Thumbnail Selection

It is now possible to turn off the feature that automatically selects one of the image datafiles to serve as the thumbnail of the parent dataset. An admin can turn it off by enabling the feature flag dataverse.feature.disable-dataset-thumbnail-autoselect. When the feature is disabled, a user can still manually pick a thumbnail image, or upload a dedicated thumbnail image. See also #10820.

More Flexible PermaLinks

The configuration setting dataverse.pid.*.permalink.base-url, which is used for PermaLinks, has been updated to support greater flexibility. Previously, the string /citation?persistentId= was automatically appended to the configured base URL. With this update, the base URL will now be used exactly as configured, without any automatic additions. See also #10775.

Globus Async Framework

A new alternative implementation of Globus polling during upload data transfers has been added in this release. This experimental framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. See globus-use-experimental-async-framework under Feature Flags and dataverse.files.globus-monitoring-server in the Installation Guide. See also #10623 and #10781.

CVoc (Controlled Vocabulary): Allow ORCID and ROR to Be Used Together in Author Field

Changes in Dataverse and updates to the ORCID and ROR external vocabulary scripts support deploying these for the citation block author field (and others). See also #10711, #10712, and https://github.com/gdcc/dataverse-external-vocab-support/pull/22.

Development on Windows

New instructions have been added for developers on Windows trying to run a Dataverse development environment using Windows Subsystem for Linux (WSL). See the guides, #10606, and #10608.

Experimental Crossref PID (DOI) Provider

Crossref can now be used as a PID (DOI) provider, but this feature is experimental. Please provide feedback through the usual channels. See also the guides, #8581, and #10806.

Improved JSON Schema Validation for Datasets

JSON Schema validation has been enhanced with checks for required and allowed child objects as well as type checking for field types including primitive, compound and controlledVocabulary. More user-friendly error messages help pinpoint the issues in the dataset JSON. See Retrieve a Dataset JSON Schema for a Collection in the API Guide, #10169, and #10543.

Counter Processor 1.05 Support (Make Data Count)

Counter Processor 1.05 is now supported for use with Make Data Count. If you are running Counter Processor, you should reinstall/reconfigure it as described in the latest guides. Note that Counter Processor 1.05 requires Python 3, so you will need to follow the full Counter Processor install. Also note that if you configure the new version the same way, it will reprocess the days in the current month when it is first run. This is normal and will not affect the metrics in Dataverse. See also #10479.

Version Tags for Container Base Images

With this release we introduce a detailed maintenance workflow for our container images. As output of the Containerization Working Group, the community takes another step towards production ready containers available directly from the core project.

The maintenance workflow regularly updates the Container Base Image, which contains the operating system, Java, Payara, and tools and libraries required by the Dataverse application. Shipping these rolling releases as well as immutable revisions is the foundation for secure and reliable Dataverse Application Container images. See also #10478 and #10827.

Bugs Fixed

Update Current Version

A significant bug in the superuser-only Update Current Version publication option was fixed. If the "Update Current Version" option was used when changes were made to the dataset terms (rather than to dataset metadata) or if the PID provider service was down or returned an error, the update would fail and render the dataset unusable and require restoration from a backup. The fix in this release allows the update to succeed in both of these cases and redesigns the functionality such that any unknown issues should not make the dataset unusable (i.e. the error would be reported and the dataset would remain in its current state with the last-published version as it was and changes still in the draft version.)

If you do not plan to upgrade to Dataverse 6.4 right away, you are encouraged to alert your superusers to this issue (see this post). Here are some workarounds for pre-6.4 versions:

Change the "dataset.updateRelease" entry in the Bundle.properties file (or local language version) to "Do Not Use" or similar (this doesn't disable the button but alerts superusers to the issue), or
Edit the dataset.xhtml file to remove the lines below, delete the contents of the generated and osgi-cache directories in the Dataverse Payara domain, and restart the Payara server. This will remove the "Update Current Version" from the UI.

<c:if test="#{dataverseSession.user.isSuperuser()}"> <f:selectItem rendered="#" itemLabel="#{bundle['dataset.updateRelease']}" itemValue="3" /> </c:if>

Again, the workarounds above are only for pre-6.4 versions. The bug has been fixed in Dataverse 6.4. See also #10797.

Broken Thumbnails

Dataverse 6.3 introduced a bug where publishing would break the dataset thumbnail, which in turn broke the rendering of the parent collection (dataverse) page.

This bug has been fixed but any existing broken thumbnails must be fixed manually. See "clearThumbnailFailureFlag" in the upgrade instructions below.

Additionally, it is now possible to turn off the feature that automatically selects of one of the image datafiles to serve as the thumbnail of the parent dataset. An admin can turn it off by raising the feature flag <jvm-options>-Ddataverse.feature.disable-dataset-thumbnail-autoselect=true</jvm-options>. When the feature is disabled, a user can still manually pick a thumbnail image, or upload a dedicated thumbnail image.

See also #10819, #10820, and the post on the mailing list.

No License, No Terms of Use

When datasets have neither a license nor custom terms of use, the dataset page will now indicate this. Also, these datasets will no longer be indexed as having custom terms. See also #8796, #10513, and #10614.

CC0 License Bug Fix

At a high level, some datasets have been mislabeled as "Custom License" when they should have been "CC0 1.0". This has been corrected.

In Dataverse 5.10, datasets with only "CC0 Waiver" in the "termsofuse" field were converted to "Custom License" (instead of the CC0 1.0 license) through a SQL migration script (see #10634). On deployment of Dataverse 6.4, a new SQL migration script will be run automatically to correct this, changing these datasets to CC0. You can review the script in #10634, which only affect the following datasets:

The existing "Terms of Use" must be equal to "This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions: CC0 Waiver" (this was set in #10634).
The following terms fields must be empty: Confidentiality Declaration, Special Permissions, Restrictions, Citation Requirements, Depositor Requirements, Conditions, and Disclaimer.
The license ID must not be assigned.

The script will set the license ID to that of the CC0 1.0 license and remove the contents of "termsofuse" field. See also #9081 and #10634.

Remap oai_dc Export and Harvesting Format Fields: dc:type and dc:date

The oai_dc export and harvesting format has had the following fields remapped:

dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset".
dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped the field "Publication Date" or the field used for the citation date, if set (see Set Citation Date Field Type for a Dataset).

In order for these changes to be reflected in existing datasets, a reexport all should be run (mentioned below). See #8129 and #10737.

Zip File No Longer Misdetected as Shapefile (Hidden Directories)

When detecting files types, Dataverse would previously detect a zip file as a shapefile if it contained markers of a shapefile in hidden directories. These hidden directories are now ignored when deciding if a zip file is a shapefile or not. See also #8945 and #10627.

External Controlled Vocabulary

This release fixes a bug (introduced in v6.3) in the external controlled vocabulary mechanism that could cause indexing to fail (with a NullPointerException) when a script is configured for one child field and no other child fields were managed. See also #10869 and #10870.

Valid JSON in Error Response

When any ApiBlockingFilter policy applies to a request, the JSON in the body of the error response is now valid JSON. See also #10085.

Docker Container Base Image Security and Compatibility

Switch "wait-for" to "wait4x", aligned with the Configbaker Image
Update "jattach" to v2.2
Install AMD64 / ARM64 versions of tools as necessary
Run base image as unprivileged user by default instead of root - this was an oversight from OpenShift changes
Linux User, Payara Admin and Domain Master passwords:
- Print hints about default, public knowledge passwords in place for
- Enable replacing these passwords at container boot time
Enable building with updates Temurin JRE image based on Ubuntu 24.04 LTS
Fix entrypoint script troubles with pre- and postboot script files
Unify location of files at CONFIG_DIR=/opt/payara/config, avoid writing to other places

Cleanup of Temp Directories

In this release we addressed an issue where copies of files uploaded via the UI were left in one specific temp directory (.../domain1/uploads by default). We would like to remind all the installation admins that it is strongly recommended to have some automated (and aggressive) cleanup mechanisms in place for all the temp directories used by Dataverse. For example, at Harvard/IQSS we have the following configuration for the PrimeFaces uploads directory above: (note that, even with this fix in place, PrimeFaces will be leaving a large number of small log files in that location)

Instead of the default location (.../domain1/uploads) we use a directory on a dedicated partition, outside of the filesystem where Dataverse is installed, via the following JVM option:

<jvm-options>-Ddataverse.files.uploads=/uploads/web</jvm-options>

and we have a dedicated cronjob that runs every 30 minutes and deletes everything older than 2 hours in that directory:

15,45 * * * * /bin/find /uploads/web/ -mmin +119 -type f -name "upload*" -exec rm -f {} \; > /dev/null 2>&1

Trailing Commas in Author Name Now Permitted

When an author name ended in a comma (e.g. Smith, or Smith,), the dataset page was broken after publishing (a "500" error page was presented to the user). The underlying issue causing the JSON-LD Schema.org output on the page to break was fixed. See #10343 and #10776.

API Updates

Search API: affiliation, parentDataverseName, image_url, etc.

The Search API (/api/search) response now includes additional fields, depending on the type.

For collections (dataverses):

"affiliation"
"parentDataverseName"
"parentDataverseIdentifier"
"image_url" (optional)

javascript "items": [ { "name": "Darwin's Finches", ... "affiliation": "Dataverse.org", "parentDataverseName": "Root", "parentDataverseIdentifier": "root", "image_url":"/api/access/dvCardImage/{identifier}" (etc, etc)

For datasets:

"image_url" (optional)

javascript "items": [ { ... "image_url": "http://localhost:8080/api/datasets/2/logo" ... (etc, etc)

For files:

"releaseOrCreateDate"
"image_url" (optional)

javascript "items": [ { "name": "test.png", ... "releaseOrCreateDate": "2016-05-10T12:53:39Z", "image_url":"/api/access/datafile/42?imageThumb=true" (etc, etc)

These examples are also shown in the Search API section of the API Guide.

The image_url field was already part of the SolrSearchResult JSON (and incorrectly appeared in Search API documentation), but it wasn't returned by the API because it was appended only after the Solr query was executed in the SearchIncludeFragment of JSF (the old/current UI framework). Now, the field is set in SearchServiceBean, ensuring it is always returned by the API when an image is available.

The Solr schema.xml file has been updated to include a new field called "dvParentAlias" for supporting the new response field "parentDataverseIdentifier". See upgrade instructions below.

Search API: publicationStatuses

The Search API (/api/search) response will now include publicationStatuses in the JSON response as long as the list is not empty.

Example:

javascript "items": [ { "name": "Darwin's Finches", ... "publicationStatuses": [ "Unpublished", "Draft" ], (etc, etc)

Search Facet Information Exposed

A new endpoint /api/datasetfields/facetables lists all facetable dataset fields defined in the installation, as described in the guides.

A new optional query parameter "returnDetails" added to /api/dataverses/{identifier}/facets/ endpoint to include detailed information of each DataverseFacet, as described in the guides. See also #10726 and #10727.

User Permissions on Collections

A new endpoint at /api/dataverses/{identifier}/userPermissions for obtaining the user permissions on a collection (dataverse) has been added. See also the guides, #10749 and #10751.

addDataverse Extended

The addDataverse (/api/dataverses/{identifier}) API endpoint has been extended to allow adding metadata blocks, input levels and facet IDs at creation time, as the Dataverse page in create mode does in JSF. See also the guides, #10633 and #10644.

Metadata Blocks and Display on Create

The /api/dataverses/{identifier}/metadatablocks endpoint has been fixed to not return fields marked as displayOnCreate=true if there is an input level with include=false, when query parameters returnDatasetFieldTypes=true and onlyDisplayedOnCreate=true are set. See also #10741 and #10767.

The fields "depositor" and "dateOfDeposit" in the citation.tsv metadata block file have been updated to have the property "displayOnCreate" set to TRUE. In practice, only the API is affected because the UI has special logic that already shows these fields when datasets are created. See also and #10850 and #10884.

Feature Flags Can Be Listed

It is now possible to list all feature flags and see if they are enabled or not. See also the guides and #10732.

Settings Added

The following settings have been added:

dataverse.feature.disable-dataset-thumbnail-autoselect
dataverse.feature.globus-use-experimental-async-framework
dataverse.files.globus-monitoring-server
dataverse.pid.*.crossref.url
dataverse.pid.*.crossref.rest-api-url
dataverse.pid.*.crossref.username
dataverse.pid.*.crossref.password
dataverse.pid.*.crossref.depositor
dataverse.pid.*.crossref.depositor-email

Backward Incompatible Changes

The oaidc export format has changed. See the "Remap oaidc" section above.
Several APIs related to DataCite have changed. See "More and Better Data Sent to DataCite" above.

Complete List of Changes

For the complete list of code changes in this release, see the 6.4 milestone in GitHub.

Getting Help

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!

Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!

You are also very welcome to join the Global Dataverse Community Consortium (GDCC).

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.3.

0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

shell export PAYARA=/usr/local/payara6`

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. Undeploy the previous version

shell $PAYARA/bin/asadmin undeploy dataverse-6.3

2. Stop and start Payara

shell service payara stop sudo service payara start

3. Deploy this version

shell $PAYARA/bin/asadmin deploy dataverse-6.4.war

Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.

shell service payara stop rm -rf $PAYARA/glassfish/domains/domain1/generated rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache rm -rf $PAYARA/glassfish/domains/domain1/lib/databases

4. For installations with internationalization:

Please remember to update translations via Dataverse language packs.

5. Restart Payara

shell service payara stop service payara start

6. Update metadata blocks

These changes reflect incremental improvements made to the handling of core metadata fields.

```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.4/scripts/api/data/metadatablocks/citation.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv ```

7. Update Solr schema.xml file. Start with the standard v6.4 schema.xml, then, if your installation uses any custom or experimental metadata blocks, update it to include the extra fields (step 7a).

Stop Solr (usually service solr stop, depending on Solr installation/OS, see the Installation Guide).

shell service solr stop

Replace schema.xml

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.4/conf/solr/schema.xml cp schema.xml /usr/local/solr/solr-9.4.1/server/solr/collection1/conf

Start Solr (but if you use any custom metadata blocks, perform the next step, 7a first).

shell service solr start

7a. For installations with custom or experimental metadata blocks:

Before starting Solr, update the schema to include all the extra metadata fields that your installation uses. We do this by collecting the output of the Dataverse schema API and feeding it to the update-fields.sh script that we supply, as in the example below (modify the command lines as needed to reflect the names of the directories, if different):

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.4/conf/solr/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.4.1/server/solr/collection1/conf/schema.xml

Now start Solr.

8. Reindex Solr

Below is the simplest way to reindex Solr:

shell curl http://localhost:8080/api/admin/index

The API above rebuilds the existing index "in place". If you want to be absolutely sure that your index is up-to-date and consistent, you may consider wiping it clean and reindexing everything from scratch (see the guides). Just note that, depending on the size of your database, a full reindex may take a while and the users will be seeing incomplete search results during that window.

9. Run reExportAll to update dataset metadata exports

This step is necessary because of changes described above for the Datacite and oai_dc export formats.

Below is the simple way to reexport all dataset metadata. For more advanced usage, please see the guides.

shell curl http://localhost:8080/api/admin/metadata/reExportAll

10. Pushing updated metadata to DataCite

(If you don't use DataCite, you can skip this.)

Above you updated the citation metadata block and Solr with the new "relationType" field. With these two changes, the "Relation Type" fields will be available and creation/publication of datasets will result in the expanded XML being sent to DataCite. You've also already run "reExportAll" to update the Datacite metadata export format.

Entries at DataCite for published datasets can be updated by a superuser using an API call (newly documented):

curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/modifyRegistrationPIDMetadataAll

This will loop through all published datasets (and released files with PIDs). As long as the loop completes, the call will return a 200/OK response. Any PIDs for which the update fails can be found using the following command:

grep 'Failure for id' server.log

Failures may occur if PIDs were never registered, or if they were never made findable. Any such cases can be fixed manually in DataCite Fabrica or using the Reserve a PID API call and the newly documented /api/datasets/<id>/modifyRegistration call respectively. See https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#send-dataset-metadata-to-pid-provider. Please reach out with any questions.

PIDs can also be updated by a superuser on a per-dataset basis using

curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/<id>/modifyRegistrationMetadata

Additional Upgrade Steps

11. If there are broken thumbnails

To restore any broken thumbnails caused by the bug described above, you can call the http://localhost:8080/api/admin/clearThumbnailFailureFlag API, which will attempt to clear the flag on all files (regardless of whether caused by this bug or some other problem with the file) or the http://localhost:8080/api/admin/clearThumbnailFailureFlag/$FILE_ID to clear the flag for individual files. Calling the former, batch API is recommended.

12. PermaLinks with custom base-url

If you currently use PermaLinks with a custom base-url: You must manually append /citation?persistentId= to the base URL to maintain functionality.

If you use a PermaLinks without a configured base-url, no changes are required.

- Java
Published by ofahimIQSS over 1 year ago

dataverse - v6.3

Dataverse 6.3

Summary

New Contributor Guide. The UX Working Group released a new Dataverse Contributor Guide.
Search Performance Improvements. Solr indexing and searching were improved, speeding up performance. Larger installations take note.
Dataverse Now Supports File-level Retention Periods. See the Retention Periods section of the guide for details.
API Optimizations for Large Datasets. Search API and permission checking have been improved for datasets with thousands of files.
Improved Controlled Vocabulary Support. Improvements include updates to the citation metadata block's Language field and multiple extensions added to the external vocabulary mechanism.
Improved Detection of RO-Crate Files. Dataverse now detects mime-types based on filename extensions and detects RO-Crate metadata files.
Sitemap Now Supports More Than 50K Items. Dataverse can now handle more than 50,000 items when generating sitemap files. For details, see the sitemap section of the Installation Guide.
Infrastructure Updates. Payara and Solr have been updated.

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.3 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Two experimental features flag called "add-publicobject-solr-field" and "avoid-expensive-solr-join" have been added to change how Solr documents are indexed for public objects and how Solr queries are constructed to accommodate access to restricted content (drafts, etc.). It is hoped that it will help with performance, especially on large instances and under load.
Before the search feature flag ("avoid-expensive...") can be turned on, the indexing flag must be enabled, and a full reindex performed. Otherwise publicly available objects are NOT going to be shown in search results.
A feature flag called "reduce-solr-deletes" has been added to improve how datafiles are indexed. When the flag is enabled, Dataverse will avoid pre-emptively deleting existing Solr documents for the files prior to sending updated information. This should improve performance and will allow additional optimizations going forward.
The /api/admin/index/status and /api/admin/index/clear-orphans calls (see https://guides.dataverse.org/en/latest/admin/solr-search-index.html#index-and-database-consistency) will now find and remove (respectively) additional permissions related Solr documents that were not being detected before. Reducing the overall number of documents will improve Solr performance and large sites may wish to periodically call the "clear-orphans" API.
Dataverse now relies on the autoCommit and autoSoftCommit settings in the Solr configuration instead of explicitly committing documents to the Solr index. This improves indexing speed.

File Retention Period

Dataverse now supports file-level retention periods. The ability to set retention periods, with a minimum duration (in months), can be configured by a Dataverse installation administrator. For more information, see the Retention Periods section of the User Guide.

Users can configure a specific retention period, defined by an end date and a short reason, on a set of selected files or an individual file, by selecting the "Retention Period" menu item and entering information in a popup dialog. Retention periods can only be set, changed, or removed before a file has been published. After publication, only Dataverse installation administrators can make changes, using an API.
After the retention period expires, files can not be previewed or downloaded (as if restricted, with no option to allow access requests). The file (landing) page and all the metadata remains available.

↑ Table of Contents

Features

Large Datasets Improvements

For scenarios involving API calls related to large datasets (numerous files, for example: ~10k) the following have been been optimized:

The Search API endpoint.
The permission checking logic present in PermissionServiceBean.

Improved Controlled Vocabulary for Citation Block

The Controlled Vocabuary Values list for the "Language" metadata field in the citation block has been improved, with some missing two- and three-letter ISO 639 codes added, as well as more alternative names for some of the languages, making all these extra language identifiers importable. See also #8243.

Updates on Support for External Vocabulary Services

Multiple extensions of the external vocabulary mechanism have been added. These extensions allow interaction with services based on the Ontoportal software and are expected to be generally useful for other service types.

These changes include:

Improved Indexing with Compound Fields: When using an external vocabulary service with compound fields, you can now specify which field(s) will include additional indexed information, such as translations of an entry into other languages. This is done by adding the indexIn in retrieval-filtering. See also #10505 and GDCC/dataverse-external-vocab-support documentation.
Broader Support for Indexing Service Responses: Indexing of the results from retrieval-filtering responses can now handle additional formats including JSON arrays of strings and values from arbitrary keys within a JSON Object. See #10505.
HTTP Headers: You are now able to add HTTP request headers required by the service you are implementing. See #10331.
Flexible params in retrievalUri: You can now use managed-fields field names as well as the term-uri-field field name as parameters in the retrieval-uri when configuring an external vocabulary service. {0} as an alternative to using the term-uri-field name is still supported for backward compatibility. Also you can specify if the value must be url encoded with encodeUrl:. See #10404.

For example : "retrieval-uri": "https://data.agroportal.lirmm.fr/ontologies/{keywordVocabulary}/classes/{encodeUrl:keywordermURL}"

Hidden HTML Fields External controlled vocabulary scripts, configured via :CVocConf, can now access the values of managed fields as well as the term-uri-field for use in constructing the metadata view for a dataset. These values are now added as hidden elements in the HTML and can be found with the HTML attribute data-cvoc-metadata-name. See also #10503.

A Contributor Guide is now available

A new Contributor Guide has been added by the UX Working Group (#10531 and #10532).

URL Validation Is More Permissive

URL validation now allows two slashes in the path component of the URL. Among other things, this allows metadata fields of url type to be filled with more complex url such as https://archive.softwareheritage.org/browse/directory/561bfe6698ca9e58b552b4eb4e56132cac41c6f9/?origin_url=https://github.com/gem-pasteur/macsyfinder&revision=868637fce184865d8e0436338af66a2648e8f6e1&snapshot=1bde3cb370766b10132c4e004c7cb377979928d1

Improved Detection of RO-Crate Files

Detection of mime-types based on a filename with extension and detection of the RO-Crate metadata files.

From now on, filenames with extensions can be added into MimeTypeDetectionByFileName.properties file. Filenames added there will take precedence over simply recognizing files by extensions. For example, two new filenames are added into that file: ro-crate-metadata.json=application/ld+json; profile="http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted https://w3id.org/ro/crate" ro-crate-metadata.jsonld=application/ld+json; profile="http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted https://w3id.org/ro/crate"

Therefore, files named ro-crate-metadata.json will be then detected as RO-Crated metadata files from now on, instead as generic JSON files. For more information on the RO-Crate specifications, see https://www.researchobject.org/ro-crate

New S3 Tagging Configuration Option

If your S3 store does not support tagging and gives an error if you configure direct upload, you can disable the tagging by using the dataverse.files.<id>.disable-tagging JVM option. For more details, see the section on S3 tags in the guides, #10022 and #10029.

Feature Flag To Remove the Required "Reason" Field in the "Return to Author" Dialog

A reason field, that is required to not be empty, was added to the "Return to Author" dialog in v6.2. Installations that handle author communications through email or another system may prefer to not be required to use this new field. v6.3 includes a new disable-return-to-author-reason feature flag that can be enabled to drop the reason field from the dialog and make sending a reason optional in the api/datasets/{id}/returnToAuthor call. See also #10655.

Improved Use of Dataverse Thumbnail

Dataverse will use the dataset thumbnail, if one is defined, rather than the generic Dataverse logo in the Open Graph metadata header. This means the image will be seen when, for example, the dataset is referenced on Facebook. See also #5621.

Improved Email Notifications When Guestbook is Used for File Access Requests

Multiple improvements to guestbook response emails making it easier to organize and process them. The subject line of the notification email now includes the name and user identifier of the requestor. Additionally, the body of the email now includes the user id of the requestor. Finally the guestbook responses have been sorted and spaced to improve readability. See also #10581.

New keywordTermURI Metadata Field in the Citation Metadata Block

A new metadata field - keywordTermURI, has been added in the citation metadata block (as a fourth child field under the keyword parent field). This has been done to improve usability and to facilitate the integration of controlled vocabulary services, adding the possibility of saving the "term" and/or its associated URI. For more information, see #10288 and PR #10371.

Updated Computational Workflow Metadata Block

The optional computational workflow metadata block has been updated to present a clickable link for the External Code Repository URL field. See also #10339.

Metadata Source Facet Added

An option has been added to index the name of the harvesting client as the "Metadata Source" of harvested datasets and files; if enabled, the Metadata Source facet will show separate entries for the content harvested from different sources, instead of the current, default behavior where there is one "Harvested" facet for all such content.

Tho enable this feature, set the optional feature flage (jvm option) dataverse.feature.index-harvested-metadata-source=true before reindexing.

Additional Facet Settings

Extra settings have been added giving an instance admin more choices in selectively limiting the availability of search facets on the collection and dataset pages.

See Disable Solr Facets under the configuration section of the Installation Guide for more info as well as #10570.

Sitemap Now Supports More Than 50k Items

Dataverse can now handle more than 50,000 items when generating sitemap files, splitting the content across multiple files to comply with the Sitemap protocol. For details, see the sitemap section of the Installation Guide. See also #8936 and #10321.

MIT and Apache 2.0 Licenses Added

New files have been added to import the MIT and Apache 2.0 Licenses to Dataverse:

licenseMIT.json
licenseApache-2.0.json

Guidance has been added to the guides to explain the procedure for adding new licenses to Dataverse.

3D Viewer by Open Forest Data

3DViewer by openforestdata.pl has been added to the list of external tools. See also #10561.

Datalad Integration With Dataverse

DataLad has been integrated with Dataverse. For more information, see the integrations section of the guides. See also #10468.

Rsync Support Has Been Deprecated

Support for rsync has been deprecated. Information has been removed from the guides for rsync and related software such as Data Capture Module (DCM) and Repository Storage Abstraction Layer (RSAL). You can still find this information in older versions of the guides. See Settings, below, for deprecated settings. See also #8985.

↑ Table of Contents

Bug Fixes

OpenAPI Re-Enabled

In Dataverse 6.0 when Payara was updated it caused the url /openapi to stop working:

https://github.com/IQSS/dataverse/issues/9981
https://github.com/payara/Payara/issues/6369

In addition to fixing the /openapi URL, we are also making some changes on how we provide the OpenAPI document:

When it worked in Dataverse 5.x, the /openapi output was generated automatically by Payara, but in this release we have switched to OpenAPI output produced by the SmallRye OpenAPI plugin. This gives us finer control over the output.

For more information, see the section on OpenAPI in the API Guide and #10328.

Re-Addition of "Cell Counting" to Life Sciences Block

In the Life Sciences metadata block under the "Measurement Type" field the value cell counting was accidentally removed in v5.1. It has been restored. See also #8655 and #9735.

Math Challenge Fixed on 403 Error Page

On the "forbidden" (403) error page, the math challenge now correctly displays so that the contact form can be submitted. See also #10466.

Ingest Option Bug Fixed

A bug that prevented the "Ingest" option in the file page "Edit File" menu from working has been fixed. See also #10568.

Incomplete Metadata Bug Fix

A bug was fixed where the incomplete metadata label was being shown for published dataset with incomplete metadata in certain scenarios. This label will now be shown for draft versions of such datasets and published datasets that the user can edit. This label can also be made invisible for published datasets (regardless of edit rights) with the new option dataverse.ui.show-validity-label-when-published set to false. See also #10116.

Identical Role Error Message

An error is now correctly reported when an attempt is made to assign an identical role to the same collection, dataset, or file. See also #9729 and #10465.

↑ Table of Contents

API

Superuser Endpoint

The existing API endpoint for toggling the superuser status of a user has been deprecated in favor of a new API endpoint that allows you to explicitly and idempotently set the status as true or false. For details, see the API Guide, #9887 and #10440.

New Featured Collections Endpoints

New API endpoints have been added to allow you to add or remove featured collections from a collection.

See also the sections on listing, setting, and removing featured collections in the API Guide, #10242 and #10459.

Dataset Version Endpoint Extended

The API endpoint for getting the Dataset version has been extended to include latestVersionPublishingStatus. See also #10330.

New Optional Query Parameters for Metadatablocks Endpoints

New optional query parameters have been added to api/metadatablocks and api/dataverses/{id}/metadatablocks endpoints:

returnDatasetFieldTypes: Whether or not to return the dataset field types present in each metadata block. If not set, the default value is false.
Setting the query parameter onlyDisplayedOnCreate=true also returns metadata blocks with dataset field type input levels configured as required on the General Information page of the collection, in addition to the metadata blocks and their fields with the property displayOnCreate=true.

Dataverse Payload Includes Release Status

The Dataverse object returned by /api/dataverses has been extended to include "isReleased": {boolean}. See also #10491.

New Field Type Input Level Endpoint

A new endpoint api/dataverses/{id}/inputLevels has been created for updating the dataset field type input levels of a collection via API. See also #10477.

Banner Message Endpoint Extended

The endpoint api/admin/bannerMessage has been extended so the ID is returned when created. See also #10565.

↑ Table of Contents

Settings

Database Settings:

New:

:DisableSolrFacets

Deprecated (used with rsync):

:DataCaptureModuleUrl
:DownloadMethods
:LocalDataAccessPath
:RepositoryStorageAbstractionLayerUrl

New Configuration Options

dataverse.files.<id>.disable-tagging
dataverse.feature.add-publicobject-solr-field
dataverse.feature.avoid-expensive-solr-join
dataverse.feature.reduce-solr-deletes
dataverse.feature.disable-return-to-author-reason
dataverse.feature.index-harvested-metadata-source
dataverse.ui.show-validity-label-when-published

↑ Table of Contents

Complete List of Changes

For the complete list of code changes in this release, see the 6.3 Milestone in GitHub.

↑ Table of Contents

Getting Help

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

↑ Table of Contents

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.2.

0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

export PAYARA=/usr/local/payara6

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin undeploy dataverse-6.2

2. Stop Payara and remove the following directories:

shell service payara stop rm -rf $PAYARA/glassfish/domains/domain1/generated rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache rm -rf $PAYARA/glassfish/domains/domain1/lib/databases

3. Upgrade Payara to v6.2024.6

With this version of Dataverse, we encourage you to upgrade to version 6.2024.6. This will address security issues accumulated since the release of 6.2023.8.

Note that if you are using GDCC containers, this upgrade is included when pulling new release images. No manual intervention is necessary.

The steps below are a simple matter of reusing your existing domain directory with the new distribution. But we recommend that you review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara Release Notes. We also recommend you ensure you followed all update instructions from the past releases regarding Payara. (The latest Payara update was for v6.0.)

Move the current Payara directory out of the way:

shell mv $PAYARA $PAYARA.2023.8

Download the new Payara version 6.2024.6 (from https://www.payara.fish/downloads/payara-platform-community-edition/), and unzip it in its place:

shell wget https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2024.6/payara-6.2024.6.zip

shell cd /usr/local unzip payara-6.2024.6.zip

Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1:

shell mv payara6/glassfish/domains/domain1 payara6/glassfish/domains/domain1_DIST mv payara6.2023.8/glassfish/domains/domain1 payara6/glassfish/domains/

Make sure that you have the following --add-opens options in your payara6/glassfish/domains/domain1/config/domain.xml. If not present, add them:

<jvm-options>--add-opens=java.management/javax.management=ALL-UNNAMED</jvm-options> <jvm-options>--add-opens=java.management/javax.management.openmbean=ALL-UNNAMED</jvm-options> <jvm-options>[17|]--add-opens=java.base/java.io=ALL-UNNAMED</jvm-options> <jvm-options>[21|]--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED</jvm-options>

(Note that you likely already have the java.base/java.io option there, but without the [17|] prefix. Make sure to replace it with the version above)

Start Payara:

shell sudo service payara start

4. Deploy this version.

shell $PAYARA/bin/asadmin deploy dataverse-6.3.war

5. For installations with internationalization:

Please remember to update translations via Dataverse language packs.

6. Restart Payara

shell service payara stop service payara start

7. Update the following metadata blocks to reflect the incremental improvements made to the handling of core metadata fields:

```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/scripts/api/data/metadatablocks/citation.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv

wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/scripts/api/data/metadatablocks/biomedical.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file biomedical.tsv

``` 7a. If you are using the optional computational workflow metadata block, update it:

```shell

wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/scripts/api/data/metadatablocks/computational_workflow.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file computational_workflow.tsv

```

8. Upgrade Solr

Solr 9.4.1 is now the version recommended in our Installation Guide and used with automated testing. There is a known security issue in the previously recommended version 9.3.0: https://nvd.nist.gov/vuln/detail/CVE-2023-36478. While the risk of an exploit should not be significant unless the Solr instance is accessible from outside networks (which we have always recommended against), we recommend to upgrade.

Install Solr 9.4.1 following the instructions from the Installation Guide.

The instructions in the guide suggest to use the config files from the installer zip bundle. Upgrading an existing instance, it may be easier to download them from the source tree:

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/conf/solr/solrconfig.xml wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/conf/solr/schema.xml cp solrconfig.xml schema.xml /usr/local/solr/solr-9.4.1/server/solr/collection1/conf

8a. For installations with custom or experimental metadata blocks:

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide).
Run the update-fields.sh script that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your Solr installation):

shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/conf/solr/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.4.1/server/solr/collection1/conf/schema.xml

Start Solr instance (usually service solr start depending on Solr/OS).

9. Enable the Metadata Source facet for harvested content (Optional):

If you choose to enable this new feature, set the optional feature flag (jvm option) dataverse.feature.index-harvested-metadata-source=true before reindexing.

10. Reindex Solr, if you upgraded Solr (recommended), or chose to enable any options that require a reindex:

shell curl http://localhost:8080/api/admin/index

Note: if you choose to perform a migration of your keywordValue metadata fields (section below), that will require a reindex as well, so do that first.

Notes for Dataverse Installation Administrators

Data migration to the new `keywordTermURI` field

You can migrate your keywordValue data containing URIs to the new keywordTermURI field. In case of data migration, view the affected data with the following database query:

sql SELECT value FROM datasetfieldvalue dfv INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id WHERE df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordValue') AND value ILIKE 'http%';

If you wish to migrate your data, a database update is then necessary:

sql UPDATE datasetfield df SET datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordTermURI') FROM datasetfieldvalue dfv WHERE dfv.datasetfield_id = df.id AND df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordValue') AND dfv.value ILIKE 'http%';

A reindex in place will be required. ReExportAll will need to be run to update the metadata exports of the dataset. Follow the directions in the Admin Guide.

↑ Table of Contents

- Java
Published by pdurbin almost 2 years ago

dataverse - v6.2

Dataverse 6.2

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.2 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Please note that this note is mandatory, but that you can still type a creative and meaningful comment such as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation.

For more information, see #10137.

Support for Using Multiple PID Providers

This release adds support for using multiple PID (DOI, Handle, PermaLink) providers, multiple PID provider accounts (managing a given protocol, authority, separator, shoulder combination), assigning PID provider accounts to specific collections, and supporting transferred PIDs (where a PID is managed by an account when its authority, separator, and/or shoulder don't match the combination where the account can mint new PIDs). It also adds the ability for additional provider services beyond the existing DataCite, EZId, Handle, and PermaLink providers to be dynamically added as separate jar files.

These changes require per-provider settings rather than the global PID settings previously supported. While backward compatibility for installations using a single PID Provider account is provided, updating to use the new microprofile settings is highly recommended and will be required in a future version.

For more information check the PID settings on this link.

New microprofile settings

Rate Limiting

The option to rate limit has been added to prevent users from over taxing the system either deliberately or by runaway automated processes. Rate limiting can be configured on a tier level with tier 0 being reserved for guest users and tiers 1-any for authenticated users. Superuser accounts are exempt from rate limiting.

Rate limits can be imposed on command APIs by configuring the tier, the command, and the hourly limit in the database. Two database settings configure the rate limiting :RateLimitingDefaultCapacityTiers and RateLimitingCapacityByTierAndAction, If either of these settings exist in the database rate limiting will be enabled and If neither setting exists rate limiting is disabled.

For more details check the detailed guide on this link.

Simplified SMTP Configuration

With this release, we deprecate the usage of asadmin create-javamail-resource to configure Dataverse to send mail using your SMTP server and provide a simplified, standard alternative using JVM options or MicroProfile Config.

At this point, no action is required if you want to keep your current configuration. Warnings will show in your server logs to inform and remind you about the deprecation. A future major release of Dataverse may remove this way of configuration.

Please do take the opportunity to update your SMTP configuration. Details can be found in section of the Installation Guide starting with the SMTP/Email Configuration section of the Installation Guide.

Once reconfiguration is complete, you should remove legacy, unused config. First, run asadmin delete-javamail-resource mail/notifyMailSession as described in the 6.2 guides. Then run curl -X DELETE http://localhost:8080/api/admin/settings/:SystemEmail as this database setting has been replace with dataverse.mail.system-email as described below.

Please note: as there have been problems with email delivered to SPAM folders when the "From" within mail envelope and the mail session configuration didn't match (#4210), as of this version the sole source for the "From" address is the setting dataverse.mail.system-email once you migrate to the new way of configuration.

New SMTP settings:

Binder Redirect

If your installation is configured to use Binder, you should remove the old "girder_ythub" tool and replace it with the tool described at https://github.com/IQSS/dataverse-binder-redirect

For more information, see #10360.

Optional Croissant 🥐 Exporter Support

When a Dataverse installation is configured to use a metadata exporter for the Croissant format, the content of the JSON-LD in the <head> of dataset landing pages will be replaced with that format. However, both JSON-LD and Croissant will still be available for download from the dataset page and API.

For more information, see #10382.

Harvesting Handle Missing Controlled Values

Allows datasets to be harvested with Controlled Vocabulary Values that existed in the originating Dataverse installation but are not in the harvesting Dataverse installation. For more information, view the changes to the endpoint here.

Add .QPJ and .QMD Extensions to Shapefile Handling

Support for .qpj and .qmd files in shapefile uploads has been introduced, ensuring that these files are properly recognized and handled as part of geospatial datasets in Dataverse.

For more information, see #10305.

Ingested Tabular Data Files Can Be Stored Without the Variable Name Header

Tabular Data Ingest can now save the generated archival files with the list of variable names added as the first tab-delimited line.

Access API will be able to take advantage of Direct Download for .tab files saved with these headers on S3 - since they no longer have to be generated and added to the streamed content on the fly.

This behavior is controlled by the new setting :StoreIngestedTabularFilesWithVarHeaders. It is false by default, preserving the legacy behavior. When enabled, Dataverse will be able to handle both the newly ingested files, and any already-existing legacy files stored without these headers transparently to the user. E.g. the access API will continue delivering tab-delimited files with this header line, whether it needs to add it dynamically for the legacy files, or reading complete files directly from storage for the ones stored with it.

We are planning to add an API for converting existing legacy tabular files in a future release.

For more information, see #10282.

Uningest/Reingest Options Available in the File Page Edit Menu

New Uningest/Reingest options are available in the File Page Edit menu. Ingest errors can be cleared by users who can published the associated dataset and by superusers, allowing for a successful ingest to be undone or retried (e.g. after a Dataverse version update or if ingest size limits are changed).

The /api/files//uningest api also now allows users who can publish the dataset to undo an ingest failure.

For more information, see #10319.

Sphinx Guides Now Support Markdown Format and Tabs

Our guides now support the Markdown format with the extension .md. Additionally, an option to create tabs in the guides using Sphinx Tabs has been added. (You can see the tabs in action in the "dev usage" page of the Container Guide.) To continue building the guides, you will need to install this new dependency by re-running:
pip install -r requirements.txt

For more information, see #10111.

Number of Concurrent Indexing Operations Now Configurable

A new MicroProfile setting called dataverse.solr.concurrency.max-async-indexes has been added that controls the maximum number of simultaneously running asynchronous dataset index operations (defaults to 4).

For more information, see #10388.

⬆️

🪲 Bug fixes

Publication Status Facet Restored

In version 6.1, the publication status facet location was unintentionally moved to the bottom. In this version, we have restored the original order.

Assign a Role With Higher Permissions Than Its Own Role Has Been Fixed

The permissions required to assign a role have been fixed. It is no longer possible to assign a role that includes permissions that the assigning user doesn't have.

Geospatial Metadata Block Fields for North and South Renamed

The Geospatial metadata block fields for north and south were labeled incorrectly as longitudes, as reported in #5645. After updating to this version of Dataverse, users will need to update any API client code used "northLongitude" and "southLongitude" to "northLatitude" and "southLatitude", respectively, as mentioned on the mailing list. Also, we have updated the tooltips in the Geospatial metadata block, where the use of commas instead of dots in coordinate values was incorrectly suggested.

OAI-PMH Error Handling Has Been Improved

OAI-PMH error handling has been improved to display a machine-readable error in XML rather than a 500 error with no further information.

/oai?foo=bar will show "No argument 'verb' found"
/oai?verb=foo&verb=bar will show "Verb must be singular, given: '[foo, bar]'"

Granting File Access Without Access Request

A bug introduced with the guestbook-at-request, requests are not deleted when granted, they are now given the state granted.

Harvesting redirects fixed

Redirects from search cards back to the original source for datasets harvested from "Generic OAI Archives", i.e. non-Dataverse OAI servers, have been fixed.

⬆️

💾 Persistence

Missing Database Constraints

This release adds two missing database constraints that will assure that the externalvocabularyvalue table only has one entry for each uri and that the oaiset table only has one set for each spec. (In the very unlikely case that your existing database has duplicate entries now, install would fail. This can be checked by running the following commands:

SELECT uri, count(*) FROM externalvocabularyvalue group by uri; And: SELECT spec, count(*) FROM oaiset group by spec; Then removing any duplicate rows (where count>1).

Universe Field in Variablemetadata Table Changed

Universe field in variablemetadata table was changed from varchar(255) to text. The change was made to support longer strings in "universe" metadata field, similar to the rest of text fields in variablemetadata table.

PostgreSQL Versions

This release adds install script support for the new permissions model in PostgreSQL versions 15+, and bumps Flyway to support PostgreSQL 16.

PostgreSQL 13 remains the version used with automated testing.

⬆️

🌐 API

Listing Collection/Dataverse API

Listing collection/dataverse role assignments via API still requires ManageDataversePermissions, but listing dataset role assignments via API now requires only ManageDatasetPermissions.

New API Endpoint for Clearing an Individual Dataset From Solr

A new Index API endpoint has been added allowing an admin to clear an individual dataset from Solr.

For more information visit the documentation on this link

New Accounts Metrics API

Users can retrieve new types of metrics related to user accounts. The new capabilities are described in the guides.

New canDownloadAtLeastOneFile Endpoint

The /api/datasets/{id}/versions/{versionId}/canDownloadAtLeastOneFile endpoint has been created.

This API endpoint indicates if the calling user can download at least one file from a dataset version. Note that Shibboleth group permissions are not considered.

Harvesting Client Endpoint Extended

The API endpoint api/harvest/clients/{harvestingClientNickname} has been extended to include the following fields:

allowHarvestingMissingCVV: enable/disable allowing datasets to be harvested with controlled vocabulary values that exist in the originating Dataverse server but are not present in the harvesting Dataverse server. The default is false.

Note: This setting is only available to the API and not currently accessible/settable via the UI.

Version Files Endpoint Extended

The response for getVersionFiles /api/datasets/{id}/versions/{versionId}/files endpoint has been modified to include a total count of records available totalCount:x. This will aid in pagination by allowing the caller to know how many pages can be iterated through. The existing API (getVersionFileCounts) to return the count will still be available.

Metadata Blocks Endpoint Extended

The API endpoint /api/metadatablocks/{block_id} has been extended to include the following fields:

isRequired: Whether or not this field is required
displayOrder: The display order of the field in create/edit forms
typeClass: The type class of this field ("controlledVocabulary", "compound", or "primitive")

Get File Citation as JSON

It is now possible to retrieve via API the file citation as it appears on the file landing page. It is formatted in HTML and encoded in JSON.

This API is not for downloading various citation formats such as EndNote XML, RIS, or BibTeX.

For more information check the documentation on this link

Files Endpoint Extended

The API endpoint api/files/{id} has been extended to support the following optional query parameters:

includeDeaccessioned: Indicates whether or not to consider deaccessioned dataset versions in the latest file search. (Default: false).
returnDatasetVersion: Indicates whether or not to include the dataset version of the file in the response. (Default: false).

A new endpoint api/files/{id}/versions/{datasetVersionId} has been created. This endpoint returns the file metadata present in the requested dataset version. To specify the dataset version, you can use :latest-published, :latest, :draft or 1.0 or any other available version identifier.

The endpoint supports the includeDeaccessioned and returnDatasetVersion optional query parameters, as does the api/files/{id} endpoint.

api/files/{id}/draft endpoint is no longer available in favor of the new endpoint api/files/{id}/versions/{datasetVersionId}, which can use the version identifier :draft (api/files/{id}/versions/:draft) to obtain the same result.

Datasets, Dataverse Collections, and Datafiles Endpoints Extended

The API endpoints for getting datasets, Dataverse collections, and datafiles have been extended to support the following optional 'returnOwners' query parameter.

Including the parameter and setting it to true will add a hierarchy showing which dataset and dataverse collection(s) the object is part of to the json object returned.

For more information visit the full native API guide on this link

Endpoint Fixed: Datasets Metadata

The API endpoint api/datasets/{id}/metadata has been changed to default to the latest version of the dataset to which the user has access.

Experimental Make Data Count processingState API

An experimental Make Data Count processingState API has been added. For now it has been documented in the (developer guide)[https://guides.dataverse.org/en/6.2/developers/make-data-count.html#processing-archived-logs].

⬆️

⚠️ Backward Incompatibilities

To view a list of changes that can be impactful to your implementation please visit our detailed list of changes to the API.

⬆️

📖 Guides

Container Guide, Documentation for Faster Redeploy

In the Container Guide, documentation for developers on how to quickly redeploy code has been added for Netbeans and improved for IntelliJ.

Also in the context of containers, a new option to skip deployment has been added and the war file is now consistently named "dataverse.war" rather than having a version in the filename, such as "dataverse-6.1.war". This predictability makes tooling easier.

Evaluation Version Tutorial on the Containers Guide

The Container Guide now containers a tutorial for running Dataverse in containers for demo or evaluation purposes: https://guides.dataverse.org/en/6.2/container/running/demo.html

New QA Guide

A new QA Guide is intended mostly for the core development team but may be of interest to contributors on: https://guides.dataverse.org/en/6.2/develop/qa

⬆️

⚙️ New Settings

MicroProfile Settings

The * indicates a provider id indicating which provider the setting is for

dataverse.pid.providers
dataverse.pid.default-provider
dataverse.pid.*.type
dataverse.pid.*.label
dataverse.pid.*.authority
dataverse.pid.*.shoulder
dataverse.pid.*.identifier-generation-style
dataverse.pid.*.datafile-pid-format
dataverse.pid.*.managed-list
dataverse.pid.*.excluded-list
dataverse.pid.*.datacite.mds-api-url
dataverse.pid.*.datacite.rest-api-url
dataverse.pid.*.datacite.username
dataverse.pid.*.datacite.password
dataverse.pid.*.ezid.api-url
dataverse.pid.*.ezid.username
dataverse.pid.*.ezid.password
dataverse.pid.*.permalink.base-url
dataverse.pid.*.permalink.separator
dataverse.pid.*.handlenet.index
dataverse.pid.*.handlenet.independent-service
dataverse.pid.*.handlenet.auth-handle
dataverse.pid.*.handlenet.key.path
dataverse.pid.*.handlenet.key.passphrase
dataverse.spi.pidproviders.directory
dataverse.solr.concurrency.max-async-indexes

SMTP Settings:

dataverse.mail.system-email
dataverse.mail.mta.host
dataverse.mail.mta.port
dataverse.mail.mta.ssl.enable
dataverse.mail.mta.auth
dataverse.mail.mta.user
dataverse.mail.mta.password
dataverse.mail.mta.allow-utf8-addresses
Plus many more for advanced usage and special provider requirements. See configuration guide for a full list.

Database Settings:

:RateLimitingDefaultCapacityTiers
:RateLimitingCapacityByTierAndAction
:StoreIngestedTabularFilesWithVarHeaders

📋 Complete List of Changes

For the complete list of code changes in this release, see the 6.2 Milestone in GitHub.

⬆️

🛟 Getting Help

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

⬆️

💻 Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.1.

0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

export PAYARA=/usr/local/payara6

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. Usually, when a Solr schema update is released, we recommend deploying the new version of Dataverse, then updating the schema.xml on the solr side. With 6.2, we recommend to install the base schema first. Without it Dataverse 6.2 is not going to be able to show any results after the initial deployment. If your instance is using any custom metadata blocks, you will need to further modify the schema, see the last step of this instruction (step 8).

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)
Replace schema.xml
- wget https://raw.githubusercontent.com/IQSS/dataverse/v6.2/conf/solr/9.3.0/schema.xml
- cp schema.xml /usr/local/solr/solr-9.3.0/server/solr/collection1/conf
Start Solr instance (usually service solr start, depending on Solr/OS)

2. Undeploy the previous version.

$PAYARA/bin/asadmin undeploy dataverse-6.1

3. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

4. Start Payara

service payara start

5. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-6.2.war

The deployment of the war file may take some time on a large production database due to the database migration scripts that are part of the release.

6. Restart Payara

service payara stop
service payara start

7. Update the following Metadata Blocks to reflect the incremental improvements made to the handling of core metadata fields:

``` wget https://github.com/IQSS/dataverse/releases/download/v6.2/geospatial.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file geospatial.tsv

wget https://github.com/IQSS/dataverse/releases/download/v6.2/citation.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv

wget https://github.com/IQSS/dataverse/releases/download/v6.2/astrophysics.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file astrophysics.tsv

wget https://github.com/IQSS/dataverse/releases/download/v6.2/biomedical.tsv

curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file biomedical.tsv ```

8. For installations with custom or experimental metadata blocks:

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)
Run the update-fields.sh script that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your solr installation): wget https://raw.githubusercontent.com/IQSS/dataverse/v6.2/conf/solr/9.3.0/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.3.0/server/solr/collection1/conf/schema.xml
Restart Solr instance (usually service solr restart depending on solr/OS)

9. Reindex Solr:

For details, see https://guides.dataverse.org/en/6.2/admin/solr-search-index.html but here is the reindex command:

curl http://localhost:8080/api/admin/index ⬆️

- Java
Published by landreev about 2 years ago

dataverse - v6.1

Dataverse 6.1

Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.1 rather than the list of releases, which will cut them off.

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release highlights

Guestbook at request

Dataverse can now be configured (via the dataverse.files.guestbook-at-request option) to display any configured guestbook to users when they request restricted files (new functionality) or when they download files (previous behavior).

The global default defined by this setting can be overridden at the collection level on the collection page and at the individual dataset level by a superuser using the API. The default, showing guestbooks when files are downloaded, remains as it was in prior Dataverse versions.

For details, see dataverse.files.guestbook-at-request and PR #9599.

Collection-level storage quotas

This release adds support for defining storage size quotas for collections. Please see the API guide for details. This is an experimental feature that has not yet been used in production on any real life Dataverse instance, but we are planning to try it out at Harvard/IQSS.

Please note that this release includes a database update (via a Flyway script) that will calculate the storage sizes of all the existing datasets and collections on the first deployment. On a large production database with tens of thousands of datasets this may add a couple of extra minutes to the first, initial deployment of Dataverse 6.1.

For details, see Storage Quotas for Collections in the Admin Guide.

Globus support (experimental), continued

Globus support in Dataverse has been expanded to include support for using file-based Globus endpoints, including the case where files are stored on tape and are not immediately accessible and for the case of referencing files stored on remote Globus endpoints. Support for using the Globus S3 Connector with an S3 store has been retained but requires changes to the Dataverse configuration. Please note:

Globus functionality remains experimental/advanced in that it requires significant setup, differs in multiple ways from other file storage mechanisms, and may continue to evolve with the potential for backward incompatibilities.
The functionality is configured per store and replaces the previous single-S3-Connector-per-Dataverse-instance model.
Adding files to a dataset, and accessing files is supported via the Dataverse user interface through a separate dataverse-globus app.
The functionality is also accessible via APIs (combining calls to the Dataverse and Globus APIs)

Backward incompatibilities: - The configuration for use of a Globus S3 Connector has changed and is aligned with the standard store configuration mechanism - The new functionality is incompatible with older versions of the globus-dataverse app and the Globus-related functionality in the UI will only function correctly if a Dataverse 6.1 compatible version of the dataverse-globus app is configured.

New JVM options: - A new "globus" store type and associated store-related options have been added. These are described in the File Storage section of the Installation Guide. - dataverse.files.globus-cache-maxage - specifies the number of minutes Dataverse will wait between an initial request for a file transfer occurs and when that transfer must begin.

Obsolete Settings: the :GlobusBasicToken, :GlobusEndpoint, and :GlobusStores settings are no longer used

Further details can be found in the Big Data Support section of the Developer Guide.

Alternative Title now allows multiple values

Alternative Title now allows multiples. Note that JSON used to create a dataset with an Alternate Title must be changed. See "Backward incompatibilities" below and PR #9440 for details.

External tools: configure tools now available at the dataset level

Read/write "configure" tools (a type of external tool) are now available at the dataset level. They appear under the "Edit Dataset" menu. See External Tools in the Admin Guide and PR #9925.

S3 out-of-band upload

In some situations, direct upload might not work from the UI, e.g., when s3 storage is not accessible from the internet. This pull request adds an option to allow direct uploads via API only. This way, a third party application can use direct upload from within the internal network, while there is no direct download available to the users via UI. By default, Dataverse supports uploading files via the add a file to a dataset API. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server). With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the Adding the Uploaded file to the Dataset API call (described in the Direct DataFile Upload/Replace API page) used to add metadata and inform Dataverse that a new file has been added to the relevant store.

JSON Schema for datasets

Functionality has been added to help validate dataset JSON prior to dataset creation. There are two new API endpoints in this release. The first takes in a collection alias and returns a custom dataset schema based on the required fields of the collection. The second takes in a collection alias and a dataset JSON file and does an automated validation of the JSON file against the custom schema for the collection. In this release functionality is limited to JSON format validation and validating required elements. Future releases will address field types, controlled vocabulary, etc. See Retrieve a Dataset JSON Schema for a Collection in the API Guide and PR #10109.

OpenID Connect (OIDC) improvements

Using MicroProfile Config for provisioning

With this release it is possible to provision a single OIDC-based authentication provider by using MicroProfile Config instead of or in addition to the classic Admin API provisioning.

If you are using an external OIDC provider component as an identity management system and/or broker to other authentication providers such as Google, eduGain SAML and so on, this might make your life easier during instance setups and reconfiguration. You no longer need to generate the necessary JSON file.

Adding PKCE Support

Some OIDC providers require using PKCE as additional security layer. As of this version, you can enable support for this on any OIDC provider you configure. (Note that OAuth2 providers have not been upgraded.)

For both features, see the OIDC section of the Installation Guide and PR #9273.

Solr improvements

As of this release, application-side support has been added for the "circuit breaker" mechanism in Solr that makes it drop requests more gracefully when the search engine is experiencing load issues.

Please see the Installing Solr section of the Installation Guide.

New release of Dataverse Previewers (including a Markdown previewer)

Version 1.4 of the standard Dataverse Previewers from https://github/com/gdcc/dataverse-previewers is available. The new version supports the use of signedUrls rather than API keys when previewing restricted files (including files in draft dataset versions). Upgrading is highly recommended. Please note:

SignedUrls can now be used with PrivateUrl access tokens, which allows PrivateUrl users to view previewers that are configured to use SignedUrls. See #10093.
Launching a dataset-level configuration tool will automatically generate an API token when needed. This is consistent with how other types of tools work. See #10045.
There is now a Markdown (.md) previewer.

New or improved APIs

The development of a new UI for Dataverse is driving the addition or improvement of many APIs.

New API endpoints

deaccessionDataset (/api/datasets/{id}/versions/{versionId}/deaccession): version deaccessioning through API (Given a dataset and a version).
/api/files/{id}/downloadCount
/api/files/{id}/dataTables
/api/files/{id}/metadata/tabularTags New endpoint to set tabular file tags.
canManageFilePermissions (/access/datafile/{id}/userPermissions) Added for getting user permissions on a file.
getVersionFileCounts (/api/datasets/{id}/versions/{versionId}/files/counts): Given a dataset and its version, retrieves file counts based on different criteria (Total count, per content type, per access status and per category name).
setFileCategories (/api/files/{id}/metadata/categories): Updates the categories (by name) for an existing file. If the specified categories do not exist, they will be created.
userFileAccessRequested (/api/access/datafile/{id}/userFileAccessRequested): Returns true or false depending on whether or not the calling user has requested access to a particular file.
hasBeenDeleted (/api/files/{id}/hasBeenDeleted): Know if a particular file that existed in a previous version of the dataset no longer exists in the latest version.
getZipDownloadLimit (/api/info/zipDownloadLimit): Get the configured zip file download limit. The response contains the long value of the limit in bytes.
getMaxEmbargoDurationInMonths (/api/info/settings/:MaxEmbargoDurationInMonths): Get the maximum embargo duration in months, if available, configured through the database setting :MaxEmbargoDurationInMonths.
getDatasetJsonSchema (/api/dataverses/{id}/datasetSchema): Get a dataset schema with the fields required by a given dataverse collection.
validateDatasetJsonSchema (/api/dataverses/{id}/validateDatasetJson): Validate that a dataset JSON file is in proper format and contains the required elements and fields for a given dataverse collection.
downloadTmpFile (/api/admin/downloadTmpFile): For testing purposes, allows files to be downloaded from /tmp.

Pagination of files in dataset versions

optional pagination has been added to /api/datasets/{id}/versions that may be useful in datasets with a large number of versions
a new flag includeFiles is added to both /api/datasets/{id}/versions and /api/datasets/{id}/versions/{vid} (true by default), providing an option to drop the file information from the output
when files are requested to be included, some database lookup optimizations have been added to improve the performance on datasets with large numbers of files.

This is reflected in the Dataset Versions API section of the Guide.

DataFile API payload has been extended to include the following fields

tabularData: Boolean field to know if the DataFile is of tabular type
fileAccessRequest: Boolean field to know if the file access requests are enabled on the Dataset (DataFile owner)
friendlyType: String

The getVersionFiles endpoint (/api/datasets/{id}/versions/{versionId}/files) has been extended to support pagination, ordering, and optional filtering

Access status: through the accessStatus query parameter, which supports the following values:
- Public
- Restricted
- EmbargoedThenRestricted
- EmbargoedThenPublic
Category name: through the categoryName query parameter. To return files to which the particular category has been added.
Content type: through the contentType query parameter. To return files matching the requested content type. For example: "image/png".

Additional improvements to existing API endpoints

getVersionFiles (/api/datasets/{id}/versions/{versionId}/files): Extended to support optional filtering by search text through the searchText query parameter. The search will be applied to the labels and descriptions of the dataset files. Added tabularTagName to return files to which the particular tabular tag has been added. Added optional boolean query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain files.
getVersionFileCounts (/api/datasets/{id}/versions/{versionId}/files/counts): Added optional boolean query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain file counts. Added support for filtering by optional criteria query parameter:
- contentType
- accessStatus
- categoryName
- tabularTagName
- searchText
getDownloadSize ("api/datasets/{identifier}/versions/{versionId}/downloadsize"): Added optional boolean query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain files. Added a new optional query parameter "mode" This parameter applies a filter criteria to the operation and supports the following values:
- All (Default): Includes both archival and original sizes for tabular files
- Archival: Includes only the archival size for tabular files
- Original: Includes only the original size for tabular files.
/api/datasets/{id}/versions/{versionId} New query parameter includeDeaccessioned added to consider deaccessioned versions when searching for versions.
/api/datasets/{id}/userPermissions Get user permissions on a dataset, in particular, the user permissions that this API call checks, returned as booleans, are the following:
- Can view the unpublished dataset
- Can edit the dataset
- Can publish the dataset
- Can manage the dataset permissions
- Can delete the dataset draft
getDatasetVersionCitation (/api/datasets/{id}/versions/{versionId}/citation) endpoint now accepts a new boolean optional query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain the citation.

Improvements for developers

Developers can enjoy a dramatically faster feedback loop when iterating on code if they are using Netbeans or IntelliJ IDEA Ultimate (with the Payara Platform Tools plugin). For details, see https://guides.dataverse.org/en/6.1/container/dev-usage.html#intellij-idea-ultimate-and-payara-platform-tools and the thread on the mailing list.
Developers can now test S3 locally by using the Dockerized development environment, which now includes both LocalStack and MinIO. API (end to end) tests are in S3AccessIT.
In addition, a new integration test class (not an API test, the new Testcontainers-based test launched with mvn verify) has been added at S3AccessIOLocalstackIT. It uses Testcontainers to spin up Localstack for S3 testing and does not require Dataverse to be running.
With this release, we add a new type of testing to Dataverse: integration tests which are not end-to-end tests (like our API tests). Starting with OIDC authentication support, we test regularly on CI for working condition of both OIDC login options in UI and API.
The testing and development Keycloak realm has been updated with more users and compatibility with Keycloak 21.
The support for setting JVM options during testing has been improved for developers. You now may add the @JvmSetting annotation to classes (also inner classes) and reference factory methods for values. This improvement is also paving the way to enable manipulating JVM options during end-to-end tests on remote ends.
As part of these testing improvements, the code coverage report file for unit tests has moved from target/jacoco.exec to target/coverage-reports/jacoco-unit.exec.

Major use cases and infrastructure enhancements

Changes and fixes in this release not already mentioned above include:

Validation has been added for the Geographic Bounding Box values in the Geospatial metadata block. This will prevent improperly defined bounding boxes from being created via the edit page or metadata imports. This also fixes the issue where existing datasets with invalid geoboxes were quietly failing to get reindexed. See PR #10142.
Dataverse's OAIORE Metadata Export format and archival BagIT exports (which include the OAI-ORE metadata export file) have been updated to include information about the dataset version state, e.g. RELEASED or DEACCESSIONED and to indicate which version of Dataverse was used to create the archival Bag. As part of the latter, the current OAIORE Metadata format has been given a 1.0.0 version designation and it is expected that any future changes to the OAIORE export format will result in a version change and that tools such as DVUploader that can recreate datasets from archival Bags will start indicating which version(s) of the OAIORE format they can read. Dataverse installations that have been using archival Bags may wish to update any existing archival Bags they have, e.g. by deleting existing Bags and using the Dataverse archival Bag export API to generate updated versions.
For BagIT export, it is now possible to configure the following information in bag-info.txt. (Previously, customization was possible by editing Bundle.properties but this is no longer supported.) For details, see https://guides.dataverse.org/en/6.1/installation/config.html#bag-info-txt
- Source-Organization from dataverse.bagit.sourceorg.name.
- Organization-Address from dataverse.bagit.sourceorg.address.
- Organization-Email from dataverse.bagit.sourceorg.address.
This release fixes several issues (#9952, #9953, #9957) where the Signposting output did not match the Signposting specification. These changes introduce backward-incompatibility, but since Signposting support was added recently (in Dataverse 5.14 in PR #8981), we feel it's best to do this clean up and not support the old implementation that was not fully compliant with the spec.
- To fix #9952, we surround the license info with < and >.
- To fix #9953, we no longer wrap the response in a {"status":"OK","data":{ JSON object. This has also been noted in the guides at https://dataverse-guide--9955.org.readthedocs.build/en/9955/api/native-api.html#retrieve-signposting-information
- To fix #9957, we corrected the mime/content type, changing it from json+ld to ld+json. For backward compatibility, we are still supporting the old one, for now.
It's now possible to configure the docroot, which holds collection logos and more. See dataverse.files.docroot in the Installation Guide and PR #9819.
We have started maintaining an API changelog of breaking changes: https://guides.dataverse.org/en/6.1/api/changelog.html See also #10060.

New configuration options

dataverse.auth.oidc.auth-server-url
dataverse.auth.oidc.client-id
dataverse.auth.oidc.client-secret
dataverse.auth.oidc.enabled
dataverse.auth.oidc.pkce.enabled
dataverse.auth.oidc.pkce.max-cache-age
dataverse.auth.oidc.pkce.max-cache-size
dataverse.auth.oidc.pkce.method
dataverse.auth.oidc.subtitle
dataverse.auth.oidc.title
dataverse.bagit.sourceorg.address
dataverse.bagit.sourceorg.address
dataverse.bagit.sourceorg.name
dataverse.files.docroot
dataverse.files.globus-cache-maxage
dataverse.files.guestbook-at-request
dataverse.files.{driverId}.upload-out-of-band

Backward incompatibilities

Since Alternative Title is now repeatable, the JSON you send to create or edit a dataset must be an array rather than a simple string. For example, instead of "value": "Alternative Title", you must send "value": ["Alternative Title1", "Alternative Title2"]
Several issues (#9952, #9953, #9957) where the Signposting output did not match the Signposting specification introduce backward-incompatibility. See above for details.
For BagIT export, if you were configuring values in bag-info.txt using Bundle.properties, you must switch to the new dataverse.bagit JVM options mentioned above. For details, see https://guides.dataverse.org/en/6.1/installation/config.html#bag-info-txt
See "Globus support" above for backward incompatibilies specific to Globus.

Complete list of changes

For the complete list of code changes in this release, see the 6.1 Milestone in GitHub.

Getting help

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!

Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!

You are also very welcome to join the Global Dataverse Community Consortium (GDCC).

Upgrade instructions

Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.0.

0. These instructions assume that you are upgrading from 6.0. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to 5.14.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.

export PAYARA=/usr/local/payara6

(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin undeploy dataverse-6.0

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-6.1.war

As noted above, deployment of the war file might take several minutes due a database migration script required for the new storage quotas feature.

5. Restart Payara

service payara stop
service payara start

6. Update Geospatial Metadata Block (to improve validation of bounding box values)

wget https://github.com/IQSS/dataverse/releases/download/v6.1/geospatial.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file geospatial.tsv

6a. Update Citation Metadata Block (to make Alternative Title repeatable)

wget https://github.com/IQSS/dataverse/releases/download/v6.1/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv

7. Upate Solr schema.xml to allow multiple Alternative Titles to be used. See specific instructions below for those installations without custom metadata blocks (7a) and those with custom metadata blocks (7b).

7a. For installations without custom or experimental metadata blocks:

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)
Replace schema.xml
- wget https://github.com/IQSS/dataverse/releases/download/v6.1/schema.xml
- cp schema.xml /usr/local/solr/solr-9.3.0/server/solr/collection1/conf
Start Solr instance (usually service solr start, depending on Solr/OS)

7b. For installations with custom or experimental metadata blocks:

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)
There are 2 ways to regenerate the schema: Either by collecting the output of the Dataverse schema API and feeding it to the update-fields.sh script that we supply, as in the example below (modify the command lines as needed): wget https://raw.githubusercontent.com/IQSS/dataverse/master/conf/solr/9.3.0/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.3.0/server/solr/collection1/conf/schema.xml OR, alternatively, you can edit the following line in your schema.xml by hand as follows (to indicate that alternative title is now multiValued="true"): <field name="alternativeTitle" type="text_en" multiValued="true" stored="true" indexed="true"/>
Restart Solr instance (usually service solr restart depending on solr/OS)

8. Run ReExportAll to update dataset metadata exports. Follow the directions in the Admin Guide.

- Java
Published by landreev over 2 years ago

dataverse - v6.0

Dataverse 6.0

This is a platform upgrade release. Payara, Solr, and Java have been upgraded. No features have been added to the Dataverse software itself. Only a handful of bugs were fixed.

Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!

Release Highlights (Major Upgrades, Breaking Changes)

This release contains major upgrades to core components. Detailed upgrade instructions can be found below.

Runtime

The required Java version has been increased from version 11 to 17.
- See PR #9764 for details.
Payara application server has been upgraded to version 6.2023.8.
- This is a required update.
- Please note that Payara Community 5 has reached end of life
- See PR #9685 and PR #9795 for details.
Solr has been upgraded to version 9.3.0.
- See PR #9787 for details.
PostgreSQL 13 remains the tested and supported version.
- See the PostgreSQL section of the Installation Guide for details.

Development

Removal of Vagrant and Docker All In One (docker-aio), deprecated in Dataverse v5.14. See PR #9838 and PR #9685 for details.
All tests have been migrated to use JUnit 5 exclusively from now on. See PR #9796 for details.

Installation

If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!

Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!

You are also very welcome to join the Global Dataverse Community Consortium (GDCC).

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.

These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 5.14.

Upgrade from Java 11 to Java 17

Java 17 is now required for Dataverse. Solr can run under Java 11 or Java 17 but the latter is recommended. In preparation for the Java upgrade, stop both Dataverse/Payara and Solr.

Undeploy Dataverse, if deployed, using the unprivileged service account.

sudo -u dataverse /usr/local/payara5/bin/asadmin list-applications

sudo -u dataverse /usr/local/payara5/bin/asadmin undeploy dataverse-5.14

Stop Payara 5.

sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain

Stop Solr 8.

sudo systemctl stop solr.service

Install Java 17.

Assuming you are using RHEL or a derivative such as Rocky Linux:

sudo yum install java-17-openjdk

Set Java 17 as the default.

Assuming you are using RHEL or a derivative such as Rocky Linux:

sudo alternatives --config java

Test that Java 17 is the default.

java -version

Upgrade from Payara 5 to Payara 6

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

Download Payara 6.2023.8.

curl -L -O https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2023.8/payara-6.2023.8.zip

Unzip it to /usr/local (or your preferred location).

sudo unzip payara-6.2023.8.zip -d /usr/local/

Change ownership of the unzipped Payara to your "service" user ("dataverse" by default).

sudo chown -R dataverse /usr/local/payara6

Undeploy Dataverse, if deployed, using the unprivileged service account.

sudo -u dataverse /usr/local/payara5/bin/asadmin list-applications

sudo -u dataverse /usr/local/payara5/bin/asadmin undeploy dataverse-5.14

Stop Payara 5, if running.

sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain

Copy Dataverse-related lines from Payara 5 to Payara 6 domain.xml.

sudo -u dataverse cp /usr/local/payara6/glassfish/domains/domain1/config/domain.xml /usr/local/payara6/glassfish/domains/domain1/config/domain.xml.orig

sudo egrep 'dataverse|doi' /usr/local/payara5/glassfish/domains/domain1/config/domain.xml > lines.txt

sudo vi /usr/local/payara6/glassfish/domains/domain1/config/domain.xml

If any JVM options reference the old payara5 path (/usr/local/payara5) be sure to change it to payara6.

The lines will appear in two sections, examples shown below (but your content will vary).

Section 1: system properties (under <server name="server" config-ref="server-config">)

<system-property name="dataverse.db.user" value="dvnuser"></system-property> <system-property name="dataverse.db.host" value="localhost"></system-property> <system-property name="dataverse.db.port" value="5432"></system-property> <system-property name="dataverse.db.name" value="dvndb"></system-property> <system-property name="dataverse.db.password" value="dvnsecret"></system-property>

Note: if you used the Dataverse installer, you won't have a dataverse.db.password property. See "Create password aliases" below.

Section 2: JVM options (under <java-config classpath-suffix="" debug-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=9009" system-classpath="">, the one under <config name="server-config">, not under <config name="default-config">)

<jvm-options>-Ddataverse.files.directory=/usr/local/dvn/data</jvm-options> <jvm-options>-Ddataverse.files.file.type=file</jvm-options> <jvm-options>-Ddataverse.files.file.label=file</jvm-options> <jvm-options>-Ddataverse.files.file.directory=/usr/local/dvn/data</jvm-options> <jvm-options>-Ddataverse.rserve.host=localhost</jvm-options> <jvm-options>-Ddataverse.rserve.port=6311</jvm-options> <jvm-options>-Ddataverse.rserve.user=rserve</jvm-options> <jvm-options>-Ddataverse.rserve.password=rserve</jvm-options> <jvm-options>-Ddataverse.auth.password-reset-timeout-in-minutes=60</jvm-options> <jvm-options>-Ddataverse.timerServer=true</jvm-options> <jvm-options>-Ddataverse.fqdn=dev1.dataverse.org</jvm-options> <jvm-options>-Ddataverse.siteUrl=https://dev1.dataverse.org</jvm-options> <jvm-options>-Ddataverse.files.storage-driver-id=file</jvm-options> <jvm-options>-Ddoi.username=testaccount</jvm-options> <jvm-options>-Ddoi.password=notmypassword</jvm-options> <jvm-options>-Ddoi.baseurlstring=https://mds.test.datacite.org/</jvm-options> <jvm-options>-Ddoi.dataciterestapiurlstring=https://api.test.datacite.org</jvm-options>

Check the Xmx setting in domain.xml.

Under /usr/local/payara6/glassfish/domains/domain1/config/domain.xml, check the Xmx setting under <config name="server-config">, where you put the JVM options, not the one under <config name="default-config">. Note that there are two such settings, and you want to adjust the one in the stanza with Dataverse options. This sets the JVM heap size; a good rule of thumb is half of your system's total RAM. You may specify the value in MB (8192m) or GB (8g).

Copy jhove.conf and jhoveConfig.xsd from Payara 5, edit and change payara5 to payara6.

sudo cp /usr/local/payara5/glassfish/domains/domain1/config/jhove* /usr/local/payara6/glassfish/domains/domain1/config/

sudo chown dataverse /usr/local/payara6/glassfish/domains/domain1/config/jhove*

sudo -u dataverse vi /usr/local/payara6/glassfish/domains/domain1/config/jhove.conf

Copy logos from Payara 5 to Payara 6.

These logos are for collections (dataverses).

sudo -u dataverse cp -r /usr/local/payara5/glassfish/domains/domain1/docroot/logos /usr/local/payara6/glassfish/domains/domain1/docroot

If you are using Make Data Count (MDC), edit :MDCLogPath.

Your :MDCLogPath database setting might be pointing to a Payara 5 directory such as /usr/local/payara5/glassfish/domains/domain1/logs. If so, edit this to be Payara 6. You'll probably want to copy your logs over as well.

Update systemd unit file (or other init system) from /usr/local/payara5 to /usr/local/payara6, if applicable.
Start Payara.

sudo -u dataverse /usr/local/payara6/bin/asadmin start-domain

Create a Java mail resource, replacing "localhost" for mailhost with your mail relay server, and replacing "localhost" for fromaddress with the FQDN of your Dataverse server.

sudo -u dataverse /usr/local/payara6/bin/asadmin create-javamail-resource --mailhost "localhost" --mailuser "dataversenotify" --fromaddress "do-not-reply@localhost" mail/notifyMailSession

Create password aliases for your database, rserve and datacite jvm-options, if you're using them.

echo "AS_ADMIN_ALIASPASSWORD=yourDBpassword" > /tmp/dataverse.db.password.txt

sudo -u dataverse /usr/local/payara6/bin/asadmin create-password-alias --passwordfile /tmp/dataverse.db.password.txt

When you are prompted "Enter the value for the aliasname operand", enter dataverse.db.password

You should see "Command create-password-alias executed successfully."

You'll want to perform similar commands for rserve_password_alias and doi_password_alias if you're using Rserve and/or DataCite.

Enable workaround for FISH-7722.

The following workaround is for https://github.com/payara/Payara/issues/6337

sudo -u dataverse /usr/local/payara6/bin/asadmin create-jvm-options --add-opens=java.base/java.io=ALL-UNNAMED

Create the network listener on port 8009.

sudo -u dataverse /usr/local/payara6/bin/asadmin create-network-listener --protocol http-listener-1 --listenerport 8009 --jkenabled true jk-connector

Deploy the Dataverse 6.0 war file.

sudo -u dataverse /usr/local/payara6/bin/asadmin deploy /path/to/dataverse-6.0.war

Check that you get a version number from Dataverse.

This is just a sanity check that Dataverse has been deployed properly.

curl http://localhost:8080/api/info/version

Perform one final Payara restart to ensure that timers are initialized properly.

sudo -u dataverse /usr/local/payara6/bin/asadmin stop-domain

sudo -u dataverse /usr/local/payara6/bin/asadmin start-domain

Upgrade from Solr 8 to 9

Solr has been upgraded to Solr 9. You must install Solr fresh and reindex. You cannot use your old schema.xml because the format has changed.

The instructions below are copied from https://guides.dataverse.org/en/6.0/installation/prerequisites.html#installing-solr and tweaked a bit for an upgrade scenario.

We assume that you already have a user called "solr" (from the instructions above), added during your initial installation of Solr. We also assume that you have already stopped Solr 8 as explained in the instructions above about upgrading Java.

Become the "solr" user and then download and configure Solr.

su - solr

cd /usr/local/solr

wget https://archive.apache.org/dist/solr/solr/9.3.0/solr-9.3.0.tgz

tar xvzf solr-9.3.0.tgz

cd solr-9.3.0

cp -r server/solr/configsets/_default server/solr/collection1

Unzip "dvinstall.zip" from this release. Unzip it into /tmp. Then copy the following files into place.

cp /tmp/dvinstall/schema*.xml /usr/local/solr/solr-9.3.0/server/solr/collection1/conf

cp /tmp/dvinstall/solrconfig.xml /usr/local/solr/solr-9.3.0/server/solr/collection1/conf

A Dataverse installation requires a change to the jetty.xml file that ships with Solr.

Edit /usr/local/solr/solr-9.3.0/server/etc/jetty.xml, increasing requestHeaderSize from 8192 to 102400

Tell Solr to create the core "collection1" on startup.

echo "name=collection1" > /usr/local/solr/solr-9.3.0/server/solr/collection1/core.properties

Update your init script.

Your init script may be located at /etc/systemd/system/solr.service, for example. Update the path to Solr to be /usr/local/solr/solr-9.3.0.

Start Solr using your init script and check collection1.

The collection1 check below should print out fields Dataverse uses like "dsDescription".

systemctl start solr.service

curl http://localhost:8983/solr/collection1/schema/fields

If you have custom metadata blocks installed, you must update your Solr schema.xml to include your custom fields.

For details, please see https://guides.dataverse.org/en/6.0/admin/metadatacustomization.html#updating-the-solr-schema

At a high level you will be copying custom fields from the output of http://localhost:8080/api/admin/index/solr/schema or using a script to automate this.

Reindex Solr.

For details, see https://guides.dataverse.org/en/6.0/admin/solr-search-index.html but here is the reindex command:

curl http://localhost:8080/api/admin/index

Potential Archiver Incompatibilities with Payara 6

The Google Cloud and DuraCloud archivers may not work in Dataverse 6.0.

This is due to the archivers' dependence on libraries that include classes in javax.* packages that are no longer available. If these classes are actually used when the archivers run, the archivers would fail. As these two archivers require additional setup, they have not been tested in 6.0. Community members using these archivers or considering their use are encouraged to test them with 6.0 and report any errors and/or provide fixes for them that can be included in future releases.

Bug Fix for Dataset Templates with Custom Terms of Use

A bug was fixed for the following scenario:

Create a template with custom terms.
Set that template as the default.
Try to create a dataset.
A 500 error appears before the form to create dataset is even shown.

For more details, see issue #9825 and PR #9892

Complete List of Changes

For the complete list of code changes in this release, see the 6.0 Milestone in GitHub.

Getting Help

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

- Java
Published by kcondon almost 3 years ago

dataverse - v5.14

Dataverse Software 5.14

(If this note appears truncated on the GitHub Releases page, you can view it in full in the source tree: https://github.com/IQSS/dataverse/blob/master/doc/release-notes/5.14-release-notes.md)

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Please note that, as an experiment, the sections of this release note are organized in a different order. The Upgrade and Installation sections are at the top, with the detailed sections highlighting new features and fixes further down.

Installation

If this is a new installation, please see our Installation Guide. Please don't be shy about asking for help if you need it!

After your installation has gone into production, you are welcome to add it to our map of installations by opening an issue in the dataverse-installations repo.

Upgrade Instructions

0. These instructions assume that you are upgrading from 5.13. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to 5.14.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin undeploy dataverse-5.13

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.14.war

5. Restart Payara

service payara stop
service payara start

6. Update the Citation metadata block: (the update makes the field Series repeatable)

wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

If you are running an English-only installation, you are finished with the citation block. Otherwise, download the updated citation.properties file and place it in the dataverse.lang.directory; /home/dataverse/langBundles used in the example below.

wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.properties
cp citation.properties /home/dataverse/langBundles

7. Upate Solr schema.xml to allow multiple series to be used. See specific instructions below for those installations without custom metadata blocks (7a) and those with custom metadata blocks (7b).

7a. For installations without custom or experimental metadata blocks:

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)
Replace schema.xml
- cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
Start Solr instance (usually service solr start, depending on Solr/OS)

7b. For installations with custom or experimental metadata blocks:

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)
There are 2 ways to regenerate the schema: Either by collecting the output of the Dataverse schema API and feeding it to the update-fields.sh script that we supply, as in the example below (modify the command lines as needed): wget https://raw.githubusercontent.com/IQSS/dataverse/master/conf/solr/8.11.1/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.8.1/server/solr/collection1/conf/schema.xml OR, alternatively, you can edit the following lines in your schema.xml by hand as follows (to indicate that series and its components are now multiValued="true"): <field name="series" type="string" stored="true" indexed="true" multiValued="true"/> <field name="seriesInformation" type="text_en" multiValued="true" stored="true" indexed="true"/> <field name="seriesName" type="text_en" multiValued="true" stored="true" indexed="true"/>
Restart Solr instance (usually service solr restart depending on solr/OS)

8. Run ReExportAll to update dataset metadata exports. Follow the directions in the Admin Guide.

9. If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:

  curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled

10. If your installation uses Handles as persistent identifiers (instead of DOIs): remember to upgrade your Handles service installation to a currently supported version.

Generally, Handles is known to be working reliably even when running older versions that haven't been officially supported in years. We still recommend to check on your service and make sure to upgrade to a supported version (the latest version is 9.3.1, https://www.handle.net/hnr-source/handle-9.3.1-distribution.tar.gz, as of writing this). An older version may be running for you seemingly just fine, but do keep in mind that it may just stop working unexpectedly at any moment, because of some incompatibility introduced in a Java rpm upgrade, or anything similarly unpredictable.

Handles is also very good about backward incompatibility. Meaning, in most cases you can simply stop the old version, unpack the new version from the distribution and start it on the existing config and database files, and it'll just keep working. However, it is a good idea to keep up with the recommended format upgrades, for the sake of efficiency and to avoid any unexpected surprises, should they finally decide to drop the old database format, for example. The two specific things we recommend: 1) Make sure your service is using a json version of the siteinfo bundle (i.e., if you are still using siteinfo.bin, convert it to siteinfo.json and remove the binary file from the service directory) and 2) Make sure you are using the newer bdbje database format for your handles catalog (i.e., if you still have the files handles.jdb and nas.jdb in your server directory, convert them to the new format). Follow the simple conversion instructions in the file README.txt in the Handles software distribution. Make sure to stop the service before converting the files and make sure to have a full backup of the existing server directory, just in case. Do not hesitate to contact the Handles support with any questions you may have, as they are very responsive and helpful.

New JVM Options and MicroProfile Config Options

The following PID provider options are now available. See the section "Changes to PID Provider JVM Settings" below for more information.

dataverse.pid.datacite.mds-api-url
dataverse.pid.datacite.rest-api-url
dataverse.pid.datacite.username
dataverse.pid.datacite.password
dataverse.pid.handlenet.key.path
dataverse.pid.handlenet.key.passphrase
dataverse.pid.handlenet.index
dataverse.pid.permalink.base-url
dataverse.pid.ezid.api-url
dataverse.pid.ezid.username
dataverse.pid.ezid.password

The following MicroProfile Config options have been added as part of Signposting support. See the section "Signposting for Dataverse" below for details.

dataverse.signposting.level1-author-limit
dataverse.signposting.level1-item-limit

The following JVM options are described in the "Creating datasets with incomplete metadata through API" section below.

dataverse.api.allow-incomplete-metadata
dataverse.ui.show-validity-filter
dataverse.ui.allow-review-for-incomplete

The following JVM/MicroProfile setting is for External Exporters. See "Mechanism Added for Adding External Exporters" below.

dataverse.spi.export.directory

The following JVM/MicroProfile settings are for handling of support emails. See "Contact Email Improvements" below.

dataverse.mail.support-email
dataverse.mail.cc-support-on-contact-emails

The following JVM/MicroProfile setting is for extracting a geospatial bounding box even if S3 direct upload is enabled.

dataverse.netcdf.geo-extract-s3-direct-upload

Backward Incompatibilities

The following list of potential backward incompatibilities references the sections of the "Detailed Release Highlights..." portion of the document further below where the corresponding changes are explained in detail.

Using the new External Exporters framework

Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters, may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the reExport command for the metadata exports of existing datasets to be updated.

See "Mechanism Added for Adding External Exporters".

Publishing via API

When publishing a dataset via API, it now mirrors the UI behavior by requiring that the dataset has either a standard license configured, or has valid Custom Terms of Use (if allowed by the instance). Attempting to publish a dataset without such will fail with an error message.

See "Handling of license information fixed in the API" for guidance on how to ensure that datasets created or updated via native API have a license configured.

Detailed Release Highlights, New Features and Use Case Scenarios

For Dataverse developers, support for running Dataverse in Docker (experimental)

Developers can experiment with running Dataverse in Docker: (PR #9439)

This is an image developers build locally (or can pull from Docker Hub). It is not meant for production use!

To provide a complete container-based local development environment, developers can deploy a Dataverse container from the new image in addition to other containers for necessary dependencies: https://guides.dataverse.org/en/5.14/container/dev-usage.html

Please note that with this emerging solution we will sunset older tooling like docker-aio and docker-dcm. We envision more testing possibilities in the future, to be discussed as part of the Dataverse Containerization Working Group. There is no sunsetting roadmap yet, but you have been warned. If there is some specific feature of these tools you would like to be kept, please reach out.

Indexing performance improved

Noticeable improvements in performance, especially for large datasets containing thousands of files. Uploading files one by one to the dataset is much faster now, allowing uploading thousands of files in an acceptable timeframe. Not only uploading a file, but all edit operations on datasets containing many files, got faster. Performance tweaks include indexing of the datasets in the background and optimizations in the amount of the indexing operations needed. Furthermore, updates to the dateset no longer wait for ingesting to finish. Ingesting was already running in the background, but it took a lock, preventing updating the dataset and degrading performance for datasets containing many files. (PR #9558)

For installations using MDC (Make Data Count), it is now possible to display both the MDC metrics and the legacy access counts, generated before MDC was enabled.

This is enabled via the new setting :MDCStartDate that specifies the cutoff date. If a dataset has any legacy access counts collected prior to that date, those numbers will be displayed in addition to any MDC numbers recorded since then. (PR #6543)

Changes to PID Provider JVM Settings

In preparation for a future feature to use multiple PID providers at the same time, all JVM settings for PID providers have been enabled to be configured using MicroProfile Config. In the same go, they were renamed to match the name of the provider to be configured.

Please watch your log files for deprecation warnings. Your old settings will be picked up, but you should migrate to the new names to avoid unnecessary log clutter and get prepared for more future changes. An example message looks like this:

[#|2023-03-31T16:55:27.992+0000|WARNING|Payara 5.2022.5|edu.harvard.iq.dataverse.settings.source.AliasConfigSource|_ThreadID=30;_ThreadName=RunLevelControllerThread-1680281704925;_TimeMillis=1680281727992;_LevelValue=900;| Detected deprecated config option doi.username in use. Please update your config to use dataverse.pid.datacite.username.|#]

Here is a list of the new settings:

dataverse.pid.datacite.mds-api-url
dataverse.pid.datacite.rest-api-url
dataverse.pid.datacite.username
dataverse.pid.datacite.password
dataverse.pid.handlenet.key.path
dataverse.pid.handlenet.key.passphrase
dataverse.pid.handlenet.index
dataverse.pid.permalink.base-url
dataverse.pid.ezid.api-url
dataverse.pid.ezid.username
dataverse.pid.ezid.password

See also https://guides.dataverse.org/en/5.14/installation/config.html#persistent-identifiers-and-publishing-datasets (multiple PRs: #8823 #8828)

Signposting for Dataverse

This release adds Signposting support to Dataverse to improve machine discoverability of datasets and files. (PR #8424)

The following MicroProfile Config options are now available (these can be treated as JVM options):

dataverse.signposting.level1-author-limit
dataverse.signposting.level1-item-limit

Signposting is described in more detail in a new page in the Admin Guide on discoverability: https://guides.dataverse.org/en/5.14/admin/discoverability.html

Permalinks support

Dataverse now optionally supports PermaLinks, a type of persistent identifier that does not involve a global registry service. PermaLinks are appropriate for Intranet deployment and catalog use cases. (PR #8674)

Creating datasets with incomplete metadata through API

It is now possible to create a dataset with some nominally mandatory metadata fields left unpopulated. For details on the use case that lead to this feature see issue #8822 and PR #8940.

The create dataset API call (POST to /api/dataverses/#dataverseId/datasets) is extended with the "doNotValidate" parameter. However, in order to be able to create a dataset with incomplete metadata, the Solr configuration must be updated first with the new "schema.xml" file (do not forget to run the metadata fields update script when you use custom metadata). Reindexing is optional, but recommended. Also, even when this feature is not used, it is recommended to update the Solr configuration and reindex the metadata. Finally, this new feature can be activated with the "dataverse.api.allow-incomplete-metadata" JVM option.

You can also enable a valid/incomplete metadata filter in the "My Data" page using the "dataverse.ui.show-validity-filter" JVM option. By default, this filter is not shown. When you wish to use this filter, you must reindex the datasets first, otherwise datasets with valid metadata will not be shown in the results.

It is not possible to publish datasets with incomplete or incomplete metadata. By default, you also cannot send such datasets for review. If you wish to enable sending for review of datasets with incomplete metadata, turn on the "dataverse.ui.allow-review-for-incomplete" JVM option.

In order to customize the wording and add translations to the UI sections extended by this feature, you can edit the "Bundle.properties" file and the localized versions of that file. The property keys used by this feature are: - incomplete - valid - dataset.message.incomplete.warning - mydataFragment.validity - dataverses.api.create.dataset.error.mustIncludeAuthorName

Registering PIDs (DOIs or Handles) for files in select collections

It is now possible to configure registering PIDs for files in individual collections.

For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See the :FilePIDsEnabled section of the Configuration guide for details. (PR #9614)

Mechanism Added for Adding External Exporters

It is now possible for third parties to develop and share code to provide new metadata export formats for Dataverse. Export formats can be made available via the Dataverse UI and API or configured for use in Harvesting. Dataverse now provides developers with a separate dataverse-spi JAR file that contains the Java interfaces and classes required to create a new metadata Exporter. Once a new Exporter has been created and packaged as a JAR file, administrators can use it by specifying a local directory for third party Exporters, dropping then Exporter JAR there, and restarting Payara. This mechanism also allows new Exporters to replace any of Dataverse's existing metadata export formats. (PR #9175). See also https://guides.dataverse.org/en/5.14/developers/metadataexport.html

Backward Incompatibilities

Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the reExport command for the metadata exports of existing datasets to be updated.

New JVM/MicroProfile Settings

dataverse.spi.export.directory - specifies a directory, readable by the Dataverse server. Any Exporter JAR files placed in this directory will be read by Dataverse and used to add/replace the specified metadata format.

Contact Email Improvements

Email sent from the contact forms to the contact(s) for a collection, dataset, or datafile can now optionally be cc'd to a support email address. The support email address can be changed from the default :SystemEmail address to a separate :SupportEmail address. When multiple contacts are listed, the system will now send one email to all contacts (with the optional cc if configured) instead of separate emails to each contact. Contact names with a comma that refer to Organizations will no longer have the name parts reversed in the email greeting. A new protected/admin feedback API has been added. (PR #9186) See https://guides.dataverse.org/en/5.14/api/native-api.html#send-feedback-to-contact-s

New JVM/MicroProfile Settings

dataverse.mail.support-email - allows a separate email, distinct from the :SystemEmail to be used as the to address in emails from the contact form/ feedback api. dataverse.mail.cc-support-on-contact-emails - include the support email address as a CC: entry when contact/feedback emails are sent to the contacts for a collection, dataset, or datafile.

Support for Grouping Dataset Files by Folder and Category Tag

Dataverse now supports grouping dataset files by folder and/or optionally by Tag/Category. The default for whether to order by folder can be changed via :OrderByFolder. Ordering by category must be enabled by an administrator via the :CategoryOrder parameter which is used to specify which tags appear first (e.g. to put Documentation files before Data or Code files, etc.) These Group-By options work with the existing sort options, i.e. sorting alphabetically means that files within each folder or tag group will be sorted alphabetically. :AllowUsersToManageOrdering can be set to true to allow users to turn folder ordering and category ordering (if enabled) on or off in the current dataset view. (PR #9204)

New Settings

:CategoryOrder - a comma separated list of Category/Tag names defining the order in which files with those tags should be displayed. The setting can include custom tag names along with the pre-defined defaults ( Documentation, Data, and Code, which can be overridden by the ::FileCategories setting.) :OrderByFolder - defaults to true - whether to group files in the same folder together :AllowUserManagementOfOrder - default false - allow users to toggle ordering on/off in the dataset display

Metadata field Series now repeatable

This enhancement allows depositors to define multiple instances of the metadata field Series in the Citation Metadata block.

Data contained in a dataset may belong to multiple series. Making the field repeatable makes it possible to reflect this fact in the dataset metadata. (PR #9256)

Guides in PDF Format

An experimental version of the guides in PDF format is available at http://preview.guides.gdcc.io/_/downloads/en/develop/pdf/ (PR #9474)

Advice for anyone who wants to help improve the PDF is available at https://guides.dataverse.org/en/5.14/developers/documentation.html#pdf-version-of-the-guides

Datasets API extended

The following APIs have been added: (PR #9592)

/api/datasets/summaryFieldNames
/api/datasets/privateUrlDatasetVersion/{privateUrlToken}
/api/datasets/privateUrlDatasetVersion/{privateUrlToken}/citation
/api/datasets/{datasetId}/versions/{version}/citation

Extra fields included in the JSON metadata

The following fields are now available in the native JSON output:

alternativePersistentId
publicationDate
citationDate

(PR #9657)

Files downloaded from Binder are now in their original format.

For example, data.dta (a Stata file) will be downloaded instead of data.tab (the archival version Dataverse creates as part of a successful ingest). (PR #9483)

This should make it easier to write code to reproduce results as the dataset authors and subsequent researchers are likely operating on the original file format rather that the format that Dataverse creates.

For details, see #9374, https://github.com/jupyterhub/repo2docker/issues/1242, and https://github.com/jupyterhub/repo2docker/pull/1253.

Handling of license information fixed in the API

(PR #9568)

When publishing a dataset via API, it now requires the dataset to either have a standard license configured, or have valid Custom Terms of Use (if allowed by the instance). Attempting to publish a dataset without such will fail with an error message. This introduces a backward incompatibility, and if you have scripts that automatically create, update and publish datasets, this last step may start failing. Because, unfortunately, there were some problems with the datasets APIs that made it difficult to manage licenses, so an API user was likely to end up with a dataset missing either of the above. In this release we have addressed it by making the following fixes:

We fixed the incompatibility between the format in which license information was exported in json, and the format the create and update APIs were expecting it for import (https://github.com/IQSS/dataverse/issues/9155). This means that the following json format can now be imported: "license": { "name": "CC0 1.0", "uri": "http://creativecommons.org/publicdomain/zero/1.0" } However, for the sake of backward compatibility the old format "license" : "CC0 1.0" will be accepted as well.

We have added the default license (CC0) to the model json file that we provide and recommend to use as the model in the Native API Guide (https://github.com/IQSS/dataverse/issues/9364).

And we have corrected the misleading language in the same guide where we used to recommend to users that they select, edit and re-import only the .metadataBlocks fragment of the json metadata representing the latest version. There are in fact other useful pieces of information that need to be preserved in the update (such as the "license" section above). So the recommended way of creating base json for updates via the API is to select everything but the "files" section, with (for example) the following jq command:

jq '.data | del(.files)'

Please see the Update Metadata For a Dataset section of our Native Api guide for more information.

New External Tool Type and Implementation

With this release a new experimental external tool type has been added to the Dataverse Software. The tool type is "query" and its first implementation is an experimental tool named Ask the Data which allows users to ask natural language queries of tabular files in Dataverse. More information is available in the External Tools section of the guides. (PR #9737) See https://guides.dataverse.org/en/5.14/admin/external-tools.html#file-level-query-tools

Default Value for File PIDs registration has changed

The default for whether PIDs are registered for files or not is now false.

Installations where file PIDs were enabled by default will have to add the :FilePIDsEnabled = true setting to maintain the existing functionality.

See Step 9 of the upgrade instructions:

If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:

curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled

It is now possible to allow File PIDs to be enabled/disabled per collection. See the :AllowEnablingFilePIDsPerCollection section of the Configuration guide for details.

For example, registration of PIDs for files can now be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default.

Changes and fixes in this release not already mentioned above include:

An endpoint for deleting a file has been added to the native API: https://guides.dataverse.org/en/5.14/api/native-api.html#deleting-files (PR #9383)
A date column has been added to the restricted file access request overview, indicating when the earliest request by that user was made. An issue was fixed where where the request list was not updated when a request was approved or rejected. (PR #9257)
Changes made in v5.13 and v5.14 in multiple PRs to improve the embedded Schema.org metadata in dataset pages will only be propagated to the Schema.Org JSON-LD metadata export if a reExportAll() is done. (PR #9102)
It is now possible to write external vocabulary scripts that target a single child field in a metadata block. Example scripts are now available at https://github.com/gdcc/dataverse-external-vocab-support that can be configured to support lookup from the Research Orgnaization Registry (ROR) for the Author Affiliation Field and for the CrossRef Funding Registry (Fundreg) in the Funding Information/Agency field, both in the standard Citation metadata block. Application if these scripts to other fields, and the development of other scripts targetting child fields are now possible (PR #9402)
Dataverse now supports requiring a secret key to add or edit metadata in specified "system" metadata blocks. Changing the metadata in such system metadata blocks is not allowed without the key and is currently only allowed via API. (PR #9388)
An attempt will be made to extract a geospatial bounding box (west, south, east, north) from NetCDF and HDF5 files and then insert these values into the geospatial metadata block, if enabled. (#9541) See https://guides.dataverse.org/en/5.14/user/dataset-management.html#geospatial-bounding-box
A file previewer called H5Web is now available for exploring and visualizing NetCDF and HDF5 files. (PR #9600) See https://guides.dataverse.org/en/5.14/user/dataset-management.html#h5web-previewer
Two file previewers for GeoTIFF and Shapefiles are now available for visualizing geotiff image files and zipped Shapefiles on a map. See https://github.com/gdcc/dataverse-previewers
New alternative to setup the Dataverse dependencies for the development environment through Docker Compose. (PR #9417)
New alternative, explained in the documentation, to build the Sphinx guides through a Docker container. (PR #9417)
A container has been added called "configbaker" that configures Dataverse while running in containers. This allows developers to spin up Dataverse with a single command. (PR #9574)
Direct upload via the Dataverse UI will now support any algorithm configured via the :FileFixityChecksumAlgorithm setting. External apps using the direct upload API can now query Dataverse to discover which algorithm should be used. Sites that have been using an algorithm other than MD5 and direct upload and/or dvwebloader may want to use the /api/admin/updateHashValues call (see https://guides.dataverse.org/en/5.14/installation/config.html?highlight=updatehashvalues#filefixitychecksumalgorithm) to replace any MD5 hashes on existing files. (PR #9482)
The OAI_ORE metadata export (and hence the archival Bag for a dataset) now includes information about file embargoes. (PR #9698)
DatasetFieldType attribute "displayFormat", is now returned by the API. (PR #9668)
An API named "MyData" has been available for years but is newly documented. It is used to get a list of the objects (datasets, collections or datafiles) that an authenticated user can modify. (PR #9596)
A Go client library for Dataverse APIs is now available. See https://guides.dataverse.org/en/5.14/api/client-libraries.html
A feature flag called "api-session-auth" has been added temporarily to aid in the development of the new frontend (#9063) but will be removed once bearer tokens (#9229) have been implemented. There is a security risk (CSRF) in enabling this flag! Do not use it in production! For more information, see https://guides.dataverse.org/en/5.14/installation/config.html#feature-flags
A feature flag called "api-bearer-auth" has been added. This allows OIDC useraccounts to send authenticated API requests using Bearer Tokens. Note: This feature is limited to OIDC! For more information, see https://guides.dataverse.org/en/5.14/installation/config.html#feature-flags (PR #9591)

Complete List of Changes

For the complete list of code changes in this release, see the 5.14 milestone on GitHub.

- Java
Published by kcondon almost 3 years ago

dataverse - v5.14

Dataverse Software 5.14

(If this note appears truncated on the GitHub Releases page, you can view it in full in the source tree: https://github.com/IQSS/dataverse/blob/master/doc/release-notes/5.14-release-notes.md)

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Please note that, as an experiment, the sections of this release note are organized in a different order. The Upgrade and Installation sections are at the top, with the detailed sections highlighting new features and fixes further down.

Installation

If this is a new installation, please see our Installation Guide. Please don't be shy about asking for help if you need it!

After your installation has gone into production, you are welcome to add it to our map of installations by opening an issue in the dataverse-installations repo.

Upgrade Instructions

0. These instructions assume that you are upgrading from 5.13. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to 5.14.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin undeploy dataverse-5.13

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.14.war

5. Restart Payara

service payara stop
service payara start

6. Update the Citation metadata block: (the update makes the field Series repeatable)

wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

If you are running an English-only installation, you are finished with the citation block. Otherwise, download the updated citation.properties file and place it in the dataverse.lang.directory; /home/dataverse/langBundles used in the example below.

wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.properties
cp citation.properties /home/dataverse/langBundles

7. Upate Solr schema.xml to allow multiple series to be used. See specific instructions below for those installations without custom metadata blocks (7a) and those with custom metadata blocks (7b).

7a. For installations without custom or experimental metadata blocks:

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)
Replace schema.xml
- cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
Start Solr instance (usually service solr start, depending on Solr/OS)

7b. For installations with custom or experimental metadata blocks:

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide)
There are 2 ways to regenerate the schema: Either by collecting the output of the Dataverse schema API and feeding it to the update-fields.sh script that we supply, as in the example below (modify the command lines as needed): wget https://raw.githubusercontent.com/IQSS/dataverse/master/conf/solr/8.11.1/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.8.1/server/solr/collection1/conf/schema.xml OR, alternatively, you can edit the following lines in your schema.xml by hand as follows (to indicate that series and its components are now multiValued="true"): <field name="series" type="string" stored="true" indexed="true" multiValued="true"/> <field name="seriesInformation" type="text_en" multiValued="true" stored="true" indexed="true"/> <field name="seriesName" type="text_en" multiValued="true" stored="true" indexed="true"/>
Restart Solr instance (usually service solr restart depending on solr/OS)

8. Run ReExportAll to update dataset metadata exports. Follow the directions in the Admin Guide.

9. If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:

  curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled

10. If your installation uses Handles as persistent identifiers (instead of DOIs): remember to upgrade your Handles service installation to a currently supported version.

Generally, Handles is known to be working reliably even when running older versions that haven't been officially supported in years. We still recommend to check on your service and make sure to upgrade to a supported version (the latest version is 9.3.1, https://www.handle.net/hnr-source/handle-9.3.1-distribution.tar.gz, as of writing this). An older version may be running for you seemingly just fine, but do keep in mind that it may just stop working unexpectedly at any moment, because of some incompatibility introduced in a Java rpm upgrade, or anything similarly unpredictable.

Handles is also very good about backward incompatibility. Meaning, in most cases you can simply stop the old version, unpack the new version from the distribution and start it on the existing config and database files, and it'll just keep working. However, it is a good idea to keep up with the recommended format upgrades, for the sake of efficiency and to avoid any unexpected surprises, should they finally decide to drop the old database format, for example. The two specific things we recommend: 1) Make sure your service is using a json version of the siteinfo bundle (i.e., if you are still using siteinfo.bin, convert it to siteinfo.json and remove the binary file from the service directory) and 2) Make sure you are using the newer bdbje database format for your handles catalog (i.e., if you still have the files handles.jdb and nas.jdb in your server directory, convert them to the new format). Follow the simple conversion instructions in the file README.txt in the Handles software distribution. Make sure to stop the service before converting the files and make sure to have a full backup of the existing server directory, just in case. Do not hesitate to contact the Handles support with any questions you may have, as they are very responsive and helpful.

New JVM Options and MicroProfile Config Options

The following PID provider options are now available. See the section "Changes to PID Provider JVM Settings" below for more information.

dataverse.pid.datacite.mds-api-url
dataverse.pid.datacite.rest-api-url
dataverse.pid.datacite.username
dataverse.pid.datacite.password
dataverse.pid.handlenet.key.path
dataverse.pid.handlenet.key.passphrase
dataverse.pid.handlenet.index
dataverse.pid.permalink.base-url
dataverse.pid.ezid.api-url
dataverse.pid.ezid.username
dataverse.pid.ezid.password

The following MicroProfile Config options have been added as part of Signposting support. See the section "Signposting for Dataverse" below for details.

dataverse.signposting.level1-author-limit
dataverse.signposting.level1-item-limit

The following JVM options are described in the "Creating datasets with incomplete metadata through API" section below.

dataverse.api.allow-incomplete-metadata
dataverse.ui.show-validity-filter
dataverse.ui.allow-review-for-incomplete

The following JVM/MicroProfile setting is for External Exporters. See "Mechanism Added for Adding External Exporters" below.

dataverse.spi.export.directory

The following JVM/MicroProfile settings are for handling of support emails. See "Contact Email Improvements" below.

dataverse.mail.support-email
dataverse.mail.cc-support-on-contact-emails

The following JVM/MicroProfile setting is for extracting a geospatial bounding box even if S3 direct upload is enabled.

dataverse.netcdf.geo-extract-s3-direct-upload

Backward Incompatibilities

The following list of potential backward incompatibilities references the sections of the "Detailed Release Highlights..." portion of the document further below where the corresponding changes are explained in detail.

Using the new External Exporters framework

Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters, may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the reExport command for the metadata exports of existing datasets to be updated.

See "Mechanism Added for Adding External Exporters".

Publishing via API

When publishing a dataset via API, it now mirrors the UI behavior by requiring that the dataset has either a standard license configured, or has valid Custom Terms of Use (if allowed by the instance). Attempting to publish a dataset without such will fail with an error message.

See "Handling of license information fixed in the API" for guidance on how to ensure that datasets created or updated via native API have a license configured.

Detailed Release Highlights, New Features and Use Case Scenarios

For Dataverse developers, support for running Dataverse in Docker (experimental)

Developers can experiment with running Dataverse in Docker: (PR #9439)

This is an image developers build locally (or can pull from Docker Hub). It is not meant for production use!

To provide a complete container-based local development environment, developers can deploy a Dataverse container from the new image in addition to other containers for necessary dependencies: https://guides.dataverse.org/en/5.14/container/dev-usage.html

Please note that with this emerging solution we will sunset older tooling like docker-aio and docker-dcm. We envision more testing possibilities in the future, to be discussed as part of the Dataverse Containerization Working Group. There is no sunsetting roadmap yet, but you have been warned. If there is some specific feature of these tools you would like to be kept, please reach out.

Indexing performance improved

Noticeable improvements in performance, especially for large datasets containing thousands of files. Uploading files one by one to the dataset is much faster now, allowing uploading thousands of files in an acceptable timeframe. Not only uploading a file, but all edit operations on datasets containing many files, got faster. Performance tweaks include indexing of the datasets in the background and optimizations in the amount of the indexing operations needed. Furthermore, updates to the dateset no longer wait for ingesting to finish. Ingesting was already running in the background, but it took a lock, preventing updating the dataset and degrading performance for datasets containing many files. (PR #9558)

For installations using MDC (Make Data Count), it is now possible to display both the MDC metrics and the legacy access counts, generated before MDC was enabled.

This is enabled via the new setting :MDCStartDate that specifies the cutoff date. If a dataset has any legacy access counts collected prior to that date, those numbers will be displayed in addition to any MDC numbers recorded since then. (PR #6543)

Changes to PID Provider JVM Settings

In preparation for a future feature to use multiple PID providers at the same time, all JVM settings for PID providers have been enabled to be configured using MicroProfile Config. In the same go, they were renamed to match the name of the provider to be configured.

Please watch your log files for deprecation warnings. Your old settings will be picked up, but you should migrate to the new names to avoid unnecessary log clutter and get prepared for more future changes. An example message looks like this:

[#|2023-03-31T16:55:27.992+0000|WARNING|Payara 5.2022.5|edu.harvard.iq.dataverse.settings.source.AliasConfigSource|_ThreadID=30;_ThreadName=RunLevelControllerThread-1680281704925;_TimeMillis=1680281727992;_LevelValue=900;| Detected deprecated config option doi.username in use. Please update your config to use dataverse.pid.datacite.username.|#]

Here is a list of the new settings:

dataverse.pid.datacite.mds-api-url
dataverse.pid.datacite.rest-api-url
dataverse.pid.datacite.username
dataverse.pid.datacite.password
dataverse.pid.handlenet.key.path
dataverse.pid.handlenet.key.passphrase
dataverse.pid.handlenet.index
dataverse.pid.permalink.base-url
dataverse.pid.ezid.api-url
dataverse.pid.ezid.username
dataverse.pid.ezid.password

See also https://guides.dataverse.org/en/5.14/installation/config.html#persistent-identifiers-and-publishing-datasets (multiple PRs: #8823 #8828)

Signposting for Dataverse

This release adds Signposting support to Dataverse to improve machine discoverability of datasets and files. (PR #8424)

The following MicroProfile Config options are now available (these can be treated as JVM options):

dataverse.signposting.level1-author-limit
dataverse.signposting.level1-item-limit

Signposting is described in more detail in a new page in the Admin Guide on discoverability: https://guides.dataverse.org/en/5.14/admin/discoverability.html

Permalinks support

Dataverse now optionally supports PermaLinks, a type of persistent identifier that does not involve a global registry service. PermaLinks are appropriate for Intranet deployment and catalog use cases. (PR #8674)

Creating datasets with incomplete metadata through API

It is now possible to create a dataset with some nominally mandatory metadata fields left unpopulated. For details on the use case that lead to this feature see issue #8822 and PR #8940.

The create dataset API call (POST to /api/dataverses/#dataverseId/datasets) is extended with the "doNotValidate" parameter. However, in order to be able to create a dataset with incomplete metadata, the Solr configuration must be updated first with the new "schema.xml" file (do not forget to run the metadata fields update script when you use custom metadata). Reindexing is optional, but recommended. Also, even when this feature is not used, it is recommended to update the Solr configuration and reindex the metadata. Finally, this new feature can be activated with the "dataverse.api.allow-incomplete-metadata" JVM option.

You can also enable a valid/incomplete metadata filter in the "My Data" page using the "dataverse.ui.show-validity-filter" JVM option. By default, this filter is not shown. When you wish to use this filter, you must reindex the datasets first, otherwise datasets with valid metadata will not be shown in the results.

It is not possible to publish datasets with incomplete or incomplete metadata. By default, you also cannot send such datasets for review. If you wish to enable sending for review of datasets with incomplete metadata, turn on the "dataverse.ui.allow-review-for-incomplete" JVM option.

In order to customize the wording and add translations to the UI sections extended by this feature, you can edit the "Bundle.properties" file and the localized versions of that file. The property keys used by this feature are: - incomplete - valid - dataset.message.incomplete.warning - mydataFragment.validity - dataverses.api.create.dataset.error.mustIncludeAuthorName

Registering PIDs (DOIs or Handles) for files in select collections

It is now possible to configure registering PIDs for files in individual collections.

For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See the :FilePIDsEnabled section of the Configuration guide for details. (PR #9614)

Mechanism Added for Adding External Exporters

It is now possible for third parties to develop and share code to provide new metadata export formats for Dataverse. Export formats can be made available via the Dataverse UI and API or configured for use in Harvesting. Dataverse now provides developers with a separate dataverse-spi JAR file that contains the Java interfaces and classes required to create a new metadata Exporter. Once a new Exporter has been created and packaged as a JAR file, administrators can use it by specifying a local directory for third party Exporters, dropping then Exporter JAR there, and restarting Payara. This mechanism also allows new Exporters to replace any of Dataverse's existing metadata export formats. (PR #9175). See also https://guides.dataverse.org/en/5.14/developers/metadataexport.html

Backward Incompatibilities

Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the reExport command for the metadata exports of existing datasets to be updated.

New JVM/MicroProfile Settings

dataverse.spi.export.directory - specifies a directory, readable by the Dataverse server. Any Exporter JAR files placed in this directory will be read by Dataverse and used to add/replace the specified metadata format.

Contact Email Improvements

Email sent from the contact forms to the contact(s) for a collection, dataset, or datafile can now optionally be cc'd to a support email address. The support email address can be changed from the default :SystemEmail address to a separate :SupportEmail address. When multiple contacts are listed, the system will now send one email to all contacts (with the optional cc if configured) instead of separate emails to each contact. Contact names with a comma that refer to Organizations will no longer have the name parts reversed in the email greeting. A new protected/admin feedback API has been added. (PR #9186) See https://guides.dataverse.org/en/5.14/api/native-api.html#send-feedback-to-contact-s

New JVM/MicroProfile Settings

dataverse.mail.support-email - allows a separate email, distinct from the :SystemEmail to be used as the to address in emails from the contact form/ feedback api. dataverse.mail.cc-support-on-contact-emails - include the support email address as a CC: entry when contact/feedback emails are sent to the contacts for a collection, dataset, or datafile.

Support for Grouping Dataset Files by Folder and Category Tag

Dataverse now supports grouping dataset files by folder and/or optionally by Tag/Category. The default for whether to order by folder can be changed via :OrderByFolder. Ordering by category must be enabled by an administrator via the :CategoryOrder parameter which is used to specify which tags appear first (e.g. to put Documentation files before Data or Code files, etc.) These Group-By options work with the existing sort options, i.e. sorting alphabetically means that files within each folder or tag group will be sorted alphabetically. :AllowUsersToManageOrdering can be set to true to allow users to turn folder ordering and category ordering (if enabled) on or off in the current dataset view. (PR #9204)

New Settings

:CategoryOrder - a comma separated list of Category/Tag names defining the order in which files with those tags should be displayed. The setting can include custom tag names along with the pre-defined defaults ( Documentation, Data, and Code, which can be overridden by the ::FileCategories setting.) :OrderByFolder - defaults to true - whether to group files in the same folder together :AllowUserManagementOfOrder - default false - allow users to toggle ordering on/off in the dataset display

Metadata field Series now repeatable

This enhancement allows depositors to define multiple instances of the metadata field Series in the Citation Metadata block.

Data contained in a dataset may belong to multiple series. Making the field repeatable makes it possible to reflect this fact in the dataset metadata. (PR #9256)

Guides in PDF Format

An experimental version of the guides in PDF format is available at http://preview.guides.gdcc.io/_/downloads/en/develop/pdf/ (PR #9474)

Advice for anyone who wants to help improve the PDF is available at https://guides.dataverse.org/en/5.14/developers/documentation.html#pdf-version-of-the-guides

Datasets API extended

The following APIs have been added: (PR #9592)

/api/datasets/summaryFieldNames
/api/datasets/privateUrlDatasetVersion/{privateUrlToken}
/api/datasets/privateUrlDatasetVersion/{privateUrlToken}/citation
/api/datasets/{datasetId}/versions/{version}/citation

Extra fields included in the JSON metadata

The following fields are now available in the native JSON output:

alternativePersistentId
publicationDate
citationDate

(PR #9657)

Files downloaded from Binder are now in their original format.

For example, data.dta (a Stata file) will be downloaded instead of data.tab (the archival version Dataverse creates as part of a successful ingest). (PR #9483)

This should make it easier to write code to reproduce results as the dataset authors and subsequent researchers are likely operating on the original file format rather that the format that Dataverse creates.

For details, see #9374, https://github.com/jupyterhub/repo2docker/issues/1242, and https://github.com/jupyterhub/repo2docker/pull/1253.

Handling of license information fixed in the API

(PR #9568)

When publishing a dataset via API, it now requires the dataset to either have a standard license configured, or have valid Custom Terms of Use (if allowed by the instance). Attempting to publish a dataset without such will fail with an error message. This introduces a backward incompatibility, and if you have scripts that automatically create, update and publish datasets, this last step may start failing. Because, unfortunately, there were some problems with the datasets APIs that made it difficult to manage licenses, so an API user was likely to end up with a dataset missing either of the above. In this release we have addressed it by making the following fixes:

We fixed the incompatibility between the format in which license information was exported in json, and the format the create and update APIs were expecting it for import (https://github.com/IQSS/dataverse/issues/9155). This means that the following json format can now be imported: "license": { "name": "CC0 1.0", "uri": "http://creativecommons.org/publicdomain/zero/1.0" } However, for the sake of backward compatibility the old format "license" : "CC0 1.0" will be accepted as well.

We have added the default license (CC0) to the model json file that we provide and recommend to use as the model in the Native API Guide (https://github.com/IQSS/dataverse/issues/9364).

And we have corrected the misleading language in the same guide where we used to recommend to users that they select, edit and re-import only the .metadataBlocks fragment of the json metadata representing the latest version. There are in fact other useful pieces of information that need to be preserved in the update (such as the "license" section above). So the recommended way of creating base json for updates via the API is to select everything but the "files" section, with (for example) the following jq command:

jq '.data | del(.files)'

Please see the Update Metadata For a Dataset section of our Native Api guide for more information.

New External Tool Type and Implementation

With this release a new experimental external tool type has been added to the Dataverse Software. The tool type is "query" and its first implementation is an experimental tool named Ask the Data which allows users to ask natural language queries of tabular files in Dataverse. More information is available in the External Tools section of the guides. (PR #9737) See https://guides.dataverse.org/en/5.14/admin/external-tools.html#file-level-query-tools

Default Value for File PIDs registration has changed

The default for whether PIDs are registered for files or not is now false.

Installations where file PIDs were enabled by default will have to add the :FilePIDsEnabled = true setting to maintain the existing functionality.

See Step 9 of the upgrade instructions:

If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:

curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled

It is now possible to allow File PIDs to be enabled/disabled per collection. See the :AllowEnablingFilePIDsPerCollection section of the Configuration guide for details.

For example, registration of PIDs for files can now be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default.

Changes and fixes in this release not already mentioned above include:

An endpoint for deleting a file has been added to the native API: https://guides.dataverse.org/en/5.14/api/native-api.html#deleting-files (PR #9383)
A date column has been added to the restricted file access request overview, indicating when the earliest request by that user was made. An issue was fixed where where the request list was not updated when a request was approved or rejected. (PR #9257)
Changes made in v5.13 and v5.14 in multiple PRs to improve the embedded Schema.org metadata in dataset pages will only be propagated to the Schema.Org JSON-LD metadata export if a reExportAll() is done. (PR #9102)
It is now possible to write external vocabulary scripts that target a single child field in a metadata block. Example scripts are now available at https://github.com/gdcc/dataverse-external-vocab-support that can be configured to support lookup from the Research Orgnaization Registry (ROR) for the Author Affiliation Field and for the CrossRef Funding Registry (Fundreg) in the Funding Information/Agency field, both in the standard Citation metadata block. Application if these scripts to other fields, and the development of other scripts targetting child fields are now possible (PR #9402)
Dataverse now supports requiring a secret key to add or edit metadata in specified "system" metadata blocks. Changing the metadata in such system metadata blocks is not allowed without the key and is currently only allowed via API. (PR #9388)
An attempt will be made to extract a geospatial bounding box (west, south, east, north) from NetCDF and HDF5 files and then insert these values into the geospatial metadata block, if enabled. (#9541) See https://guides.dataverse.org/en/5.14/user/dataset-management.html#geospatial-bounding-box
A file previewer called H5Web is now available for exploring and visualizing NetCDF and HDF5 files. (PR #9600) See https://guides.dataverse.org/en/5.14/user/dataset-management.html#h5web-previewer
Two file previewers for GeoTIFF and Shapefiles are now available for visualizing geotiff image files and zipped Shapefiles on a map. See https://github.com/gdcc/dataverse-previewers
New alternative to setup the Dataverse dependencies for the development environment through Docker Compose. (PR #9417)
New alternative, explained in the documentation, to build the Sphinx guides through a Docker container. (PR #9417)
A container has been added called "configbaker" that configures Dataverse while running in containers. This allows developers to spin up Dataverse with a single command. (PR #9574)
Direct upload via the Dataverse UI will now support any algorithm configured via the :FileFixityChecksumAlgorithm setting. External apps using the direct upload API can now query Dataverse to discover which algorithm should be used. Sites that have been using an algorithm other than MD5 and direct upload and/or dvwebloader may want to use the /api/admin/updateHashValues call (see https://guides.dataverse.org/en/5.14/installation/config.html?highlight=updatehashvalues#filefixitychecksumalgorithm) to replace any MD5 hashes on existing files. (PR #9482)
The OAI_ORE metadata export (and hence the archival Bag for a dataset) now includes information about file embargoes. (PR #9698)
DatasetFieldType attribute "displayFormat", is now returned by the API. (PR #9668)
An API named "MyData" has been available for years but is newly documented. It is used to get a list of the objects (datasets, collections or datafiles) that an authenticated user can modify. (PR #9596)
A Go client library for Dataverse APIs is now available. See https://guides.dataverse.org/en/5.14/api/client-libraries.html
A feature flag called "api-session-auth" has been added temporarily to aid in the development of the new frontend (#9063) but will be removed once bearer tokens (#9229) have been implemented. There is a security risk (CSRF) in enabling this flag! Do not use it in production! For more information, see https://guides.dataverse.org/en/5.14/installation/config.html#feature-flags
A feature flag called "api-bearer-auth" has been added. This allows OIDC useraccounts to send authenticated API requests using Bearer Tokens. Note: This feature is limited to OIDC! For more information, see https://guides.dataverse.org/en/5.14/installation/config.html#feature-flags (PR #9591)

Complete List of Changes

For the complete list of code changes in this release, see the 5.14 milestone on GitHub.

- Java
Published by kcondon almost 3 years ago

dataverse - v5.13

Dataverse Software 5.13

This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Schema.org Improvements (Some Backward Incompatibility)

The Schema.org metadata used as an export format and also embedded in dataset pages has been updated to improve compliance with Schema.org's schema and Google's recommendations for Google Dataset Search.

Please be advised that these improvements have the chance to break integrations that rely on the old, less compliant structure. For details see the "backward incompatibility" section below. (Issue #7349)

Folder Uploads via Web UI (dvwebloader, S3 only)

For installations using S3 for storage and with direct upload enabled, a new tool called DVWebloader can be enabled that allows web users to upload a folder with a hierarchy of files and subfolders while retaining the relative paths of files (similarly to how the DVUploader tool does it on the command line, but with the convenience of using the browser UI). See Folder Upload in the User Guide for details. (PR #9096)

Long Descriptions of Collections (Dataverses) are Now Truncated

Like datasets, long descriptions of collections (dataverses) are now truncated by default but can be expanded with a "read full description" button. (PR #9222)

License Sorting

Licenses as shown in the dropdown in UI can be now sorted by the superusers. See Sorting Licenses section of the Installation Guide for details. (PR #8697)

Metadata Field Production Location Now Repeatable, Facetable, and Enabled for Advanced Search

Depositors can now click the plus sign to enter multiple instances of the metadata field "Production Location" in the citation metadata block. Additionally this field now appears on the Advanced Search page and can be added to the list of search facets. (PR #9254)

Support for NetCDF and HDF5 Files

NetCDF and HDF5 files are now detected based on their content rather than just their file extension. Both "classic" NetCDF 3 files and more modern NetCDF 4 files are detected based on content. Detection for older HDF4 files is only done through the file extension ".hdf", as before.

For NetCDF and HDF5 files, an attempt will be made to extract metadata in NcML (XML) format and save it as an auxiliary file. There is a new NcML previewer available in the dataverse-previewers repo.

An extractNcml API endpoint has been added, especially for installations with existing NetCDF and HDF5 files. After upgrading, they can iterate through these files and try to extract an NcML file.

See the NetCDF and HDF5 section of the User Guide for details. (PR #9239)

Support for .eln Files (Electronic Laboratory Notebooks)

The .eln file format is used by Electronic Laboratory Notebooks as an exchange format for experimental protocols, results, sample descriptions, etc...

Improved Security for External Tools

External tools can now be configured to use signed URLs to access the Dataverse API as an alternative to API tokens. This eliminates the need for tools to have access to the user's API token in order to access draft or restricted datasets and datafiles. Signed URLs can be transferred via POST or via a callback when triggering a tool via GET. See Authorization Options in the External Tools documentation for details. (PR #9001)

Geospatial Search (API Only)

Geospatial search is supported via the Search API using two new parameters: geo_point and geo_radius.

The fields that are geospatially indexed are "West Longitude", "East Longitude", "North Latitude", and "South Latitude" from the "Geographic Bounding Box" field in the geospatial metadata block. (PR #8239)

Reproducibility and Code Execution with Binder

Binder has been added to the list of external tools that can be added to a Dataverse installation. From the dataset page, you can launch Binder, which spins up a computational environment in which you can explore the code and data in the dataset, or write new code, such as a Jupyter notebook. (PR #9341)

CodeMeta (Software) Metadata Support (Experimental)

Experimental support for research software metadata deposits has been added.

By adding a metadata block for CodeMeta, we take another step toward adding first class support of diverse FAIR objects, such as research software and computational workflows.

There is more work underway to make Dataverse installations around the world "research software ready."

Note: Like the metadata block for computational workflows before, CodeMeta is listed under Experimental Metadata in the guides. Experimental means it's brand new, opt-in, and might need future tweaking based on experience of usage in the field. We hope for feedback from installations on the new metadata block to optimize and lift it from the experimental stage. (PR #7877)

Mechanism Added for Stopping a Harvest in Progress

It is now possible for a sysadmin to stop a long-running harvesting job. See Harvesting Clients in the Admin Guide for more information. (PR #9187)

API Endpoint Listing Metadata Block Details has been Extended

The API endpoint /api/metadatablocks/{block_id} has been extended to include the following fields:

controlledVocabularyValues - All possible values for fields with a controlled vocabulary. For example, the values "Agricultural Sciences", "Arts and Humanities", etc. for the "Subject" field.
isControlledVocabulary: Whether or not this field has a controlled vocabulary.
multiple: Whether or not the field supports multiple values.

See Metadata Blocks in the API Guide for details. (PR #9213)

Advanced Database Settings

You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be sslmode=require, though installations already setting this parameter in the Postgres connection string will need to move it to dataverse.db.parameters. See the new Database Persistence section of the Installation Guide for details. (PR #8915)

Support for Cleaning up Leftover Files in Dataset Storage

Experimental feature: the leftover files stored in the Dataset storage location that are not in the file list of that Dataset, but are named following the Dataverse technical convention for dataset files, can be removed with the new Cleanup Storage of a Dataset API endpoint.

OAI Server Bug Fixed

A bug introduced in 5.12 was preventing the Dataverse OAI server from serving incremental harvesting requests from clients. It was fixed in this release (PR #9316).

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release not already mentioned above include:

Administrators can configure an alternative storage location where files uploaded via the UI are temporarily stored during the transfer from client to server. (PR #8983, See also Configuration Guide)
To improve performance, Dataverse estimates download counts. This release includes an update that makes the estimate more accurate. (PR #8972)
Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files). (PR #9018)
A persistent identifier, CSRT, is added to the Related Publication field's ID Type child field. For datasets published with CSRT IDs, Dataverse will also include them in the datasets' Schema.org metadata exports. (Issue #8838)
Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections.

New JVM Options and MicroProfile Config Options

The following JVM option is now available:

dataverse.personOrOrg.assumeCommaInPersonName - the default is false

The following MicroProfile Config options are now available (these can be treated as JVM options):

dataverse.files.uploads - alternative storage location of generated temporary files for UI file uploads
dataverse.api.signing-secret - used by signed URLs
dataverse.solr.host
dataverse.solr.port
dataverse.solr.protocol
dataverse.solr.core
dataverse.solr.path
dataverse.rserve.host

The following existing JVM options are now available via MicroProfile Config:

dataverse.siteUrl
dataverse.fqdn
dataverse.files.directory
dataverse.rserve.host
dataverse.rserve.port
dataverse.rserve.user
dataverse.rserve.password
dataverse.rserve.tempdir

Notes for Developers and Integrators

See the "Backward Incompatibilities" section below.

Backward Incompatibilities

Schema.org

The following changes have been made to Schema.org exports (necessary for the improvements mentioned above):

Descriptions are now joined and truncated to less than 5K characters.
The "citation"/"text" key has been replaced by a "citation"/"name" key.
File entries now have the mimetype reported as 'encodingFormat' rather than 'fileFormat' to better conform with the Schema.org specification for DataDownload entries. Download URLs are now sent for all files unless the dataverse.files.hide-schema-dot-org-download-urls setting is set to true.
Author/creators now have an @type of Person or Organization and any affiliation (affiliation for Person, parentOrganization for Organization) is now an object of @type Organization

License Files

License files are now required to contain the new "sortOrder" column. When attempting to create a new license without this field, an error would be returned. See Configuring Licenses section of the Installation Guide for reference.

Complete List of Changes

For the complete list of code changes in this release, see the 5.13 milestone on GitHub.

Installation

If this is a new installation, please see our Installation Guide. Please don't be shy about asking for help if you need it!

After your installation has gone into production, you are welcome to add it to our map of installations by opening an issue in the dataverse-installations repo.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from version 4.x to 5.0 of the Dataverse software following the instructions in the release notes for version 5.0. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.13.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.13.war

5. Restart Payara

service payara stop
service payara start

6. Reload citation metadata block

wget https://github.com/IQSS/dataverse/releases/download/v5.13/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

If you are running an English-only installation, you are finished with the citation block. Otherwise, download the updated citation.properties file and place in the dataverse.lang.directory.

wget https://github.com/IQSS/dataverse/releases/download/v5.13/citation.properties
cp citation.properties /home/dataverse/langBundles

7. Replace Solr schema.xml to allow multiple production locations and support for geospatial indexing to be used. See specific instructions below for those installations without custom metadata blocks (1a) and those with custom metadata blocks (1b).

Note: with this release support for indexing of the experimental workflow metadata block has been removed from the standard schema.xml. If you are using the workflow metadata block be sure to follow the instructions in step 7b) below to maintain support for indexing workflow metadata.

7a. For installations without custom or experimental metadata blocks:

Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide
Replace schema.xml
- cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
Start solr instance (usually service solr start, depending on Solr/OS)

7b. For installations with custom or experimental metadata blocks:

Stop solr instance (usually service solr stop, depending on solr installation/OS, see the Installation Guide
Edit the following line to your schema.xml (to indicate that productionPlace is now multiValued='true"):

<field name="productionPlace" type="string" stored="true" indexed="true" multiValued="true"/>
Add the following lines to your schema.xml to add support for geospatial indexing:

  <field name="geolocation" type="location_rpt" multiValued="true" stored="true" indexed="true"/>  <field name="boundingBox" type="bbox" multiValued="true" stored="true" indexed="true"/>  <fieldType name="bbox" class="solr.BBoxField" geo="true" distanceUnits="kilometers" numberType="pdouble" />
Restart Solr instance (usually service solr start, depending on solr/OS)

Optional Upgrade Step: Reindex Linked Dataverse Collections

Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections. In order to fix the display of collections that have already been linked you must re-index the linked collections. This query will provide a list of commands to re-index the effected collections:

select 'curl http://localhost:8080/api/admin/index/dataverses/' || tmp.dvid from (select distinct dataverse_id as dvid from dataverselinkingdataverse) as tmp

The result of the query will be a list of re-index commands such as:

curl http://localhost:8080/api/admin/index/dataverses/633

where '633' is the id of the linked collection.

Optional Upgrade Step: Run File Detection on .eln Files

Now that .eln files are recognized, you can run the Redetect File Type API on them to switch them from "unknown" to "ELN Archive". Afterward, you can reindex these files to make them appear in search facets.

- Java
Published by kcondon over 3 years ago

dataverse - v5.12.1

Dataverse Software 5.12.1

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Bug Fix for "Internal Server Error" When Creating a New Remote Account

Unfortunately, as of 5.11 new remote users have seen "Internal Server Error" when creating an account (or checking notifications just after creating an account). Remote users are those who log in with institutional (Shibboleth), OAuth (ORCID, GitHub, or Google) or OIDC providers.

This is a transient error that can be worked around by reloading the browser (or logging out and back in again) but it's obviously a very poor user experience and a bad first impression. This bug is the primary reason we are putting out this patch release. Other features and bug fixes are coming along for the ride.

Ability to Disable OAuth Sign Up While Allowing Existing Accounts to Log In

A new option called :AllowRemoteAuthSignUp has been added providing a mechanism for disabling new account signups for specific OAuth2 authentication providers (Orcid, GitHub, Google etc.) while still allowing logins for already-existing accounts using this authentication method.

See the Installation Guide for more information on the setting.

Production Date Now Used for Harvested Datasets in Addition to Distribution Date (`oai_dc` format)

Fix the year displayed in citation for harvested dataset, especially for oai_dc format.

For normal datasets, the date used is the "citation date" which is by default the publication date (the first release date) unless you change it.

However, for a harvested dataset, the distribution date was used instead and this date is not always present in the harvested metadata.

Now, the production date is used for harvested dataset in addition to distribution date when harvesting with the oai_dc format.

Publication Date Now Used for Harvested Dataset if Production Date is Not Set (`oai_dc` format)

For exports and harvesting in oai_dc format, if "Production Date" is not set, "Publication Date" is now used instead. This change is reflected in the Dataverse 4+ Metadata Crosswalk linked from the Appendix of the User Guide.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

Users creating an account by logging in with Shibboleth, OAuth, or OIDC should not see errors. (Issue 9029, PR #9030)
When harvesting datasets, I want the Production Date if I can't get the Distribution Date (PR #8732)
When harvesting datasets, I want the Publication Date if I can't get the Production Date (PR #8733)
As a sysadmin I'd like to disable (temporarily or permanently) sign ups from OAuth providers while allowing existing users to continue to log in from that provider (PR #9112)
As a C/C++ developer I want to use Dataverse APIs (PR #9070)

New DB Settings

The following DB settings have been added:

:AllowRemoteAuthSignUp

See the Database Settings section of the Guides for more information.

Complete List of Changes

For the complete list of code changes in this release, see the 5.12.1 Milestone in GitHub.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.12.1.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

shell export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version

shell $PAYARA/bin/asadmin list-applications $PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara

shell service payara stop rm -rf $PAYARA/glassfish/domains/domain1/generated

6. Start Payara

shell service payara start

7. Deploy this version.

shell $PAYARA/bin/asadmin deploy dataverse-5.12.1.war

8. Restart payara

shell service payara stop service payara start

Upcoming Versions of Payara

With the recent release of Payara 6 (Payara 6.2022.1 being the first version), the days of free-to-use Payara 5.x Platform Community versions are numbered. Specifically, Payara's blog post says, "Payara Platform Community 5.2022.4 has been released today as the penultimate Payara 5 Community release."

Given the end of free-to-use Payara 5 versions, we plan to get the Dataverse software working on Payara 6 (#8305), which will require substantial efforts from the IQSS team and community members, as this also means shifting our app to be a Jakarta EE 10 application (upgrading from EE 8). We are currently working out the details and will share news as soon as we can. Rest assured we will do our best to provide you with a smooth transition. You can follow along in Issue #8305 and related pull requests and you are, of course, very welcome to participate by testing and otherwise contributing, as always.

- Java
Published by pdurbin over 3 years ago

dataverse - v5.12

Dataverse Software 5.12

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Support for Globus

Globus can be used to transfer large files. Part of "Harvard Data Commons Additions" below.

Support for Remote File Storage

Dataset files can be stored at remote URLs. Part of "Harvard Data Commons Additions" below.

New Computational Workflow Metadata Block

The new Computational Workflow metadata block will allow depositors to effectively tag datasets as computational workflows.

To add the new metadata block, follow the instructions in the Admin Guide: https://guides.dataverse.org/en/5.12/admin/metadatacustomization.html

The location of the new metadata block tsv file is scripts/api/data/metadatablocks/computational_workflow.tsv. Part of "Harvard Data Commons Additions" below.

Support for Linked Data Notifications (LDN)

Linked Data Notifications (LDN) is a standard from the W3C. Part of "Harvard Data Commons Additions" below.

Harvard Data Commons Additions

As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons project has supported a wide range of additions to the Dataverse software that improve support for Big Data, Workflows, Archiving, and interaction with other repositories. In many cases, these additions build upon features developed within the Dataverse community by Borealis, DANS, QDR, TDL, and others. Highlights from this work include:

Initial support for Globus file transfer to upload to and download from a Dataverse managed S3 store. The current implementation disables file restriction and embargo on Globus-enabled stores.
Initial support for Remote File Storage. This capability, enabled via a new RemoteOverlay store type, allows a file stored in a remote system to be added to a dataset (currently only via API) with download requests redirected to the remote system. Use cases include referencing public files hosted on external web servers as well as support for controlled access managed by Dataverse (e.g. via restricted and embargoed status) and/or by the remote store.
Initial support for computational workflows, including a new metadata block and detected filetypes.
Support for archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack).
Improved error handling and performance in archival bag creation and new options such as only supporting archiving of one dataset version.
Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referencing the name/mimetype/size/checksum/download URL of the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version, and use of the URL form of PIDs.
Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status.
Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually.
Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-directional linking between a published paper and a Dataverse dataset.
A new capability to provide custom per field instructions in dataset templates
The following file extensions are now detected:
- wdl=text/x-workflow-description-language
- cwl=text/x-computational-workflow-language
- nf=text/x-nextflow
- Rmd=text/x-r-notebook
- rb=text/x-ruby-script
- dag=text/x-dagman

Improvements to Fields that Appear in the Citation Metadata Block

Grammar, style and consistency improvements have been made to the titles, tooltip description text, and watermarks of metadata fields that appear in the Citation metadata block.

This includes fields that dataset depositors can edit in the Citation Metadata accordion (i.e. fields controlled by the citation.tsv and citation.properties files) and fields whose values are system-generated, such as the Dataset Persistent ID, Previous Dataset Persistent ID, and Publication Date fields whose titles and tooltips are configured in the bundles.properties file.

The changes should provide clearer information to curators, depositors, and people looking for data about what the fields are for.

A new page in the Style Guides called "Text" has also been added. The new page includes a section called "Metadata Text Guidelines" with a link to a Google Doc where the guidelines are being maintained for now since we expect them to be revised frequently.

New Static Search Facet: Metadata Types

A new static search facet has been added to the search side panel. This new facet is called "Metadata Types" and is driven from metadata blocks. When a metadata field value is inserted into a dataset, an entry for the metadata block it belongs to is added to this new facet.

This new facet needs to be configured for it to appear on the search side panel. The configuration assigns to a dataverse what metadata blocks to show. The configuration is inherited by child dataverses.

To configure the new facet, use the Metadata Block Facet API: https://guides.dataverse.org/en/5.12/api/native-api.html#set-metadata-block-facet-for-a-dataverse-collection

Broader MicroProfile Config Support for Developers

As of this release, many JVM options can be set using any MicroProfile Config Source.

Currently this change is only relevant to developers but as settings are migrated to the new "lookup" pattern documented in the Consuming Configuration section of the Developer Guide, anyone installing the Dataverse software will have much greater flexibility when configuring those settings, especially within containers. These changes will be announced in future releases.

Please note that an upgrade to Payara 5.2021.8 or higher is required to make use of this. Payara 5.2021.5 threw exceptions, as explained in PR #8823.

HTTP Range Requests: New HTTP Status Codes and Headers for Datafile Access API

The Basic File Access resource for datafiles (/api/access/datafile/$id) was slightly modified in order to comply better with the HTTP specification for range requests.

If the request contains a "Range" header: * The returned HTTP status is now 206 (Partial Content) instead of 200 * A "Content-Range" header is returned containing information about the returned bytes * An "Accept-Ranges" header with value "bytes" is returned

CORS rules/headers were modified accordingly: * The "Range" header is added to "Access-Control-Allow-Headers" * The "Content-Range" and "Accept-Ranges" header are added to "Access-Control-Expose-Headers"

This new functionality has enabled a Zip Previewer and file extractor for zip files, an external tool.

File Type Detection When File Has No Extension

File types are now detected based on the filename when the file has no extension.

The following filenames are now detected:

Makefile=text/x-makefile
Snakemake=text/x-snakemake
Dockerfile=application/x-docker-file
Vagrantfile=application/x-vagrant-file

These are defined in MimeTypeDetectionByFileName.properties.

Upgrade to Payara 5.2022.3 Highly Recommended

With lots of bug and security fixes included, we encourage everyone to upgrade to Payara 5.2022.3 as soon as possible. See below for details.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

Administrators can configure an S3 store used in Dataverse to support users uploading/downloading files via Globus File Transfer. (PR #8891)
Administrators can configure a RemoteOverlay store to allow files that remain hosted by a remote system to be added to a dataset. (PR #7325)
Administrators can configure the Dataverse software to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751)
Users can see information about a dataset's parent collection(s) in the OAI-ORE metadata export. (PR #8770)
Users and administrators can now use the OAI-ORE metadata export to retrieve and assess the fixity of the original file (for ingested tabular files) via the included checksum. (PR #8901)
Archiving via RDA-conformant Bags is more robust and is more configurable. (PR #8773, #8747, #8699, #8609, #8606, #8610)
Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table. (PR #8748, #8696)
Administrators can configure messaging between their Dataverse installation and other repositories that may hold related resources or services interested in activity within that installation. (PR #8775)
Collection managers can create templates that include custom instructions on how to fill out specific metadata fields.
Dataset update API users are given more information when the dataset they are updating is out of compliance with Terms of Access requirements (Issue #8859)
Adds a new setting (:ControlledVocabularyCustomJavaScript) that allows a JavaScript file to be loaded into the dataset page for the purpose of showing controlled vocabulary as a list (Issue #8722)
Fixes an issue with the Redetect File Type API (Issue #7527)
Terms of Use is now imported when using DDI format through harvesting or the native API. (Issue #8715, PR #8743)
Optimizes some code to improve application memory usage (Issue #8871)
Fixes sample data to reflect custom licenses.
Fixes the Archival Status Input API (available to superusers) (Issue #8924)
Small bugs have been fixed in the dataset export in the JSON and DDI formats; eliminating the export of "undefined" as a metadata language in the former, and a duplicate keyword tag in the latter. (Issue #8868)

New DB Settings

The following DB settings have been added: - :ShibAffiliationOrder - Select the first or last entry in an Affiliation array - :ShibAffiliationSeparator (default: ";") - Set the separator for the Affiliation array - :LDNMessageHosts - :GlobusBasicToken - :GlobusEndpoint - :GlobusStores - :GlobusAppUrl - :GlobusPollingInterval - :GlobusSingleFileTransfer - :S3ArchiverConfig - :S3ArchiverProfile - :DRSArchiverConfig - :ControlledVocabularyCustomJavaScript

See the Database Settings section of the Guides for more information.

Notes for Dataverse Installation Administrators

Enabling Experimental Capabilities

Several of the capabilities introduced in v5.12 are "experimental" in the sense that further changes and enhancements to these capabilities should be expected and that these changes may involve additional work, for those who use the initial implementations, when upgrading to newer versions of the Dataverse software. Administrators wishing to use them are encouraged to stay in touch, e.g. via the Dataverse Community Slack space, to understand the limits of current capabilities and to plan for future upgrades.

Notes for Developers and Integrators

See the "Backward Incompatibilities" section below.

Backward Incompatibilities

OAI-ORE and Archiving Changes

The Admin API call to manually submit a dataset version for archiving has changed to require POST instead of GET and to have a name making it clearer that archiving is being done for a given dataset version: /api/admin/submitDatasetVersionToArchive.

Earlier versions of the archival bags included the ingested (tab-separated-value) version of tabular files while providing the checksum of the original file (Issue #8449). This release fixes that by including the original file and its metadata in the archival bag. This means that archival bags created prior to this version do not include a way to validate ingested files. Further, it is likely that capabilities in development (i.e. as part of the Dataverse Uploader to allow re-creation of a dataset version from an archival bag will only be fully compatible with archival bags generated by a Dataverse instance at a release > v5.12. (Specifically, at a minimum, since only the ingested file is included in earlier archival bags, an upload via DVUploader would not result in the same original file/ingested version as in the original dataset.) Administrators should be aware that re-creating archival bags, i.e. via the new batch archiving API, may be advisable now and will be recommended at some point in the future (i.e. there will be a point where we will start versioning archival bags and will start maintaining backward compatibility for older versions as part of transitioning this from being an experimental capability).

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

Instructions for Upgrading to Payara 5.2022.3

Note: with the approaching EOL for the Payara 5 Community release train it's likely we will switch to a yet-to-be-released Payara 6 in the not-so-far-away future.

We recommend you ensure you followed all update instructions from the past releases regarding Payara. (latest Payara update was for v5.6)

Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.

The steps below are a simple matter of reusing your existing domain directory with the new distribution. But we also recommend that you review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara Release Notes

Please note that the deletion of the lib/databases directory below is only required once, for this upgrade (see Issue #8230 for details).

shell export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version

shell $PAYARA/bin/asadmin list-applications $PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara

shell service payara stop rm -rf $PAYARA/glassfish/domains/domain1/generated rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache rm -rf $PAYARA/glassfish/domains/domain1/lib/databases

3. Move the current Payara directory out of the way

shell mv $PAYARA $PAYARA.MOVED

4. Download the new Payara version (5.2022.3), and unzip it in its place

5. Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1

6. Start Payara

shell service payara start

7. Deploy this version.

shell $PAYARA/bin/asadmin deploy dataverse-5.12.war

8. Restart payara

shell service payara stop service payara start

Additional Upgrade Steps

Update the Citation metadata block:

wget https://github.com/IQSS/dataverse/releases/download/v5.12/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
Run ReExportAll to update metadata files (OAI_ORE, JSON and DDI formats are affected by the changes and bug fixes in this release; PRs #8770 and #8868). Optionally, for those using the Dataverse software's BagIt-based archiving, re-archive dataset versions archived using prior versions of the Dataverse software. This will be recommended/required in a future release.

- Java
Published by pdurbin over 3 years ago

dataverse - v5.11.1

Dataverse Software 5.11.1

This is a bug fix release of the Dataverse Software. The .war file for v5.11 will no longer be made available and installations should upgrade directly from v5.10.1 to v5.11.1. To do so you will need to follow the instructions for installing release 5.11 using the v5.11.1 war file. (Note specifically the upgrade steps 6-9 from the 5.11 release note; most importantly, the ones related to the citation block and the Solr schema). If you had previously installed v5.11 (no longer available), follow the simplified instructions below.

Release Highlights

Dataverse Software 5.11 contains two critical issues that are fixed in this release.

First, if you delete a file from a published version of a dataset, the file will be deleted from the file system (or S3) and lose its "owner id" in the database. For details, see Issue #8867.

Second, if you are a superuser, it's possible to click "Delete Draft" and delete a published dataset if it has restricted files. For details, see #8845 and #8742.

Notes for Dataverse Installation Administrators

Identifying Datasets with Deleted Files

If you have been running 5.11, check if any files show "null" for the owner id. The "owner" of a file is the parent dataset:

select * from dvobject where dtype = 'DataFile' and owner_id is null;

For any of these files, change the owner id to the database id of the parent dataset. In addition, the file on disk (or in S3) is likely gone. Look at the "storageidentifier" field from the query above to determine the location of the file then restore the file from backup.

Identifying Datasets Superusers May Have Accidentally Destroyed

Check the "actionlogrecord" table for DestroyDatasetCommand. While these "destroy" entries are normal when a superuser uses the API to destroy datasets, an entry is also created if a superuser has accidentally deleted a published dataset in the web interface with the "Delete Draft" button.

Complete List of Changes

For the complete list of code changes in this release, see the 5.11.1 Milestone in GitHub.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.11.1. To upgrade from 5.10.1, follow the instructions for installing release 5.11 using the v5.11.1 war file. If you had previously installed v5.11 (no longer available), follow the simplified instructions below.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.11.1.war

5. Restart Payara

service payara stop
service payara start

- Java
Published by kcondon almost 4 years ago

dataverse - v5.11

Dataverse Software 5.11

Please note: We have removed the 5.11 war file and dvinstall.zip because there are very serious bugs in the 5.11 release. For the upgrade instructions below, please use the 5.11.1 war file instead. New installations should start with 5.11.1. The bugs are explained in the 5.11.1 release notes.

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Terms of Access or Request Access Required for Restricted Files

Beginning in this release, datasets with restricted files must have either Terms of Access or Request Access enabled. This change is to ensure that for each file in a Dataverse installation there is a clear path to get to the data, either through requesting access to the data or to provide context about why requesting access is not enabled.

Published datasets are not affected by this change. Datasets that are in draft and that have neither Terms of Access nor Request Access enabled must be updated to select one or the other (or both). Otherwise, datasets cannot be futher edited or published. Dataset authors will be able to tell if their dataset is affected by the presence of the following message at the top of their dataset (when they are logged in):

"Datasets with restricted files are required to have Request Access enabled or Terms of Access to help people access the data. Please edit the dataset to confirm Request Access or provide Terms of Access to be in compliance with the policy."

At this point, authors should click "Edit Dataset" then "Terms" and then check the box for "Request Access" or fill in "Terms of Access for Restricted Files" (or both). Afterwards, authors will be able to further edit metadata and publish.

In the "Notes for Dataverse Installation Administrators" section, we have provided a query to help proactively identify datasets that need to be updated.

Muting Notifications

Users can control which notifications they receive if the system is configured to allow this. See also Issue #7492 and PR #8530.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

Terms of Access or Request Access required for restricted files. (Issue #8191, PR #8308)
Users can control which notifications they receive if the system is configured to allow this. (Issue #7492, PR #8530)
A 500 error was occuring when creating a dataset if a template did not have an associated "termsofuseandaccess". See "Legacy Templates Issue" below for details. (Issue #8599, PR #8789)
Tabular ingest can be skipped via API. (Issue #8525, PR #8532)
The "Verify Email" button has been changed to "Send Verification Email" and rather than sometimes showing a popup now always sends a fresh verification email (and invalidates previous verification emails). (Issue #8227, PR #8579)
For Shibboleth users, the emailconfirmed timestamp is now set on login and the UI should show "Verified". (Issue #5663, PR #8579)
Information about the license selection (or custom terms) is now available in the confirmation popup when contributors click "Submit for Review". Previously, this was only available in the confirmation popup for the "Publish" button, which contributors do not see. (Issue #8561, PR #8691)
For installations configured to support multiple languages, controlled vocabulary fields that do not allow multiple entries (e.g. journalArticleType) are now indexed properly. (Issue #8595, PR #8601, PR #8624)
Two-letter ISO-639-1 codes for languages are now supported, in metadata imports and harvesting. (Issue #8139, PR #8689)
The API endpoint for listing notifications has been enhanced to show the subject, text, and timestamp of notifications. (Issue #8487, PR #8530)
The API Guide has been updated to explain that the Content-type header is now (as of Dataverse 5.6) necessary to create datasets via native API. (Issue #8663, PR #8676)
Admin API endpoints have been added to find and delete dataset templates. (Issue 8600, PR #8706)
The BagIt file handler detects and transforms zip files with a BagIt package format into Dataverse data files, validating checksums along the way. See the BagIt File Handler section of the Installation Guide for details. (Issue #8608, PR #8677)
For BagIt Export, the number of threads used when zipping data files into an archival bag is now configurable using the :BagGeneratorThreads database setting. (Issue #8602, PR #8606)
PostgreSQL 14 can now be used (though we've tested mostly with 13). PostgreSQL 10+ is required. (Issue #8295, PR #8296)
As always, widgets can be embedded in the <iframe> HTML tag, but the HTTP header "Content-Security-Policy" is now being sent on non-widget pages to prevent them from being embedded. (PR #8662)
URIs in the the experimental Semantic API have changed (details below). (Issue #8533, PR #8592)
Installations running Make Data Count can upgrade to Counter Processor-0.1.04. (Issue #8380, PR #8391)
PrimeFaces, the UI framework we use, has been upgraded from 10 to 11. (Issue #8456, PR #8652)

Notes for Dataverse Installation Administrators

Identifying Datasets Requiring Terms of Access or Request Access Changes

In support of the change to require either Terms of Access or Request Access for all restricted files (see above for details), we have provided a query to identify datasets in your installation where at least one restricted file has neither Terms of Access nor Request Access enabled:

https://github.com/IQSS/dataverse/blob/v5.11/scripts/issues/8191/datasetswithouttoaorrequest_access

This will allow you to reach out to those dataset owners as appropriate.

Legacy Templates Issue

When custom license functionality was added, dataverses that had older legacy templates as their default template would not allow the creation of a new dataset (500 error).

This occurred because those legacy templates did not have an associated termsofuseandaccess linked to them.

In this release, we run a script that creates a default empty termsofuseandaccess for each of these templates and links them.

Note the termsofuseandaccess that are created this way default to using the license with id=1 (cc0) and the fileaccessrequest to false.

PostgreSQL Version 10+ Required

This release upgrades the bundled PostgreSQL JDBC driver to support major version 14.

Note that the newer PostgreSQL driver required a Flyway version bump, which entails positive and negative consequences:

The newer version of Flyway supports PostgreSQL 14 and includes a number of security fixes.
As of version 8.0 the Flyway Community Edition dropped support for PostgreSQL 9.6 and older.

This means that as foreshadowed in the 5.10 and 5.10.1 release notes, version 10 or higher of PostgreSQL is now required. For suggested upgrade steps, please see "PostgreSQL Update" in the release notes for 5.10: https://github.com/IQSS/dataverse/releases/tag/v5.10

Counter Processor 0.1.04 Support

This release includes support for counter-processor-0.1.04 for processing Make Data Count metrics. If you are running Make Data Counts support, you should reinstall/reconfigure counter-processor as described in the latest Guides. (For existing installations, note that counter-processor-0.1.04 requires a newer version of Python so you will need to follow the full counter-processor install. Also note that if you configure the new version the same way, it will reprocess the days in the current month when it is first run. This is normal and will not affect the metrics in Dataverse.)

New JVM Options and DB Settings

The following DB settings have been added:

:ShowMuteOptions
:AlwaysMuted
:NeverMuted
:CreateDataFilesMaxErrorsToDisplay
:BagItHandlerEnabled
:BagValidatorJobPoolSize
:BagValidatorMaxErrors
:BagValidatorJobWaitInterval
:BagGeneratorThreads

See the Database Settings section of the Guides for more information.

Notes for Developers and Integrators

See the "Backward Incompatibilities" section below.

Backward Incompatibilities

Semantic API Changes

This release includes an update to the experimental semantic API and the underlying assignment of URIs to metadata block terms that are not explicitly mapped to terms in community vocabularies. The change affects the output of the OAIORE metadata export, the OAIORE file in archival bags, and the input/output allowed for those terms in the semantic API.

For those updating integrating code or existing files intended for input into this release of Dataverse, URIs of the form...

https://dataverse.org/schema/<block name>/<parentField name>#<childField title>

and

https://dataverse.org/schema/<block name>/<Field title>

...are both replaced with URIs of the form...

https://dataverse.org/schema/<block name>/<Field name>.

Create Dataset API Requires Content-type Header (Since 5.6)

Due to a code change introduced in Dataverse 5.6, calls to the native API without the Content-type header will fail to create a dataset. The API Guide has been updated to indicate the necessity of this header: https://guides.dataverse.org/en/5.11/api/native-api.html#create-a-dataset-in-a-dataverse-collection

Complete List of Changes

For the complete list of code changes in this release, see the 5.11 Milestone in GitHub.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.11.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.11.war

5. Restart Payara

service payara stop
service payara start

6. Reload citation metadata block

wget https://github.com/IQSS/dataverse/releases/download/v5.11/citation.tsv curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

7. Update Solr schema.xml

Note that if you have custom metadata blocks you can skip this step and proceed to the next one.

Edit schema.xml and for journalArticleType change multiValued from "false" to "true" and then restart Solr. Alternatively, download and use the version from https://github.com/IQSS/dataverse/releases/download/v5.11/schema.xml . By default the file can be found at /usr/local/solr/solr-8.11.1/server/solr/collection1/conf/schema.xml.

7b. For installations with custom metadata blocks

Use the script provided in the release to add the custom fields to the base schema.xml installed in the previous step.

wget https://github.com/IQSS/dataverse/releases/download/v5.11/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.11.1/server/solr/collection1/conf/schema.xml

(Note that the curl command above calls the admin API on localhost to obtain the list of the custom fields. In the unlikely case that you are running the main Dataverse Application and Solr on different servers, generate the schema.xml on the application node, then copy it onto the Solr server.)

In either case, reload solr schema: https://guides.dataverse.org/en/5.11/admin/metadatacustomization.html#updating-the-solr-schema curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1"

8. Re-export metadata files (only OAI_ORE is affected)

People archiving Bags should re-archive. Follow the directions in the Admin Guide

9. (Optional) Delete duplicate templates in database

Prior to this release making a copy of a dataset template was creating two copies, only one of which is visible in the dataverse collection and usable. The other was not being assigned a collection was invisible to the user (#8600).

If you would like to remove these orphan templates you may run the following script:

https://github.com/IQSS/dataverse/blob/v5.11/scripts/issues/8600/deleteorphantemplates_8600.sh

Also, admin APIs for finding and deleting templates have been added: https://guides.dataverse.org/en/5.11/api/native-api.html#list-dataset-templates

- Java
Published by kcondon about 4 years ago

dataverse - v5.10.1

Dataverse Software 5.10.1

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Bug Fix for Request Access

Dataverse Software 5.10 contains a bug where the "Request Access" button doesn't work from the file listing on the dataset page if the dataset contains custom terms. This has been fixed in PR #8555.

Bug Fix for Searching and Selecting Controlled Vocabulary Values

Dataverse Software 5.10 contains a bug where the search option is no longer present when selecting from more than ten controlled vocabulary values. This has been fixed in PR #8521.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

Users can use the "Request Access" button when the dataset has custom terms. (Issue #8553, PR #8555)
Users can search when selecting from more than ten controlled vocabulary values. (Issue #8519, PR #8521)
The default file categories ("Documentation", "Data", and "Code") can be redefined through the :FileCategories database setting. (Issue #8461, PR #8478)
Documentation on troubleshooting Excel ingest errors was improved. (PR #8541)
Internationalized controlled vocabulary values can now be searched. (Issue #8286, PR #8435)
Curation labels can be internationalized. (Issue #8381, PR #8466)
"NONE" is no longer accepted as a license using the SWORD API (since 5.10). See "Backward Incompatibilities" below for details. (Issue #8551, PR #8558).

Notes for Dataverse Installation Administrators

PostgreSQL Version 10+ Required Soon

Because 5.10.1 is a bug fix release, an upgrade to PostgreSQL is not required. However, this upgrade is still coming in the next non-bug fix release. For details, please see the release notes for 5.10: https://github.com/IQSS/dataverse/releases/tag/v5.10

Payara Upgrade

You may notice that the Payara version used in the install scripts has been updated from 5.2021.5 to 5.2021.6. This was to address a bug where it was not possible to easily update the logging level. For existing installations, this release does not require upgrading Payara and a Payara upgrade is not part of the Upgrade Instructions below. For more information, see PR #8508.

New JVM Options and DB Settings

The following DB settings have been added:

:FileCategories - The default list of the pre-defined file categories ("Documentation", "Data" and "Code") can now be redefined with a comma-separated list (e.g. 'Docs,Data,Code,Workflow').

See the Database Settings section of the Guides for more information.

Notes for Developers and Integrators

In the "Backward Incompatibilities" section below, note changes in the API regarding licenses and the SWORD API.

Backward Incompatibilities

As of Dataverse 5.10, "NONE" is no longer supported as a valid license when creating a dataset using the SWORD API. The API Guide has been updated to reflect this. Additionally, if you specify an invalid license, a list of available licenses will be returned in the response.

Complete List of Changes

For the complete list of code changes in this release, see the 5.10.1 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10.1.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.10.1.war

5. Restart payara

service payara stop
service payara start

- Java
Published by kcondon about 4 years ago

dataverse - v5.10

Dataverse Software 5.10

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Multiple License Support

Users can now select from a set of configured licenses in addition to or instead of the previous Creative Commons CC0 choice or provide custom terms of use (if configured) for their datasets. Administrators can configure their Dataverse instance via API to allow any desired license as a choice and can enable or disable the option to allow custom terms. Administrators can also mark licenses as "inactive" to disallow future use while keeping that license for existing datasets. For upgrades, only the CC0 license will be preinstalled. New installations will have both CC0 and CC BY preinstalled. The Configuring Licenses section of the Installation Guide shows how to add or remove licenses.

Note: Datasets in existing installations will automatically be updated to conform to new requirements that custom terms cannot be used with a standard license and that custom terms cannot be empty. Administrators may wish to manually update datasets with these conditions if they do not like the automated migration choices. See the "Notes for Dataverse Installation Administrators" section below for details.

This release also makes the license selection and/or custom terms more prominent when publishing and viewing a dataset and when downloading files.

Ingest and File Upload Messaging Improvements

Messaging around ingest failure has been softened to prevent support tickets. In addition, messaging during file upload has been improved, especially with regard to showing size limits and providing links to the guides about tabular ingest. For screenshots and additional details see PR #8271.

Downloading of Guestbook Responses with Fewer Clicks

A download button has been added to the page that lists guestbooks. This saves a click but you can still download responses from the "View Responses" page, as before.

Also, links to the guides about guestbooks have been added in additional places.

Dynamically Request Arbitrary Metadata Fields from Search API

The Search API now allows arbitrary metadata fields to be requested when displaying results from datasets. You can request all fields from metadata blocks or pick and choose certain fields.

The new parameter is called metadata_fields and the Search API documentation contains details and examples: https://guides.dataverse.org/en/5.10/api/search.html

Solr 8 Upgrade

The Dataverse Software now runs on Solr 8.11.1, the latest available stable release in the Solr 8.x series.

PostgreSQL Upgrade

A PostgreSQL upgrade is not required for this release but is planned for the next release. See below for details.

Major Use Cases and Infrastructure Enhancements

Changes and fixes in this release include:

When creating or updating datasets, users can select from a set of licenses configured by the administrator (CC, CC BY, custom licenses, etc.) or provide custom terms (if the installation is configured to allow them). (Issue #7440, PR #7920)
Users can get better feedback on tabular ingest errors and more information about size limits when uploading files. (Issue #8205, PR #8271)
Users can more easily download guestbook responses and learn how guestbooks work. (Issue #8244, PR #8402)
Search API users can specify additional metadata fields to be returned in the search results. (Issue #7863, PR #7942)
The "Preview" tab on the file page can now show restricted files. (Issue #8258, PR #8265)
Users wanting to upload files from GitHub to Dataverse can learn about a new GitHub Action called "Dataverse Uploader". (PR #8416)
Users requesting access to files now get feedback that it was successful. (Issue #7469, PR #8341)
Users may notice various accessibility improvements. (Issue #8321, PR #8322)
Users of the Social Science metadata block can now add multiples of the "Collection Mode" field. (Issue #8452, PR #8473)
Guestbooks now support multi-line text area fields. (Issue #8288, PR #8291)
Guestbooks can better handle commas in responses. (Issue #8193, PR #8343)
Dataset editors can now deselect a guestbook. (Issue #2257, PR #8403)
Administrators with a large actionlogrecord table can read docs on archiving and then trimming it. (Issue #5916, PR #8292)
Administrators can list locks across all datasets. (PR #8445)
Administrators can run a version of Solr that doesn't include a version of log4j2 with serious known vulnerabilities. We trust that you have patched the version of Solr you are running now following the instructions that were sent out. An upgrade to the latest version is recommended for extra peace of mind. (PR #8415)
Administrators can run a version of Dataverse that doesn't include a version of log4j with known vulnerabilities. (PR #8377)

Notes for Dataverse Installation Administrators

Updating for Multiple License Support

Adding and Removing Licenses and How Existing Datasets Will Be Automatically Updated

As part of installing or upgrading an existing installation, administrators may wish to add additional license choices and/or configure Dataverse to allow custom terms. Adding additional licenses is managed via API, as explained in the Configuring Licenses section of the Installation Guide. Licenses are described via a JSON structure providing a name, URL, short description, and optional icon URL. Additionally licenses may be marked as active (selectable for new or updated datasets) or inactive (only allowed on existing datasets) and one license can be marked as the default. Custom Terms are allowed by default (backward compatible with the current option to select "No" to using CC0) and can be disabled by setting :AllowCustomTermsOfUse to false.

Further, administrators should review the following automated migration of existing licenses and terms into the new license framework and, if desired, should manually find and update any datasets for which the automated update is problematic. To understand the migration process, it is useful to understand how the multiple license feature works in this release:

"Custom Terms", aka a custom license, are defined through entries in the following fields of the dataset "Terms" tab:

Terms of Use
Confidentiality Declaration
Special Permissions
Restrictions
Citation Requirements
Depositor Requirements
Conditions
Disclaimer

"Custom Terms" require, at a minimum, a non-blank entry in the "Terms of Use" field. Entries in other fields are optional.

Since these fields are intended for terms/conditions that would potentially conflict with or modify the terms in a standard license, they are no longer shown when a standard license is selected.

In earlier Dataverse releases, it was possible to select the CC0 license and have entries in the fields above. It was also possible to say "No" to using CC0 and leave all of these terms fields blank.

The automated process will update existing datasets as follows.

"CC0 Waiver" and no entries in the fields above -> CC0 License (no change)
No CC0 Waiver and an entry in the "Terms of Use" field and possibly others fields listed above -> "Custom Terms" with the same entries in these fields (no change)
CC0 Waiver and an entry in some of the fields listed -> 'Custom Terms' with the following text preprended in the "Terms of Use" field: "This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions:"
No CC0 Waiver and an entry in a field(s) other than the "Terms of Use" field -> "Custom Terms" with the following "Terms of Use" added: "This dataset is made available with limited information on how it can be used. You may wish to communicate with the Contact(s) specified before use."
No CC0 Waiver and no entry in any of the listed fields -> "Custom Terms" with the following "Terms of Use" added: "This dataset is made available without information on how it can be used. You should communicate with the Contact(s) specified before use."

Administrators who have datasets where CC0 has been selected along with additional terms, or datasets where the Terms of Use field is empty, may wish to modify those datasets prior to upgrading to avoid the automated changes above. This is discussed next.

Handling Datasets that No Longer Comply With Licensing Rules

In most Dataverse installations, one would expect the vast majority of datasets to either use the CC0 Waiver or have non-empty Terms of Use. As noted above, these will be migrated without any issue. Administrators may however wish to find and manually update datasets that specified a CC0 license but also had terms (no longer allowed) or had no license and no terms of use (also no longer allowed) rather than accept the default migrations for these datasets listed above.

Finding and Modifying Datasets with a CC0 License and Non-Empty Terms

To find datasets with a CC0 license and non-empty terms:

select CONCAT('doi:', dvo.authority, '/', dvo.identifier), v.alias as dataverse_alias, case when versionstate='RELEASED' then concat(dv.versionnumber, '.', dv.minorversionnumber) else versionstate END as version, dv.id as datasetversion_id, t.id as termsofuseandaccess_id, t.termsofuse, t.confidentialitydeclaration, t.specialpermissions, t.restrictions, t.citationrequirements, t.depositorrequirements, t.conditions, t.disclaimer from dvobject dvo, termsofuseandaccess t, datasetversion dv, dataverse v where dv.dataset_id=dvo.id and dv.termsofuseandaccess_id=t.id and dvo.owner_id=v.id and t.license='CC0' and not (t.termsofuse is null and t.confidentialitydeclaration is null and t.specialpermissions is null and t.restrictions is null and citationrequirements is null and t.depositorrequirements is null and t.conditions is null and t.disclaimer is null);

The datasetdoi column will let you find and view the affected dataset in the Dataverse web interface. The version column will indicate which version(s) are relevant. The dataverse_alias will tell you which Dataverse collection the dataset is in (and may be useful if you want to adjust all datasets in a given collection). The termsofuseandaccess_id column indicates which specific entry in that table is associated with the dataset/version. The remaining columns show the values of any terms fields.

There are two options to migrate such datasets:

Option 1: Set all terms fields to null:

update termsofuseandaccess set termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id=<id>;

or to change several at once:

update termsofuseandaccess set termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id in (<comma separated list of termsanduseofaccess_ids>);

Option 2: Change the Dataset version(s) to not use the CCO waiver and modify the Terms of Use (and/or other fields) as you wish to indicate that the CC0 waiver was previously selected:

update termsofuseandaccess set license='NONE', termsofuse=concat('New text. ', termsofuse) where id=<id>;

or

update termsofuseandaccess set license='NONE', termsofuse=concat('New text. ', termsofuse) where id in (<comma separated list of termsanduseofaccess_ids>);

Finding and Modifying Datasets without a CC0 License and with Empty Terms

To find datasets with a without a CC0 license and with empty terms:

select CONCAT('doi:', dvo.authority, '/', dvo.identifier), v.alias as dataverse_alias, case when versionstate='RELEASED' then concat(dv.versionnumber, '.', dv.minorversionnumber) else versionstate END as version, dv.id as datasetversion_id, t.id as termsofuseandaccess_id, t.termsofuse, t.confidentialitydeclaration, t.specialpermissions, t.restrictions, t.citationrequirements, t.depositorrequirements, t.conditions, t.disclaimer from dvobject dvo, termsofuseandaccess t, datasetversion dv, dataverse v where dv.dataset_id=dvo.id and dv.termsofuseandaccess_id=t.id and dvo.owner_id=v.id and (t.license='NONE' or t.license is null) and t.termsofuse is null;

As before, there are a couple options.

Option 1: These datasets could be updated to use CC0:

update termsofuseandaccess set license='CC0', confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id=<id>;

Option 2: Terms of Use could be added:

update termsofuseandaccess set termsofuse='New text. ' where id=<id>;

In both cases, the same where id in (<comma separated list of termsanduseofaccess_ids>); ending could be used to change multiple datasets/versions at once.

Standardizing Custom Licenses

If many datasets use the same set of Custom Terms, it may make sense to create and register a standard license including those terms. Doing this would include:

Creating and posting an external document that includes the custom terms, i.e. an HTML document with sections corresponding to the terms fields that are used.
Defining a name, short description, URL (where it is posted), and optionally an icon URL for this license
Using the Dataverse API to register the new license as one of the options available in your installation
Using the API to make sure the license is active and deciding whether the license should also be the default
Once the license is registered with Dataverse, making an SQL update to change datasets/versions using that license to reference it instead of having their own copy of those custom terms.

The benefits of this approach are:

usability: the license can be selected for new datasets without allowing custom terms and without users having to cut/paste terms or collection administrators having to configure templates with those terms
efficiency: custom terms are stored per dataset whereas licenses are registered once and all uses of it refer to the same object and external URL
security: with the license terms maintained external to Dataverse, users cannot edit specific terms and curators do not need to check for edits

Once a standardized version of your Custom Terms are registered as a license, an SQL update like the following can be used to have datasets use it:

UPDATE termsofuseandaccess SET license_id = (SELECT license.id FROM license WHERE license.name = '<Your License Name>'), termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null WHERE termsofuseandaccess.termsofuse LIKE '%<Unique phrase in your Terms of Use>%';

Note that this information is also available in the Configuring Licenses section of the Installation Guide. Look for "Standardizing Custom Licenses".

PostgreSQL Version 10+ Required

If you are still using PostgreSQL 9.x, now is the time to upgrade. PostgreSQL 9.x is now EOL (no longer supported, as of January 2022), and in the next version of the Dataverse Software we plan to upgrade the Flyway library (used for database migrations) to a version that will no longer work with versions prior to PostgreSQL 10. See PR #8296 for more on this upcoming Flyway upgrade.

The Dataverse Software has been tested with PostgreSQL versions up to 13. The current stable version 13.5 is recommended. If that's not an option for reasons specific to your installation (for example, if PostgreSQL 13.5 is not available for the OS distribution you are using), any 10+ version should work.

See the upgrade section below for more information.

Providing S3 Storage Credentials via MicroProfile Config

With this release, you may use two new JVM options (dataverse.files.<id>.access-key and dataverse.files.<id>.secret-key) to pass an access key identifier and a secret access key for S3-based storage definitions without creating the files used by the AWS CLI tools (~/.aws/config & ~/.aws/credentials).

This has been added to ease setups using containers (Docker, Podman, Kubernetes, OpenShift) or testing and development installations. Find additional documentation and a word of warning in the Installation Guide.

New JVM Options and DB Settings

The following JVM settings have been added:

dataverse.files.<id>.access-key - S3 access key ID.
dataverse.files.<id>.secret-key - S3 secret access key.

See the JVM Options section of the Installation Guide for more information.

The following DB settings have been added:

:AllowCustomTermsOfUse (default: true) - allow users to provide Custom Terms instead of choosing one of the configured standard licenses.

See the Database Settings section of the Guides for more information.

Notes for Developers and Integrators

In the "Backward Incompatibilities" section below, note changes in the API regarding licenses and the native JSON format.

Backward Incompatibilities

With the change to support multiple licenses, which can include cases where CC0 is not an option, and the decision to prohibit two previously possible cases (no license and no entry in the "Terms of Use" field, a standard license and entries in "Terms of Use", "Special Permissions", and related fields), this release contains changes to the display, API payloads, and export metadata that are not backward compatible. These include:

"CC0 Waiver" has been replaced by "CC0 1.0" (the short name specified by Creative Commons) in the web interface, API payloads, and export formats that include a license name. (Note that installation admins can alter the license name in the database to maintain the original "CC0 Waiver" text, if desired.)
Schema.org metadata in page headers and the Schema.org JSON-LD metadata export now reference the license via URL (which should avoid the current warning from Google about an invalid license object in the page metadata).
Metadata exports and import methods (including SWORD) use either the license name (e.g. in the JSON export) or URL (e.g. in the OAI_ORE export) rather than a hardcoded value of "CC0" or "CC0 Waiver" currently (if the CC0 license is available, its default name would be "CC0 1.0").
API calls (e.g. for import, migrate) that specify both a license and custom terms will be considered an error, as would having no license and an empty/blank value for "Terms of Use".
Rollback. In general, one should not deploy an earlier release over a database that has been modified by deployment of a later release. (Make a db backup before upgrading and use that copy if you go back to a prior version.) Due to the nature of the db changes in this release, attempts to deploy an earlier version of Dataverse will fail unless the database is also restored to its pre-release state.

Also, note that since CC0 Waiver is no longer a hardcoded option, text strings that reference it have been edited or removed from Bundle.properties. This means that the ability to provide translations of the CC0 license name/description has been removed. The initial release of multiple license functionality doesn't include an alternative mechanism to provide translations of license names/descriptions, so this is a regression in capability (see #8346). The instructions and help information about license and terms remains internationalizable, it is only the name/description of the licenses themselves that cannot yet be translated.

An update in the metadata block Social Science changes the field CollectionMode to allow multiple values. This changes the way the field is encoded in the native JSON format. From

"typeName": "collectionMode", "multiple": false, "typeClass": "primitive", "value": "some text"

to

"typeName": "collectionMode", "multiple": true, "typeClass": "primitive", "value": ["some text", "more text"]

Complete List of Changes

For the complete list of code changes in this release, see the 5.10 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.10.war

5. Restart payara

service payara stop
service payara start

6. Update the Social Science metadata block

wget https://github.com/IQSS/dataverse/releases/download/v5.10/social_science.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @social_science.tsv -H "Content-type: text/tab-separated-values"
Note that this update also requires an updated Solr schema. We strongly recommend that you upgrade Solr as part of this release, by installing the latest stable release from scratch (see below). In the process you will configure it with the latest version of the schema as distributed with this Dataverse release, so no further steps will be needed. If you have already upgraded, or have some very good reason to stay on the old version a little longer, please refer to https://guides.dataverse.org/en/5.10/admin/metadatacustomization.html#updating-the-solr-schema for information on updating your Solr schema in place.

7. Run ReExportall to update Exports

Following the directions in the Admin Guide

8. Upgrade Solr

See "Additional Release Steps" below for how to upgrade Solr.

Additional Release Steps

Solr Upgrade

With this release we upgrade to the latest available stable release in the Solr 8.x branch. We recommend a fresh installation of Solr (the index will be empty) followed by an "index all".

Before you start the "index all", the Dataverse installation will appear to be empty because the search results come from Solr. As indexing progresses, partial results will appear until indexing is complete.

See http://guides.dataverse.org/en/5.10/installation/prerequisites.html#installing-solr for more information.

Please note that after you have followed the instruction above you will have Solr installed with the default schema that lists all the fields in the standard Dataverse metadata blocks. If your installation uses any custom metadata blocks, please refer to https://guides.dataverse.org/en/5.10/admin/metadatacustomization.html#updating-the-solr-schema for information on updating your Solr schema to include these extra fields.

PostgreSQL Upgrade

The tested and recommended way of upgrading an existing database is as follows:

Export your current database with pg_dumpall.
Install the new version of PostgreSQL (make sure it's running on the same port, so that no changes are needed in the Payara configuration).
Re-import the database with psql, as the user postgres.

It is strongly recommended to use the versions of the pg_dumpall and psql from the old and new versions of PostgreSQL, respectively. For example, the commands below were used to migrate a database running under PostgreSQL 9.6 to 13.5. Adjust the versions and the path names to match your environment.

Back up/export:

/usr/pgsql-9.6/bin/pg_dumpall -U postgres > /tmp/backup.sql

Restore/import:

/usr/pgsql-13/bin/psql -U postgres -f /tmp/backup.sql

When upgrading the production database here at Harvard IQSS we were able to go from version 9.6 all the way to 13.3 without any issues.

You may want to try these backup and restore steps on a test server to get an accurate estimate of how much downtime to expect with the final production upgrade. That of course will depend on the size of your database.

Consult the PostgreSQL upgrade documentation for more information, for example https://www.postgresql.org/docs/13/upgrading.html#UPGRADING-VIA-PGDUMPALL.

- Java
Published by kcondon over 4 years ago

dataverse - v5.9

Dataverse Software 5.9

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Dataverse Collection Page Optimizations

The Dataverse Collection page, which also serves as the search page and the homepage in most Dataverse installations, has been optimized, with a specific focus on reducing the number of queries for each page load. These optimizations will be more noticable on Dataverse installations with higher traffic.

Support for HTTP "Range" Header for Partial File Downloads

Dataverse now supports the HTTP "Range" header, which allows users to download parts of a file. Here are some examples:

bytes=0-9 gets the first 10 bytes.
bytes=10-19 gets 10 bytes from the middle.
bytes=-10 gets the last 10 bytes.
bytes=9- gets all bytes except the first 10.

Only a single range is supported. For more information, see the Data Access API section of the API Guide.

Support for Optional External Metadata Validation Scripts

The Dataverse software now allows an installation administrator to provide custom scripts for additional metadata validation when datasets are being published and/or when Dataverse collections are being published or modified. The Harvard Dataverse Repository has been using this mechanism to combat content that violates our Terms of Use, specifically spam content. All the validation or verification logic is defined in these external scripts, thus making it possible for an installation to add checks custom-tailored to their needs.

Please note that only the metadata are subject to these validation checks. This does not check the content of any uploaded files.

For more information, see the Database Settings section of the Guide. The new settings are listed below, in the "New JVM Options and DB Settings" section of these release notes.

Displaying Author's Identifier as Link

In the dataset page's metadata tab the author's identifier is now displayed as a clickable link, which points to the profile page in the external service (ORCID, VIAF etc.) in cases where the identifier scheme provides a resolvable landing page. If the identifier does not match the expected scheme, a link is not shown.

Auxiliary File API Enhancements

This release includes updates to the Auxiliary File API. These updates include:

Auxiliary files can now also be associated with non-tabular files
Auxiliary files can now be deleted
Duplicate Auxiliary files can no longer be created
A new API has been added to list Auxiliary files by their origin
Some auxiliary were being saved with the wrong content type (MIME type) but now the user can supply the content type on upload, overriding the type that would otherwise be assigned
Improved error reporting
A bugfix involving checksums for Auxiliary files

Please note that the Auxiliary files feature is experimental and is designed to support integration with tools from the OpenDP Project. If the API endpoints are not needed they can be blocked.

Major Use Cases and Infrastructure Enhancements

Newly-supported major use cases in this release include:

The Dataverse collection page has been optimized, resulting in quicker load times on one of the most common pages in the application (Issue #7804, PR #8143)
Users will now be able to specify a certain byte range in their downloads via API, allowing for downloads of file parts. (Issue #6397, PR #8087)
A Dataverse installation administrator can now set up metadata validation for datasets and Dataverse collections, allowing for publish-time and create-time checks for all content. (Issue #8155, PR #8245)
Users will be provided with clickable links to authors' ORCIDs and other IDs in the dataset metadata (Issue #7978, PR #7979)
Users will now be able to associate Auxiliary files with non-tabular files (Issue #8235, PR #8237)
Users will no longer be able to create duplicate Auxiliary files (Issue #8235, PR #8237)
Users will be able to delete Auxiliary files (Issue #8235, PR #8237)
Users can retrieve a list of Auxiliary files based on their origin (Issue #8235, PR #8237)
Users will be able to supply the content type of Auxiliary files on upload (Issue #8241, PR #8282)
The indexing process has been updated so that datasets with fewer files and indexed first, resulting in fewer failures and making it easier to identify problematically-large datasets. (Issue #8097, PR #8152)
Users will no longer be able to create metadata records with problematic special characters, which would later require Dataverse installation administrator intervention and a database change (Issue #8018, PR #8242)
The Dataverse software will now appropriately recognize files with the .geojson extension as GeoJSON files rather than "unknown" (Issue #8261, PR #8262)
A Dataverse installation administrator can now retrieve more information about role deletion from the ActionLogRecord (Issue #2912, PR #8211)
Users will be able to use a new role to allow a user to respond to file download requests without also giving them the power to manage the dataset (Issue #8109, PR #8174)
Users will no longer be forced to update their passwords when moving from Dataverse 3.x to Dataverse 4.x (PR #7916)
Improved accessibility of buttons on the Dataset and File pages (Issue #8247, PR #8257)

Notes for Dataverse Installation Administrators

Indexing Performance on Datasets with Large Numbers of Files

We discovered that whenever a full reindexing needs to be performed, datasets with large numbers of files take an exceptionally long time to index. For example, in the Harvard Dataverse Repository, it takes several hours for a dataset that has 25,000 files. In situations where the Solr index needs to be erased and rebuilt from scratch (such as a Solr version upgrade, or a corrupt index, etc.) this can significantly delay the repopulation of the search catalog.

We are still investigating the reasons behind this performance issue. For now, even though some improvements have been made, a dataset with thousands of files is still going to take a long time to index. In this release, we've made a simple change to the reindexing process, to index any such datasets at the very end of the batch, after all the datasets with fewer files have been reindexed. This does not improve the overall reindexing time, but will repopulate the bulk of the search index much faster for the users of the installation.

Custom Analytics Code Changes

You should update your custom analytics code to capture a bug fix related to tracking within the dataset files table. This release restores that tracking.

For more information, see the documentation and sample analytics code snippet provided in Installation Guide. This update can be used on any version 5.4+.

New ManageFilePermissions Permission

Dataverse can now support a use case in which a Admin or Curator would like to delegate the ability to grant access to restricted files to other users. This can be implemented by creating a custom role (e.g. DownloadApprover) that has the new ManageFilePermissions permission. This release introduces the new permission, and a Flyway script adjusts the existing Admin and Curator roles so they continue to have the ability to grant file download requrests.

Thumbnail Defaults

New default values have been added for the JVM settings dataverse.dataAccess.thumbnail.image.limit and dataverse.dataAccess.thumbnail.pdf.limit, of 3MB and 1MB respectively. This means that, unless specified otherwise by the JVM settings already in your domain configuration, the application will skip attempting to generate thumbnails for image files and PDFs that are above these size limits. In previous versions, if these limits were not explicitly set, the application would try to create thumbnails for files of unlimited size. Which would occasionally cause problems with very large images.

New JVM Options and DB Settings

The following DB settings allow configuration of the external metadata validator:

:DataverseMetadataValidatorScript
:DataverseMetadataPublishValidationFailureMsg
:DataverseMetadataUpdateValidationFailureMsg
:DatasetMetadataValidatorScript
:DatasetMetadataValidationFailureMsg
:ExternalValidationAdminOverride

See the Database Settings section of the Guides for more information.

Notes for Developers and Integrators

Two sections of the Developer Guide have been updated:

Instructions on how to sync a PR in progress with develop have been added in the version control section
Guidance on avoiding ineffeciencies in JSF render logic has been added to the "Tips" section

Complete List of Changes

For the complete list of code changes in this release, see the 5.9 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.9.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.9.war

5. Restart payara

service payara stop
service payara start

6. Run ReExportall to update JSON Exports

Following the directions in the Admin Guide

Additional Release Steps

(for installations collecting web analytics)

1. Update custom analytics code per the Installation Guide.

(for installations with GeoJSON files)

1. Redetect GeoJSON files to update the type from "Unknown" to GeoJSON, following the directions in the API Guide

2. Kick off full reindex following the directions in the Admin Guide

- Java
Published by kcondon over 4 years ago

dataverse - v5.8

Dataverse Software 5.8

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Support for Data Embargoes

The Dataverse Software now supports file-level embargoes. The ability to set embargoes, up to a maximum duration (in months), can be configured by a Dataverse installation administrator. For more information, see the Embargoes section of the Dataverse Software Guides.

Users can configure a specific embargo, defined by an end date and a short reason, on a set of selected files or an individual file, by selecting the 'Embargo' menu item and entering information in a popup dialog. Embargoes can only be set, changed, or removed before a file has been published. After publication, only Dataverse installation administrators can make changes, using an API.
While embargoed, files cannot be previewed or downloaded (as if restricted, with no option to allow access requests). After the embargo expires, files become accessible. If the files were also restricted, they remain inaccessible and functionality is the same as for any restricted file.
By default, the citation date reported for the dataset and the datafiles in version 1.0 reflects the longest embargo period on any file in version 1.0, which is consistent with recommended practice from DataCite. Administrators can still specify an alternate date field to be used in the citation date via the Set Citation Date Field Type for a Dataset API Call.

The work to add this functionality was initiated by Data Archiving and Networked Services (DANS-KNAW), the Netherlands. It was further developed by the Global Dataverse Community Consortium (GDCC) in cooperation with and with funding from DANS.

Major Use Cases and Infrastructure Enhancements

Newly-supported major use cases in this release include:

Users can set file-level embargoes. (Issue #7743, #4052, #343, PR #8020)
Improved accessibility of form labels on the advanced search page (Issue #8169, PR #8170)

Notes for Dataverse Installation Administrators

Mitigate Solr Schema Management Problems

With Release 5.5, the <copyField> definitions had been reincluded into schema.xml to fix searching for datasets.

This release includes a final update to schema.xml and a new script update-fields.sh to manage your custom metadata fields, and to provide opportunities for other future improvements. The broken script updateSchemaMDB.sh has been removed.

You will need to replace your schema.xml with the one provided in order to make sure that the new script can function. If you do not use any custom metadata blocks in your installation, this is the only change to be made. If you do use custom metadata blocks you will need to take a few extra steps, enumerated in the step-by-step instructions below.

New JVM Options and DB Settings

:MaxEmbargoDurationInMonths controls whether embargoes are allowed in a Dataverse instance and can limit the maximum duration users are allowed to specify. A value of 0 months or non-existent setting indicates embargoes are not supported. A value of -1 allows embargoes of any length.

Complete List of Changes

For the complete list of code changes in this release, see the 5.8 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.8.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.8.war

5. Restart payara

service payara stop
service payara start

6. Update Solr schema.xml.

/usr/local/solr/solr-8.8.1/server/solr/collection1/conf is used in the examples below as the location of your Solr schema. Please adapt it to the correct location, if different in your installation. Use find / -name schema.xml if in doubt.

6a. Replace schema.xml with the base version included in this release.

wget https://github.com/IQSS/dataverse/releases/download/v5.8/schema.xml cp schema.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf

For installations that are not using any Custom Metadata Blocks, you can skip the next step.

6b. For installations with Custom Metadata Blocks

Use the script provided in the release to add the custom fields to the base schema.xml installed in the previous step.

wget https://github.com/IQSS/dataverse/releases/download/v5.8/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.8.1/server/solr/collection1/conf/schema.xml

(Note that the curl command above calls the admin api on localhost to obtain the list of the custom fields. In the unlikely case that you are running the main Dataverse Application and Solr on different servers, generate the schema.xml on the application node, then copy it onto the Solr server)

7. Restart Solr

Usually service solr stop; service solr start, but may be different on your system. See the Installation Guide for more details.

- Java
Published by kcondon over 4 years ago

dataverse - v5.7

Dataverse Software 5.7

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Experimental Support for External Vocabulary Services

Dataverse can now be configured to associate specific metadata fields with third-party vocabulary services to provide an easy way for users to select values from those vocabularies. The mapping involves use of external Javascripts. Two such scripts have been developed so far: one for vocabularies served via the SKOSMOS protocol and one allowing people to be identified via their ORCID. The guides contain info about the new :CVocConf setting used for configuration and additional information about this functionality. Scripts, examples, and additional documentation are available at the GDCC GitHub Repository.

Please watch the online presentation, read the document with requirements and join the Dataverse Working Group on Ontologies and Controlled Vocabularies if you have some questions and want to contribute.

This functionality was initially developed by Data Archiving and Networked Services (DANS-KNAW), the Netherlands, and funded by SSHOC, "Social Sciences and Humanities Open Cloud". SSHOC has received funding from the European Union’s Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782. It was further improved by the Global Dataverse Community Consortium (GDCC) and extended with the support of semantic search.

Curation Status Labels

A new :AllowedCurationLabels setting allows a sysadmins to define one or more sets of labels that can be applied to a draft Dataset version via the user interface or API to indicate the status of the dataset with respect to a defined curation process.

Labels are completely customizable (alphanumeric or spaces, up to 32 characters, e.g. "Author contacted", "Privacy Review", "Awaiting paper publication"). Superusers can select a specific set of labels, or disable this functionality per collection. Anyone who can publish a draft dataset (e.g. curators) can set/change/remove labels (from the set specified for the collection containing the dataset) via the user interface or via an API. The API also would allow external tools to search for, read and set labels on Datasets, providing an integration mechanism. Labels are visible on the Dataset page and in Dataverse collection listings/search results. Internally, the labels have no effect, and at publication, any existing label will be removed. A reporting API call allows admins to get a list of datasets and their curation statuses.

The Solr schema must be updated as part of installing the release of Dataverse containing this feature for it to work.

Major Use Cases

Newly-supported major use cases in this release include:

Administrators will be able to set up integrations with external vocabulary services, allowing for autocomplete-assisted metadata entry, metadata standardization, and better integration with other systems (Issue #7711, PR #7946)
Users viewing datasets in the root Dataverse collection will now see breadcrumbs that have have a link back to the root Dataverse collection (Issue #7527, PR #8078)
Users will be able to more easily differentiate between datasets and files through new iconography (Issue #7991, PR #8021)
Users retrieving large guestbooks over the API will experience fewer failures (Issue #8073, PR #8084)
Dataverse collection administrators can specify which language will be used when entering metadata for new Datasets in a collection, based on a list of languages specified by the Dataverse installation administrator (Issue #7388, PR #7958)
- Users will see the language used for metadata entry indicated at the document or element level in metadata exports (Issue #7388, PR #7958)
- Administrators will now be able to specify the language(s) of controlled vocabulary entries, in addition to the installation's default language (Issue #6751, PR #7959)
Administrators and curators can now receive notifications when a dataset is created (Issue #8069, PR #8070)
Administrators with large files in their installation can disable the automatic checksum verification process at publish time (Issue #8043, PR #8074)

Notes for Dataverse Installation Administrators

Dataset Creation Notifications for Administrators

A new :SendNotificationOnDatasetCreation setting has been added. When true, administrators and curators (those who can publish the dataset) will get a notification when a new dataset is created. This makes it easier to track activity in a Dataverse and, for example, allow admins to follow up when users do not publish a new dataset within some period of time.

Skip Checksum Validation at Publish Based on Size

When a user requests to publish a dataset, the time taken to complete the publishing process varies based on the dataset/datafile size.

With the additional settings of :DatasetChecksumValidationSizeLimit and :DataFileChecksumValidationSizeLimit, the checksum validation can be skipped while publishing.

If the Dataverse administrator chooses to set these values, it's strongly recommended to have an external auditing system run periodically in order to monitor the integrity of the files in the Dataverse installation.

Guestbook Response API Update

With this release the Retrieve Guestbook Responses for a Dataverse Collection API will no longer produce a file by default. You may specify an output file by adding a -o $YOURFILENAME to the curl command.

Dynamic JavaServer Faces Configuration Options

This release includes a new way to easily change JSF settings via MicroProfile Config, especially useful during development. See the development guide on "Debugging" for more information.

Enhancements to DDI Metadata Exports

Several changes have been made to the DDI exports to improve support for internationalization and to improve compliance with CESSDA requirements. These changes include:

Addition of xml:lang attributes specifying the dataset metadata language at the document level and for individual elements such as title and description
Specification of controlled vocabulary terms in duplicate elements in multiple languages (in the installation default langauge and, if different, the dataset metadata language)

While these changes are intended to improve harvesting and integration with external systems, they could break existing connections that make assumptions about the elements and attributes that have been changed.

New JVM Options and DB Settings

:SendNotificationOnDatasetCreation - A boolean setting that, if true will send an email and notification to additional users when a Dataset is created. Messages go to those, other than the dataset creator, who have the ability/permission necessary to publish the dataset.
:DatasetChecksumValidationSizeLimit - disables the checksum validation while publishing for any dataset size greater than the limit.
:DataFileChecksumValidationSizeLimit - Disables the checksum validation while publishing for any datafiles greater than the limit.
:CVocConf - A JSON-structured setting that configures Dataverse to associate specific metadatablock fields with external vocabulary services and specific vocabularies/sub-vocabularies managed by that service.
:MetadataLanguages - Sets which languages can be used when entering dataset metadata.
:AllowedCurationLabels - A JSON Object containing lists of allowed labels (up to 32 characters, spaces allowed) that can be set, via API or UI by users with the permission to publish a dataset. The set of labels allowed for datasets can be selected by a superuser - via the Dataverse collection page (Edit/General Info) or set via API call.

Notes for Tool Developers and Integrators

Bags Now Support File Paths

The original Bag generation code stored all dataset files directly under the /data directory. With the addition in Dataverse of a directory path for files and then a change to allow files with different paths to have the same name, archival Bags will now use the directory path from Dataverse to avoid name collisions within the /data directory. Prior to this update, Bags from Datasets with multiple files with the same name would have been created with only one of the files with that name (with warnings in the log, but still generating the Bag).

Complete List of Changes

For the complete list of code changes in this release, see the 5.7 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.7.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.7.war

5. Restart payara

service payara stop
service payara start

Additional Release Steps

1. Replace Solr schema.xml to allow Curation Labels to be used. See specific instructions below for those installations with custom metadata blocks (1a) and those without (1b).

1a.

For installations with Custom Metadata Blocks:

-stop solr instance (usually service solr stop, depending on solr installation/OS, see the Installation Guide

add the following line to your schema.xml:

<field name="externalStatus" type="string" stored="true" indexed="true" multiValued="false"/>
restart solr instance (usually service solr start, depending on solr/OS)

1b.

For installations without Custom Metadata Blocks:

-stop solr instance (usually service solr stop, depending on solr installation/OS, see the Installation Guide

-replace schema.xml

cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf

-start solr instance (usually service solr start, depending on solr/OS)

- Java
Published by kcondon over 4 years ago

dataverse - v5.6

Dataverse Software 5.6

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Anonymized Access in Support of Double Blind Review

Dataverse installations can select whether or not to allow users to create anonymized private URLs and can control which specific identifying fields are anonymized. If this is enabled, users can create private URLs that do not expose identifying information about dataset depositors, allowing for double blind reviews of datasets in the Dataverse installation.

Guestbook Responses API

A new API to retrieve Guestbook responses has been added. This makes it easier to retrieve the records for large guestbooks and also makes it easier to integrate with external systems.

Dataset Semantic API (Experimental)

Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format - following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field storage architecture). This new API also allows for the update of terms metadata (#5899).

This development was supported by the Research Data Alliance, DANS, and Sciences PO and follows the recommendations from the Research Data Repository Interoperability Working Group.

Dataset Migration API (Experimental)

Datasets can now imported following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier migration from one Dataverse installation to another, and migration from other systems. This experimental, superuser only, endpoint also allows keeping the existing persistent identifier (where the authority and shoulder match those for which the software is configured) and publication dates.

This development was supported by DANS and the Research Data Alliance and follows the recommendations from the Research Data Repository Interoperability Working Group.

Direct Upload API Now Available for Adding Multiple Files' Metadata to the Dataset

Using the Direct Upload API, users can now add metadata of multiple files to the dataset after the files exist in the S3 bucket. This makes direct uploads more efficient and reduces server load by only updating the dataset once instead of once per file. For more information, see the Direct DataFile Upload/Replace API section of the Dataverse Software Guides.

Major Use Cases

Newly-supported major use cases in this release include:

Users can create Private URLs that anonymize dataset metadata, allowing for double blind peer review. (Issue #1724, PR #7908)
Users can download Guestbook records using a new API. (Issue #7767, PR #7931)
Users can update terms metadata using the new semantic API. (Issue #5899, PR #7414)
Users can retrieve, set, and update metadata using a new, flatter JSON-LD format. (Issue #6497, PR #7414)
Administrators can use the Traces API to retrieve information about specific types of user activity (Issue #7952, PR #7953)

Notes for Dataverse Installation Administrators

New Database Constraint

A new DB Constraint has been added in this release. Full instructions on how to identify whether or not your database needs any cleanup before the upgrade can be found in the Dataverse software GitHub repository. This information was also emailed out to Dataverse installation contacts.

Payara 5.2021.5 (or Higher) Required

Some changes in this release require an upgrade to Payara 5.2021.5 or higher. (See the upgrade section).

Instructions on how to update can be found in the Payara documentation We've included the necessary steps below, but we recommend that you review the Payara upgrade instructions as it could be helpful during any troubleshooting.

Installations upgrading from a previous Payara version shouldn't encounter a logging configuration bug in Payara-5.2021.5, but if your server.log fills with repeated notes about logging configuration and WELD complaints about loading beans, see the paragraph on logging.properties in the Installation Guide

Enhancement to DDI Metadata Exports

To increase support for internationalization and to improve compliance with CESSDA requirements, DDI exports now have a holdings element with a URI attribute whose value is the URL form of the dataset PID.

New JVM Options and DB Settings

:AnonymizedFieldTypeNames can be used to enable creation of anonymized Private URLs and to specify which fields will be anonymized.

Notes for Tool Developers and Integrators

Semantic API

The new Semantic API is especially helpful in data migrations and getting metadata into a Dataverse installation. Learn more in the Developers Guide.

Complete List of Changes

For the complete list of code changes in this release, see the 5.6 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.6.

The steps below include a required upgrade to Payara 5.2021.5 or higher. (It is a simple matter of reusing your existing domain directory with the new distribution). But we also recommend that you review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara documentation

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Move the current Payara directory out of the way

mv $PAYARA $PAYARA.MOVED

4. Download the new Payara version (5.2021.5+), and unzip it in its place

5. Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1

6. Start Payara

service payara start

7. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.6.war

8. Restart payara

service payara stop
service payara start

10. Run ReExportall to update JSON Exports http://guides.dataverse.org/en/5.6/admin/metadataexport.html?highlight=export#batch-exports-through-the-api

Additional Release Steps

If your installation relies on the database-side stored procedure for generating sequential numeric identifiers:

Note that you can skip this step if your installation uses the default-style, randomly-generated six alphanumeric character-long identifiers for your datasets! This is the case with most Dataverse installations.

The underlying database framework has been modified in this release, to make it easier for installations to create custom procedures for generating identifier strings that suit their needs. Your current configuration will be automatically updated by the database upgrade (Flyway) script incorporated in the release. No manual configuration changes should be necessary. However, after the upgrade, we recommend that you confirm that your installation can still create new datasets, and that they are still assigned sequential numeric identifiers. In the unlikely chance that this is no longer working, please re-create the stored procedure following the steps described in the documentation for the :IdentifierGenerationStyle setting in the Configuration section of the Installation Guide for this release (v5.6).

(Running the script supplied there will NOT overwrite the position on the sequence you are currently using!)

- Java
Published by kcondon almost 5 years ago

dataverse - v5.5

Dataverse Software 5.5

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Note: this release has a change to the default value for the :ZipDownloadLimit setting, from 100 MB to 0 bytes. If you have not previously adjusted this setting from the default, your Dataverse installation will no longer generate zip files once v5.5 is installed, as the setting will now be 0 bytes. This behavior will be revisited in a later release.

Release Highlights

Auxiliary Files Accessible Through the UI

Auxiliary Files can now be downloaded from the web interface. Auxiliary files uploaded as type=DP appear under "Differentially Private Statistics" under file level download. The rest appear under "Other Auxiliary Files".

Please note that the auxiliary files feature is experimental and is designed to support integration with tools from the OpenDP Project. If the API endpoints are not needed they can be blocked.

Improved Workflow for Downloading Large Zip Files

Users trying to download a zip file larger than the Dataverse installation's :ZipDownloadLimit will now receive messaging that the zip file is too large, and the user will be presented with alternate access options. Previously, the zip file would download and files above the :ZipDownloadLimit would be excluded and noted in a MANIFEST.TXT file.

Guidelines on Depositing Code

The Software Metadata Working Group has created guidelines on depositing research code in a Dataverse installation. Learn more in the Dataset Management section of the Dataverse Guides.

New Metrics API

Users can retrieve new types of metrics and per-collection metrics. The new capabilities are described in the guides. A new version of the Dataverse Metrics web app adds interactive graphs to display these metrics. Anyone running the existing Dataverse Metrics app will need to upgrade or apply a small patch to continue retrieving metrics from Dataverse instances upgrading to this release.

Major Use Cases

Newly-supported major use cases in this release include:

Users can now select and download auxiliary files through the UI. (Issue #7400, PR #7729)
Users attempting to download zip files above the installation's size limit will receive better messaging and be directed to other download options. (Issue #7714, PR #7806)
Superusers can now sort users on the Dashboard. (Issue #7814, PR #7815)
Users can now access expanded and new metrics through a new API (Issue #7177, PR #7178)
Dataverse collection administrators can now add a search facet on their collection pages for the Geospatial metadatablock's "Other" field, so that others can narrow searches in their collections using the values entered in that "Other" field (Issue #7399, PR #7813)
Depositors can now receive guidance about depositing code into a Dataverse installation (PR #7717)

Notes for Dataverse Installation Administrators

Simple Search Fix for Solr Configuration

The introduction in v4.17 of a schemadvmdb_copies.xml file as part of the Solr configuration accidentally removed the contents of most metadata fields from index used for simple searches in Dataverse (i.e. when one types a word without indicating which field to search in the normal search box). This was somewhat ameliorated/hidden by the fact that many common fields such as description were still included by other means.

This release removes the schemadvmdb_copies.xml file and includes the updates needed in the schema.xml file. Installations with no custom metadata blocks can simply replace their current schema.xml file for Solr, restart Solr, and run a 'Reindex in Place' as described in the guides.

Installations using custom metadata blocks should manually copy the contents of their schemadvmdb_copies.xml file (excluding the enclosing <schema> element and only including the <copyField> elements) into their schema.xml file, replacing the section between

and

.

In existing schema.xml files, this section currently includes only one line:

<xi:include href="schema_dv_mdb_copies.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />.

In this release, that line has already been replaced with the default set of <copyFields>. It doesn't matter whether schemadvmdbcopies.xml was originally created manually or via the recommended updateSchemaMDB.sh script and this fix will work with all prior versions of Dataverse from v4.17 on. If you make further changes to metadata blocks in your installation, you can repeat this process (i.e. run updateSchemaMDB.sh, copy the entries in schemadvmdbcopies.xml into the same section of schema.xml, restart solr, and reindex.)

Once schema.xml is updated, Solr should be restarted and a 'Reindex in Place' will be required. (Future Dataverse Software versions will avoid this manual copy step.)

Geospatial Metadata Block Updated

The Geospatial metadata block (geospatial.tsv) was updated. Dataverse collection administrators can now add a search facet on their collection pages for the metadata block's "Other" field, so that people searching in their collections can narrow searches using the values entered in that field.

Extended support for S3 Download Redirects ("Direct Downloads")

If your installation uses S3 for storage and you have "direct downloads" enabled, please note that it will now cover the following download types that were not handled by redirects in the earlier versions: saved originals of tabular data files, cached RData frames, resized thumbnails for image files and other auxiliary files. In other words, all the forms of the file download API that take extra arguments, such as "format" or "imageThumb" - for example:

/api/access/datafile/12345?format=original

/api/access/datafile/:persistentId?persistentId=doi:1234/ABCDE/FGHIJ&imageThumb=true

etc., that were previously excluded.

Since browsers follow redirects automatically, this change should not in any way affect the web GUI users. However, some API users may experience problems, if they use it in a way that does not expect to receive a redirect response. For example, if a user has a script where they expect to download a saved original of an ingested tabular file with the following command:

curl https://yourhost.edu/api/access/datafile/12345?format=original > orig.dta

it will fail to save the file when it receives a 303 (redirect) response instead of 200. So they will need to add "-L" to the command line above, to instruct curl to follow redirects:

curl -L https://yourhost.edu/api/access/datafile/12345?format=original > orig.dta

Most of your API users have likely figured it out already, since you enabled S3 redirects for "straightforward" downloads in your installation. But we feel it was worth a heads up, just in case.

Authenticated User Deactivated Field Updated

The "deactivated" field on the Authenticated User table has been updated to be a non-nullable field. When the field was added in version 5.3 it was set to 'false' in an update script. If for whatever reason that update failed in the 5.3 deploy you will need to re-run it before deploying 5.5. The update query you may need to run is: UPDATE authenticateduser SET deactivated = false WHERE deactivated IS NULL;

Notes for Tool Developers and Integrators

S3 Download Redirects

See above note about download redirects. If your application integrates with the the Dataverse software using the APIs, you may need to change how redirects are handled in your tool or integration.

Complete List of Changes

For the complete list of code changes in this release, see the 5.5 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.5.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.5.war

5. Restart payara

service payara stop
service payara start

Additional Release Steps

1. Follow the steps to update your Solr configuration, found in the "Notes for Dataverse Installation Administrators" section above. Note that there are different instructions for Dataverse installations running with custom metadata blocks and those without.

2. Update Geospatial Metadata Block (if used)

wget https://github.com/IQSS/dataverse/releases/download/v5.5/geospatial.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @geospatial.tsv -H "Content-type: text/tab-separated-values"

- Java
Published by kcondon about 5 years ago

dataverse - v5.4.1

Dataverse Software 5.4.1

This release provides a fix for a regression introduced in 5.4 and implements a few other small changes. Please use 5.4.1 for production deployments instead of 5.4.

Release Highlights

API Backwards Compatibility Maintained

The syntax in the example in the Basic File Access section of the Dataverse Software Guides will continue to work.

Direct Upload API Now Available for Replacing Files

Users can now replace files using the direct upload API. For more information, see the Direct DataFile Upload/Replace API section of the Dataverse Software Guides.

Complete List of Changes

For the complete list of code changes in this release, see the 5.4.1 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.4.1.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

1. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

2. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

3. Start Payara

service payara start

4. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.4.1.war

5. Restart payara

service payara stop
service payara start

- Java
Published by kcondon about 5 years ago

dataverse - v5.4

Dataverse Software 5.4

This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. Please note that there is an API backwards compatibility issue in 5.4, and we recommend using 5.4.1 for any production environments.

Release Highlights

Deactivate Users API, Get User Traces API, Revoke Roles API

A new API has been added to deactivate users to prevent them from logging in, receiving communications, or otherwise being active in the system. Deactivating a user is an alternative to deleting a user, especially when the latter is not possible due to the amount of interaction the user has had with the Dataverse installation. In order to learn more about a user before deleting, deactivating, or merging, a new "get user traces" API is available that will show objects created, roles, group memberships, and more. Finally, the "remove all roles" button available in the superuser dashboard is now also available via API.

New File Access API

A new API offers crawlable access view of the folders and files within a dataset:

/api/datasets/<dataset id>/dirindex/

will output a simple html listing, based on the standard Apache directory index, with Access API download links for individual files, and recursive calls to the API above for sub-folders. Please see the Native API Guide for more information.

Using this API, wget --recursive (or similar crawling client) can be used to download all the files in a dataset, preserving the file names and folder structure; without having to use the download-as-zip API. In addition to being faster (zipping is a relatively resource-intensive operation on the server side), this process can be restarted if interrupted (with wget --continue or equivalent) - unlike zipped multi-file downloads that always have to start from the beginning.

On a system that uses S3 with download redirects, the individual file downloads will be handled by S3 directly (with the exception of tabular files), without having to be proxied through the Dataverse application.

Restricted Files and DDI "dataDscr" Information (Summary Statistics, Variable Names, Variable Labels)

In previous releases, DDI "dataDscr" information (summary statistics, variable names, and variable labels, sometimes known as "variable metadata") for tabular files that were ingested successfully were available even if files were restricted. This has been changed in the following ways:

At the dataset level, DDI exports no longer show "dataDscr" information for restricted files. There is only one version of this export and it is the version that's suitable for public consumption with the "dataDscr" information hidden for restricted files.
Similarly, at the dataset level, the DDI HTML Codebook no longer shows "dataDscr" information for restricted files.
At the file level, "dataDscr" information is no longer publicly available for restricted files. In practice, it was only possible to get this publicly via API (the download/access button was hidden).
At the file level, "dataDscr" (variable metadata) information can still be downloaded for restricted files if you have access to download the file.

Search with Accented Characters

Many languages include characters that have close analogs in ascii, e.g. (á, à, â, ç, é, è, ê, ë, í, ó, ö, ú, ù, û, ü…). This release changes the default Solr configuration to allow search to match words based on these associations, e.g. a search for Mercè would match the word Merce in a Dataset, and vice versa. This should generally be helpful, but can result in false positives, e.g. "canon" will be found searching for "cañon".

Java 11, PostgreSQL 13, and Solr 8 Support/Upgrades

Several of the core components of the Dataverse Software have been upgraded. Specifically:

The Dataverse Software now runs on and requires Java 11. This will provide performance and security enhancements, allows developers to take advantage of new and updated Java features, and moves the project to a platform with better longer term support. This upgrade requires a few extra steps in the release process, outlined below.
The Dataverse Software has now been tested with PostgreSQL versions up to 13. Versions 9.6+ will still work, but this update is necessary to support the software beyond PostgreSQL EOL later in 2021.
The Dataverse Software now runs on Solr 8.8.1, the latest available stable release in the Solr 8.x series.

Saved Search Performance Improvements

A refactoring has greatly improved Saved Search performance in the application. If your installation has multiple, potentially long-running Saved Searches in place, this greatly improves the probability that those search jobs will complete without timing out.

Worldmap/Geoconnect Integration Now Obsolete

As of this release, the Geoconnect/Worldmap integration is no longer available. The Harvard University Worldmap is going through a migration process, and instead of updating this code to work with the new infrastructure, the decision was made to pursue future Geospatial exploration/analysis through other tools, following the External Tools Framework in the Dataverse Software.

Guides Updates

The Dataverse Software Guides have been updated to follow recent changes to how different terms are used across the Dataverse Project. For more information, see Mercè's note to the community:

https://groups.google.com/g/dataverse-community/c/pD-aFrpXMPo

Conditionally Required Metadata Fields

Prior to this release, when defining metadata for compound fields (via their dataset field types), fields could be either be optional or required, i.e. if required you must always have (at least one) value for that field. For example, Author Name being required means you must have at least one Author with an nonempty Author name.

In order to support more robust metadata (and specifically to resolve #7551), we need to allow a third case: Conditionally Required, that is, the field is required if and only if any of its "sibling" fields are entered. For example, Producer Name is now conditionally required in the citation metadata block. A user does not have to enter a Producer, but if they do, they have to enter a Producer Name.

Major Use Cases

Newly-supported major use cases in this release include:

Dataverse Installation Administrators can now deactivate users using a new API. (Issue #2419, PR #7629)
Superusers can remove all of a user's assigned roles using a new API. (Issue #2419, PR #7629)
Superusers can use an API to gather more information about actions a user has taken in the system in order to make an informed decisions about whether or not to deactivate or delete a user. (Issue #2419, PR #7629)
Superusers will now be able to harvest from installations using ISO-639-3 language codes. (Issue #7638, PR #7690)
Users interacting with the workflow system will receive status messages (Issue #7564, PR #7635)
Users interacting with prepublication workflows will see speed improvements (Issue #7681, PR #7682)
API Users will receive Dataverse collection API responses in a deterministic order. (Issue #7634, PR #7708)
API Users will be able to access a list of crawlable URLs for file download, allowing for faster and easily resumable transfers. (Issue #7084, PR #7579)
Users will no longer be able to access summary stats for restricted files. (Issue #7619, PR #7642)
Users will now see truncated versions of long strings (primarily checksums) throughout the application (Issue #6685, PR #7312)
Users will now be able to easily copy checksums, API tokens, and private URLs with a single click (Issue #6039, Issue #6685, PR #7539, PR #7312)
Users uploading data through the Direct Upload API will now be able to use additional checksums (Issue #7600, PR #7602)
Users searching for content will now be able to search using non-ascii characters. (Issue #820, PR #7378)
Users can now replace files in draft datasets, a functionality previously only available on published datasets. (Issue #7149, PR #7337)
Dataverse Installation Administrators can now set subfields of compound fields as conditionally required, that is, the field is required if and only if any of its "sibling" fields are entered. For example, Producer Name is now conditionally required in the citation metadata block. A user does not have to enter a Producer, but if they do, they have to enter a Producer Name. (Issue #7606, PR #7608)

Notes for Dataverse Installation Administrators

Java 11 Upgrade

There are some things to note and keep in mind regarding the move to Java 11:

You should install the JDK/JRE following your usual methods, depending on your operating system. An example of this on a RHEL/CentOS 7 or RHEL/CentOS 8 system is:

$ sudo yum remove java-1.8.0-openjdk java-1.8.0-openjdk-devel java-1.8.0-openjdk-headless

$ sudo yum install java-11-openjdk-devel

The remove command may provide an error message if -headless isn't installed.
We targeted and tested Java 11, but 11+ will likely work. Java 11 was targeted because of its long term support.
If you're moving from a Dataverse installation that was previously running Glassfish 4.x (typically this would be Dataverse Software 4.x), you will need to adjust some JVM options in domain.xml as part of the upgrade process. We've provided these optional steps below. These steps are not required if your first installed Dataverse Software version was running Payara 5.x (typically Dataverse Software 5.x).

PostgreSQL Versions Up To 13 Supported

Up until this release our installation guide "strongly recommended" to install PostgreSQL v. 9.6. While that version is known to be very stable, it is nearing its end-of-life (in Nov. 2021). Dataverse Software has now been tested with versions up to 13. If you decide to upgrade PostgreSQL, the tested and recommended way of doing that is as follows:

Export your current database with pg_dumpall;
Install the new version of PostgreSQL; (make sure it's running on the same port, etc. so that no changes are needed in the Payara configuration)
Re-import the database with psql, as the postgres user.

Consult the PostgreSQL upgrade documentation for more information, for example https://www.postgresql.org/docs/13/upgrading.html#UPGRADING-VIA-PGDUMPALL.

Solr Upgrade

With this release we upgrade to the latest available stable release in the Solr 8.x branch. We recommend a fresh installation of Solr 8.8.1 (the index will be empty) followed by an "index all".

Before you start the "index all", the Dataverse installation will appear to be empty because the search results come from Solr. As indexing progresses, partial results will appear until indexing is complete.

See http://guides.dataverse.org/en/5.4/installation/prerequisites.html#installing-solr for more information.

Managing Conditionally Required Metadata Fields

Prior to this release, when defining metadata for compound fields (via their dataset field types), fields could be either be optional or required, i.e. if required you must always have (at least one) value for that field. For example, Author Name being required means you must have at least one Author with an nonempty Author name.

In order to support more robust metadata (and specifically to resolve #7551), we need to allow a third case: Conditionally Required, that is, the field is required if and only if any of its "sibling" fields are entered. For example, Producer Name is now conditionally required in the citation metadata block. A user does not have to enter a Producer, but if they do, they have to enter a Producer Name.

This change required some modifications to how "required" is defined in the metadata .tsv files (for compound fields).

Prior to this release, the value of required for the parent compound field did not matter and so was set to false.

Going forward:

For optional, the parent compound field would be required = false and all children would be required = false.
For required, the parent compound field would be required = true and at least one child would be required = true.
For conditionally required, the parent compound field would be required = false and at least one child would be required = true.

This release updates the citation .tsv file that is distributed with the software for the required parent compound fields (e.g. author), as well as sets Producer Name to be conditionally required. No other distributed .tsv files were updated, as they did not have any required compound values.

If you have created any custom metadata .tsv files, you will need to make the same (type of) changes there.

Citation Metadata Block Update

Due to the changes for Conditionally Required Metadata Fields, and a minor update in the citation metadata block to support extra ISO-639-3 language codes, a block upgrade is required. Instructions are provided below.

Retroactively Store Original File Size

Beginning in Dataverse Software 4.10, the size of the saved original file (for an ingested tabular datafile) was stored in the database. For files added before this change, we provide an API that retrieves and permanently stores the sizes for any already existing saved originals. See Datafile Integrity API for more information.

This was documented as a step in previous release notes, but we are noting it in these release notes to give it more visibility.

DB Cleanup for Saved Searches

A previous version of the Dataverse Software changed the indexing logic so that when a user links a Dataverse collection, its children are also indexed as linked. This means that the children do not need to be separately linked, and in this version we removed the logic that creates a saved search to create those links when a Dataverse collection is linked.

We recommend cleaning up the db to a) remove these saved searches and b) remove the links for the objects. We can do this via a few queries, which are available in the folder here:

https://github.com/IQSS/dataverse/raw/develop/scripts/issues/7398/

There are four sets of queries available, and they should be run in this order:

ssfordeletion.txt to identify the Saved Searches to be deleted
delete_ss.txt to delete the Saved Searches identified in the previous query
dldfordeletion.txt to identify the linked datasets and Dataverse collections to be deleted
delete_dld.txt to delete the linked datasets and Dataverse collections identified in the previous queries

Note: removing these saved searches and links should not affect what users will see as linked due to the aforementioned indexing change. Similarly, not removing these saved searches and links should not affect anything, but is a cleanup of unnecessary rows in the database.

DB Cleanup for Superusers Releasing without Version Updates

In datasets where a superuser has run the Curate command and the update included a change to the fileaccessrequest flag, those changes would not be reflected appropriately in the published version. This should be a rare occurrence.

Instead of an automated solution, we recommend inspecting the affected datasets and correcting the fileaccessrequest flag as appropriate. You can identify the affected datasets this via a query, which is available in the folder here:

https://github.com/IQSS/dataverse/raw/develop/scripts/issues/7687/

New JVM Options and Database Settings

For installations that were previously running on Dataverse Software 4.x, a number of new JVM options need to be added as part of the upgrade. The JVM Options are enumerated in the detailed upgrade instructions below.

Two new Database settings were added:

:InstallationName
:ExportInstallationAsDistributorOnlyWhenNotSet

For an overview of these new options, please see the Installation Guide

Notes for Tool Developers and Integrators

UTF-8 Characters and Spaces in File Names

UTF-8 characters in filenames are now preserved when downloaded.

Dataverse installations will no longer replace spaces in file names of downloaded files with the + character. If your tool or integration has any special handling around this, you may need to make further adjustments to maintain backwards compatibility while also supporting Dataverse installations on 5.4+.

Note that this follows a change from 5.1 that only corrected this for installations running with S3 storage. This makes the behavior consistent across installations running all types of file storage.

Complete List of Changes

For the complete list of code changes in this release, see the 5.4 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.4.

1. Upgrade to Java 11.

2. Upgrade to Solr 8.8.1.

If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.

In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.

export PAYARA=/usr/local/payara5

(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)

3. Undeploy the previous version.

$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>

4. Stop Payara and remove the generated directory

service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated

5. (only required for installations previously running Dataverse Software 4.x!) In other words, if you have a domain.xml that originated under Glassfish 4, the below JVM Options need to be added. If your Dataverse installation was first installed on the 5.x series, these JVM options should already be present.

In domain.xml:

Remove the following JVM options from the <config name="server-config"><java-config> section:

<jvm-options>-Djava.endorsed.dirs=/usr/local/payara5/glassfish/modules/endorsed:/usr/local/payara5/glassfish/lib/endorsed</jvm-options>

<jvm-options>-Djava.ext.dirs=${com.sun.aas.javaRoot}/lib/ext${path.separator}${com.sun.aas.javaRoot}/jre/lib/ext${path.separator}${com.sun.aas.instanceRoot}/lib/ext</jvm-options>

Add the following JVM options to the <config name="server-config"><java-config> section:

<jvm-options>[9|]--add-opens=java.base/jdk.internal.loader=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-exports=java.base/jdk.internal.ref=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.base/java.lang=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.base/java.net=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.base/java.nio=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.base/java.util=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.base/sun.nio.ch=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.management/sun.management=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.base/sun.net.www.protocol.jrt=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.base/sun.net.www.protocol.jar=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.naming/javax.naming.spi=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED</jvm-options>

<jvm-options>[9|]--add-opens=java.logging/java.util.logging=ALL-UNNAMED</jvm-options>

6. Start Payara

service payara start

7. Deploy this version.

$PAYARA/bin/asadmin deploy dataverse-5.4.war

8. Restart payara

service payara stop
service payara start

9. Reload Citation Metadata Block:

wget https://github.com/IQSS/dataverse/releases/download/v5.4/citation.tsv curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

Additional Release Steps

1. Confirm that the schema.xml was updated with the new v5.4 version when you updated Solr.

2. Run the script updateSchemaMDB.sh to generate updated solr schema files and preserve any other custom fields in your Solr configuration.

For example: (modify the path names as needed)

cd /usr/local/solr-8.8.1/server/solr/collection1/conf wget https://github.com/IQSS/dataverse/releases/download/v5.4/updateSchemaMDB.sh chmod +x updateSchemaMDB.sh ./updateSchemaMDB.sh -t .

See https://guides.dataverse.org/en/5.4/admin/metadatacustomization.html#updating-the-solr-schema for more information.

3. Do a clean reindex by first clearing then indexing. Re-indexing is required to get full-functionality from this change. Please refer to the guides on how to clear and index if needed.

4. Upgrade Postgres.

Export your current database with pg_dumpall;
Install the new version of PostgreSQL; (make sure it's running on the same port, etc. so that no changes are needed in the Payara configuration)
Re-import the database with psql, as the postgres user.

Consult the PostgreSQL upgrade documentation for more information, for example https://www.postgresql.org/docs/13/upgrading.html#UPGRADING-VIA-PGDUMPALL.

5. Retroactively store original file size

Use the Datafile Integrity API to ensure that the sizes of all original files are stored in the database.

6. DB Cleanup for Superusers Releasing without Version Updates

In datasets where a superuser has run the Curate command and the update included a change to the fileaccessrequest flag, those changes would not be reflected appropriately in the published version. This should be a rare occurrence.

Instead of an automated solution, we recommend inspecting the affected datasets and correcting the fileaccessrequest flag as appropriate. You can identify the affected datasets this via a query, which is available in the folder here:

https://github.com/IQSS/dataverse/raw/develop/scripts/issues/7687/

7. (Optional, but recommended) DB Cleanup for Saved Searches and Linked Objects

Perform the DB Cleanup for Saved Searches and Linked Objects, summarized in the "Notes for Dataverse Installation Administrators" section above.

8. Take a backup of the Worldmap links, if any.

9. (Only required if custom metadata blocks are used in your Dataverse installation) Update any custom metadata blocks:

In the .tsv for any custom metadata blocks, for any subfield that has a required value of TRUE, find the corresponding parent field and change its required value to TRUE.

Note: As there is an accompanying Flyway script that updates the values directly in the database, you do not need to reload these metadata .tsv files via API, unless you make additional changes, e.g set some compound fields to be conditionally required.

- Java
Published by kcondon about 5 years ago

dataverse - v5.3

Dataverse 5.3

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Auxiliary Files (Experimental)

Auxiliary files can now be added to datafiles and accessed using new experimental API endpoints. These endpoints allow additional, non-Dataverse generated metadata to be added alongside datafiles in dataverse.

The support for auxiliary files in Dataverse is being driven by integration with the Open Differential Privacy (DP) Project and is designed to support the deposit and retrieval of differentially private metadata, but the endpoints are not specific to differential privacy use cases.

Additional Banner Functionality

Banners in Dataverse can now be set to allow dismissal by a logged in user. Previously, banners would persist until they were removed by an administrator. This allows administrators to more easily communicate one-time messages to users.

File Tags Searchable from Advanced Search and Dataset Search

File tags ("Documentation", "Data", "Code", etc.) now appear on the Advanced Search page.

Performing a search for files on the dataset page now includes file tags. Previously, only file name and file description were searched.

Easier Configuration of Database Connections

Previously, the configuration of the database connections has been quite static and not very easy to update. This has been an issue especially for cloud and container usage. Using new technologies provided by the move to Payara, you can now more easily configure the connection to your PostgreSQL DB.

Using MicroProfile Config API (Issue #7000, Issue #7418), you can much more easily specify configuration details. For an overview of supported options, please see the Installation Guide.

Note that some settings have been moved from domain.xml to code, such as min and max pool size.

Major Use Cases

Newly-supported use cases in this release include:

Users can use an API to add auxiliary files to files in order to provide metadata representations for specific tools or integrations (Issue #7275, PR #7350)
Administrators can use a new API to manage banner messages and take advantage of new banner display options (Issue #7263, PR #7434)
Users replacing files will now have their files renamed when a file name conflict exists, making the behavior consistent with upload and edit (Issue #7335, PR #7336)
Users will now be able to search on file tags on the advanced search and dataset pages (Issue #7194, PR #7385)

Notes for Dataverse Installation Administrators

Payara 5.2020.6 (or Higher) Required

Some changes in this release require an upgrade to Payara 5.2020.6 or higher.

Instructions on how to update can be found in the Payara documentation

New Banner API, Obsolete DB Settings

The functionality previously provided by the DB settings :StatusMessageHeader and ::StatusMessageText is no longer supported and is now provided through the Manage Banner Messages API. Learn more in the API Guide.

New Database Settings and JVM Options

Several new JVM options have been added in this release:

dataverse.db.name
dataverse.db.user
dataverse.db.password
dataverse.db.host
dataverse.db.port

For an overview of these new options, please see the Installation Guide

See above note about obsolete DB options.

Introducing MicroProfile Config API

With this Dataverse release, Dataverse Administrators can start to make use of the MicroProfile Config API.

This will benefit both developers and sysadmins, but the codebase will have to be refactored to make use of it. As this will take time, we will always provide a backward compatible way of using it.

For more details about these new options, please see the Consuming Configuration section of the Developer Guide.

Java Message System Configuration

The Ingest process uses the Java Message System to create ingest tasks in a queue. That queue had been configured from command line or domain.xml before. This has now changed to being done in code.

In the unlikely case you might want to change any of these settings, feel free to change and recompile or raise an issue on Github. See IngestQueueProducer for more details.

If you want to clean up your existing installation, you can delete the old, unused queue like this:

<payara install path>/bin/asadmin delete-connector-connection-pool --cascade=true jms IngestQueueConnectionFactoryPool

Notes for Tool Developers and Integrators

Experimental Auxiliary File Support

Experimental endpoints have been added to allow auxiliary files to be added to datafiles. These auxiliary files can be deposited and accessed via API. Later releases will include options for accessing these files through the UI. For more information, see the Auxiliary File Support section of the Developer Guide.

Complete List of Changes

For the complete list of code changes in this release, see the 5.3 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.

1. Upgrade to Payara 5.2020.6 or higher.

Instructions on how to update can be found in the Payara documentation.

It would likely be safer to upgrade Payara first, while still running Dataverse 5.2, and then proceed with the steps below. Upgrading from an earlier version of Payara should be a straightforward process: Undeploy Dataverse; stop Payara; move the current Payara directory out of the way; unzip the new Payara version in its place; replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1; start Payara, deploy Dataverse 5.2. We still recommend that you read the detailed upgrade instructions above; and, if you run into any issues with this upgrade, it will help to be able to separate them from any problems with the upgrade of Dataverse proper. If you are still using pre-5.0 version of Dataverse, and Glassfish version 4, please follow the upgrade instructions in the Dataverse 5.0 release notes; but use the latest version of Payara 5 (5.2020.7, as of this writing).

2. Undeploy the previous version.

<payara install path>/bin/asadmin list-applications
<payara install path>/bin/asadmin undeploy dataverse<-version>

(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)

3. Update your database connection.

Please configure your connection details, replacing all the ${DB_...}.

<payara install path>/bin/asadmin create-system-properties "dataverse.db.user=${DB_USER}"
<payara install path>/bin/asadmin create-system-properties "dataverse.db.host=${DB_HOST}"
<payara install path>/bin/asadmin create-system-properties "dataverse.db.port=${DB_PORT}"
<payara install path>/bin/asadmin create-system-properties "dataverse.db.name=${DB_NAME}"
echo "AS_ADMIN_ALIASPASSWORD=${DB_PASS}" > /tmp/password.txt
<payara install path>/bin/asadmin create-password-alias --passwordfile /tmp/password.txt dataverse.db.password
rm /tmp/password.txt

4. In domain.xml, verify that the __TimerPool jdbc-connection-pool is using the H2 database, as follows (if you have the old Derby version from Glassfish 4, replace it):

<jdbc-connection-pool datasource-classname="org.h2.jdbcx.JdbcDataSource" name="__TimerPool" res-type="javax.sql.XADataSource"> <property name="URL" value="jdbc:h2:${com.sun.aas.instanceRoot}/lib/databases/ejbtimer;AUTO_SERVER=TRUE"></property> </jdbc-connection-pool>

5. Reset the EJB timer database back to default:

<payara install path>/bin/asadmin set configs.config.server-config.ejb-container.ejb-timer-service.timer-datasource=jdbc/__TimerPool

6. Delete the old password alias and DB pool:

<payara install path>/bin/asadmin delete-jdbc-connection-pool --cascade=true dvnDbPool
<payara install path>/bin/asadmin delete-password-alias db_password_alias

7. Stop payara, remove the generated and ejbtimer database directories, then restart.

service payara stop
rm -rf <payara install path>/glassfish/domains/domain1/generated
rm -rf <payara install path>/glassfish/domains/domain1/lib/databases/ejbtimer
service payara start

8. Deploy this version.

<payara install path>/bin/asadmin deploy dataverse-5.3.war

9. Restart payara

service payara stop
service payara start

- Java
Published by kcondon over 5 years ago

dataverse - v5.2

Dataverse 5.2

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

File Preview When Guestbooks or Terms Exist

Previously, file preview was only available when files were publicly downloadable. Now if a guestbook or terms (or both) are configured for the dataset, they will be shown in the Preview tab and once they are agreed to, the file preview will appear (#6919).

Preview Only External Tools

A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919).

Dataset Page Edit Options Consolidation

As part of the continued effort to redesign the Dataset and File pages, some of the edit options for a file on the dataset page are being moved to a "kebab" to allow for better consistency and future scalability.

Google Cloud Archiver

Dataverse Bags can now be sent to a bucket in Google Cloud, including those in the "Coldline" storage class, which provides less expensive but slower access.

Major Use Cases

Newly-supported use cases in this release include:

Users can now preview files that have a guestbook or terms. (Issue #6919, PR #7369)
External tool developers can indicate that their tool is "preview only". (Issue #6919, PR #7369)
Dataverse Administrators can set up a regular export to Google Cloud so that the installation's data is preserved (Issue #7140, PR #7292)
Dataverse Administrators can use a regex when defining a group (Issue #7344, PR #7351)
External Tool Developers can use a new API endpoint to retrieve a user's information (Issue #7307, PR #7345)

Notes for Dataverse Installation Administrators

Converting Explore External Tools to Preview Only

When the war file is deployed, a SQL migration script will convert dataverse-previewers to have both "explore" and "preview" types so that they will continue to be displayed in the Preview tab.

If you would prefer that these tools be preview only, you can delete the tools, adjust the JSON manifests (changing "explore" to "preview"), and re-add them.

New Database Settings and JVM Options

Installations integrating with Google Cloud Archiver will need to use two new database settings:

:GoogleCloudProject - the name of the project managing the bucket
:GoogleCloudBucket - the name of the bucket to use

For more information, see the Google Cloud Configuration section of the Installation Guide

Automation of Make Data Count Scripts

Scripts have been added in order to automate Make Data Count processing. For more information, see the Make Data Count section of the Admin Guide.

Notes for Tool Developers and Integrators

Preview Only External Tools, "hasPreviewMode"

A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919). This "preview" type replaces "hasPreviewMode", which has been removed.

Multiple Types for External Tools

External tools now support multiple types. In practice, the types "explore" and "preview" are the only combination that makes a difference in the UI as opposed to only having only one or the other type (see "preview only" above). Multiple types are specified in the JSON manifest with an array in "types". The older, single "type" is still supported but should be considered deprecated.

User Information Endpoint

New API endpoint to retrieve user info so that tools can email users if needed.

Complete List of Changes

For the complete list of code changes in this release, see the 5.2 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade Instructions

0. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.

1. Undeploy the previous version.

<payara install path>/bin/asadmin list-applications
<payara install path>/bin/asadmin undeploy dataverse<-version>

(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)

2. Stop payara and remove the generated directory, start.

service payara stop
remove the generated directory: rm -rf <payara install path>/payara/domains/domain1/generated
service payara start

3. Deploy this version.

<payara install path>/bin/asadmin deploy dataverse-5.2.war

4. Restart payara

service payara stop
service payara start

- Java
Published by kcondon over 5 years ago

dataverse - v5.1.1

Dataverse 5.1.1

This minor release adds important scaling improvements for installations running on AWS S3. It is recommended that 5.1.1 be used in production instead of 5.1.

Release Highlights

Connection Pool Size Configuration Option, Connection Optimizations

Dataverse 5.1 improved the efficiency of making S3 connections through use of an http connection pool. This release adds optimizations around closing streams and channels that may hold S3 http connections open and exhaust the connection pool. In parallel, this release increases the default pool size from 50 to 256 and adds the ability to increase the size of the connection pool, so a larger pool can be configured if needed.

Major Use Cases

Newly-supported use cases in this release include:

Administrators of installations using S3 will be able to define the connection pool size, allowing better resource scaling for larger installations (Issue #7309, PR #7313)

Notes for Dataverse Installation Administrators

5.1.1 vs. 5.1 for Production Use

As mentioned above, we encourage 5.1.1 instead of 5.1 for production use.

New JVM Option for Connection Pool Size

Larger installations may want to increase the number of open S3 connections allowed (default is 256). For example, to set the value to 4096:

./asadmin create-jvm-options "-Ddataverse.files.<id>.connection-pool-size=4096" (where <id> is the identifier of your S3 file store (likely "s3"). The JVM Options section of the Configuration Guide has more information.

Complete List of Changes

For the complete list of code changes in this release, see the 5.1.1 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide

Upgrade Instructions

These instructions assume that you've already successfully upgraded to Dataverse 5.1 following the instructions in the Dataverse 5.1 Release Notes.
Undeploy the previous version.

<payara install path>/bin/asadmin list-applications <payara install path>/bin/asadmin undeploy dataverse<-version>

Stop payara and remove the generated directory, start.

service payara stop
remove the generated directory: rm -rf <payara install path>/glassfish/domains/domain1/generated
service payara start

Deploy this version. <payara install path>/bin/asadmin deploy dataverse-5.1.1.war
Restart payara

- Java
Published by kcondon over 5 years ago

dataverse - Dataverse 5.1

Dataverse 5.1

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Large File Upload for Installations Using AWS S3

The added support for multipart upload through the API and UI (Issue #6763) will allow files larger than 5 GB to be uploaded to Dataverse when an installation is running on AWS S3. Previously, only non-AWS S3 storage configurations would allow uploads larger than 5 GB.

Dataset-Specific Stores

In previous releases, configuration options were added that allow each dataverse to have a specific store enabled. This release adds even more granularity, with the ability to set a dataset-level store.

Major Use Cases

Newly-supported use cases in this release include:

Users can now upload files larger than 5 GB on installations running AWS S3 (Issue #6763, PR #6995)
Administrators will now be able to specify a store at the dataset level in addition to the Dataverse level (Issue #6872, PR #7272)
Users will have their dataset's directory structure retained when uploading a dataset with shapefiles (Issue #6873, PR #7279)
Users will now be able to download zip files through the experimental Zipper service when the set of downloaded files have duplicate names (Issue #80, PR #7276)
Users will now be able to download zip files with the proper file structure through the experiment Zipper service (Issue #7255, PR #7258)
Administrators will be able to use new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause stale search results to not load. (Issue #4225, PR #7211)

Notes for Dataverse Installation Administrators

New API for setting a Dataset-level Store

This release adds a new API for setting a dataset-specific store. Learn more in the Managing Dataverse and Datasets section of the Admin Guide.

Multipart Upload Storage Monitoring, Recommended Use for Multipart Upload

Charges may be incurred for storage reserved for multipart uploads that are not completed or cancelled. Administrators may want to do periodic manual or automated checks for open multipart uploads. Learn more in the Big Data Support section of the Developers Guide.

While multipart uploads can support much larger files, and can have advantages in terms of robust transfer and speed, they are more complex than single part direct uploads. Administrators should consider taking advantage of the options to limit use of multipart uploads to specific users by using multiple stores and configuring access to stores with high file size limits to specific Dataverses (added in 4.20) or Datasets (added in this release).

New APIs for keeping Solr records in sync

This release adds new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause search results to not load. Learn more in the Solr section of the Admin Guide.

Documentation for Purging the Ingest Queue

At times, it may be necessary to cancel long-running Ingest jobs in the interest of system stability. The Troubleshooting section of the Admin Guide now has specific steps.

Biomedical Metadata Block Updated

The Life Science Metadata block (biomedical.tsv) was updated. "Other Design Type", "Other Factor Type", "Other Technology Type", "Other Technology Platform" boxes were added. See the "Additional Upgrade Steps" below if you use this in your installation.

Notes for Tool Developers and Integrators

Spaces in File Names

Dataverse Installations using S3 storage will no longer replace spaces in file names of downloaded files with the + character. If your tool or integration has any special handling around this, you may need to make further adjustments to maintain backwards compatibility while also supporting Dataverse installations on 5.1+.

Complete List of Changes

For the complete list of code changes in this release, see the 5.1 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide

Upgrade Instructions

These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.
Undeploy the previous version.

<payara install path>/bin/asadmin list-applications <payara install path>/bin/asadmin undeploy dataverse<-version>

(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)

Stop payara and remove the generated directory, start.

service payara stop
remove the generated directory: rm -rf <payara install path>/payara/domains/domain1/generated
service payara start

Deploy this version. <payara install path>/bin/asadmin deploy dataverse-5.1.war
Restart payara

Additional Upgrade Steps

Update Biomedical Metadata Block (if used), Reload Solr, ReExportAll

wget https://github.com/IQSS/dataverse/releases/download/v5.1/biomedical.tsv curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @biomedical.tsv -H "Content-type: text/tab-separated-values"

Check if your Solr installation is running with the latest schema.xml config file (https://github.com/IQSS/dataverse/releases/download/v5.1/schema.xml), update if needed.
Run the script updateSchemaMDB.sh to generate updated solr schema files and preserve any other custom fields in your Solr configuration. For example: (modify the path names as needed) cd /usr/local/solr-7.7.2/server/solr/collection1/conf wget https://github.com/IQSS/dataverse/releases/download/v5.1/updateSchemaMDB.sh chmod +x updateSchemaMDB.sh ./updateSchemaMDB.sh -t . See http://guides.dataverse.org/en/5.1/admin/metadatacustomization.html?highlight=updateschemamdb for more information.
Run ReExportall to update JSON Exports
http://guides.dataverse.org/en/5.1/admin/metadataexport.html?highlight=export#batch-exports-through-the-api

- Java
Published by kcondon over 5 years ago

dataverse - Dataverse 5.0

Dataverse 5.0

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Please note that this is a major release and these are long release notes. We offer no apologies. :)

Release Highlights

Continued Dataset and File Redesign: Dataset and File Button Redesign, Responsive Layout

The buttons available on the Dataset and File pages have been redesigned. This change is to provide more scalability for future expanded options for data access and exploration, and to provide a consistent experience between the two pages. The dataset and file pages have also been redesigned to be more responsive and function better across multiple devices.

This is an important step in the incremental process of the Dataset and File Redesign project, following the release of on-page previews, filtering and sorting options, tree view, and other enhancements. Additional features in support of these redesign efforts will follow in later 5.x releases.

Payara 5

A major upgrade of the application server provides security updates, access to new features like MicroProfile Config API, and will enable upgrades to other core technologies.

Note that moving from Glassfish to Payara will be required as part of the move to Dataverse 5.

Download Dataset

Users can now more easily download all files in Dataset through both the UI and API. If this causes server instability, it's suggested that Dataverse Installation Administrators take advantage of the new Standalone Zipper Service described below.

Download All Option on the Dataset Page

In previous versions of Dataverse, downloading all files from a dataset meant several clicks to select files and initiate the download. The Dataset Page now includes a Download All option for both the original and archival formats of the files in a dataset under the "Access Dataset" button.

Download All Files in a Dataset by API

In previous versions of Dataverse, downloading all files from a dataset via API was a two step process:

Find all the database ids of the files.
Download all the files, using those ids (comma-separated).

Now you can download all files from a dataset (assuming you have access to them) via API by passing the dataset persistent ID (PID such as DOI or Handle) or the dataset's database id. Versions are also supported, and you can pass :draft, :latest, :latest-published, or numbers (1.1, 2.0) similar to the "download metadata" API.

A Multi-File, Zipped Download Optimization

In this release we are offering an experimental optimization for the multi-file, download-as-zip functionality. If this option is enabled, instead of enforcing size limits, we attempt to serve all the files that the user requested (that they are authorized to download), but the request is redirected to a standalone zipper service running as a cgi executable. Thus moving these potentially long-running jobs completely outside the Application Server (Payara); and preventing service threads from becoming locked serving them. Since zipping is also a CPU-intensive task, it is possible to have this service running on a different host system, thus freeing the cycles on the main Application Server. The system running the service needs to have access to the database as well as to the storage filesystem, and/or S3 bucket.

Please consult the scripts/zipdownload/README.md in the Dataverse 5 source tree.

The components of the standalone "zipper tool" can also be downloaded here:

https://github.com/IQSS/dataverse/releases/download/v5.0/zipper.zip

Updated File Handling

Files without extensions can now be uploaded through the UI. This release also changes the way Dataverse handles duplicate (filename or checksum) files in a dataset. Specifically:

Files with the same checksum can be included in a dataset, even if the files are in the same directory.
Files with the same filename can be included in a dataset as long as the files are in different directories.
If a user uploads a file to a directory where a file already exists with that directory/filename combination, Dataverse will adjust the file path and names by adding "-1" or "-2" as applicable. This change will be visible in the list of files being uploaded.
If the directory or name of an existing or newly uploaded file is edited in such a way that would create a directory/filename combination that already exists, Dataverse will display an error.
If a user attempts to replace a file with another file that has the same checksum, an error message will be displayed and the file will not be able to be replaced.
If a user attempts to replace a file with a file that has the same checksum as a different file in the dataset, a warning will be displayed.
Files without extensions can now be uploaded through the UI.

Pre-Publish DOI Reservation with DataCite

Dataverse installations using DataCite will be able to reserve the persistent identifiers for datasets with DataCite ahead of publishing time. This allows the DOI to be reserved earlier in the data sharing process and makes the step of publishing datasets simpler and less error-prone.

Primefaces 8

Primefaces, the open source UI framework upon which the Dataverse front end is built, has been updated to the most recent version. This provides security updates and bug fixes and will also allow Dataverse developers to take advantage of new features and enhancements.

Major Use Cases

Newly-supported use cases in this release include:

Users will be presented with a new workflow around dataset and file access and exploration. (Issue #6684, PR #6909)
Users will experience a UI appropriate across a variety of device sizes. (Issue #6684, PR #6909)
Users will be able to download an entire dataset without needing to select all the files in that dataset. (Issue #6564, PR #6262)
Users will be able to download all files in a dataset with a single API call. (Issue #4529, PR #7086)
Users will have DOIs reserved for their datasets upon dataset create instead of at publish time. (Issue #5093, PR #6901)
Users will be able to upload files without extensions. (Issue #6634, PR #6804)
Users will be able to upload files with the same name in a dataset, as long as a those files are in different file paths. (Issue #4813, PR #6924)
Users will be able to upload files with the same checksum in a dataset. (Issue #4813, PR #6924)
Users will be less likely to encounter locks during the publishing process due to PID providers being unavailable. (Issue #6918, PR #7118)
Users will now have their files validated during publish, and in the unlikely event that anything has happened to the files between deposit and publish, they will be able to take corrective action. (Issue #6558, PR #6790)
Administrators will likely see more success with Harvesting, as many minor harvesting issues have been resolved. (Issues #7127, #7128, #4597, #7056, #7052, #7023, #7009, and #7003)
Administrators can now enable an external zip service that frees up application server resources and allows the zip download limit to be increased. (Issue #6505, PR #6986)
Administrators can now create groups based on users' email domains. (Issue #6936, PR #6974)
Administrators can now set date facets to be organized chronologically. (Issue #4977, PR #6958)
Administrators can now link harvested datasets using an API. (Issue #5886, PR #6935)
Administrators can now destroy datasets with mapped shapefiles. (Issue #4093, PR #6860)

Notes for Dataverse Installation Administrators

Glassfish to Payara

This upgrade requires a few extra steps. See the detailed upgrade instructions below.

Dataverse Installations Using DataCite: Upgrade Action Required

If you are using DataCite as your DOI provider you must add a new JVM option called "doi.dataciterestapiurlstring" with a value of "https://api.datacite.org" for production environments and "https://api.test.datacite.org" for test environments. More information about this JVM option can be found in the Installation Guide.

"doi.mdcbaseurlstring" should be deleted if it was previously set.

Dataverse Installations Using DataCite: Upgrade Action Recommended

For installations that are using DataCite, Dataverse v5.0 introduces a change in the process of registering the Persistent Identifier (DOI) for a dataset. Instead of registering it when the dataset is published for the first time, Dataverse will try to "reserve" the DOI when it's created (by registering it as a "draft", using DataCite terminology). When the user publishes the dataset, the DOI will be publicized as well (by switching the registration status to "findable"). This approach makes the process of publishing datasets simpler and less error-prone.

New APIs have been provided for finding any unreserved DataCite-issued DOIs in your Dataverse, and for reserving them (see below). While not required - the user can still attempt to publish a dataset with an unreserved DOI - having all the identifiers reserved ahead of time is recommended. If you are upgrading an installation that uses DataCite, we specifically recommend that you reserve the DOIs for all your pre-existing unpublished drafts as soon as Dataverse v5.0 is deployed, since none of them were registered at create time. This can be done using the following API calls:

/api/pids/unreserved will report the ids of the datasets
/api/pids/:persistentId/reserve reserves the assigned DOI with DataCite (will need to be run on every id reported by the the first API).

See the Native API Guide for more information.

Scripted, the whole process would look as follows (adjust as needed):

``` API_TOKEN='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'

curl -s -H "X-Dataverse-key:$APITOKEN" http://localhost:8080/api/pids/unreserved | # the API outputs JSON; note the use of jq to parse it: jq '.data.count[].pid' | tr -d '"' | while read doi do curl -s -H "X-Dataverse-key:$APITOKEN" -X POST http://localhost:8080/api/pids/:persistentId/reserve?persistentId=$doi done ```

Going forward, once all the DOIs have been reserved for the legacy drafts, you may still get an occasional dataset with an unreserved identifier. DataCite service instability would be a potential cause. There is no reason to expect that to happen often, but it is not impossible. You may consider running the script above (perhaps with some extra diagnostics added) regularly, from a cron job or otherwise, to address this preemptively.

Terms of Use Display Updates

In this release we’ve fixed an issue that would cause the Application Terms of Use to not display when the user's language is set to a language that does not match one of the languages for which terms were created and registered for that Dataverse installation. Instead of the expected Terms of Use, users signing up could receive the “There are no Terms of Use for this Dataverse installation” message. This could potentially result in some users signing up for an account without having the proper Terms of Use displayed. This will only affect installations that use the :ApplicationTermsOfUse setting.

Please note that there is not currently a native workflow in Dataverse to display updated Terms of Use to a user or to force re-agreement. This would only potentially affect users that have signed up since the upgrade to 4.17 (or a following release if 4.17 was skipped).

Datafiles Validation when Publishing Datasets

When a user requests to publish a dataset, Dataverse will now attempt to validate the physical files in the dataset, by recalculating the checksums and verifying them against the values in the database. The goal is to prevent any corrupted files in published datasets. Most of all the instances of actual damage to physical files that we've seen in the past happened while the datafiles were still in the Draft state. (Physical files become essentially read-only once published). So this is the logical place to catch any such issues.

If any files in the dataset fail the validation, the dataset does not get published, and the user is notified that they need to contact their Dataverse support in order to address the issue before another attempt to publish can be made. See the "Troubleshooting" section of the Guide on how to fix such problems.

This validation will be performed asynchronously, the same way as the registration of the file-level persistent ids. Similarly to the file PID registration, this validation process can be disabled on your system, with the setting :FileValidationOnPublishEnabled. (A Dataverse admin may choose to disable it if, for example, they are already running an external auditing system to monitor the integrity of the files in their Dataverse, and would prefer the publishing process to take less time). See the Configuration section of the Installation Guide.

Please note that we are not aware of any bugs in the current versions of Dataverse that would result in damage to users' files. But you may have some legacy files in your archive that were affected by some issue in the past, or perhaps affected by something outside Dataverse, so we are adding this feature out of abundance of caution. An example of a problem we've experienced in the early versions of Dataverse was a possible scenario where a user actually attempted to delete a Draft file from an unpublished version, where the database transaction would fail for whatever reason, but only after the physical file had already been deleted from the filesystem. Thus resulting in a datafile entry remaining in the dataset, but with the corresponding physical file missing. The fix for this case, since the user wanted to delete the file in the first place, is simply to confirm it and purge the datafile entity from the database.

The Setting :PIDAsynchRegFileCount is Deprecated as of 5.0

It used to specify the number of datafiles in the dataset to warrant adding a lock during publishing. As of v5.0 all datasets get locked for the duration of the publishing process. The setting will be ignored if present.

Location Changes for Related Projects

The dataverse-ansible and dataverse-previewers repositories have been moved to the GDCC Organization on GitHub. If you have been referencing the dataverse-ansible repository from IQSS and the dataverse-previewers from QDR, please instead use them from their new locations:

https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers

Harvesting Improvements

Many updates have been made to address common Harvesting failures. You may see Harvests complete more often and have a higher success rate on a dataset-by-dataset basis.

New JVM Options and Database Settings

Several new JVM options and DB Settings have been added in this release. More documentation about each of these settings can be found in the Configuration section of the Installation Guide.

New JVM Options

doi.dataciterestapiurlstring: Set with a value of "https://api.datacite.org" for production environments and "https://api.test.datacite.org" for test environments. Must be set if you are using DataCite as your DOI provider.
dataverse.useripaddresssourceheader: If set, specifies an HTTP Header such as X-Forwarded-For to use to retrieve the user's IP address. This setting is useful in cases such as running Dataverse behind load balancers where the default option of getting the Remote Address from the servlet isn't correct (e.g. it would be the load balancer IP address). Note that unless your installation always sets the header you configure here, this could be used as a way to spoof the user's address. See the Configuration section of the Installation Guide for more information about proper use and security concerns.
http.request-timeout-seconds: To facilitate large file upload and download, the Dataverse installer bumps the Payara server-config.network-config.protocols.protocol.http-listener-1.http.request-timeout-seconds setting from its default 900 seconds (15 minutes) to 1800 (30 minutes).

New Database Settings

:CustomZipDownloadServiceUrl: If defined, this is the URL of the zipping service outside the main application server where zip downloads should be directed (instead of /api/access/datafiles/).
:ShibAttributeCharacterSetConversionEnabled: By default, all attributes received from Shibboleth are converted from ISO-8859-1 to UTF-8. You can disable this behavior by setting to false.
:ChronologicalDateFacets: Facets with Date/Year are sorted chronologically by default, with the most recent value first. To have them sorted by number of hits, e.g. with the year with the most results first, set this to false.
:NavbarGuidesUrl: Set to a fully-qualified URL which will be used for the "User Guide" link in the navbar.
:FileValidationOnPublishEnabled: Toggles validation of the physical files in the dataset when it's published, by recalculating the checksums and comparing against the values stored in the DataFile table. By default this setting is absent and Dataverse assumes it to be true. If enabled, the validation will be performed asynchronously, similarly to how we handle assigning persistent identifiers to datafiles, with the dataset locked for the duration of the publishing process.

Custom Analytics Code Changes

You should update your custom analytics code to implement necessary changes for tracking updated dataset and file buttons. There was also a fix to the analytics code that will now properly track downloads for tabular files.

For more information, see the documentation and sample analytics code snippet provided in Installation Guide > Configuration > Web Analytics Code to reflect the changes implemented in this version (#6938/#6684).

Tracking Users' IP Addresses Behind an Address-Masking Proxy

It is now possible to collect real user IP addresses in MDC logs and/or set up an IP group on a system running behind a proxy/load balancer that hides the addresses of incoming requests. See "Recording User IP Addresses" in the Configuration section of the Installation Guide.

Reload Astrophysics Metadata Block (if used)

Tooltips have been updated for the Astrophysics Metadata Block. If you'd like these updated Tooltips to be displayed to users of your installation, you should update the Astrophysics Metadata Block:

curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @astrophysics.tsv -H "Content-type: text/tab-separated-values"

We've included this in the step-by-step instructions below.

Run ReExportall

We made changes to the JSON Export in this release. If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process following the steps in Admin Guide

We've included this in the step-by-step instructions below.

Notes for Tool Developers and Integrators

Complete List of Changes

For the complete list of code changes in this release, see the 5.0 Milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide

Upgrade Instructions

Upgrade from Glassfish 4.1 to Payara 5

The instructions below describe the upgrade procedure based on moving an existing glassfish4 domain directory under Payara. We recommend this method instead of setting up a brand-new Payara domain using the installer because it appears to be the easiest way to recreate your current configuration and preserve all your data.

Download Payara, v5.2020.2 as of this writing:

curl -L -O https://github.com/payara/Payara/releases/download/payara-server-5.2020.2/payara-5.2020.2.zip sha256sum payara-5.2020.2.zip 1f5f7ea30901b1b4c7bcdfa5591881a700c9b7e2022ae3894192ba97eb83cc3e

Unzip it somewhere (/usr/local is a safe bet)

sudo unzip payara-5.2020.2.zip -d /usr/local/

Copy the Postgres driver to /usr/local/payara5/glassfish/lib

sudo cp /usr/local/glassfish4/glassfish/lib/postgresql-42.2.9.jar /usr/local/payara5/glassfish/lib/

Move payara5/glassfish/domains/domain1 out of the way

sudo mv /usr/local/payara5/glassfish/domains/domain1 /usr/local/payara5/glassfish/domains/domain1.orig

Undeploy the Dataverse web application (if deployed; version 4.20 is assumed in the example below)

sudo /usr/local/glassfish4/bin/asadmin list-applications sudo /usr/local/glassfish4/bin/asadmin undeploy dataverse-4.20

Stop Glassfish; copy domain1 to Payara

sudo /usr/local/glassfish4/bin/asadmin stop-domain sudo cp -ar /usr/local/glassfish4/glassfish/domains/domain1 /usr/local/payara5/glassfish/domains/

Remove the cache directories

sudo rm -rf /usr/local/payara5/glassfish/domains/domain1/generated/
sudo rm -rf /usr/local/payara5/glassfish/domains/domain1/osgi-cache/

In domain.xml:

Replace the -XX:PermSize and -XX:MaxPermSize JVM options with -XX:MetaspaceSize and -XX:MaxMetaspaceSize.

-XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m

Add the below JVM options beneath the -Ddataverse settings:

-Dfish.payara.classloading.delegate=false -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+DisableExplicitGC

Also in domain.xml, replace the follow element:

<jdbc-connection-pool datasource-classname="org.apache.derby.jdbc.EmbeddedXADataSource" name="__TimerPool" res-type="javax.sql.XADataSource"> <property name="databaseName" value="${com.sun.aas.instanceRoot}/lib/databases/ejbtimer"></property><property name="connectionAttributes" value=";create=true"></property></jdbc-connection-pool>

with

<jdbc-connection-pool datasource-classname="org.h2.jdbcx.JdbcDataSource" name="__TimerPool" res-type="javax.sql.XADataSource"> <property name="URL" value="jdbc:h2:${com.sun.aas.instanceRoot}/lib/databases/ejbtimer;AUTO_SERVER=TRUE"></property> </jdbc-connection-pool>

Change any full pathnames /usr/local/glassfish4/... to /usr/local/payara5/... or whatever it is in your case. (Specifically check the -Ddataverse.files.directory and -Ddataverse.files.file.directory JVM options)
In domain1/config/jhove.conf, change the hard-coded /usr/local/glassfish4 path, as above.

(Optional): If you renamed your service account from glassfish to payara or appserver, update the ownership permissions. The Installation Guide recommends a service account of dataverse:

sudo chown -R dataverse /usr/local/payara5/glassfish/domains/domain1 sudo chown -R dataverse /usr/local/payara5/glassfish/lib

You will also need to check that the service account has write permission on the files directory, if they are located outside the old Glassfish domain. And/or make sure the service account has the correct AWS credentials, if you are using S3 for storage.
Finally, start Payara:

sudo -u dataverse /usr/local/payara5/bin/asadmin start-domain

Deploy the Dataverse 5 warfile:

sudo -u dataverse /usr/local/payara5/bin/asadmin deploy /path/to/dataverse-5.0.war

Then restart Payara:

sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain sudo -u dataverse /usr/local/payara5/bin/asadmin start-domain

Additional Upgrade Steps

Update Astrophysics Metadata Block (if used)

wget https://github.com/IQSS/dataverse/releases/download/v5.0/astrophysics.tsv curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @astrophysics.tsv -H "Content-type: text/tab-separated-values"

(Recommended) Run ReExportall to update JSON Exports

http://guides.dataverse.org/en/5.0/admin/metadataexport.html?highlight=export#batch-exports-through-the-api

(Required for installations using DataCite) Add the JVM option doi.dataciterestapiurlstring

For production environments:

/usr/local/payara5/bin/asadmin create-jvm-options "\-Ddoi.dataciterestapiurlstring=https\://api.datacite.org"

For test environments:

/usr/local/payara5/bin/asadmin create-jvm-options "\-Ddoi.dataciterestapiurlstring=https\://api.test.datacite.org"

The JVM option doi.mdcbaseurlstring should be deleted if it was previously set, for example:

/usr/local/payara5/bin/asadmin delete-jvm-options "\-Ddoi.mdcbaseurlstring=https\://api.test.datacite.org"

(Recommended for installations using DataCite) Pre-register DOIs

Execute the script described in the section "Dataverse Installations Using DataCite: Upgrade Action Recommended" earlier in the Release Note.

Please consult the earlier sections of the Release Note for any additional configuration options that may apply to your installation.

- Java
Published by djbrooke almost 6 years ago

dataverse - 4.20

Dataverse 4.20

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Multiple Store Support

Dataverse can now be configured to store files in more than one place at the same time (multiple file, s3, and/or swift stores).

General information about this capability can be found below and in the Configuration Guide - File Storage section.

S3 Direct Upload support

S3 stores can now optionally be configured to support direct upload of files, as one option for supporting upload of larger files. In the current implementation, each file is uploaded in a single HTTP call. For AWS, this limits file size to 5 GB. With Minio the theoretical limit should be 5 TB and 50+ GB file uploads have been tested successfully. (In practice other factors such as network timeouts may prevent a successful upload a multi-TB file and minio instances may be configured with a < 5 TB single HTTP call limit.) No other S3 service providers have been tested yet. Their limits should be the lower of the maximum object size allowed and any single HTTP call upload limit.

General information about this capability can be found in the Big Data Support Guide with specific information about how to enable it in the Configuration Guide - File Storage section.

Integration Test Coverage Reporting

The percentage of code covered by the API-based integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.

New APIs

New APIs for Role Management and Dataset Size have been added. Previously, managing roles at the dataset and file level was only possible through the UI. API users can now also retrieve the size of a dataset through an API call, with specific parameters depending on the type of information needed.

More information can be found in the API Guide.

Major Use Cases

Newly-supported use cases in this release include:

Users will now be able to see the number of linked datasets and dataverses accurately reflected in the facet counts on the Dataverse search page. (Issue #6564, PR #6262)
Users will be able to upload large files directly to S3. (Issue #6489, PR #6490)
Users will be able to see the PIDs of datasets and files in the Guestbook export. (Issue #6534, PR #6628)
Administrators will be able to configure multiple stores per Dataverse installation, which allow dataverse-level setting of storage location, upload size limits, and supported data transfer methods (Issue #6485, PR #6488)
Administrators and integrators will be able to manage roles using a new API. (Issue #6290, PR #6622)
Administrators and integrators will be able to determine a dataset's size. (Issue #6524, PR #6609)
Integrators will now be able to retrieve the number of files in a dataset as part of a single API call instead of needing to count the number of files in the response. (Issue #6601, PR #6623)

Notes for Dataverse Installation Administrators

Potential Data Integrity Issue

We recently discovered a potential data integrity issue in Dataverse databases. One manifests itself as duplicate DataFile objects created for the same uploaded file (https://github.com/IQSS/dataverse/issues/6522); the other as duplicate DataTable (tabular metadata) objects linked to the same DataFile (https://github.com/IQSS/dataverse/issues/6510). This issue impacted approximately .03% of datasets in Harvard's Dataverse.

To see if any datasets in your installation have been impacted by this data integrity issue, we've provided a diagnostic script here:

https://github.com/IQSS/dataverse/raw/develop/scripts/issues/6510/checkdatafiles6522_6510.sh

The script relies on the PostgreSQL utility psql to access the database. You will need to edit the credentials at the top of the script to match your database configuration.

If neither of the two issues is present in your database, you will see a message "... no duplicate DataFile objects in your database" and "no tabular files affected by this issue in your database".

If either, or both kinds of duplicates are detected, the script will provide further instructions. We will need you to send us the produced output. We will then assist you in resolving the issues in your database.

Multiple Store Support Changes

Existing installations will need to make configuration changes to adopt this version, regardless of whether additional stores are to be added or not.

Multistore support requires that each store be assigned a label, id, and type - see the Configuration Guide for a more complete explanation. For an existing store, the recommended upgrade path is to assign the store id based on it's type, i.e. a 'file' store would get id 'file', an 's3' store would have the id 's3'.

With this choice, no manual changes to datafile 'storageidentifier' entries are needed in the database. If you do not name your existing store using this convention, you will need to edit the database to maintain access to existing files.

The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade: For a file store:

./asadmin create-jvm-options "\-Ddataverse.files.file.type=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.label=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.directory=<your directory>"

For a s3 store:

./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"
./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"
./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=<your_bucket_name>"
./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=<your_bucket_name>"

Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured.

Once these options are set, restarting the Glassfish service is all that is needed to complete the change.

Note that the "-Ddataverse.files.directory", if defined, continues to control where temporary files are stored (in the /temp subdir of that directory), independent of the location of any 'file' store defined above.

Also note that the :MaxFileUploadSizeInBytes property has a new option to provide independent limits for each store instead of a single value for the whole installation. The default is to apply any existing limit defined by this property to all stores.

Direct S3 Upload Changes

Direct upload to S3 is enabled per store by one new jvm option:

./asadmin create-jvm-options "\-Ddataverse.files.<id>.upload-redirect=true"

The existing :MaxFileUploadSizeInBytes property and dataverse.files.<id>.url-expiration-minutes jvm option for the same store also apply to direct upload.

Direct upload via the Dataverse web interface is transparent to the user and handled automatically by the browser. Some minor differences in file upload exist: directly uploaded files are not unzipped and Dataverse does not scan their content to help in assigning a MIME type. Ingest of tabular files and metadata extraction from FITS files will occur, but can be turned off for files above a specified size limit through the new dataverse.files.<id>.ingestsizelimit jvm option.

API calls to support direct upload also exist, and, if direct upload is enabled for a store in Dataverse, the latest DVUploader (v1.0.8) provides a'-directupload' flag that enables its use.

Solr Update

With this release we upgrade to the latest available stable release in the Solr 7.x branch. We recommend a fresh installation of Solr 7.7.2 (the index will be empty) followed by an "index all".

Before you start the "index all", Dataverse will appear to be empty because the search results come from Solr. As indexing progresses, results will appear until indexing is complete.

Dataverse Linking Fix

The fix implemented for #6262 will display the datasets contained in linked dataverses in the linking dataverse. The full reindex described above will correct these counts. Going forward, this will happen automatically whenever a dataverse is linked.

Google Analytics Download Tracking Bug

The button tracking capability discussed in the installation guide (http://guides.dataverse.org/en/4.20/installation/config.html#id88) relies on an analytics-code.html file that must be configured using the :WebAnalyticsCode setting. The example file provided in the installation guide is no longer compatible with recent Dataverse releases (>v4.16). Installations using this feature should update their analytics-code.html file by following the installation instructions using the updated example file. Alternately, sites can modify their existing files to include the one-line change made in the example file at line 120.

Run ReExportall

We made changes to the JSON Export in this release (Issue #6650, PR #6669). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below.

New JVM Options and Database Settings

New JVM Options for file storage drivers

The JVM option dataverse.files.file.directory=<your directory> controls where temporary files are stored (in the /temp subdir of the defined directory), independent of the location of any 'file' store defined above.
The JVM option dataverse.files.<id>.upload-redirect enables direct upload of files added to a dataset to the S3 bucket. (S3 stores only!)
The JVM option dataverse.files.<id>.ingestsizelimit controls the maximum size of files for which ingest will be attempted, for the given file store.

New Database Settings for Shibboleth

The database setting :ShibAffiliationAttribute can now be set to prevent affiliations for Shibboleth users from being reset upon each log in.

Notes for Tool Developers and Integrators

Integration Test Coverage Reporting

API-based integration tests are run every time a branch is merged to develop and the percentage of code covered by these integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.

Guestbook Column Changes

Users of downloaded guestbooks should note that two new columns have been added:

Dataset PID
File PID

If you are expecting column in the CSV file to be in a particular order, you will need to make adjustments.

Old columns: Guestbook, Dataset, Date, Type, File Name, File Id, User Name, Email, Institution, Position, Custom Questions

New columns: Guestbook, Dataset, Dataset PID, Date, Type, File Name, File Id, File PID, User Name, Email, Institution, Position, Custom Questions

API Changes

As reported in #6570, the affiliation for dataset contacts has been wrapped in parentheses in the JSON output from the Search API. These parentheses have now been removed. This is a backward incompatible change but it's expected that this will not cause issues for integrators.

Role Name Change

The role alias provided in API responses has changed, so if anything was hard-coded to "editor" instead of "contributor" it will need to be updated.

Complete List of Changes

For the complete list of code changes in this release, see the 4.20 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start.

service glassfish stop
remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
service glassfish start

Install and configure Solr v7.7.2

See http://guides.dataverse.org/en/4.20/installation/prerequisites.html#installing-solr

Deploy this version.

<glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.20.war

The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade:

For a file store:

./asadmin create-jvm-options "\-Ddataverse.files.file.type=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.label=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.directory=<your directory>"

For a s3 store:

./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"
./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"
./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=<your_bucket_name>"
./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=<your_bucket_name>"

Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured.

Restart glassfish.
Update Citation Metadata Block

wget https://github.com/IQSS/dataverse/releases/download/v4.20/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

Kick off full reindex

http://guides.dataverse.org/en/4.20/admin/solr-search-index.html

(Recommended) Run ReExportall to update JSON Exports

http://guides.dataverse.org/en/4.20/admin/metadataexport.html?highlight=export#batch-exports-through-the-api

- Java
Published by kcondon about 6 years ago

dataverse - 4.19

Dataverse 4.19

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Open ID Connect Support

Dataverse now provides basic support for any OpenID Connect (OIDC) compliant authentication provider.

Prior to supporting this standard, new authentication methods needed to be added by pull request. OIDC support provides a standardized way for authentication, sharing user information, and more. You are able to use any compliant provider just by loading a configuration file, without touching the codebase. While the usual prominent providers like Google and others feature OIDC support there are plenty of other options to easily attach your installation to a custom authentication provider, using enterprise grade software.

See the OpenID Connect Login Options documentation in the Installation Guide for more details.

This is to be extended with support for attribute mapping, group syncing and more in future versions of the code.

Python Installer

We are introducing a new installer script, written in Python. It is intended to eventually replace the old installer (written in Perl). For now it is being offered as an (experimental) alternative.

See README_python.txt in scripts/installer and/or in the installer bundle for more information.

Major Use Cases

Newly-supported use cases in this release include:

Dataverse installation administrators will be able to experiment with a Python Installer (Issue #3937, PR #6484)
Dataverse installation administrators will be able to set up an OIDC-compliant login options by editing a configuration file and with no need for a code change (Issue #6432, PR #6433)
Following setup by a Dataverse administration, users will be able to log in using OIDC-compliant methods (Issue #6432, PR #6433)
Users of the Search API will see additional fields in the JSON output (Issues #6300, #6396, PR #6441)
Users loading the support form will now be presented with the math challenge as expected and will be able to successfully send an email to support (Issue #6307, PR #6462)
Users of https://mybinder.org can now spin up Jupyter Notebooks and other computational environments from Dataverse DOIs (Issue #4714, PR #6453)

Notes for Dataverse Installation Administrators

Security vulnerability in Solr

A serious security issue has recently been identified in multiple versions of Solr search engine, including v.7.3 that Dataverse is currently using. Follow the instructions below to verify that your installation is safe from a potential attack. You can also consult the following link for a detailed description of the issue:

RCE in Solr via Velocity Template.

The vulnerability allows an intruder to execute arbitrary code on the system running Solr. Fortunately, it can only be exploited if Solr API access point is open to direct access from public networks (aka, "the outside world"), which is NOT needed in a Dataverse installation.

We have always recommended having Solr (port 8983) firewalled off from public access in our installation guides. But we recommend that you double-check your firewall settings and verify that the port is not accessible from outside networks. The simplest quick test is to try the following URL in your browser:

  `http://<your Solr server address>:8983`

and confirm that you get "access denied" or that it times out, etc.

In most cases, when Solr runs on the same server as the Dataverse web application, you will only want the port accessible from localhost. We also recommend that you add the following arguments to the Solr startup command: -j jetty.host=127.0.0.1. This will make Solr accept connections from localhost only; adding redundancy, in case of the firewall failure.

In a case where Solr needs to run on a different host, make sure that the firewall limits access to the port only to the Dataverse web host(s), by specific ip address(es).

We would also like to reiterate that it is simply never a good idea to run Solr as root! Running the process as a non-privileged user would substantially minimize any potential damage even in the event that the instance is compromised.

Citation and Geospatial Metadata Block Updates

We updated two metadata blocks in this release. Updating these metadata blocks is mentioned in the step-by-step upgrade instructions below.

Run ReExportall

We made changes to the JSON Export in this release (#6426). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below.

BinderHub

https://mybinder.org now supports spinning up Jupyter Notebooks and other computational environments from Dataverse DOIs.

Widgets update for OpenScholar

We updated the code for widgets so that they will keep working in OpenScholar sites after the upcoming upgrade OpenScholar upgrade to Drupal 8. If users of your dataverse have embedded widgets on an Openscholar site that upgrades to Drupal 8, you will need to run this Dataverse version (or later) for the widgets to keep working.

Payara tech preview

Dataverse 4 has always run on Glassfish 4.1 but changes in this release (PR #6523) should open the door to upgrading to Payara 5 eventually. Production installations of Dataverse should remain on Glassfish 4.1 but feedback from any experiments running Dataverse on Payara 5 is welcome via the usual channels.

Notes for Tool Developers and Integrators

Search API

The boolean parameter query_entities has been removed from the Search API. The former "true" behavior of "whether entities are queried via direct database calls (for developer use)" is now always true.

Additional fields are now available via the Search API, mostly related to information about specific dataset versions.

Complete List of Changes

For the complete list of code changes in this release, see the 4.19 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start.

service glassfish stop
remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
service glassfish start

Deploy this version.

<glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.19.war

Restart glassfish.
Update Geospatial Metadata Block

wget https://github.com/IQSS/dataverse/releases/download/v4.19/geospatial.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @geospatial.tsv -H "Content-type: text/tab-separated-values"

(Optional) Run ReExportall to update JSON Exports

http://guides.dataverse.org/en/4.19/admin/metadataexport.html?highlight=export#batch-exports-through-the-api

- Java
Published by djbrooke over 6 years ago

dataverse - 4.18.1

Dataverse 4.18.1

This release provides a fix for a regression introduced in 4.18 and implements a few other small changes.

Release Highlights

Proper Validation Messages

When creating or editing dataset metadata, users were not receiving field-level indications about what entries failed validation and were only receiving a message at the top of the page. This fix restores field-level indications.

Major Use Cases

Use cases in this release include:

Users will receive the proper messaging when dataset metadata entries are not valid.
Users can now view the expiration date of an API token and revoke a token on the API Token tab of the account page.

Complete List of Changes

For the complete list of code changes in this release, see the 4.18.1 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start.

service glassfish stop
remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
service glassfish start

Deploy this version.

<glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.18.1.war

Restart glassfish.

- Java
Published by kcondon over 6 years ago

dataverse - 4.18

Dataverse 4.18

Note: There is an issue in 4.18 with the display of validation messages on the dataset page (#6380) and we recommend using 4.18.1 for any production environments.

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

File Page Previews and Previewers

File-level External Tools can now be configured to display in a "Preview Mode" designed for embedding within the file landing page.

While not technically part of this release, previewers have been made available for several common file types. The previewers support for spreadsheet, image, text, document, audio, video, html files and more. These previewers can be found in the Qualitative Data Repository Github Repository. The spreadsheet viewer was contributed by the Dataverse SSHOC project.

Microsoft Login

Users can now create Dataverse accounts and login using self-provisioned Microsoft accounts such as live.com and outlook.com. Users can also use Microsoft accounts managed by their institutions. This new feature not only makes it easier to log in to Dataverse but will also streamline the interaction between any external tools that utilize Azure services that require login.

Add Data and Host Dataverse

More workflows to add data have been added across the UI, including a new button on the My Data tab of the Account page, as well as a link in the Dataverse navbar, which will display on every page. This will provider users much easier access to start depositing data. By default, the Host Dataverse will be the installation root dataverse for these new Add Data workflows, but there is now a dropdown component allowing creators to select a dataverse you have proper permissions to create a new dataverse or dataset in.

Primefaces 7

Primefaces, the open source UI framework upon which the Dataverse front end is built, has been updated to the most recent version. This provides security updates and bug fixes and will also allow Dataverse developers to take advantage of new features and enhancements.

Integration Test Pipeline and Test Health Reporting

As part of the Dataverse Community's ongoing efforts to provide more robust automated testing infrastructure, and in support of the project's desire to have the develop branch constantly in a "release ready" state, API-based integration tests are now run every time a branch is merged to develop. The status of the last test run is available as a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.

Make Data Count Metrics Updates

A new configuration option has been added that allows Make Data Count metrics to be collected, but not reflected in the front end. This option was designed to allow installations to collect and verify metrics for a period before turning on the display to users. It is suggested that installations turn on Make Data Count as part of the upgrade.

Search API Enhancements

The Dataverse Search API will now display unpublished content when an API token is passed (and appropriate permissions exist).

Additional Dataset Author Identifiers

The following dataset author identifiers are now supported:

DAI: https://en.wikipedia.org/wiki/DigitalAuthorIdentifier
ResearcherID: http://researcherid.com
ScopusID: https://www.scopus.com

Major Use Cases

Newly-supported use cases in this release include:

Users can view previews of several common file types, eliminating the need to download or explore a file just to get a quick look.
Users can log in using self-provisioned Microsoft accounts and also can log in using Microsoft accounts managed by an organization.
Dataverse administrators can now revoke and regenerate API tokens with an API call.
Users will receive notifications when their ingests complete, and will be informed if the ingest was a success or failure.
Dataverse developers will receive feedback about the health of the develop branch after their pull request was merged.
Dataverse tool developers will be able to query the Dataverse API for unpublished data as well as published data.
Dataverse administrators will be able to collect Make Data Count metrics without turning on the display for users.
Users with a DAI, ResearcherID, or ScopusID and use these author identifiers in their datasets.

Notes for Dataverse Installation Administrators

API Token Management

You can now delete a user's API token, recreate a user's API token, and find a token's expiration date. See the Native API guide for more information.

New JVM Options

:mdcbaseurlstring allows dataverse administrators to use a test base URL for Make Data Count.

New Database Settings

:DisplayMDCMetrics can be set to false to disable display of MDC metrics.

Notes for Tool Developers and Integrators

Preview Mode

Tool Developers can now add the hasPreviewMode parameter to their file level external tools. This setting provides an embedded, simplified view of the tool on the file pages for any installation that installs the tool. See Building External Tools for more information.

API Token Management

If your tool writes content back to Dataverse, you can now take advantage of administrative endpoints that delete and re-create API tokens. You can also use an endpoint that provides the expiration date of a specific API token. See the Native API guide for more information.

View Unpublished Data Using Search API

If you pass a token, the search API output will include unpublished content.

Complete List of Changes

For the complete list of code changes in this release, see the 4.18 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start.

service glassfish stop
remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
service glassfish start

Deploy this version.

<glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.18.war

Restart glassfish.
Update Citation Metadata Block

wget https://github.com/IQSS/dataverse/releases/download/v4.18/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

(Recommended) Enable Make Data Count if your installation plans to make use of it at some point in the future.

- Java
Published by kcondon over 6 years ago

dataverse - 4.17

Dataverse 4.17

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Dataset Level Explore Tools

Tools that integrate with Dataverse can now be launched from the dataset page! This makes it possible to develop and add tools that work across the entire dataset instead of single files. Tools to verify reproducibility and allow researchers to compute on an entire dataset will take advantage of this new infrastructure.

Performance Enhancements

Dataverse now allows installation administrators to configure the session timeout for logged in users using the new :LoginSessionTimeout setting. (Session length for anonymous users has been reduced from 24 hours to 10 minutes.) Setting this lower will release system resources as configured and will result in better performance (less memory use) throughout a Dataverse installation.

Dataverse and Dataset pages have also been optimized to discard more of the objects they allocate immediately after the page load. Thus keeping less memory permanently tied up for the duration of the user's login session. These savings are especially significant in the Dataverse page.

Major Use Cases

Newly-supported use cases in this release include:

As a user, I can launch and utilize external tools that allow me to work across the code, data, and other files in a dataset.
As a user, I can add a footer to my dataverse to show the logo for a funder or other entity.
As a developer, I can build external tools to verify reproducibility or allow computation.
As a developer, I can check to see the impact of my proposed changes on memory utilization.
As an installation administrator, I can make a quick configuration change to provide a better experience for my installation's users.

Notes for Dataverse Installation Administrators

Configurable User Session Timeout

Idle session timeout for logged-in users has been made configurable in this release. The default is now set to 8 hours (this is a change from the previous default value of 24 hours). If you want to change it, set the setting :LoginSessionTimeout to the new value in minutes. For example, to reduce the timeout to 4 hours:

curl -X PUT -d 240 http://localhost:8080/api/admin/settings/:LoginSessionTimeout

Once again, this is the session timeout for logged-in users only. For the anonymous sessions the sessions are set to time out after the default session-timeout value (also in minutes) in the web.xml of the Dataverse application, which is set to 10 minutes. You will most likely not ever need to change this, but if you do, configure it by editing the web.xml file.

Flexible Solr Schema, optionally reconfigure Solr

With this release, we moved all fields in Solr search index that relate to the default metadata schemas from schema.xml to separate files. Custom metadata block configuration of the search index can be more easily automated that way. For details, see admin/metadatacustomization.html#updating-the-solr-schema.

This is optional, but all future changes will go to these files. It might be a good idea to reconfigure Solr now or be aware to look for changes to these files in the future, too. Here's how:

You will need to replace or modify your schema.xml with the recent one (containing XML includes)
Copy schema_dv_mdb_fields.xml and schema_dv_mdb_copies.xml to the same location as the schema.xml
A re-index is not necessary as long no other changes happened, as this is only a reorganization of Solr fields from a single schema.xml file into multiple files.

In case you use custom metadata blocks, you might find the new updateSchemaMDB.sh script beneficial. Again, see http://guides.dataverse.org/en/4.17/admin/metadatacustomization.html#updating-the-solr-schema

Memory Benchmark Test

Developers and installation administrators can take advantage of new scripts to produce graphs of memory usage and garbage collection events. This is helpful for developers to investigate the implications of changes on memory usage and it is helpful for installation administrators to compare graphs across releases or time periods. For details see the scripts/tests/ec2-memory-benchmark directory.

New Database Settings

:LoginSessionTimeout controls the session timeout (in minutes) for logged-in users.

Notes for Tool Developers and Integrators

New Features and Breaking Changes for External Tool Developers

The good news is that external tools can now be defined at the dataset level and there is new and improved documentation for external tool developers, linked below.

Additionally, the reserved words {datasetPid}, {{filePid}, and {localeCode} were added. Please consider making it possible to translate your tool into various languages! The reserved word {datasetVersion} has been made more flexible.

The bad news is that there are two breaking changes. First, tools must now define a "scope" of either "file" or "dataset" for the manifest to be successfully loaded into Dataverse. Existing tools in a Dataverse installations will be assigned a scope of "file" automatically by a SQL migration script but new installations of Dataverse will need to load an updated manifest file with this new "scope" variable.

Second, file level tools that did not previously define a "contentType" are now required to do so. In previously releases, file level tools that did not define a contentType were automatically given a contentType of "text/tab-separated-values" but now Dataverse will refuse to load the manifest file if contentType is not specified.

The Dataverse team has been reaching out to tool makers about these breaking changes and getting various tools working in the https://github.com/IQSS/dataverse-ansible repo. Thank you for your patience as the dust settles around the external tool framework.

For more information, check out new Building External Tools section of the API Guide.

Complete List of Changes

For the complete list of code changes in this release, see the 4.17 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start

service glassfish stop
remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
service glassfish start

Deploy this version.

<glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.17.war

Restart glassfish
Update Citation Metadata Block

wget https://github.com/IQSS/dataverse/releases/download/v4.17/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

If you have any trouble adding an external tool at the dataset level and see warnings about "contenttype" in server.log, it is recommended that you run the following SQL update from pull request #6460:

 ALTER TABLE externaltool ALTER contenttype DROP NOT NULL;

- Java
Published by kcondon over 6 years ago

dataverse - 4.16

Dataverse 4.16

This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.

Release Highlights

Metrics Redesign

The metrics view at both the Dataset and File level has been redesigned. The main driver of this redesign has been the expanded metrics (citations and views) provided through an integration with Make Data Count, but installations that do not adopt Make Data Count will also be able to take advantage of the new metrics panel.

HTML Codebook Export

Users will now be able to download HTML Codebooks as an additional Dataset Export type. This codebook is a more human-readable version of the DDI Codebook 2.5 metadata export and provides valuable information about the contents and structure of a dataset and will increase reusability of the datasets in Dataverse.

Harvesting Improvements

The Harvesting code will now better handle problematic records during incremental harvests. Fixing this will mean not only fewer manual interventions by installation administrators to keep harvesting running, but it will also mean users can more easily find and access data that is important to their research.

Major Use Cases

Newly-supported use cases in this release include:

As a user, I can view the works that have cited a dataset.
As a user, I can view the downloads and views for a dataset, based on the Make Data Count standard.
As a user, I can export an HTML codebook for a dataset.
As a user, I can expect harvested datasets to be made available more regularly.
As a user, I'll encounter fewer locks as I go through the publishing process.
As an installation administrator, I no longer need to destroy a PID in another system after destroying a dataset in Dataverse.

Notes for Dataverse Installation Administrators

Run ReExportall

We made changes to the citation block in this release that will require installations to run ReExportall as part of the upgrade process. We've included this in the detailed instructions below.

Custom Analytics Code Changes

You should update your custom analytics code to include CDATA sections, inside the script tags, around the javascript code. We have updated the documentation and sample analytics code snippet provided in Installation Guide > Configuration > Web Analytics Code to fix a bug that broke the rendering of the 403 and 500 custom error pgs (#5967).

Destroy Updates

Destroying Datasets in Dataverse will now unregister/delete the PID with the PID provider. This eliminates the need for an extra step to "clean up" a PID registration after destroying a Dataset.

Deleting Notifications

In making the fix for #5687 we discovered that notifications created prior to 2018 may have been invalidated. With this release we advise that these older notifications are deleted from the database. The following query can be used for this purpose:

delete from usernotification where date_part('year', senddate) < 2018;

Lock Improvements

In 4.15 a new lock was added to prevent parallel edits. After seeing that the lock was not being released as expected, which required administrator intervention, we've adjusted this code to release the lock as expected.

New Database Settings

:AllowCors - Allows Cross-Origin Resource sharing(CORS). By default this setting is absent and Dataverse assumes it to be true.

Notes for Tool Developers and Integrators

OpenAIRE Export Changes

The OpenAIRE metadata export now correctly expresses information about a dataset's Production Place and GeoSpatial Bounding Box. When users add metadata to Dataverse's Production Place and GeoSpatial Bounding Box fields, those fields are now mapped to separate DataCite geoLocation properties.

Metadata about the software name and version used to create a dataset, Software Name and Software Version, are re-mapped from DataCite's more general descriptionType="Methods" property to descriptionType="TechnicalInfo", which was added in a recent version of the DataCite schema in order to improve discoverability of metadata about the software used to create datasets.

Complete List of Changes

For the complete list of code changes in this release, see the 4.16 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation

If this is a new installation, please see our Installation Guide.

Upgrade

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start

service glassfish stop
remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
service glassfish start

Deploy this version.

<glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.16.war

Restart glassfish
Update Citation Metadata Block

curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"

Run ReExportall to update the citations

http://guides.dataverse.org/en/4.16/admin/metadataexport.html?highlight=export#batch-exports-through-the-api

- Java
Published by kcondon almost 7 years ago

dataverse - 4.15.1

This release adds an important Solr optimization, an API for editing variable metadata, and fixes a bug on the dataset page with searching and filtering of tags with spaces.

For the complete list of issues, see the 4.15.1 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation:

If this is a new installation, please see our Installation Guide.

Upgrade:

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.15.1.war
Restart glassfish

- Java
Published by djbrooke almost 7 years ago

dataverse - 4.15

Note: There is a stability issue in 4.15 and we recommend waiting for 4.15.1 for any production environments. 4.15.1 will also contain fixes for issue #5972, which provides better filtering and sorting for file tags that have spaces.

Note: PostgreSQL 9.6 is required. Previous versions of PostgreSQL do not support ALTER TABLE ADD COLUMN IF NOT EXISTS which is used in an upgrade script. Newer versions of PostgreSQL such as version 10 have not been tested.

This release adds the ability to filter and sort the files in a dataset, better recognition and categorization of file types, accessibility enhancements, and a new API to load language packs in support of internationalization.

For the complete list of issues, see the 4.15 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation:

If this is a new installation, please see our Installation Guide.

Upgrade:

In an effort to prevent accidental duplicate accounts, user spoofing, or other username-based confusion, this release introduces a database constraint that no longer allows usernames that are exactly the same but use different capitalization, e.g. Bob11 vs. bob11. You may need to do some cleanup before upgrading to deal with existing usernames like this.

To check whether you have any usernames like this that need cleaning up, run the case insensitive duplicate queries from our Useful Queries doc.

Once you identify the usernames that need cleaning up, you should use either Merge User Accounts (if it’s the same person) or Change User Identifier (if they are different people). After the cleanup you can safely upgrade without issue.

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
A new version of file type detection software, Jhove, is added in this release. It requires an update of its configuration file: jhove.conf. Download the new configuration file from the Dataverse release page on GitHub, or from the source tree at https://raw.githubusercontent.com/IQSS/dataverse/master/conf/jhove/jhove.conf , and place it in /config/. For example: /usr/local/glassfish4/glassfish/domains/domain1/config/jhove.conf.

Important: If your Glassfish installation directory is different from /usr/local/glassfish4, make sure to edit the header of the config file, to reflect the correct location.

Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.15.war
Restart glassfish
Replace Solr schema.xml to allow sorting and filtering on the file page -stop solr instance (service solr stop, depending on solr installation/OS, see http://guides.dataverse.org/en/4.15/installation/prerequisites.html#solr-init-script) -replace schema.xml cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-7.3.1/server/solr/collection1/conf cp /tmp/dvinstall/solrconfig.xml /usr/local/solr/solr-7.3.1/server/solr/collection1/conf -start solr instance (service solr start, depending on solr/OS)
Kick off in place reindex http://guides.dataverse.org/en/4.15/admin/solr-search-index.html#reindex-in-place curl -X DELETE http://localhost:8080/api/admin/index/timestamps curl http://localhost:8080/api/admin/index/continue
Redetect file types using the new Redetect File Types API:

https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/api/native-api.rst#id31

- Java
Published by djbrooke about 7 years ago

dataverse - 4.14

This release adds OpenAIRE-compliant exports, an option on the Dashboard for superusers to move datasets, and expanded analytics options.

For the complete list of issues, see the 4.14 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation:

If this is a new installation, please see our Installation Guide.

Upgrade:

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.14.war
Restart glassfish

- Java
Published by kcondon about 7 years ago

dataverse - 4.13

This release adds a file tree view at the Dataset level and adds a new API for file level metadata edits. It also reverts an API change from the previous release.

For the complete list of issues, see the 4.13 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation:

If this is a new installation, please see our Installation Guide.

Upgrade:

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
Upgrade your version of PostgreSQL to at least 9.3. Version 9.6 is recommended.
NOTE for Dataverse Installations running OpenStack Swift:

Now all Swift properties have been migrated to domain.xml, no longer needing to maintain a separate swift.properties file, and offering better governability and performance. Furthermore, now the Swift credential's password is stored using create-password-alias, which encrypts the password so that it does not appear in plain text on domain.xml.

In order to migrate to these new configuration settings, please visit http://guides.dataverse.org/en/4.13/installation/config.html#swift-storage

Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.13.war
Restart glassfish

- Java
Published by djbrooke about 7 years ago

dataverse - 4.12

Note: Before using the User Management APIs on Shibboleth or OAuth users, we recommend upgrading to the 4.14 release or later, which will contain the fix for issue #5811. If you have renamed users and are experiencing issues, please contact support@dataverse.org.

This release adds User Management APIs, the ability to edit the hierarchy of files in a dataset, backend support for Make Data Count, and guidance on best practices for making datasets appear in search engines.

For the complete list of issues, see the 4.12 milestone in Github.

For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.

Installation:

If this is a new installation, please see our Installation Guide.

Upgrade:

Undeploy the previous version.

<glassfish install path>/glassfish4/bin/asadmin list-applications
<glassfish install path>/glassfish4/bin/asadmin undeploy dataverse

Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
Upgrade your version of PostgreSQL to at least 9.3. Version 9.6 is recommended.
Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.12.war
Restart glassfish
Replace Solr schema.xml -stop solr instance (service solr stop, depending on solr installation/OS, see http://guides.dataverse.org/en/4.12/installation/prerequisites.html#solr-init-script) -replace schema.xml cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-7.3.0/server/solr/collection1/conf -start solr instance (service solr start, depending on solr/OS)
Kick off in place reindex http://guides.dataverse.org/en/4.12/admin/solr-search-index.html#reindex-in-place curl -X DELETE http://localhost:8080/api/admin/index/timestamps curl http://localhost:8080/api/admin/index/continue
If you are using Web Analytics, please review your "analytics-code.html" fragment (described in Installation Guide > Configuration > Web Analytics Code), and see if any of the script lines contain an empty "async" attribute. In the documentation provided by Google, its value is left blank (as in

Recent Releases of dataverse

dataverse - v6.7.1

Dataverse 6.7.1

Complete List of Changes

Upgrade Instructions

dataverse - v6.7

Dataverse 6.7

Release Highlights

Features Added

Keep S3 Storage Working After December 2025

Limiting Files Per Dataset

Curation Status Label Enhancements

Configurable Search Services

Linking Drafts

Dataset Metadata Can Be Exported From Draft Datasets (via API)

API for Switching Datasets to DOIs, for Example

TK Labels

Model Context Protocol (MCP) Server for Dataverse

AI Guide

OJS 3 is Supported

Tagged Docker Images

Rate Limiting Statistics API

Payara 6.2025.3

Rclone Support

Unique Filenames for Zip Downloads

New Metadata Field Type: String

Tabular Tags Can Now Be Replaced

Make Data Count Improvements

Improved Navigation for Guides

Video Subtitles (vtt Files)

Dataset Types Can Set Available Licenses

Loading Metadata Blocks in Docker

Solr Indexing Speed Improved

Improved Efficiency for Per-Request Filters

New OIDC Feature Flag

dataverse-metadata-crawler

Bugs Fixed

Reduced Chance of Losing Metadata Edits

Improved "Role Has Already Been Granted" Message

NcML Previewer Bug Fix

Search API Bug Fix

Other Bug Fixes

API Updates

Update File Metadata API

Extend Restrict API to Include New Attributes

Categories Can Now Be Replaced

Application Terms of Use Available via API

dvObject and type Fields Added to Featured Items

Edit Dataset Metadata: Removing Fields

Edit Dataset Metadata: Prevent Inconsistencies

api/roles/userSelectable

Security Updates

Updates for Documentation Writers

Sphinx Upgraded

Updates for Developers

Development of Dataverse on Windows

Keycloak SPI for Built-In Users

File Previews Available in Dev Environment, More Docs

XML Parsers

End-Of-Life (EOL) Announcements

Whole Tale EOL

New Settings

Deprecated Settings

Removed Settings

Backward Incompatible Changes

showmydata removed from Search API

curationStatus API

listCurationStates API

XML serialization of empty elements

Complete List of Changes

Getting Help

Installation

Upgrade Instructions

dataverse - v6.6

Dataverse 6.6

Release Highlights

Features Added

Metadata Fields Can Be "Display on Create" Per Collection

ORCIDs Linked to Accounts

Version Notes