Recent Releases of dataverse
dataverse - v6.7.1
Dataverse 6.7.1
This is a bug fix release for Dataverse 6.7.1 that fixes a performance problem when loading the "create dataset" and "edit dataset pages". For details see #11700.
Complete List of Changes
For the complete list of code changes in this release, see the 6.7.1 milestone in GitHub.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Upgrade Instructions
You only need to follow the redeployment instructions below if you had deployed the originally released dataverse-6.7.war in the few days before it was removed from the release page. If you followed the 6.7 instructions in their current form, you should already be running dataverse-6.7.1 below and do not need to do anything else.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.7.
0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.
Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.
shell
export PAYARA=/usr/local/payara6
(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)
1. List deployed applications
shell
$PAYARA/bin/asadmin list-applications
2. Undeploy the previous version (should match "list-applications" above)
shell
$PAYARA/bin/asadmin undeploy dataverse-6.7
3. Download and deploy this version
shell
wget https://github.com/IQSS/dataverse/releases/download/v6.7/dataverse-6.7.1.war
$PAYARA/bin/asadmin deploy dataverse-6.7.1.war
Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.
shell
sudo service payara stop
sudo rm -rf $PAYARA/glassfish/domains/domain1/generated
sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache
sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases
sudo service payara start
- Java
Published by ofahimIQSS 11 months ago
dataverse - v6.7
Dataverse 6.7
Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.7 rather than the list of releases, which will cut them off.
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!
Release Highlights
Highlights for Dataverse 6.7 include:
- Keeping S3 storage working after December 2025
- Limiting files per dataset
- Curation status label enhancements
- Configurable search services
- Linking drafts
- Dataset metadata exports from drafts
- API for switching Datasets to DOIs, for example
- TK Labels
- Model Context Protocol (MCP) server
- A new AI Guide
- OJS 3 is now a supported integration
- Tagged Docker images
- Rate limiting statistics API
- Infrastructure: Payara upgraded to 6.2025.3
- Security fixes
Features Added
Keep S3 Storage Working After December 2025
To support S3 storage, Dataverse uses the AWS SDK. We have upgraded to v2 of this SDK because v1 reaches End Of Life (EOL) in December 2025.
As part of the upgrade, the payload-signing setting for S3 stores (dataverse.files.<id>.payload-signing) has been removed because it is no longer necessary. With the updated SDK, a payload signature will automatically be sent when required (and not sent when not required).
Dataverse developers should note that LocalStack is used to test S3 and older versions appear to be incompatible. The development environment has been upgraded to LocalStack v2.3.2 to v4.2.0, which seems to work fine.
See also #11073 and #11360.
Limiting Files Per Dataset
It's now possible to set a limit on the number of files that can be uploaded to a dataset. Limits can be set globally, per collection, or per dataset.
See also the guides, #11275, and #11359.
Curation Status Label Enhancements
The External/Curation Status Label mechanism has been enhanced:
- adding tracking of who creates the status label and when
- keeping a history of past statuses
- updating the CSV report to include the creation time and assigner of a status
- updating the getCurationStatus API call to return a JSON object for the status with label, assigner, and create time
- adding an includeHistory query param for these API calls to allow seeing prior statuses
- adding a facet to allow filtering by curation status (for users able to set them)
- adding the creation time to Solr as a
pdateto support search by time period, e.g. current status set prior to a given date - standardizing the language around "curation status" vs "external status"
- adding a "curation-status" class to displayed labels to allow styling
- adding a
dataverse.ui.show-curation-status-to-allfeature flag that allows users who can see a draft but not publish it to also view the curation status
Due to changes in the Solr schema (the addition of fields "curationStatus" and "curationStatusCreateTime"), updating the Solr schema and reindexing is required as described below in upgrade instructions. Background reindexing should be OK. See also #9247 and #11268.
Configurable Search Services
Dataverse now has an experimental capability to dynamically add and configure new search engines. The current Dataverse user interface can be configured to use a specified search engine instead of the built-in solr search. The Search API now supports an optional searchService query parameter that allows using any configured search engine. An additional /api/search/services endpoint allows discovery of the services installed.
In addition to two trivial example services designed for testing, Dataverse ships with two search engine classes that support calling an externally-hosted search service (via HTTP GET or POST). These classes rely on the internal solr search to perform access-control and to format the final results, simplifying development of such an external engine.
Details about the new functionality are described in the guides. See also #11281.
Linking Drafts
It is now possible to link draft datasets to other Dataverse collections. As usual, the datasets will only become publicly visible in the linked collection(s) after they have been published. To publish a linked dataset, your account must have the "Publish Dataset" permission for the Dataverse collection in which the dataset was originally created. Permissions in the linked Dataverse collections do not apply. See also #10134.
Dataset Metadata Can Be Exported From Draft Datasets (via API)
In previous versions of Dataverse, it was only possible to export metadata from published datasets. It is now possible to export metadata from draft datasets via API as long as you supply an API token that has access to the draft. As before, when exporting metadata from published datasets, only the latest published version is supported. Internal exporters have been updated to work with drafts but external exporters might need to be updated (Croissant definitely does). See "upgrade instructions" below for details. See the guides, #11305, and #11398.
API for Switching Datasets to DOIs, for Example
In some cases, you might want draft datasets to begin their life with a zero-cost PIDs such as Permalinks and later decide to give certain datasets a DOI. To support use cases like this, a new API for persistent identifier reconciliation has been added.
Here's how it works. An unpublished dataset can be updated with a new pidProvider. If a persistent identifier was already registered when the dataset was registered, this is undone and the new provider (if changed in the meantime) is used. Note that this change does not affect the storage repository where the old identifier is still used. See the guides, #10501, and #10567.
TK Labels
New API calls to find projects at https://localcontextshub.org associated with a dataset have been added. This supports integration via an external vocabulary script that allows users to associate such a project with their dataset and display the associated Notices and Tribal Knowledge Labels.
Connecting to LocalContexts requires a LocalContexts API Key. Using both the production and sandbox (test) LocalContexts servers are supported.
See also the guides and #11294.
Model Context Protocol (MCP) Server for Dataverse
Model Context Protocol (MCP) is a standard for AI Agents to communicate with tools and services, announced in November 2024.
An MCP server for Dataverse has been deployed to https://mcp.dataverse.org, powered by the code at https://github.com/gdcc/mcp-dataverse written by Vyacheslav Tykhonov.
All are welcome to experiment with the MCP Server and give feedback in the thread on Google Group and Zulip. See also #11474.
AI Guide
Information about various Dataverse-related AI efforts have been documented in a new AI Guide. See also #11540 and #11541.
OJS 3 is Supported
OJS 3 (version 3.3 and higher) is now supported as an integration with Dataverse. See the guides and #11518 for details.
Tagged Docker Images
Container image management has been enhanced to provide better support for multiple Dataverse releases and improved maintenance workflows.
Versioned Image Tags: Application ("dataverse") and Config Baker images on Docker Hub now have versioned tags, supporting the latest three Dataverse software releases. This enables users to pin to specific versions (e.g. 6.7), providing better stability for production deployments. Previously, the "alpha" tag could be used, but it was always overwritten by the latest release. Now, you can choose the 6.7 tag, for example, to stay on that version. Please note that the "alpha" tag should no longer be used and will likely be deleted. The equivalent is the new "latest" tag.
Backport Support: Application and Config Baker image builds now support including code backports for past releases, enabling the delivery of security fixes and critical updates to older (supported) versions.
Enhanced Documentation: Container image documentation has been updated to reflect the new versioning scheme and maintenance processes.
Config Baker Base Image Change: The Config Baker image has been migrated from Alpine to Ubuntu as its base operating system, aligning with other container images in the project for consistency and better compatibility. The past releases have not been migrated, only future releases (6.7+) will use Ubuntu.
Workflow Responsibility Split: GitHub Actions workflows for containers have been reorganized with a clear separation of concerns:
container_maintenance.ymlhandles all release-time and maintenance activities- Other workflows focus solely on preview images for development merges and pull requests
These improvements provide more robust container image lifecycle management, better security update delivery, and clearer operational procedures for both development and production environments. See also the Container Guide, #10618, and #11477.
Rate Limiting Statistics API
A new Rate Limiting Statistics API gives insight into the current state of rate limiting such as the number of users being limited and the number of available bucket tokens for a command.
See also the guides, #11413, and #11424.
Payara 6.2025.3
The recommended Payara version has been updated to Payara-6.2025.3. See the upgrade instructions below and #11357.
Rclone Support
Rclone ("rsync for cloud storage") is a command-line program to sync files and directories to and from different cloud storage providers. As of version 1.70 Rclone supports Dataverse. See the announcement, the guides, #11608, and #11609.
Unique Filenames for Zip Downloads
The Data Access APIs that generate multi-file zipped bundles will offer file name suggestions based on the persistent identifiers (for example, doi-10.70122-fk2-xxyyzz.zip), instead of the fixed filename dataverse_files.zip as in prior versions. This means you'll see unique names in your "downloads" folder. See the guides, #9620, and #11466.
New Metadata Field Type: String
The "string" type has been added as a new field type for metadata fields.
In contrast to "text" fields, "string" fields are stored and indexed exactly as provided, without any text analysis or transformations.
This field type is suitable for fields like IDs (e.g. ORCIDs) or enums, where exact matches are required when searching.
See also #11147 and #11321.
Tabular Tags Can Now Be Replaced
Previously the API POST /files/{id}/metadata/tabularTags could only add new tags to the tabular tags list. Now with the query parameter ?replace=true the list of tags will be replaced.
See also the guides, #11292, and #11359.
Make Data Count Improvements
Counter Processor, used to power Make Data Count metrics in Dataverse, is now maintained in the https://github.com/gdcc/counter-processor repository. Multiple improvements to efficiency and scalability have been made. The example counter_daily.sh and counter_weekly.sh scripts that automate using Counter Processor, available from the MDC section of the Dataverse Guides have been updated to work with the latest Counter Processor release and also have minor improvements. See also #11489.
Improved Navigation for Guides
Navigation across the guides has been improved. You can now click in the upper left to go "home". The navbar has been simplified with fewer links. The bottom of every page now has "Next" and "Previous" links. A "Source" link at the bottom has also been added. See also #10942.
Video Subtitles (vtt Files)
Video subtitles (vtt files) are now supported and indexed using full text indexing, if configured.
All new files uploaded with a .vtt extension will be assigned the context type "text/vtt" and shown as "Web Video Text Tracks". See upgrade instructions below to convert existing files.
The upgrade instructions below also explain how to upgrade to v1.5 of the Dataverse Previewers, which includes an updated video previewer that supports subtitles. The new previewer version presents vtt files as subtitles for videos, and the naming convention is <video-basename>.<language-tag>.vtt. The previewer does not rely on the content type. A proper content type may hint users to ask permission for the subtitles together with a video.
See also #11041.
Dataset Types Can Set Available Licenses
Licenses (e.g. "MIT") can now be linked to dataset types (e.g. "software") using new superuser APIs. The create Dataset Type APIs have been extended to allow you to set metadata blocks and/or licenses on the creation of a Dataset Type. (You can change both later.)
If a license is not available for a given Dataset Type then the Create Dataset API will prevent that license from being applied to the dataset. Also, the UI will only show those licenses that are available to the Dataset Type.
For more information, see the guides (overview, new APIs), #10520, and #11385.
Loading Metadata Blocks in Docker
The tutorial on running Dataverse in Docker has been updated to include how to load a metadata block and then update Solr to know about the new fields. See also #11004 and #11204.
Solr Indexing Speed Improved
The performance of Solr indexing has been significantly improved, particularly for datasets with many files.
A new dataverse.solr.min-files-to-use-proxy microprofile setting can be used to further improve performance/lower memory requirements for datasets with many files (e.g. 500+) (defaults to Integer.MAX, disabling use of the new functionality). See also #11374.
Improved Efficiency for Per-Request Filters
This release improves the performance of Dataverse's per-request handling of CORS Headers and API calls.
It adds new JVM options/Microprofile settings (starting with dataverse.cors and dataverse.api) replacing the now deprecated database settings (starting with :BlockedApi and :AllowCors). (See "new settings" and "deprecated settings" below for a full list.)
Additional changes:
- CORS headers can now be configured with a list of desired origins, methods, and allowed and exposed headers.
- An
X-Dataverse-unblock-keyheader has been added that can be used instead of the less secureunblock-keyquery parameter when the:BlockedApiPolicyis set tounblock-key. - Warnings have been added to the log if the Blocked API settings are misconfigured or if the key is weak (when the
unblock-keypolicy is used). - The new
dataverse.api.blocked.keycan be configured using Payara password aliases or other secure storage options.
See also the guides and #11454.
New OIDC Feature Flag
A new feature flag called API_BEARER_AUTH_USE_BUILTIN_USER_ON_ID_MATCH has been introduced, which allows the use of a built-in user account when an identity match is found during OIDC API bearer token authentication.
This feature enables automatic association of an incoming Identity Provider (IdP) identity with an existing built-in user account, bypassing the need for additional user registration steps.
See the guides, #11193, #11197, and #11314.
dataverse-metadata-crawler
The dataverse-metadata-crawler was added to the guides. See #11581.
Bugs Fixed
Reduced Chance of Losing Metadata Edits
Changes were made to the "edit dataset metadata" page to reduce the chance of losing metadata edits.
The remedy for the problem consists of two parts: - Do not show the "Host Dataverse" field when there is nothing to choose. This mimics the behaviour for templates. - When you accidentally start typing in the "Host Dataverse" field, undo the change with backspace, fill in the other metadata fields and save the draft, the page used to get blocked due to an exception. Reloading the page would erase all your input. The exception (caused by an invalid argument) is remedied returning the currently selected value.
See also #11301.
Improved "Role Has Already Been Granted" Message
A simple "role has already been granted" message is now given, fixing a bug where "dataset" was incorrectly indicated instead of "collection". See also #11191 and #11362.
NcML Previewer Bug Fix
Dataverse Previewers v1.4 contains a bug in the NcML previewer that prevented it from working with signed URLs. See #11252 for screenshots.
This has been fixed in the "betatest" and v1.5 versions of the previewer. See also #11252 and #11311. Upgrading to v1.5 of all previewers is recommended in the upgrade instructions below.
Search API Bug Fix
The Search API now returns all type totals (Dataverses, Dataset, and Files) regardless of the list of types requested. None requested types were returned with total count set to 0. &type=dataverse&type=dataset would result in "Files": 0 since type=file was not requested. Now all counts show the correct totals. See also #11280.
Other Bug Fixes
- Deeply nested compound fields are not (yet) supported by Dataverse but the Search API now properly avoids returning duplicate values for them. See #11172.
- An issue causing more than one edit of a versionNote to fail, when done without a page refresh, has been fixed. See #11394.
- The deaccessionedReason was missing in the fileDifferenceSummary json object returned by API GET "$SERVER_URL/api/files/{ID}/versionDifferences". See #11438.
- Memory usage has been reduced and potential memory leaks closed in the metadata exporters. See #11417.
API Updates
Update File Metadata API
A new API endpoint has been added to allow updating file metadata for one or more files in a dataset. See the Native API documentation for details on usage and #11271.
Extend Restrict API to Include New Attributes
The "restrict" API only allowed for a boolean to update the restricted attribute of a file. For backward compatibility, this is still supported, but now a richer JSON object can be passed instead. The JSON object allows for the required restrict flag as well as optional attributes: enableAccessRequest and termsOfAccess.
If enableAccessRequest is false then the termsOfAccess text must also be included.
See the guides, #11299, and #11349.
Categories Can Now Be Replaced
Previously the API POST /files/{id}/metadata/categories could only add new categories to the categories list. Now with the query parameter ?replace=true the list of categories will be replaced.
See also the guides, #11401, and #11359.
Application Terms of Use Available via API
It's now possible to retrieve the Application Terms of Use (called General Terms of Use in the UI) via API. These are the terms users agree to when creating an account. See the guides, #11415 and #11422.
dvObject and type Fields Added to Featured Items
Dataverse Featured Items can now be linked to Dataverses, Datasets, or Datafiles.
Pre-existing featured items as well as new items without dvObjects will be defaulted to type=custom. See also #11414.
Edit Dataset Metadata: Removing Fields
The "edit dataset metadata" endpoint now allows removing fields (by sending empty values) as long as they are not required by the dataset. See also #11243.
Edit Dataset Metadata: Prevent Inconsistencies
A new sourceInternalVersionNumber optional query parameter, which prevents inconsistencies by managing updates that may occur from other users while a dataset is being edited. See also #11243.
api/roles/userSelectable
A new endpoint (api/roles/userSelectable) has been implemented, which returns the appropriate roles that the calling user can use as filters when searching within their data. See #11434.
Security Updates
This release contains important security updates. If you are not receiving security notices, please sign up by following the steps in the guides.
Updates for Documentation Writers
Sphinx Upgraded
Sphinx has been upgraded to 7.4.0 and new dependencies have been added, including semver. Please re-run the pip install -r requirements.txt setup step to upgrade your environment. Otherwise you might see an error like ModuleNotFoundError: No module named 'semver'.
Updates for Developers
Development of Dataverse on Windows
Development of Dataverse on Windows has been confirmed to work as long as you use WSL rather than cmd.exe. See the updated quickstart, the rewritten page on Windows, #10606, and #11583.
Keycloak SPI for Built-In Users
A Keycloak SPI, builtin-users-spi, has been implemented that allows the use of Keycloak on instances with built-in accounts for OIDC authentication, enabling the use of the SPA on those instances.
Looking ahead, this authenticator SPI could also support mapping Shibboleth users coming in through Keycloak to existing Shib users without changing the provider in the Dataverse database. However, this would require changes to the storage provider to support more than just built-in users.
The SPI code is available in the Dataverse code repository (conf/keycloak/builtin-users-spi).
File Previews Available in Dev Environment, More Docs
In Dataverse 6.5 File Previewers were enabled in the "demo or eval" containerized (Dockerized) environment (#11025). These previewers are now available in the development environment as well and documentation has been added explaining how to configure them. See also #10506 and #11181.
XML Parsers
The configuration of XML parsers used in Dataverse has been centralized and unused functionality has been turned off to enhance security. See #11619.
End-Of-Life (EOL) Announcements
Whole Tale EOL
Unfortunately, the Whole Tale project is no longer active and has been removed from the list of integrations in the Admin Guide. See #11497.
New Settings
The following settings have been added:
dataverse.api.blocked.policy: Policy for blocking API endpointsdataverse.api.blocked.endpoints: List of API endpoints to be blocked (comma-separated)dataverse.api.blocked.key: Key for unblocking API endpointsdataverse.bagit.sourceorg.namedataverse.cors.origin: Allowed origins for CORS requestsdataverse.cors.methods: Allowed HTTP methods for CORS requestsdataverse.cors.headers.allow: Allowed headers for CORS requestsdataverse.cors.headers.expose: Headers to expose in CORS responsesdataverse.files.hide-schema-dot-org-download-urls: now configurable via MicroProfile Config, see #11482dataverse.localcontexts.urldataverse.localcontexts.api-keydataverse.search.services.directorydataverse.search.default-servicedataverse.solr.min-files-to-use-proxydataverse.ui.show-curation-status-to-all:GetExternalSearchUrl:GetExternalSearchName:PostExternalSearchUrl:PostExternalSearchName
Deprecated Settings
bagit.SourceOrganizationentry in Bundle.properties:AllowCors:BlockedApiPolicy:BlockedApiEndpoints:BlockedApiKey
Removed Settings
dataverse.files.<id>.payload-signing: See #11360
Backward Incompatible Changes
Generally speaking, see the API Changelog for a list of backward-incompatible API changes.
showmydata removed from Search API
An undocumented Search API parameter called "showmydata" has been removed. It was never exercised by tests and is believed to be unused. API users should use the MyData API instead. See #11287 and #11375.
curationStatus API
/api/datasets/{id}/curationStatus API now includes a JSON object with curation label, createtime, and assigner rather than a string label and it supports a new boolean includeHistory parameter (default false) that returns a JSON array of statuses. See #11268.
listCurationStates API
/api/datasets/{id}/listCurationStates includes new columns "Status Set Time" and "Status Set By" columns listing the time the current status was applied and by whom. It also supports the boolean includeHistory parameter. See #11268.
XML serialization of empty elements
Due to updates in libraries used by Dataverse, XML serialization may have changed slightly with respect to whether self-closing tags are used for empty elements. This primarily affects XML-based metadata exports. The XML structure of the export itself has not changed, so this is only an incompatibility if you are not using an XML parser. See #11360.
Complete List of Changes
For the complete list of code changes in this release, see the 6.7 milestone in GitHub.
Getting Help
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!
Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!
You are also very welcome to join the Global Dataverse Community Consortium (GDCC).
Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.6.
0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.
Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.
shell
export PAYARA=/usr/local/payara6
(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)
1. List deployed applications
shell
$PAYARA/bin/asadmin list-applications
2. Undeploy the previous version (should match "list-applications" above)
shell
$PAYARA/bin/asadmin undeploy dataverse-6.6
3. Stop Payara
shell
sudo service payara stop
4. Upgrade to Payara-6.2025.3
The steps below reuse your existing domain directory with the new distribution of Payara. You may also want to review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara Release Notes. We also recommend you ensure you followed all update instructions from the past releases regarding Payara. (The most recent Payara update was for v6.6.)
Move the current Payara directory out of the way:
shell
mv $PAYARA $PAYARA.6.2025.2
Download the new Payara version 6.2025.3 (from https://www.payara.fish/downloads/payara-platform-community-edition/ or https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2025.3/payara-6.2025.3.zip), and unzip it in its place:
shell
cd /usr/local
unzip payara-6.2025.3.zip
Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1:
shell
mv payara6/glassfish/domains/domain1 payara6/glassfish/domains/domain1_DIST
mv payara6.6.2025.2/glassfish/domains/domain1 payara6/glassfish/domains/
5. Download and deploy this version
shell
wget https://github.com/IQSS/dataverse/releases/download/v6.7/dataverse-6.7.war
$PAYARA/bin/asadmin deploy dataverse-6.7.war
Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.
shell
sudo service payara stop
sudo rm -rf $PAYARA/glassfish/domains/domain1/generated
sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache
sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases
sudo service payara start
6. For installations with internationalization or text customizations:
Please remember to update translations via Dataverse language packs.
If you have text customizations you can get the latest English files from https://github.com/IQSS/dataverse/tree/v6.7/src/main/java/propertyFiles.
7. Restart Payara
shell
sudo service payara stop
sudo service payara start
8. If you have enabled the Croissant exporter, update it and run reExportAll to update dataset metadata exports
After Dataverse 6.6 was released on 2024-03-18, two versions of the Croissant exporter have been released. You are encouraged to upgrade to the latest version, which is 0.1.5.
Under "installation" at the README at https://github.com/gdcc/exporter-croissant you'll find instructions about upgrading the Croissant exporter. In the same repo you can find a changelog if you are curious about what has changed.
Afterwards, we recommend reexporting all dataset metadata. (Reexporting just a single export format, like Croissant, is not supported.) Below is the simple way to reexport all dataset metadata. For more advanced usage, please see the guides.
shell
curl http://localhost:8080/api/admin/metadata/reExportAll
9. Archival bags
If you are using archival bags, be sure that the dataverse.bagit.sourceorg.name JVM option is set.
Archival Bags now use the JVM option dataverse.bagit.sourceorg.name in generating the bag-info.txt file's "Internal-Sender-Identifier" (in addition to its use for "Source-Organization") rather than pulling the value from a deprecated bagit.SourceOrganization entry in Bundle.properties ("Internal-Sender-Identifier" is generated by appending " Catalog" in both cases). Sites using archival bags would not see a change if these settings were already using the same value. See #10680 and #11416.
10. API Filters
Per-request filtering has been improved. Migrate to the new settings as explained below as the old settings have been deprecated.
The deprecated database settings will continue to work in this version. To use the new settings (which are more efficient),
If :AllowCors is not set or is true:
shell
bin/asadmin create-jvm-options -Ddataverse.cors.origin=*
Optionally set origin to a list of hosts and/or set other CORS JvmSettings Your currently blocked API endpoints can be found at http://localhost:8080/api/admin/settings/:BlockedApiEndpoints
Copy them into the new setting with the following command. As with the deprecated setting, the endpoints should be comma-separated.
shell
bin/asadmin create-jvm-options '-Ddataverse.api.blocked.endpoints=<current :BlockedApiEndpoints>'
If :BlockedApiPolicy is set and is not 'drop'
shell
bin/asadmin create-jvm-options '-Ddataverse.api.blocked.policy=<current :BlockedApiPolicy>'
If :BlockedApiPolicy is 'unblock-key' and :BlockedApiKey is set
shell
`echo "API_BLOCKED_KEY_ALIAS=<value of :BlockedApiKey>" > /tmp/dataverse.api.blocked.key.txt`
shell
sudo -u dataverse /usr/local/payara6/bin/asadmin create-password-alias --passwordfile /tmp/dataverse.api.blocked.key.txt
When you are prompted "Enter the value for the aliasname operand", enter api_blocked_key_alias
You should see "Command create-password-alias executed successfully."
shell
bin/asadmin create-jvm-options '-Ddataverse.api.blocked.key=${ALIAS=api_blocked_key_alias}'
Restart Payara:
shell
service payara restart
Check server.log to verify that your new settings are in effect.
Cleanup: delete deprecated settings:
shell
curl -X DELETE http://localhost:8080/api/admin/settings/:AllowCors
curl -X DELETE http://localhost:8080/api/admin/settings/:BlockedApiEndpoints
curl -X DELETE http://localhost:8080/api/admin/settings/:BlockedApiPolicy
curl -X DELETE http://localhost:8080/api/admin/settings/:BlockedApiKey
11. Upgrade to Dataverse Previewers v1.5
Dataverse Previewers has been upgraded to v1.5. See the announcement for upgrade instructions.
12. Re-detect video subtitle (vtt) files
Existing files with extension ".vtt" will keep the content type application/octet-stream presented as "Unknown".
The following query shows the number of files per extension with an "Unknown" content type:
sql
SELECT substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2)) AS extension, COUNT(*) as count
FROM datafile f LEFT JOIN filemetadata m ON f.id = m.datafile_id
WHERE f.contenttype = 'application/octet-stream'
GROUP BY extension;
If vtt does not appear in the result, you are done. Otherwise, you may want to update the content type for existing files and reindex those datasets.
First figure out which datasets would need reindexing:
sql
select distinct
o.protocol, o.authority, o.identifier,
v.versionnumber, v.minorversionnumber, v.versionstate
from datafile f
left join filemetadata m on f.id = m.datafile_id
left join datasetversion v on v.id = m.datasetversion_id
left join dvobject o on o.id = v.dataset_id
WHERE contenttype = 'application/octet-stream'
AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2))
;
Then update the content type for the files:
sql
UPDATE datafile SET contenttype = 'text/vtt' WHERE id IN (
SELECT datafile_id FROM filemetadata m
WHERE contenttype = 'application/octet-stream'
AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2))
);
The vtt files will be reindexed in a step below.
13. Update Solr schema and reindex
Due to changes in the Solr schema (the addition of fields "curationStatus" and"curationStatusCreateTime"), updating the Solr schema and reindexing is required.
Download the updated schema.xml file:
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/schema.xml
cp schema.xml /usr/local/solr/solr-9.8.0/server/solr/collection1/conf
13a. For installations with additional metadata blocks or external controlled vocabulary scripts, update fields
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide).Run the
update-fields.shscript that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your Solr installation):
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.7/conf/solr/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.8.0/server/solr/collection1/conf/schema.xml
Note that Docker-based installations use a different directory: solr/data/data/collection1/conf/schema.xml.
- Start Solr instance (usually
service solr startdepending on Solr/OS).
14. Reindex Solr
shell
curl http://localhost:8080/api/admin/index
- Java
Published by ofahimIQSS 11 months ago
dataverse - v6.6
Dataverse 6.6
Please note: Dataverse 6.6 was released in March 2025 but GitHub shows a newer date because we had to the tag and the master branch in git. (For the gory details, please see the doc about it.) The war file and dvinstall.zip are original, as released in March 2025.
Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.6 rather than the list of releases, which will cut them off.
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!
Release Highlights
Highlights for Dataverse 6.6 include:
- metadata fields can be "display on create" per collection
- ORCIDs linked to accounts
- version notes
- harvesting from DataCite
- citations using Citation Style Language (CSL)
- license metadata enhancements
- metadata fields now support range searches (dates, integers, etc.)
- more accurate search highlighting
- collections can be moved by using the superuser dashboard
- new 3D Objects metadata block
- new Archival metadata block (experimental)
- optionally prevent publishing of datasets without files
- Signposting output now contains links to all dataset metadata export formats
- infrastructure updates (Payara and Solr)
In a recent community call, we talked about many of these highlights if you'd like to watch the video (around 22:30).
Features Added
Metadata Fields Can Be "Display on Create" Per Collection
Collection administrators can now configure which metadata fields appear during dataset creation through the displayOnCreate property, even when fields are not required. This provides greater control over metadata visibility and can help improve metadata completeness.
Currently this feature can only be configured via API, but a UI implementation is planned in #11221. See #10476, #11224, and #11312.
ORCIDs Linked to Accounts
Dataverse now includes improved integration with ORCID, supported through a grant to GDCC from the (ORCID Global Participation Fund).
Specifically, Dataverse users can now link their Dataverse account with their ORCID profile. Previously, this was only available to users who logged in with ORCID. Once linked, Dataverse will automatically prepopulate their ORCID to their author metadata when they create a dataset.
This functionality leverages Dataverse's existing support for login via ORCID, but can be turned on independently of it. If ORCID login is enabled, the user's ORCID will automatically be added to their profile. If the user has logged in via some other mechanism, they are able to click a button to initiate a similar authentication process in which the user must login to their ORCID account and approve the connection.
Feedback from installations that enable this functionality is requested and we expect that updates can be made in the next Dataverse release.
See the User Guide, Installation Guide, #7284, and #11222.
Version Notes
Dataverse now supports the option of adding a version note before or during the publication of a dataset. These notes can be used, for example, to indicate why a version was created or how it differs from the prior version. Whether this feature is enabled is controlled by the flag dataverse.feature.enable-version-note. Version notes are shown in the user interface (in the dataset page version table), indexed (as versionNote), available via the API, and have been added to the JSON, DDI, DataCite, and OAI-ORE exports.
With the addition of this feature, work has been done to clean-up and rename fields that have been used for specifying the reason for deaccessioning a dataset and providing an optional link to a non-Dataverse location where the dataset still can be found. The former was listed in some JSON-based API calls and exports as "versionNote" and is now "deaccessionNote", while the latter was referred to as "archiveNote" and is now "deaccessionLink".
Further, some database consolidation has been done to combine the deaccessionlink and archivenote fields, which appear to have both been used for the same purpose. The deaccessionlink database field is older and also was not displayed in the current UI. Going forward, only the deaccessionlink column exists.
See the User Guide, API Guide #8431, and #11068.
OAI-PMH Harvesting from DataCite
DataCite maintains an OAI server (https://oai.datacite.org/oai) that serves records for every DOI they have registered. There's been a lot of interest in the community in being able to harvest from them. This way, it will be possible to harvest metadata from institution X even if the institution X does not maintain an OAI server of their own, if they happen to register their DOIs with DataCite. One extra element of this harvesting model that makes it especially powerful and flexible is the DataCite's concept of a "dynamic OAI set": a harvester is not limited to harvesting the pre-defined set of ALL the records registered by the institution X, but can instead harvest virtually any arbitrary subset thereof; any query that the DataCite search API understands can be used as an OAI set. The feature is already in use at Harvard Dataverse, as a beta version patch.
For various reasons, in order to take advantage of this feature harvesting clients must be created using the /api/harvest/clients API. Once configured however, harvests can be run from the Harvesting Clients control panel in the UI.
DataCite-harvesting clients must be configured with 2 new feature flags, useListRecords and useOaiIdentifiersAsPids (added in Dataverse 6.5). Note that these features may be of use when harvesting from other sources, not just from DataCite.
See the Admin Guide, API Guide, #10909, and #11011.
Citations Using Citation Style Language (CSL)
This release adds support for generating citations in any of the standard independent formats specified using the Citation Style Language.
The CSL formats are available to copy/paste if you click "Cite Dataset" and then "View Styled Citations" on the dataset page. An API call to retrieve a dataset citation in EndNote, RIS, BibTeX, and CSLJson format has also been added. The first three have been available as downloads from the UI (CSLJson is not) but have not been directly accessible via API until now. The CSLJson format is new to Dataverse and can be used with open source libraries to generate all of the other CSL-style citations.
Admins can use a new dataverse.csl.common-styles setting to highlight commonly used styles. Common styles are listed in the pop-up, while others can be found by type-ahead search in a list of 1000+ options.
See the User Guide, Settings, API Guide, and #11163.
License Metadata Enhancements
- Added new fields to licenses: rightsIdentifier, rightsIdentifierScheme, schemeUri, languageCode. See JSON files under Adding Licenses in the guides
- Updated DataCite metadata export to include rightsIdentifier, rightsIdentifierScheme, and schemeUri consistent with the DataCite 4.5 schema and examples
- Enhanced metadata exports to include all new license fields
- Existing licenses from the example set included with Dataverse will be automatically updated with new fields
- Existing API calls support the new optional fields
See below for upgrade instructions. See also #10883 and #11232.
Range Search
This release enhances how numerical and date fields are indexed in Solr. Previously, all fields were indexed as English text (text_en), but with this update:
- Integer fields are indexed as
plong - Float fields are indexed as
pdouble - Date fields are indexed as
date_range(solr.DateRangeField)
This change enables range queries when searching from both the UI and the API, such as dateOfDeposit:[2000-01-01 TO 2014-12-31] or targetSampleActualSize:[25 TO 50]. See below for a full list of fields that now support range search.
Additionally, search result highlighting is now more accurate, ensuring that only fields relevant to the query are highlighted in search results. If the query is specifically limited to certain fields, the highlighting is now limited to those fields as well. See #10887.
Specifically, the following fields were updated:
- coverage.Depth
- coverage.ObjectCount
- coverage.ObjectDensity
- coverage.Redshift.MaximumValue
- coverage.Redshift.MinimumValue
- coverage.RedshiftValue
- coverage.SkyFraction
- coverage.Spectral.CentralWavelength
- coverage.Spectral.MaximumWavelength
- coverage.Spectral.MinimumWavelength
- coverage.Temporal.StartTime
- coverage.Temporal.StopTime
- dateOfCollectionEnd
- dateOfCollectionStart
- dateOfDeposit
- distributionDate
- dsDescriptionDate
- journalPubDate
- productionDate
- resolution.Redshift
- targetSampleActualSize
- timePeriodCoveredEnd
- timePeriodCoveredStart
New 3D Objects Metadata Block
A new metadata block has been added for describing 3D object data. You can download it from the guides. See also #11120 and #11167.
All new Dataverse installations will receive this metadata block by default. We recommend adding it by following the upgrade instructions below.
New Archival Metadata Block (Experimental)
An experimental "Archival" metadata block has been added, downloadable from the User Guide. The purpose of the metadata block is to enable repositories to register metadata relating to the potential archiving of the dataset at a depositor archive, whether that being your own institutional archive or an external archive, i.e. a historical archive. Feedback is welcome! See also #10626.
Prevent Publishing of Datasets Without Files
Datasets without files can be optionally prevented from being published through a new "requireFilesToPublishDataset" boolean defined at the collection level. This boolean can be set only via API and only by a superuser. See Change Collection Attributes. If the boolean is not set, the parent collection is consulted. If you do not set the boolean, the existing behavior of datasets being able to be published without files will continue. Superusers can still publish datasets whether or not the boolean is set. See #10981 and #10994.
Metadata Source Facet Can Now Differentiate Between Harvested Sources
The behavior of the feature flag index-harvested-metadata-source and the "Metadata Source" facet, which were added and updated, respectively, in Dataverse 6.3 (through pull requests #10464 and #10651), have been updated. A new field called "Source Name" has been added to harvesting clients.
Before Dataverse 6.3, all harvested content (datasets and files) appeared together under "Harvested" under the "Metadata Source" facet. This is still the behavior of Dataverse out of the box. Since Dataverse 6.3, enabling the index-harvested-metadata-source feature flag (and reindexing) resulted in harvested content appearing under the nickname for whatever harvesting client was used to bring in the content. This meant that instead of having all harvested content lumped together under "Harvested", content would appear under "client1", "client2", etc.
With this release, enabling the index-harvested-metadata-source feature flag, populating a new field for harvesting clients called "Source Name" ("sourceName" in the API), and reindexing (see upgrade instructions below) results in the source name appearing under the "Metadata Source" facet rather than the harvesting client nickname. This gives you more control over the name that appears under the "Metadata Source" facet and allows you to reuse the same source name to group harvested content from various harvesting clients under the same name if you wish.
Previously, index-harvested-metadata-source was not documented in the guides, but now you can find information about it under Feature Flags. See also #10217 and #11217.
Globus Framework Improvements
The improvements and optimizations in this release build on top of the earlier work (such as #10781). They are based on the experience gained at IQSS as part of the production rollout of the Large Data Storage services that utilizes Globus.
The changes in this release focus on improving Globus downloads, i.e., transfers from Dataverse-linked Globus volumes to users' Globus collections. Most importantly, the mechanism of "Asynchronous Task Monitoring", first introduced in #10781 for uploads, has been extended to handle downloads as well. This generally makes downloads more reliable, specifically in how Dataverse manages temporary access rules granted to users, minimizing the risk of consequent downloads failing because of stale access rules left in place.
Multiple other improvements have been made making the underlying Globus framework more reliable and robust.
See globus-use-experimental-async-framework under Feature Flags and dataverse.files.globus-monitoring-server in the Installation Guide, #11057, and #11125.
OIDC Bearer Tokens
The release extends the OIDC API auth mechanism, available through feature flag api-bearer-auth, to properly handle cases where BearerTokenAuthMechanism successfully validates the token but cannot identify any Dataverse user because there is no account associated with the token.
To register a new user who has authenticated via an OIDC provider, a new endpoint has been implemented (/users/register). A feature flag named api-bearer-auth-provide-missing-claims has been implemented to allow sending missing user claims in the request JSON. This is useful when the identity provider does not supply the necessary claims. However, this flag will only be considered if the api-bearer-auth feature flag is enabled. If the latter is not enabled, the api-bearer-auth-provide-missing-claims flag will be ignored.
A feature flag named api-bearer-auth-handle-tos-acceptance-in-idp has been implemented. When enabled, it specifies that Terms of Service acceptance is managed by the identity provider, eliminating the need to explicitly include the acceptance in the user registration request JSON.
See the guides, #10959, and #10972.
Signposting Output Now Contains Links to All Dataset Metadata Export Formats
When Signposting was added in Dataverse 5.14 (#8981), it provided links only for the schema.org metadata export format.
The output of HEAD, GET, and the Signposting "linkset" API have all been updated to include links to all available dataset metadata export formats, including any external exporters, such as Croissant, that have been enabled.
This provides a lightweight machine-readable way to first retrieve a list of links, such as via a HTTP HEAD request, to each available metadata export format and then follow up with a request for the export format of interest.
In addition, the content type for the schema.org dataset metadata export format has been corrected. It was application/json and now it is application/ld+json.
See also the guides, #10542 and #11045.
Dataset Types Can Be Linked to Metadata Blocks
Metadata blocks, such as (e.g. "CodeMeta") can now be linked to dataset types (e.g. "software") using new superuser APIs.
This will have the following effects for the APIs used by the new Dataverse UI:
- The list of fields shown when creating a dataset will include fields marked as "displayoncreate" (in the tsv/database) for metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.
- The metadata blocks shown when editing a dataset will include metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.
Mostly in order to write automated tests for the above, a displayOnCreate API endpoint has been added.
For more information, see the guides (overview, new APIs), #10519 and #11001.
Other Features
- In addition to the API Move a Dataverse Collection, it is now possible for a Dataverse administrator to move a collection using the Dataverse dashboard. See #10304 and #11150.
- The Preview URL popup and related documentation have been updated to give more information about anonymous access, including the names of the dataset fields that will be withheld from the Anonymous Preview URL user and to suggest how to review the URL before releasing it. See also #11159 and #11164.
- ROR (Research Organization Registry) has been added as an Author Identifier Type for when the author is an organization rather than a person. Like ORCID, ROR will appear in the "Datacite" metadata export format. See #11075 and #11118.
- The publisher value of harvested datasets is now attributed to the dataset's distributor instead of its producer. This improves the citation associated with these datasets, but the change affects only newly harvested datasets. See "Upgrade Instructions" below on how to re-harvest. For more information, see the guides, #8739, and #9013.
- A new harvest status differentiates between a complete harvest with errors ("completed with failures") and without errors ("completed"). Also, harvest status labels are now internationalized. See #9294 and #11017.
- The OAI-ORE exporter can now export metadata containing nested compound fields or compound fields within compound fields. See #10809 and #11190.
- It is now possible to edit a custom role with the same alias. See #8808 and #10612.
- The Metadata Customization documentation has been updated to explain how to implement a boolean fieldtype (look for "boolean"). See #7961 and #11064.
- The version of Stata files is now detected during S3 direct upload (as it was for normal uploads), allowing ingest of Stata 14 and 15 files that have been uploaded directly. See the guides #10108, and #11054.
- It is now possible to populate the "Keyword" metadata field from an OntoPortal service. The code has been shared to the GDCC dataverse-external-vocab-support GitHub repository. See #11258.
- Support for legacy configuration of a PermaLink PID provider, such as using the :Protocol,:Authority, and :Shoulder settings, has been fixed. See #10516 and #10521.
- On the home page for each guide (User Guide, etc.) there was an overwhelming amount of information in the form of a deeply nested table of contents. The depth of the table of contents has been reduced to two levels, making the home page for each guide more readable. Compare the User Guide for 6.5 vs. 6.6 and see #11166.
- For compliance with GDPR and other privacy regulations, advice on adding a cookie consent popup has been added to the guides. See the new cookie consent section and #10320.
- A new file has been added to import the French Open License to Dataverse: licenseEtalab-2.0.json. You can download it from the guides. This license, which is compatible with the Creative Commons license, is recommended by the French government for open documents. See #9301, #9302, and #11302.
- The API that lists versions of a dataset now features an optional
excludeMetadataBlocksparameter, which defaults to "false" for backward compatibility. For a dataset with a large number of versions and/or metadataBlocks, having the metadata blocks included can dramatically increase the volume of the output. See also the guides, #10171, and #10778. - Deeply nested metadata fields are not supported but the code used to generate the Solr schema has been adjusted to support them. See #11136.
- The tutorial on running Dataverse in Docker has been updated to explain how to configure the root collection using a JSON file (#10541 and #11201) and now uses the Permalink PID provider instead of the FAKE DOI Provider (#11107 and #11108).
- Payara application server has been upgraded to version 6.2025.2. See #11126 and #11128.
- Solr has been upgraded to version 9.8.0. See #10713.
- For testing purposes, the FAKE PID provider can now be used with file PIDs enabled. (The FAKE provider is not recommended for any production use.) See #10979.
Bugs Fixed
- A bug which causes users of the Anonymous Review URL to have some metadata of published datasets withheld has been fixed. See #11202 and #11164.
- A bug that caused ORCIDs starting with "https://orcid.org/" entered as author identifier to be ignored when creating the DataCite metadata has been fixed. This primarily affected users of the ORCID external vocabulary script; for the manual entry form, we used to recommend not using the URL form. The display of authorIdentifier, when not using any external vocabulary scripts, has been improved so that either the plain identifier (e.g. "0000-0002-1825-0097") or its URL form (e.g. "https://orcid.org/0000-0002-1825-0097") will result in valid links in the display (for identifier types that have a URL form). The URL form is now recommended when doing manual entry. See #11242 and #11242.
- Multiple small issues with the formatting of PIDs in the DDI exporters, and EndNote and BibTeX citation formats have been addressed. These should improve the ability to import Dataverse citations into reference managers and fix potential issues harvesting datasets using PermaLinks. See #10768, #10769, #11165, and #10790.
- On the Advanced Search page, the metadata fields are now displayed in the correct order as defined in the TSV file via the displayOrder value, making the order the same as when you view or edit metadata. Note that fields that are not defined in the TSV file, like the "Persistent ID" and "Publication Date", will be displayed at the end. See #11272 and #11279.
- Bugs that caused 1) guestbook questions to appear along with terms of use/terms of access in the request access dialog when no guestbook was configured, and 2) terms of access to not be shown when using the per-file request access/download menu items have been fixed. Text related to configuring the choice to have guestbooks appear when file access is requested or when files are downloaded has been updated to make it clearer that this affects only datasets where guestbooks have been configured. See #11203.
- The file page version table now shows whether a file has been replaced. See #11142 and #11145.
- We fixed an issue where draft versions of datasets were sorted using the release timestamp of their most recent major version. This caused newer drafts to appear incorrectly alongside their corresponding major version, instead of at the top, when sorted by "newest first". Sorting now uses the last update timestamp when sorting draft datasets. The sorting behavior of published major and minor dataset versions is unchanged. There is no need to reindex datasets because Solr is being upgraded (see "Upgrade Instructions"), which will result in an empty database that will be reindexed. See #11178.
- Some external controlled vocabulary scripts/configurations, when used on a metadata field that is single-valued, could result in indexing failure for the dataset, e.g. when the script tried to index both the identifier and name of the identified entity for indexing. Dataverse has been updated to correctly indicate the need for a multi-valued Solr field in these cases in the call to
/api/admin/index/solr/schema. Configuring the Solr schema and running the update-fields.sh script as usually recommended when using custom metadata blocks (see "Upgrade Instructions") will resolve the issue. See the guides, #11095, and #11096. - The OpenAIRE metadata export format can now correctly process one or multiple productionPlaces as geolocation. See #9546 and #11194
- We fixed a bug that caused adding free-form provenance to a file to fail. See #11145.
- A bug has been fixed which could cause publication of datasets to fail in cases where they were not assigned a DOI at creation. See #11234 and #11236.
- When users request access to files, the people who have permission to grant access received an email with a link that didn't work due to a trailing period (full stop) right next to the link, e.g.
https://demo.dataverse.org/permissions-manage-files.xhtml?id=9.A space has been added to fix this. See #10384 and #11115. - Harvesting clients now use the correct granularity while re-running a partial harvest, using the
fromparameter. The correct granularity comes from theIdentifyverb request. See #11020 and #11038. - Access requests were missing on the File Permission page after upgrading from Dataverse 6.0. This has been corrected with a database update script. See #10714 and #11061.
- When a dataset has a long running lock, including when it is "in review", Dataverse will now slow the page refresh rate over time. See #11264 and #11269.
- The
/api/info/metrics/files/monthlyAPI call had a bug that resulted in files being counted each time they were published in a new version if those publication events occurred in different months. This resulted in an over-count. The/api/info/metrics/filesand/api/info/metrics/files/toMonthAPI calls had a bug that resulted in files that were published but no longer in the latest published version as of the specified date (now, or the date entered in the/toMonthvariant). This resulted in an under-count. See #11189. - DatasetFieldTypes in MetadataBlock response that are also a child of another DatasetFieldType were being returned twice. The child DatasetFieldType was included in the "fields" object as well as in the "childFields" of its parent DatasetFieldType. This fix suppresses the standalone object so only one instance of the DatasetFieldType is returned (in the "childFields" of its parent). This fix changes the JSON output of the API
/api/dataverses/{dataverseAlias}/metadatablocks(see "Backward Incompatible Changes", below). See #10472 and #11066. - A bug that caused replacing files via API when file PIDs were enabled to fail has been fixed. See #10975 and #10979.
- The :CustomDatasetSummaryFields setting now allows spaces along with a comma separating field names. In addition, a bug that caused license information to be hidden if there are no values for any of the custom fields specified has been fixed. See #11228 and #11229.
- Dataverse 6.5 introduced a bug which causes search to fail for non-superusers in multiple groups when the
AVOID_EXPENSIVE_SOLR_JOINfeature flag is set to true. This release fixes the bug. See #11133 and #11134. - We fixed a bug with My Data where listing collections for a user with only rights on harvested collections would result in a server error response. See #11083.
- Minor styling fixes for the Related Publication field and fields using ORCID or ROR have been made. See #11053, #10964, and #11106.
- In the Search API, files were displaying DRAFT version instead of latest released version under
dataset_citation. See #10735 and #11051. - Unnecessary Solr documents were being created when a file was added or deleted from a draft dataset. These documents could accumulate and potentially impact performance. There is no action to take because this release includes a new Solr version, which will start with an empty database. See #11113 and #11114.
- When using the API to update a collection, omitting optional fields such as
inputLevels,facetIds, ormetadataBlockNamescaused data to be deleted. The fix no longer deletes data for these fields. Two new flags have been added to themetadataBlocksJSON object to signal the deletion of the data:inheritMetadataBlocksFromParent: trueandinheritFacetsFromParent: true. See the guides, #11130, and #11144.
API Updates
Search API Returns Additional Fields for Files
Added new fields to search results type=files
For Files:
- restricted: boolean
- canDownloadFile: boolean (from file user permission)
- categories: array of string "categories" would be similar to what it is in metadata api.
For tabular files:
- tabularTags: array of string for example,
{"tabularTags" : ["Event", "Genomics", "Geospatial"]} - variables: number/int shows how many variables we have for the tabular file
- observations: number/int shows how many observations for the tabular file
See #11027 and #11097.
Backend Support for Collection Featured Items
CRUD endpoints for Collection Featured Items have been implemented. In particular, the following endpoints have been implemented:
- Create a feature item (POST
/api/dataverses/<dataverse_id>/featuredItems) - Update a feature item (PUT
/api/dataverseFeaturedItems/<item_id>) - Delete a feature item (DELETE
/api/dataverseFeaturedItems/<item_id>) - List all featured items in a collection (GET
/api/dataverses/<dataverse_id>/featuredItems) - Delete all featured items in a collection (DELETE
/api/dataverses/<dataverse_id>/featuredItems) - Update all featured items in a collection (PUT
/api/dataverses/<dataverse_id>/featuredItems)
See also the "Settings Added" section, #10943 and #11124.
Other API Updates
- Multiple files can be deleted from a dataset at once. See the the guides and #11230.
- An API has been added to get the "classic" download count from a dataset with an optional
includeMDCparameter (for Make Data Count). See the guides, #11244 and #11282. - An API has been added that lists the collections that the user has access to via the permission passed. See the guides, #6467, and #10906.
- An API has been added to get dataset versions including a summary of differences between consecutive versions where available. See the docs, #10888, and #10945.
- An API has been added to list of versions of a data file showing any changes that affected the file with each version. See the guides, #11198 and #11237.
- The Search API has a new parameter called
show_type_counts. If you set it to true, it will returntotal_count_per_object_typefor the types dataverse, dataset, and files (#11065 and #11082) even if the search result for any given type is 0 (#11127 and #11138). - CRUD operations for external tools are now available for superusers from non-localhost. See the guides, #10930 and #11079.
- A new API endpoint has been added that allows a global role to be updated. See the guides and #10612.
- An API has been added to send feedback to the collection, dataset, or data file's contacts. If necessary, you can rate limit the
CheckRateLimitForDatasetFeedbackCommandand configure the new :ContactFeedbackMessageSizeLimit database setting. See the guides, #11129, and #11162. - /api/metadatablocks is no longer returning duplicated metadata properties and does not omit metadata properties when called. See "Backward Incompatible Changes" below and #10764.
- A new query param,
returnChildCount, has been added to the getDataverse endpoint (/api/dataverses/{id}) for optionally retrieving the child count, which represents the number of collections, datasets, or files within the collection (direct children only). See also #11255 and #11259.
End-Of-Life (EOL) Announcements
PostgreSQL 13 reaches EOL on 13 November 2025
Per https://www.postgresql.org/support/versioning/ PostgreSQL 13 reaches EOL on 13 November 2025. Our first step toward moving off version 13 was to switch our testing to version 16, as we've noted in the guides. You are encouraged to start planning your upgrade and may want to review the Dataverse 5.4 release notes as the upgrade process (e.g. pg_dumpall, etc.) will likely be similar. If you notice any bumps along the way, please let us know!
Dataverse developers using Docker have been using PostgreSQL 17 since Dataverse 6.5 (#10912). (Developers not using Docker who are still on PostgreSQL 13 are encouraged to upgrade.) Older or newer versions should work, within reason.
See also #11212 and #11215.
Security
SameSite Cookie Attribute
The SameSite cookie attribute is defined in an upcoming revision to RFC 6265 (HTTP State Management Mechanism) called 6265bis ("bis" meaning "repeated"). The possible values are "None", "Lax", and "Strict".
"If no SameSite attribute is set, the cookie is treated as Lax by default" by browsers according to MDN. This was the previous behavior of Dataverse, to not set the SameSite attribute.
New Dataverse installations now explicitly set to the SameSite cookie attribute to "Lax" out of the box through the installer (in the case of a "classic" installation) or through an updated base image (in the case of a Docker installation). Classic installations should follow the upgrade instructions below to bring their installation up to date with the behavior for new installations. Docker installations will automatically get the updated base image.
While you are welcome to experiment with "Strict", which is intended to help prevent Cross-Site Request Forgery (CSRF) attacks, as described in the RFC proposal and an OWASP cheatsheet, our testing so far indicates that some functionality, such as OIDC login, seems to be incompatible with "Strict".
You should avoid the use of "None" as it is less secure than "Lax". See also the guides, https://github.com/IQSS/dataverse-security/issues/27, #11210, and the upgrade instructions below.
Settings Added
- dataverse.feature.enable-version-note
- dataverse.csl.common-styles
- dataverse.files.featured-items.image-maxsize - It sets the maximum allowed size of the image that can be added to a featured item.
- dataverse.files.featured-items.image-uploads - It specifies the name of the subdirectory for saving featured item images within the docroot directory.
- dataverse.feature.api-bearer-auth-provide-missing-claims
- dataverse.feature.api-bearer-auth-handle-tos-acceptance-in-idp
- :ContactFeedbackMessageSizeLimit
Backward Incompatible Changes
Generally speaking, see the API Changelog for a list of backward-incompatible API changes.
- /api/metadatablocks is no longer returning duplicated metadata properties and does not omit metadata properties when called. See #10764.
- The JSON response of API call
/api/dataverses/{dataverseAlias}/metadatablockswill no longer include the DatasetFieldTypes in "fields" if they are children of another DatasetFieldType. The child DatasetFieldType will only be included in the "childFields" of its parent DatasetFieldType. See #10472 and #11066. versionNotehas been renamed todeaccessionNote.archiveNotehas been renamed todeaccessionLink. See #11068.- The Show Role API endpoint was returning 401 Unauthorized when a permission check failed. This has been corrected to return 403 Forbidden instead. That is, the API token is known to be good (401 otherwise) but the user lacks permission (403 is now sent). See also the API Changelog, #10340, and #11116.
- Changes to PID formatting occur in the DDI/DDI Html export formats and the EndNote and BibTex citation formats. These changes correct errors and improve conformance with best practices but could break parsing of these formats. See #10768, #10769, #11165, and #10790.
Complete List of Changes
For the complete list of code changes in this release, see the 6.6 milestone in GitHub.
Getting Help
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!
Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!
You are also very welcome to join the Global Dataverse Community Consortium (GDCC).
Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.5.
0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.
Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.
shell
export PAYARA=/usr/local/payara6
(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)
1. List deployed applications
shell
$PAYARA/bin/asadmin list-applications
2. Undeploy the previous version (should match "list-applications" above)
shell
$PAYARA/bin/asadmin undeploy dataverse-6.5
3. Stop Payara
shell
sudo service payara stop
4. Upgrade to Payara 6.2025.2
The steps below reuse your existing domain directory with the new distribution of Payara. You may also want to review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara Release Notes. We also recommend you ensure you followed all update instructions from the past releases regarding Payara. (The most recent Payara update was for v6.3.)
Move the current Payara directory out of the way:
shell
mv $PAYARA $PAYARA.6.2024.6
Download the new Payara version 6.2025.2 (from https://www.payara.fish/downloads/payara-platform-community-edition/ or https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2025.2/payara-6.2025.2.zip), and unzip it in its place:
shell
cd /usr/local
unzip payara-6.2025.2.zip
Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1:
shell
mv payara6/glassfish/domains/domain1 payara6/glassfish/domains/domain1_DIST
mv payara6.6.2024.6/glassfish/domains/domain1 payara6/glassfish/domains/
5. Download and deploy this version
shell
wget https://github.com/IQSS/dataverse/releases/download/v6.6/dataverse-6.6.war
$PAYARA/bin/asadmin deploy dataverse-6.6.war
Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.
shell
sudo service payara stop
sudo rm -rf $PAYARA/glassfish/domains/domain1/generated
sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache
sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases
sudo service payara start
6. For installations with internationalization or text customizations:
Please remember to update translations via Dataverse language packs.
If you have text customizations you can get the latest English files from https://github.com/IQSS/dataverse/tree/v6.6/src/main/java/propertyFiles.
7. Decide to enable (or not) the index-harvested-metadata-source feature flag
Decide whether or not to enable the dataverse.feature.index-harvested-metadata-source feature flag described above, in the guides, #10217 and #11217. The reason to decide now is that reindexing is required and the next steps involve restarting Payara and upgrading Solr, which will result in a fresh index.
8. Configure SameSite
To bring your Dataverse installation in line with new installations, as described above and in the guides, we recommend running the following commands:
``` ./asadmin set server-config.network-config.protocols.protocol.http-listener-1.http.cookie-same-site-value=Lax
./asadmin set server-config.network-config.protocols.protocol.http-listener-1.http.cookie-same-site-enabled=true ```
Please note that "None" is less secure than "Lax" and should be avoided. You can test the setting by inspecting headers with curl, looking at the JSESSIONID cookie for "SameSite=Lax" (yes, it's expected to be repeated, probably due to a bug in Payara) like this:
% curl -s -I http://localhost:8080 | grep JSESSIONID
Set-Cookie: JSESSIONID=6574324d75aebeb86dc96ecb3bb0; Path=/;SameSite=Lax;SameSite=Lax
Before making the changes above, SameSite attribute should be absent, like this:
% curl -s -I http://localhost:8080 | grep JSESSIONID
Set-Cookie: JSESSIONID=6574324d75aebeb86dc96ecb3bb0; Path=/
8. Restart Payara
shell
sudo service payara stop
sudo service payara start
9. Update metadata blocks
These changes reflect incremental improvements made to the handling of core metadata fields.
Expect the loading of the citation block to take several seconds because of its size (especially due to the number of languages).
```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/scripts/api/data/metadatablocks/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv ```
The 3D Objects metadata block is included in all new installations of Dataverse so we recommend adding it.
```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/scripts/api/data/metadatablocks/3d_objects.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file 3d_objects.tsv ```
10. Upgrade Solr
Solr 9.8.0 is now the version recommended in our Installation Guide and used with automated testing. Additionally, due to the new range search support feature and the addition of fields (e.g. versionNote, fileRestricted, canDownloadFile, variableCount, and observations), the default schema.xml files has changed so you must upgrade.
Install Solr 9.8.0 following the instructions from the Installation Guide.
The instructions in the guide suggest to use the config files from the installer zip bundle. When upgrading an existing instance, it may be easier to download them from the source tree:
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/solrconfig.xml
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/schema.xml
cp solrconfig.xml schema.xml /usr/local/solr/solr-9.8.0/server/solr/collection1/conf
10a. For installations with additional metadata blocks or external controlled vocabulary scripts, update fields
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide).Run the
update-fields.shscript that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your Solr installation):
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.8.0/server/solr/collection1/conf/schema.xml
- Start Solr instance (usually
service solr startdepending on Solr/OS).
11. Reindex Solr
shell
curl http://localhost:8080/api/admin/index
12. Run reExportAll to update dataset metadata exports
For existing published datasets, additional license metadata will not be available from DataCite or in metadata exports until
- the dataset is republished or
- the /api/admin/metadata/{id}/reExportDataset is run for the dataset or
- the /api/datasets/{id}/modifyRegistrationMetadata API is run for the dataset or
- the global version of these API calls (/api/admin/metadata/reExportAll, /api/datasets/modifyRegistrationPIDMetadataAll) are used.
For this reason, we recommend reexporting all dataset metadata. For more advanced usage, please see the guides.
shell
curl http://localhost:8080/api/admin/metadata/reExportAll
13. (Optional) Re-harvest datasets
The publisher value of harvested datasets is now attributed to the dataset's distributor instead of its producer. For more information, see the guides, #8739, and #9013.
This improves the citation associated with these datasets, but the change only affects newly harvested datasets.
If you would like to pick up this change for existing harvested datasets, you should re-harvest them. This can be accomplished by deleting and re-adding each harvesting client, followed by a harvesting run. You may want to use harvesting client APIs to save (serialize), add, and remove clients.
- Java
Published by pdurbin 11 months ago
dataverse - v6.6
Dataverse 6.6
Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.6 rather than the list of releases, which will cut them off.
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!
Release Highlights
Highlights for Dataverse 6.6 include:
- metadata fields can be "display on create" per collection
- ORCIDs linked to accounts
- version notes
- harvesting from DataCite
- citations using Citation Style Language (CSL)
- license metadata enhancements
- metadata fields now support range searches (dates, integers, etc.)
- more accurate search highlighting
- collections can be moved by using the superuser dashboard
- new 3D Objects metadata block
- new Archival metadata block (experimental)
- optionally prevent publishing of datasets without files
- Signposting output now contains links to all dataset metadata export formats
- infrastructure updates (Payara and Solr)
In a recent community call, we talked about many of these highlights if you'd like to watch the video (around 22:30).
Features Added
Metadata Fields Can Be "Display on Create" Per Collection
Collection administrators can now configure which metadata fields appear during dataset creation through the displayOnCreate property, even when fields are not required. This provides greater control over metadata visibility and can help improve metadata completeness.
Currently this feature can only be configured via API, but a UI implementation is planned in #11221. See #10476, #11224, and #11312.
ORCIDs Linked to Accounts
Dataverse now includes improved integration with ORCID, supported through a grant to GDCC from the (ORCID Global Participation Fund).
Specifically, Dataverse users can now link their Dataverse account with their ORCID profile. Previously, this was only available to users who logged in with ORCID. Once linked, Dataverse will automatically prepopulate their ORCID to their author metadata when they create a dataset.
This functionality leverages Dataverse's existing support for login via ORCID, but can be turned on independently of it. If ORCID login is enabled, the user's ORCID will automatically be added to their profile. If the user has logged in via some other mechanism, they are able to click a button to initiate a similar authentication process in which the user must login to their ORCID account and approve the connection.
Feedback from installations that enable this functionality is requested and we expect that updates can be made in the next Dataverse release.
See the User Guide, Installation Guide, #7284, and #11222.
Version Notes
Dataverse now supports the option of adding a version note before or during the publication of a dataset. These notes can be used, for example, to indicate why a version was created or how it differs from the prior version. Whether this feature is enabled is controlled by the flag dataverse.feature.enable-version-note. Version notes are shown in the user interface (in the dataset page version table), indexed (as versionNote), available via the API, and have been added to the JSON, DDI, DataCite, and OAI-ORE exports.
With the addition of this feature, work has been done to clean-up and rename fields that have been used for specifying the reason for deaccessioning a dataset and providing an optional link to a non-Dataverse location where the dataset still can be found. The former was listed in some JSON-based API calls and exports as "versionNote" and is now "deaccessionNote", while the latter was referred to as "archiveNote" and is now "deaccessionLink".
Further, some database consolidation has been done to combine the deaccessionlink and archivenote fields, which appear to have both been used for the same purpose. The deaccessionlink database field is older and also was not displayed in the current UI. Going forward, only the deaccessionlink column exists.
See the User Guide, API Guide #8431, and #11068.
OAI-PMH Harvesting from DataCite
DataCite maintains an OAI server (https://oai.datacite.org/oai) that serves records for every DOI they have registered. There's been a lot of interest in the community in being able to harvest from them. This way, it will be possible to harvest metadata from institution X even if the institution X does not maintain an OAI server of their own, if they happen to register their DOIs with DataCite. One extra element of this harvesting model that makes it especially powerful and flexible is the DataCite's concept of a "dynamic OAI set": a harvester is not limited to harvesting the pre-defined set of ALL the records registered by the institution X, but can instead harvest virtually any arbitrary subset thereof; any query that the DataCite search API understands can be used as an OAI set. The feature is already in use at Harvard Dataverse, as a beta version patch.
For various reasons, in order to take advantage of this feature harvesting clients must be created using the /api/harvest/clients API. Once configured however, harvests can be run from the Harvesting Clients control panel in the UI.
DataCite-harvesting clients must be configured with 2 new feature flags, useListRecords and useOaiIdentifiersAsPids (added in Dataverse 6.5). Note that these features may be of use when harvesting from other sources, not just from DataCite.
See the Admin Guide, API Guide, #10909, and #11011.
Citations Using Citation Style Language (CSL)
This release adds support for generating citations in any of the standard independent formats specified using the Citation Style Language.
The CSL formats are available to copy/paste if you click "Cite Dataset" and then "View Styled Citations" on the dataset page. An API call to retrieve a dataset citation in EndNote, RIS, BibTeX, and CSLJson format has also been added. The first three have been available as downloads from the UI (CSLJson is not) but have not been directly accessible via API until now. The CSLJson format is new to Dataverse and can be used with open source libraries to generate all of the other CSL-style citations.
Admins can use a new dataverse.csl.common-styles setting to highlight commonly used styles. Common styles are listed in the pop-up, while others can be found by type-ahead search in a list of 1000+ options.
See the User Guide, Settings, API Guide, and #11163.
License Metadata Enhancements
- Added new fields to licenses: rightsIdentifier, rightsIdentifierScheme, schemeUri, languageCode. See JSON files under Adding Licenses in the guides
- Updated DataCite metadata export to include rightsIdentifier, rightsIdentifierScheme, and schemeUri consistent with the DataCite 4.5 schema and examples
- Enhanced metadata exports to include all new license fields
- Existing licenses from the example set included with Dataverse will be automatically updated with new fields
- Existing API calls support the new optional fields
See below for upgrade instructions. See also #10883 and #11232.
Range Search
This release enhances how numerical and date fields are indexed in Solr. Previously, all fields were indexed as English text (text_en), but with this update:
- Integer fields are indexed as
plong - Float fields are indexed as
pdouble - Date fields are indexed as
date_range(solr.DateRangeField)
This change enables range queries when searching from both the UI and the API, such as dateOfDeposit:[2000-01-01 TO 2014-12-31] or targetSampleActualSize:[25 TO 50]. See below for a full list of fields that now support range search.
Additionally, search result highlighting is now more accurate, ensuring that only fields relevant to the query are highlighted in search results. If the query is specifically limited to certain fields, the highlighting is now limited to those fields as well. See #10887.
Specifically, the following fields were updated:
- coverage.Depth
- coverage.ObjectCount
- coverage.ObjectDensity
- coverage.Redshift.MaximumValue
- coverage.Redshift.MinimumValue
- coverage.RedshiftValue
- coverage.SkyFraction
- coverage.Spectral.CentralWavelength
- coverage.Spectral.MaximumWavelength
- coverage.Spectral.MinimumWavelength
- coverage.Temporal.StartTime
- coverage.Temporal.StopTime
- dateOfCollectionEnd
- dateOfCollectionStart
- dateOfDeposit
- distributionDate
- dsDescriptionDate
- journalPubDate
- productionDate
- resolution.Redshift
- targetSampleActualSize
- timePeriodCoveredEnd
- timePeriodCoveredStart
New 3D Objects Metadata Block
A new metadata block has been added for describing 3D object data. You can download it from the guides. See also #11120 and #11167.
All new Dataverse installations will receive this metadata block by default. We recommend adding it by following the upgrade instructions below.
New Archival Metadata Block (Experimental)
An experimental "Archival" metadata block has been added, downloadable from the User Guide. The purpose of the metadata block is to enable repositories to register metadata relating to the potential archiving of the dataset at a depositor archive, whether that being your own institutional archive or an external archive, i.e. a historical archive. Feedback is welcome! See also #10626.
Prevent Publishing of Datasets Without Files
Datasets without files can be optionally prevented from being published through a new "requireFilesToPublishDataset" boolean defined at the collection level. This boolean can be set only via API and only by a superuser. See Change Collection Attributes. If the boolean is not set, the parent collection is consulted. If you do not set the boolean, the existing behavior of datasets being able to be published without files will continue. Superusers can still publish datasets whether or not the boolean is set. See #10981 and #10994.
Metadata Source Facet Can Now Differentiate Between Harvested Sources
The behavior of the feature flag index-harvested-metadata-source and the "Metadata Source" facet, which were added and updated, respectively, in Dataverse 6.3 (through pull requests #10464 and #10651), have been updated. A new field called "Source Name" has been added to harvesting clients.
Before Dataverse 6.3, all harvested content (datasets and files) appeared together under "Harvested" under the "Metadata Source" facet. This is still the behavior of Dataverse out of the box. Since Dataverse 6.3, enabling the index-harvested-metadata-source feature flag (and reindexing) resulted in harvested content appearing under the nickname for whatever harvesting client was used to bring in the content. This meant that instead of having all harvested content lumped together under "Harvested", content would appear under "client1", "client2", etc.
With this release, enabling the index-harvested-metadata-source feature flag, populating a new field for harvesting clients called "Source Name" ("sourceName" in the API), and reindexing (see upgrade instructions below) results in the source name appearing under the "Metadata Source" facet rather than the harvesting client nickname. This gives you more control over the name that appears under the "Metadata Source" facet and allows you to reuse the same source name to group harvested content from various harvesting clients under the same name if you wish.
Previously, index-harvested-metadata-source was not documented in the guides, but now you can find information about it under Feature Flags. See also #10217 and #11217.
Globus Framework Improvements
The improvements and optimizations in this release build on top of the earlier work (such as #10781). They are based on the experience gained at IQSS as part of the production rollout of the Large Data Storage services that utilizes Globus.
The changes in this release focus on improving Globus downloads, i.e., transfers from Dataverse-linked Globus volumes to users' Globus collections. Most importantly, the mechanism of "Asynchronous Task Monitoring", first introduced in #10781 for uploads, has been extended to handle downloads as well. This generally makes downloads more reliable, specifically in how Dataverse manages temporary access rules granted to users, minimizing the risk of consequent downloads failing because of stale access rules left in place.
Multiple other improvements have been made making the underlying Globus framework more reliable and robust.
See globus-use-experimental-async-framework under Feature Flags and dataverse.files.globus-monitoring-server in the Installation Guide, #11057, and #11125.
OIDC Bearer Tokens
The release extends the OIDC API auth mechanism, available through feature flag api-bearer-auth, to properly handle cases where BearerTokenAuthMechanism successfully validates the token but cannot identify any Dataverse user because there is no account associated with the token.
To register a new user who has authenticated via an OIDC provider, a new endpoint has been implemented (/users/register). A feature flag named api-bearer-auth-provide-missing-claims has been implemented to allow sending missing user claims in the request JSON. This is useful when the identity provider does not supply the necessary claims. However, this flag will only be considered if the api-bearer-auth feature flag is enabled. If the latter is not enabled, the api-bearer-auth-provide-missing-claims flag will be ignored.
A feature flag named api-bearer-auth-handle-tos-acceptance-in-idp has been implemented. When enabled, it specifies that Terms of Service acceptance is managed by the identity provider, eliminating the need to explicitly include the acceptance in the user registration request JSON.
See the guides, #10959, and #10972.
Signposting Output Now Contains Links to All Dataset Metadata Export Formats
When Signposting was added in Dataverse 5.14 (#8981), it provided links only for the schema.org metadata export format.
The output of HEAD, GET, and the Signposting "linkset" API have all been updated to include links to all available dataset metadata export formats, including any external exporters, such as Croissant, that have been enabled.
This provides a lightweight machine-readable way to first retrieve a list of links, such as via a HTTP HEAD request, to each available metadata export format and then follow up with a request for the export format of interest.
In addition, the content type for the schema.org dataset metadata export format has been corrected. It was application/json and now it is application/ld+json.
See also the guides, #10542 and #11045.
Dataset Types Can Be Linked to Metadata Blocks
Metadata blocks, such as (e.g. "CodeMeta") can now be linked to dataset types (e.g. "software") using new superuser APIs.
This will have the following effects for the APIs used by the new Dataverse UI:
- The list of fields shown when creating a dataset will include fields marked as "displayoncreate" (in the tsv/database) for metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.
- The metadata blocks shown when editing a dataset will include metadata blocks (e.g. "CodeMeta") that are linked to the dataset type (e.g. "software") that is passed to the API.
Mostly in order to write automated tests for the above, a displayOnCreate API endpoint has been added.
For more information, see the guides (overview, new APIs), #10519 and #11001.
Other Features
- In addition to the API Move a Dataverse Collection, it is now possible for a Dataverse administrator to move a collection using the Dataverse dashboard. See #10304 and #11150.
- The Preview URL popup and related documentation have been updated to give more information about anonymous access, including the names of the dataset fields that will be withheld from the Anonymous Preview URL user and to suggest how to review the URL before releasing it. See also #11159 and #11164.
- ROR (Research Organization Registry) has been added as an Author Identifier Type for when the author is an organization rather than a person. Like ORCID, ROR will appear in the "Datacite" metadata export format. See #11075 and #11118.
- The publisher value of harvested datasets is now attributed to the dataset's distributor instead of its producer. This improves the citation associated with these datasets, but the change affects only newly harvested datasets. See "Upgrade Instructions" below on how to re-harvest. For more information, see the guides, #8739, and #9013.
- A new harvest status differentiates between a complete harvest with errors ("completed with failures") and without errors ("completed"). Also, harvest status labels are now internationalized. See #9294 and #11017.
- The OAI-ORE exporter can now export metadata containing nested compound fields or compound fields within compound fields. See #10809 and #11190.
- It is now possible to edit a custom role with the same alias. See #8808 and #10612.
- The Metadata Customization documentation has been updated to explain how to implement a boolean fieldtype (look for "boolean"). See #7961 and #11064.
- The version of Stata files is now detected during S3 direct upload (as it was for normal uploads), allowing ingest of Stata 14 and 15 files that have been uploaded directly. See the guides #10108, and #11054.
- It is now possible to populate the "Keyword" metadata field from an OntoPortal service. The code has been shared to the GDCC dataverse-external-vocab-support GitHub repository. See #11258.
- Support for legacy configuration of a PermaLink PID provider, such as using the :Protocol,:Authority, and :Shoulder settings, has been fixed. See #10516 and #10521.
- On the home page for each guide (User Guide, etc.) there was an overwhelming amount of information in the form of a deeply nested table of contents. The depth of the table of contents has been reduced to two levels, making the home page for each guide more readable. Compare the User Guide for 6.5 vs. 6.6 and see #11166.
- For compliance with GDPR and other privacy regulations, advice on adding a cookie consent popup has been added to the guides. See the new cookie consent section and #10320.
- A new file has been added to import the French Open License to Dataverse: licenseEtalab-2.0.json. You can download it from the guides. This license, which is compatible with the Creative Commons license, is recommended by the French government for open documents. See #9301, #9302, and #11302.
- The API that lists versions of a dataset now features an optional
excludeMetadataBlocksparameter, which defaults to "false" for backward compatibility. For a dataset with a large number of versions and/or metadataBlocks, having the metadata blocks included can dramatically increase the volume of the output. See also the guides, #10171, and #10778. - Deeply nested metadata fields are not supported but the code used to generate the Solr schema has been adjusted to support them. See #11136.
- The tutorial on running Dataverse in Docker has been updated to explain how to configure the root collection using a JSON file (#10541 and #11201) and now uses the Permalink PID provider instead of the FAKE DOI Provider (#11107 and #11108).
- Payara application server has been upgraded to version 6.2025.2. See #11126 and #11128.
- Solr has been upgraded to version 9.8.0. See #10713.
- For testing purposes, the FAKE PID provider can now be used with file PIDs enabled. (The FAKE provider is not recommended for any production use.) See #10979.
Bugs Fixed
- A bug which causes users of the Anonymous Review URL to have some metadata of published datasets withheld has been fixed. See #11202 and #11164.
- A bug that caused ORCIDs starting with "https://orcid.org/" entered as author identifier to be ignored when creating the DataCite metadata has been fixed. This primarily affected users of the ORCID external vocabulary script; for the manual entry form, we used to recommend not using the URL form. The display of authorIdentifier, when not using any external vocabulary scripts, has been improved so that either the plain identifier (e.g. "0000-0002-1825-0097") or its URL form (e.g. "https://orcid.org/0000-0002-1825-0097") will result in valid links in the display (for identifier types that have a URL form). The URL form is now recommended when doing manual entry. See #11242 and #11242.
- Multiple small issues with the formatting of PIDs in the DDI exporters, and EndNote and BibTeX citation formats have been addressed. These should improve the ability to import Dataverse citations into reference managers and fix potential issues harvesting datasets using PermaLinks. See #10768, #10769, #11165, and #10790.
- On the Advanced Search page, the metadata fields are now displayed in the correct order as defined in the TSV file via the displayOrder value, making the order the same as when you view or edit metadata. Note that fields that are not defined in the TSV file, like the "Persistent ID" and "Publication Date", will be displayed at the end. See #11272 and #11279.
- Bugs that caused 1) guestbook questions to appear along with terms of use/terms of access in the request access dialog when no guestbook was configured, and 2) terms of access to not be shown when using the per-file request access/download menu items have been fixed. Text related to configuring the choice to have guestbooks appear when file access is requested or when files are downloaded has been updated to make it clearer that this affects only datasets where guestbooks have been configured. See #11203.
- The file page version table now shows whether a file has been replaced. See #11142 and #11145.
- We fixed an issue where draft versions of datasets were sorted using the release timestamp of their most recent major version. This caused newer drafts to appear incorrectly alongside their corresponding major version, instead of at the top, when sorted by "newest first". Sorting now uses the last update timestamp when sorting draft datasets. The sorting behavior of published major and minor dataset versions is unchanged. There is no need to reindex datasets because Solr is being upgraded (see "Upgrade Instructions"), which will result in an empty database that will be reindexed. See #11178.
- Some external controlled vocabulary scripts/configurations, when used on a metadata field that is single-valued, could result in indexing failure for the dataset, e.g. when the script tried to index both the identifier and name of the identified entity for indexing. Dataverse has been updated to correctly indicate the need for a multi-valued Solr field in these cases in the call to
/api/admin/index/solr/schema. Configuring the Solr schema and running the update-fields.sh script as usually recommended when using custom metadata blocks (see "Upgrade Instructions") will resolve the issue. See the guides, #11095, and #11096. - The OpenAIRE metadata export format can now correctly process one or multiple productionPlaces as geolocation. See #9546 and #11194
- We fixed a bug that caused adding free-form provenance to a file to fail. See #11145.
- A bug has been fixed which could cause publication of datasets to fail in cases where they were not assigned a DOI at creation. See #11234 and #11236.
- When users request access to files, the people who have permission to grant access received an email with a link that didn't work due to a trailing period (full stop) right next to the link, e.g.
https://demo.dataverse.org/permissions-manage-files.xhtml?id=9.A space has been added to fix this. See #10384 and #11115. - Harvesting clients now use the correct granularity while re-running a partial harvest, using the
fromparameter. The correct granularity comes from theIdentifyverb request. See #11020 and #11038. - Access requests were missing on the File Permission page after upgrading from Dataverse 6.0. This has been corrected with a database update script. See #10714 and #11061.
- When a dataset has a long running lock, including when it is "in review", Dataverse will now slow the page refresh rate over time. See #11264 and #11269.
- The
/api/info/metrics/files/monthlyAPI call had a bug that resulted in files being counted each time they were published in a new version if those publication events occurred in different months. This resulted in an over-count. The/api/info/metrics/filesand/api/info/metrics/files/toMonthAPI calls had a bug that resulted in files that were published but no longer in the latest published version as of the specified date (now, or the date entered in the/toMonthvariant). This resulted in an under-count. See #11189. - DatasetFieldTypes in MetadataBlock response that are also a child of another DatasetFieldType were being returned twice. The child DatasetFieldType was included in the "fields" object as well as in the "childFields" of its parent DatasetFieldType. This fix suppresses the standalone object so only one instance of the DatasetFieldType is returned (in the "childFields" of its parent). This fix changes the JSON output of the API
/api/dataverses/{dataverseAlias}/metadatablocks(see "Backward Incompatible Changes", below). See #10472 and #11066. - A bug that caused replacing files via API when file PIDs were enabled to fail has been fixed. See #10975 and #10979.
- The :CustomDatasetSummaryFields setting now allows spaces along with a comma separating field names. In addition, a bug that caused license information to be hidden if there are no values for any of the custom fields specified has been fixed. See #11228 and #11229.
- Dataverse 6.5 introduced a bug which causes search to fail for non-superusers in multiple groups when the
AVOID_EXPENSIVE_SOLR_JOINfeature flag is set to true. This release fixes the bug. See #11133 and #11134. - We fixed a bug with My Data where listing collections for a user with only rights on harvested collections would result in a server error response. See #11083.
- Minor styling fixes for the Related Publication field and fields using ORCID or ROR have been made. See #11053, #10964, and #11106.
- In the Search API, files were displaying DRAFT version instead of latest released version under
dataset_citation. See #10735 and #11051. - Unnecessary Solr documents were being created when a file was added or deleted from a draft dataset. These documents could accumulate and potentially impact performance. There is no action to take because this release includes a new Solr version, which will start with an empty database. See #11113 and #11114.
- When using the API to update a collection, omitting optional fields such as
inputLevels,facetIds, ormetadataBlockNamescaused data to be deleted. The fix no longer deletes data for these fields. Two new flags have been added to themetadataBlocksJSON object to signal the deletion of the data:inheritMetadataBlocksFromParent: trueandinheritFacetsFromParent: true. See the guides, #11130, and #11144.
API Updates
Search API Returns Additional Fields for Files
Added new fields to search results type=files
For Files:
- restricted: boolean
- canDownloadFile: boolean (from file user permission)
- categories: array of string "categories" would be similar to what it is in metadata api.
For tabular files:
- tabularTags: array of string for example,
{"tabularTags" : ["Event", "Genomics", "Geospatial"]} - variables: number/int shows how many variables we have for the tabular file
- observations: number/int shows how many observations for the tabular file
See #11027 and #11097.
Backend Support for Collection Featured Items
CRUD endpoints for Collection Featured Items have been implemented. In particular, the following endpoints have been implemented:
- Create a feature item (POST
/api/dataverses/<dataverse_id>/featuredItems) - Update a feature item (PUT
/api/dataverseFeaturedItems/<item_id>) - Delete a feature item (DELETE
/api/dataverseFeaturedItems/<item_id>) - List all featured items in a collection (GET
/api/dataverses/<dataverse_id>/featuredItems) - Delete all featured items in a collection (DELETE
/api/dataverses/<dataverse_id>/featuredItems) - Update all featured items in a collection (PUT
/api/dataverses/<dataverse_id>/featuredItems)
See also the "Settings Added" section, #10943 and #11124.
Other API Updates
- Multiple files can be deleted from a dataset at once. See the the guides and #11230.
- An API has been added to get the "classic" download count from a dataset with an optional
includeMDCparameter (for Make Data Count). See the guides, #11244 and #11282. - An API has been added that lists the collections that the user has access to via the permission passed. See the guides, #6467, and #10906.
- An API has been added to get dataset versions including a summary of differences between consecutive versions where available. See the docs, #10888, and #10945.
- An API has been added to list of versions of a data file showing any changes that affected the file with each version. See the guides, #11198 and #11237.
- The Search API has a new parameter called
show_type_counts. If you set it to true, it will returntotal_count_per_object_typefor the types dataverse, dataset, and files (#11065 and #11082) even if the search result for any given type is 0 (#11127 and #11138). - CRUD operations for external tools are now available for superusers from non-localhost. See the guides, #10930 and #11079.
- A new API endpoint has been added that allows a global role to be updated. See the guides and #10612.
- An API has been added to send feedback to the collection, dataset, or data file's contacts. If necessary, you can rate limit the
CheckRateLimitForDatasetFeedbackCommandand configure the new :ContactFeedbackMessageSizeLimit database setting. See the guides, #11129, and #11162. - /api/metadatablocks is no longer returning duplicated metadata properties and does not omit metadata properties when called. See "Backward Incompatible Changes" below and #10764.
- A new query param,
returnChildCount, has been added to the getDataverse endpoint (/api/dataverses/{id}) for optionally retrieving the child count, which represents the number of collections, datasets, or files within the collection (direct children only). See also #11255 and #11259.
End-Of-Life (EOL) Announcements
PostgreSQL 13 reaches EOL on 13 November 2025
Per https://www.postgresql.org/support/versioning/ PostgreSQL 13 reaches EOL on 13 November 2025. Our first step toward moving off version 13 was to switch our testing to version 16, as we've noted in the guides. You are encouraged to start planning your upgrade and may want to review the Dataverse 5.4 release notes as the upgrade process (e.g. pg_dumpall, etc.) will likely be similar. If you notice any bumps along the way, please let us know!
Dataverse developers using Docker have been using PostgreSQL 17 since Dataverse 6.5 (#10912). (Developers not using Docker who are still on PostgreSQL 13 are encouraged to upgrade.) Older or newer versions should work, within reason.
See also #11212 and #11215.
Security
SameSite Cookie Attribute
The SameSite cookie attribute is defined in an upcoming revision to RFC 6265 (HTTP State Management Mechanism) called 6265bis ("bis" meaning "repeated"). The possible values are "None", "Lax", and "Strict".
"If no SameSite attribute is set, the cookie is treated as Lax by default" by browsers according to MDN. This was the previous behavior of Dataverse, to not set the SameSite attribute.
New Dataverse installations now explicitly set to the SameSite cookie attribute to "Lax" out of the box through the installer (in the case of a "classic" installation) or through an updated base image (in the case of a Docker installation). Classic installations should follow the upgrade instructions below to bring their installation up to date with the behavior for new installations. Docker installations will automatically get the updated base image.
While you are welcome to experiment with "Strict", which is intended to help prevent Cross-Site Request Forgery (CSRF) attacks, as described in the RFC proposal and an OWASP cheatsheet, our testing so far indicates that some functionality, such as OIDC login, seems to be incompatible with "Strict".
You should avoid the use of "None" as it is less secure than "Lax". See also the guides, https://github.com/IQSS/dataverse-security/issues/27, #11210, and the upgrade instructions below.
Settings Added
- dataverse.feature.enable-version-note
- dataverse.csl.common-styles
- dataverse.files.featured-items.image-maxsize - It sets the maximum allowed size of the image that can be added to a featured item.
- dataverse.files.featured-items.image-uploads - It specifies the name of the subdirectory for saving featured item images within the docroot directory.
- dataverse.feature.api-bearer-auth-provide-missing-claims
- dataverse.feature.api-bearer-auth-handle-tos-acceptance-in-idp
- :ContactFeedbackMessageSizeLimit
Backward Incompatible Changes
Generally speaking, see the API Changelog for a list of backward-incompatible API changes.
- /api/metadatablocks is no longer returning duplicated metadata properties and does not omit metadata properties when called. See #10764.
- The JSON response of API call
/api/dataverses/{dataverseAlias}/metadatablockswill no longer include the DatasetFieldTypes in "fields" if they are children of another DatasetFieldType. The child DatasetFieldType will only be included in the "childFields" of its parent DatasetFieldType. See #10472 and #11066. versionNotehas been renamed todeaccessionNote.archiveNotehas been renamed todeaccessionLink. See #11068.- The Show Role API endpoint was returning 401 Unauthorized when a permission check failed. This has been corrected to return 403 Forbidden instead. That is, the API token is known to be good (401 otherwise) but the user lacks permission (403 is now sent). See also the API Changelog, #10340, and #11116.
- Changes to PID formatting occur in the DDI/DDI Html export formats and the EndNote and BibTex citation formats. These changes correct errors and improve conformance with best practices but could break parsing of these formats. See #10768, #10769, #11165, and #10790.
Complete List of Changes
For the complete list of code changes in this release, see the 6.6 milestone in GitHub.
Getting Help
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!
Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!
You are also very welcome to join the Global Dataverse Community Consortium (GDCC).
Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.5.
0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.
Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.
shell
export PAYARA=/usr/local/payara6
(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)
1. List deployed applications
shell
$PAYARA/bin/asadmin list-applications
2. Undeploy the previous version (should match "list-applications" above)
shell
$PAYARA/bin/asadmin undeploy dataverse-6.5
3. Stop Payara
shell
sudo service payara stop
4. Upgrade to Payara 6.2025.2
The steps below reuse your existing domain directory with the new distribution of Payara. You may also want to review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara Release Notes. We also recommend you ensure you followed all update instructions from the past releases regarding Payara. (The most recent Payara update was for v6.3.)
Move the current Payara directory out of the way:
shell
mv $PAYARA $PAYARA.6.2024.6
Download the new Payara version 6.2025.2 (from https://www.payara.fish/downloads/payara-platform-community-edition/ or https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2025.2/payara-6.2025.2.zip), and unzip it in its place:
shell
cd /usr/local
unzip payara-6.2025.2.zip
Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1:
shell
mv payara6/glassfish/domains/domain1 payara6/glassfish/domains/domain1_DIST
mv payara6.6.2024.6/glassfish/domains/domain1 payara6/glassfish/domains/
5. Download and deploy this version
shell
wget https://github.com/IQSS/dataverse/releases/download/v6.6/dataverse-6.6.war
$PAYARA/bin/asadmin deploy dataverse-6.6.war
Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.
shell
sudo service payara stop
sudo rm -rf $PAYARA/glassfish/domains/domain1/generated
sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache
sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases
sudo service payara start
6. For installations with internationalization or text customizations:
Please remember to update translations via Dataverse language packs.
If you have text customizations you can get the latest English files from https://github.com/IQSS/dataverse/tree/v6.6/src/main/java/propertyFiles.
7. Decide to enable (or not) the index-harvested-metadata-source feature flag
Decide whether or not to enable the dataverse.feature.index-harvested-metadata-source feature flag described above, in the guides, #10217 and #11217. The reason to decide now is that reindexing is required and the next steps involve restarting Payara and upgrading Solr, which will result in a fresh index.
8. Configure SameSite
To bring your Dataverse installation in line with new installations, as described above and in the guides, we recommend running the following commands:
``` ./asadmin set server-config.network-config.protocols.protocol.http-listener-1.http.cookie-same-site-value=Lax
./asadmin set server-config.network-config.protocols.protocol.http-listener-1.http.cookie-same-site-enabled=true ```
Please note that "None" is less secure than "Lax" and should be avoided. You can test the setting by inspecting headers with curl, looking at the JSESSIONID cookie for "SameSite=Lax" (yes, it's expected to be repeated, probably due to a bug in Payara) like this:
% curl -s -I http://localhost:8080 | grep JSESSIONID
Set-Cookie: JSESSIONID=6574324d75aebeb86dc96ecb3bb0; Path=/;SameSite=Lax;SameSite=Lax
Before making the changes above, SameSite attribute should be absent, like this:
% curl -s -I http://localhost:8080 | grep JSESSIONID
Set-Cookie: JSESSIONID=6574324d75aebeb86dc96ecb3bb0; Path=/
8. Restart Payara
shell
sudo service payara stop
sudo service payara start
9. Update metadata blocks
These changes reflect incremental improvements made to the handling of core metadata fields.
Expect the loading of the citation block to take several seconds because of its size (especially due to the number of languages).
```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/scripts/api/data/metadatablocks/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv ```
The 3D Objects metadata block is included in all new installations of Dataverse so we recommend adding it.
```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/scripts/api/data/metadatablocks/3d_objects.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file 3d_objects.tsv ```
10. Upgrade Solr
Solr 9.8.0 is now the version recommended in our Installation Guide and used with automated testing. Additionally, due to the new range search support feature and the addition of fields (e.g. versionNote, fileRestricted, canDownloadFile, variableCount, and observations), the default schema.xml files has changed so you must upgrade.
Install Solr 9.8.0 following the instructions from the Installation Guide.
The instructions in the guide suggest to use the config files from the installer zip bundle. When upgrading an existing instance, it may be easier to download them from the source tree:
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/solrconfig.xml
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/schema.xml
cp solrconfig.xml schema.xml /usr/local/solr/solr-9.8.0/server/solr/collection1/conf
10a. For installations with additional metadata blocks or external controlled vocabulary scripts, update fields
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide).Run the
update-fields.shscript that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your Solr installation):
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.6/conf/solr/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.8.0/server/solr/collection1/conf/schema.xml
- Start Solr instance (usually
service solr startdepending on Solr/OS).
11. Reindex Solr
shell
curl http://localhost:8080/api/admin/index
12. Run reExportAll to update dataset metadata exports
For existing published datasets, additional license metadata will not be available from DataCite or in metadata exports until
- the dataset is republished or
- the /api/admin/metadata/{id}/reExportDataset is run for the dataset or
- the /api/datasets/{id}/modifyRegistrationMetadata API is run for the dataset or
- the global version of these API calls (/api/admin/metadata/reExportAll, /api/datasets/modifyRegistrationPIDMetadataAll) are used.
For this reason, we recommend reexporting all dataset metadata. For more advanced usage, please see the guides.
shell
curl http://localhost:8080/api/admin/metadata/reExportAll
13. (Optional) Re-harvest datasets
The publisher value of harvested datasets is now attributed to the dataset's distributor instead of its producer. For more information, see the guides, #8739, and #9013.
This improves the citation associated with these datasets, but the change only affects newly harvested datasets.
If you would like to pick up this change for existing harvested datasets, you should re-harvest them. This can be accomplished by deleting and re-adding each harvesting client, followed by a harvesting run. You may want to use harvesting client APIs to save (serialize), add, and remove clients.
- Java
Published by stevenwinship over 1 year ago
dataverse - v6.5
Dataverse 6.5
Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.5 rather than the list of releases, which will cut them off.
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!
Release Highlights
Highlights for Dataverse 6.5 include:
- new API endpoints, including editing of collections, Search API file counts, listing of exporters, comparing dataset versions, and auditing data files
- UX improvements, especially Preview URLs
- increased harvesting flexibility
- performance gains
- a security vulnerability addressed
- many bug fixes
- and more! Please see below.
Features Added
Private URL Renamed to Preview URL and Improved
The name of the URL that may be used by dataset administrators to share a draft version of a dataset has been changed from Private URL to Preview URL.
Also, additional information about the creation of Preview URLs has been added to the popup accessed via edit menu of the Dataset Page.
Users of the Anonymous Preview URL will no longer be able to see the name of the Dataverse that the dataset is in but will be able to see the name of the repository.
Any Private URLs created in previous versions of Dataverse will continue to work.
The old "privateUrl" API endpoints for the creation and deletion of Preview (formerly Private) URLs have been deprecated. They will continue to work but please switch to the "previewUrl" equivalents that have been documented in the API Guide.
See also #8184, #8185, #10950, #10961, and #11085.
Showing Differences Between Dataset Versions is More Scalable
Showing differences between dataset versions, which is done during dataset edit operations and to populate the dataset page versions table, has been made significantly more scalable. See #10814 and #10818.
Version Differences Details Sorting Added
In order to facilitate the comparison between the draft version and the published version of a dataset, a sort on subfields has been added. See #10969.
Reindexing After a Role Assignment is Less Memory Intensive
Adding or removing a user from a role on a collection, particularly the root collection, could lead to a significant increase in memory use, resulting in Dataverse itself failing with an out-of-memory condition. Such changes now consume much less memory. A Solr reindexing step is included in the upgrade instructions below. See also #10697 and #10698.
Longer Custom Questions in Guestbooks
Custom questions in Guestbooks can now be more than 255 characters and the bug causing a silent failure when questions were longer than this limit has been fixed. See also #9492, #10117, #10118.
PostgreSQL and Flyway Updates
This release bumps the version of PostgreSQL and Flyway used in containers as well as the PostgreSQL JDBC driver used all installations, including classic (non-Docker) installations. PostgreSQL and its driver have been bumped to version 17. Flyway has been bumped to version 10.
PostgreSQL 13 remains the version used with automated testing, leading us to continue to recommend that version for classic installations.
As of Flyway 10, supporting older versions of PostgreSQL no longer requires a paid subscription. While we don't encourage the use of older PostgreSQL versions, this flexibility may benefit some of our long-standing installations in their upgrade paths.
As part of this update, the containerized development environment now uses Postgres 17 instead of 16. Developers must delete their data (rm -rf docker-dev-volumes) and start with an empty database (rerun the quickstart in the dev guide), as explained on the dev mailing list.
The Docker compose file used for evaluations or demos has been upgraded from Postgres 13 to 17.
See also #10889 and #10912.
Harvesting "oai_dc" Metadata Prefix When Extended With Specific Namespaces
Some data repositories extend the "oai_dc" metadata prefix with specific namespaces. In this case, harvesting of these datasets into Dataverse was not possible because an XML parsing error was raised.
Harvesting of these datasets has been fixed by excluding tags with namespaces that are not "dc:". That is, only harvesting metadata with the "dc" namespace. See #10837.
Harvested Dataset PID from Record Header
When harvesting, Dataverse can now use the identifier from the OAI-PMH record header as the persistent id for the harvested dataset.
This will allow harvesting from sources that do not include a persistent id in their oai_dc metadata records, but use valid DOIs or handles as the OAI-PMH record header identifiers.
It is also possible to optionally configure a harvesting client to use this OAI-PMH identifier as the preferred choice for the persistent id. See the Harvesting Clients API section of the Guides, #11049 and #10982 for more information.
Harvested Datasets Can Have Multiple "otherId" Values
When harvesting using the DDI format, datasets can now have multiple "otherId" values. See #10772.
Multiple Languages in Docker
Documentation has been added to explain how to set up multiple languages (e.g. English and French) in the tutorial for setting up Dataverse in Docker.
See the tutorial, #10939, and #10940.
GlobusBatchLookupSize
An optimization has been added for the Globus upload workflow, with a corresponding new database setting: :GlobusBatchLookupSize
See the Database Settings section of the guides, #10977, and #11040 for more information.
Bugs Fixed
Relation Type (Related Publication) and DataCite
The subfield "Relation Type" was added to the field "Related Publication" in Dataverse 6.4 (#10632) but couldn't be used without workarounds described in an announcement about the problem. The bug has been fixed and workarounds are no longer required. See #10926 and the announcement above.
Sort Order for Files
"Newest" and "Oldest" were reversed when sorting files on the dataset landing page. This has been fixed. See #10742 and #11000.
Guestbook Email Validation
In the Guestbook UI form, the email address is now checked for validity. See #10661 and #11022.
Updating Files Now Possible When Latest and Only Dataset Version is Deaccessioned
When a dataset was deaccessioned, and was the only previous version, it would cause an error when trying to update the files. This has been fixed. See #9351 and #10901.
My Data Filter by Username Feature Restored
The superuser-only feature of filtering by a username on the My Data page was not working. Entering a username in the "Results for Username" field now returns data for the desired user. See also #7239 and #10980.
Better Handling of Parallel Edit/Publish Errors
Improvements have been made in handling the errors when a dataset has been edited in one browser window and an attempt is made to edit or publish it in another. (This practice is discouraged, by the way.) See #10793 and #10794.
Facets Filter Labels Now Translated Above Search Results
On the main page, it's possible to filter results using search facets. If internationalization (i18n) has been enabled in the Dataverse installation, allowing pages to be displayed in several languages, the facets were correctly translated in the filter column at the left. However, they were not being translated above the search results, remaining in the default language, English. This has been fixed. See #9408 and #10158.
Unpublished File Bug Fix Related to Deaccessioning
A bug fix was made related to retrieval of the major version of a Dataset when all major versions were deaccessioned. This fixes the incorrect showing of the files as "Unpublished" in the search list even when they are published. In the upgrade instructions below, there is a step to reindex Solr. See also #10947 and #10974.
Minor DataCiteXML Fix (Useless Null)
A minor bug fix was made to avoid sending a useless ", null" in the DataCiteXML sent to DataCite and in the DataCite export when a dataset has a metadata entry for "Software Name" and no entry for "Software Version". The bug fix will update datasets upon publication. Anyone with existing published datasets with this problem can be fixed by pushing updated metadata to DataCite for affected datasets and re-exporting the dataset metadata. See "Pushing updated metadata to DataCite" in the upgrade instructions below. See also #10919.
PIDs and Make Data Count Citation Retrieval
Make Data Count (MDC) citation retrieval with the PID settings has been fixed. PID parsing in Dataverse is now case insensitive, improving interaction with services that may change the case of PIDs. Warnings related to managed/excluded PID lists for PID providers have been reduced. See #10708.
Quirk in Overview Display When Using External Controlled Variables
This bugfix corrects an issue when there are duplicated entries on the metadata page. It is fixed by correcting an IF-clause in metadataFragment.xhtml. See #11005 and #11034.
Globus "missing properties" Logging Fixed
In previous releases, logging would show Globus-related strings were missing from properties files. This has been fixed. See #11030.
API Updates
Editing Collections
A new endpoint (PUT /api/dataverses/<identifier>) for updating an existing collection (dataverse) has been added. It uses the same JSON structure as the one used for collection creation. See also the docs, #10904, and #10925.
fileCount Added to Search API
A new search field called fileCount can be searched to discover the number of files per dataset. The upgrade instructions below explain how to update your Solr schema.xml file to add the new field and reindex Solr. See also #8941 and #10598.
List Dataset Metadata Exporters
A list of available dataset metadata exporters can now be retrieved programmatically via API. See the docs and #10739.
Comparing Dataset Versions
An API has been added to compare dataset versions. See the docs, #10888, and #10945.
Audit Data Files
A superuser-only API endpoint has been added to audit datasets with data files where the physical files are missing or the file metadata is missing. See the docs, #11016, and #220.
Update Collection API Inheritance
The update collection (dataverse) API endpoint has been updated to support an "inherit from parent" configuration for metadata blocks, facets, and input levels.
Previously, not setting these fields meant using a copy of the settings from the parent collection, which could get out of sync. See also the docs, #11018, and #11026.
isMetadataBlockRoot and isFacetRoot
The JSON payload of the "get collection" endpoint has been extended to include properties isMetadataBlockRoot and isFacetRoot. See also the docs, #11012, and #11013.
Whitespace Trimming When Loading Metadata Block TSV Files
When loading custom metadata blocks using the api/admin/datasetfield/load API endpoint, whitespace can be introduced into field names. Whitespace is now trimmed from the beginning and end of all values read into the API before persisting them. See #10688 and #10696.
Image URLs from the Search API
As of 6.4 (#10855) image_url is being returned from the Search API. The logic has been updated to only show the image if each of the following are true:
- The data file is not harvested
- A thumbnail is available for the data file
- If the data file is restricted, then the caller must have DownloadFile permission for the data file
- The data file is NOT actively embargoed
- The data file's retention period has NOT expired
See also #10875 and #10886.
Metrics API Bug Fixes
Two bugs in the Metrics API have been fixed:
The /datasets and /datasets/byMonth endpoints could report incorrect values if or when they have been called using the "dataLocation" parameter (which allows getting metrics for local, remote (harvested), or all datasets) as the metrics cache was not storing different values for these cases.
Metrics endpoints whose calculation relied on finding the latest published dataset version were incorrect if/when the minor version number was > 9.
The upgrade instructions below include a step for clearing the metrics cache.
See also #10379 and #10865.
API Tokens
An optional query parameter called "returnExpiration" has been added to the /api/users/token/recreate endpoint, which, if set to true, returns the expiration time in the response. See the docs, #10857 and #10858.
The /api/users/token endpoint has been extended to support any auth mechanism for retrieving the token information. Previously this endpoint only accepted an API token to retrieve its information. Now it accepts any authentication mechanism and returns the associated API token information. See #10914 and #10924.
Settings Added
:GlobusBatchLookupSize
Backward Incompatible Changes
Generally speaking, see the API Changelog for a list of backward-incompatible API changes.
List Collections Linked to a Dataset
The API endpoint that returns a list of collections that a dataset has been linked to has been improved to provide a more structured JSON response. See the docs, #9650, and #9665.
Complete List of Changes
For the complete list of code changes in this release, see the 6.5 milestone in GitHub.
Getting Help
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!
Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!
You are also very welcome to join the Global Dataverse Community Consortium (GDCC).
Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.4.
0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. By default, Payara runs as the dataverse user. In the commands below, we use sudo to run the commands as a non-root user.
Also, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.
shell
export PAYARA=/usr/local/payara6
(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)
1. List deployed applications
shell
$PAYARA/bin/asadmin list-applications
2. Undeploy the previous version (should match "list-applications" above)
shell
$PAYARA/bin/asadmin undeploy dataverse-6.4
3. Stop and start Payara
shell
sudo service payara stop
sudo service payara start
4. Download and deploy this version
shell
wget https://github.com/IQSS/dataverse/releases/download/v6.5/dataverse-6.5.war
$PAYARA/bin/asadmin deploy dataverse-6.5.war
Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.
shell
sudo service payara stop
sudo rm -rf $PAYARA/glassfish/domains/domain1/generated
sudo rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache
sudo rm -rf $PAYARA/glassfish/domains/domain1/lib/databases
5. For installations with internationalization:
Please remember to update translations via Dataverse language packs.
6. Restart Payara
shell
sudo service payara stop
sudo service payara start
7. Update Solr schema.xml file. Start with the standard v6.5 schema.xml, then, if your installation uses any custom or experimental metadata blocks, update it to include the extra fields (step 7a).
Run the commands below as a non-root user.
Stop Solr (usually sudo service solr stop, depending on Solr installation/OS, see the Installation Guide).
shell
sudo service solr stop
Replace schema.xml
Please note that the path to Solr may differ from the example below.
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.5/conf/solr/schema.xml
sudo cp schema.xml /usr/local/solr/solr-9.4.1/server/solr/collection1/conf
Start Solr (but if you use any custom metadata blocks, perform the next step, 7a first).
shell
sudo service solr start
7a. For installations with custom or experimental metadata blocks:
Before starting Solr, update the schema.xml file to include all the extra metadata fields that your installation uses.
We do this by collecting the output of Dataverse's Solr schema API endpoint (/api/admin/index/solr/schema) and piping it to the update-fields.sh script which updates the schema.xml file supplied as an argument.
The example below assumes the default installation location of Solr, but you can modify the commands as needed.
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.5/conf/solr/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | sudo ./update-fields.sh /usr/local/solr/solr-9.4.1/server/solr/collection1/conf/schema.xml
Now start Solr.
shell
sudo service solr start
8. Reindex Solr
Below is the simplest way to reindex Solr:
shell
curl http://localhost:8080/api/admin/index
The API above rebuilds the existing index. If you want to be absolutely sure that your index is up-to-date and consistent, you may consider wiping it clean and reindexing everything from scratch (see the guides). Just note that, depending on the size of your database, a full reindex may take a while and the users will be seeing incomplete search results during that window.
9. Run reExportAll to update dataset metadata exports
Below is the simple way to reexport all dataset metadata. For more advanced usage, please see the guides.
shell
curl http://localhost:8080/api/admin/metadata/reExportAll
10. Clear metrics cache
Run the clearMetricsCache API endpoint to remove old cached values that may be incorrect.
shell
curl -X DELETE http://localhost:8080/api/admin/clearMetricsCache
11. Pushing updated metadata to DataCite
(If you don't use DataCite, you can skip this. Also, if you aren't affected by the "useless null" bug described above, you can skip this.)
Entries at DataCite for published datasets can be updated by a superuser using an API call (newly documented):
curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/modifyRegistrationPIDMetadataAll
This will loop through all published datasets (and released files with PIDs). As long as the loop completes, the call will return a 200/OK response. Any PIDs for which the update fails can be found using the following command:
grep 'Failure for id' server.log
Failures may occur if PIDs were never registered, or if they were never made findable. Any such cases can be fixed manually in DataCite Fabrica or using the Reserve a PID API call and the newly documented /api/datasets/<id>/modifyRegistration call respectively. See https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#send-dataset-metadata-to-pid-provider. Please reach out with any questions.
PIDs can also be updated by a superuser on a per-dataset basis using
curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/<id>/modifyRegistrationMetadata
- Java
Published by ofahimIQSS over 1 year ago
dataverse - v6.4
Dataverse 6.4
Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.4 rather than the list of releases, which will cut them off.
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
New features in Dataverse 6.4:
- Enhanced DataCite Metadata, including "Relation Type"
- All ISO 639-3 languages are now supported
- There is now a button for "Unlink Dataset"
- Users will have DOIs/PIDs reserved for their files as part of file upload instead of at publication time
- Datasets can now have types such as "software" or "workflow"
- Croissant support
- RO-Crate support
- and more! Please see below.
New client library:
- Rust
This release also fixes two important bugs described below and in a post on the mailing list:
- "Update Current Version" can cause metadata loss
- Publishing breaks designated dataset thumbnail, messes up collection page
Additional details on the above as well as many more features and bug fixes included in the release are described below. Read on!
Features Added
Enhanced DataCite Metadata, Including "Relation Type"
Within the "Related Publication" field, a new subfield has been added called "Relation Type" that allows for the most common values recommended by DataCite: isCitedBy, Cites, IsSupplementTo, IsSupplementedBy, IsReferencedBy, and References. For existing datasets where no "Relation Type" has been specified, "IsSupplementTo" is assumed.
Dataverse now supports the DataCite v4.5 schema. Additional metadata is now being sent to DataCite including metadata about related publications and files in the dataset. Improved metadata is being sent including how PIDs (ORCID, ROR, DOIs, etc.), license/terms, geospatial, and other metadata are represented. The enhanced metadata will automatically be sent to DataCite when datasets are created and published. Additionally, after publication, you can inspect what was sent by looking at the DataCite XML export.
The additions are in rough alignment with the OpenAIRE XML export, but there are some minor differences in addition to the Relation Type addition, including an update to the DataCite 4.5 schema. For details see #10632, #10615 and the design document referenced there.
Multiple backward incompatible changes and bug fixes have been made to API calls (three of four of which were not documented) related to updating PID target URLs and metadata at the provider service: - Update Target URL for a Published Dataset at the PID provider - Update Target URL for all Published Datasets at the PID provider - Update Metadata for a Published Dataset at the PID provider - Update Metadata for all Published Datasets at the PID provider
Full List of ISO 639-3 Languages Now Supported
The controlled vocabulary values list for the metadata field "Language" in the citation block has now been extended to include roughly 7920 ISO 639-3 values.
Some of the language entries in the pre-6.4 list correspond to "macro languages" in ISO-639-3 and admins/users may wish to update to use the corresponding individual language entries from ISO-639-3. As these cases are expected to be rare (they do not involve major world languages), finding them is not covered in the release notes. Anyone who desires help in this area is encouraged to reach out to the Dataverse community via any of the standard communication channels.
ISO 639-3 codes were downloaded from sil.org and the file used for merging with the existing citation.tsv was "iso-639-3.tab". See also #8578 and #10762.
Unlink Dataset Button
A new "Unlink Dataset" button has been added to the dataset page to allow a user to unlink a dataset from a collection. To unlink a dataset the user must have permission to link the dataset. Additionally, the existing API for unlinking datasets has been updated to no longer require superuser access as the "Publish Dataset" permission is now enough. See also #10583 and #10689.
Pre-Publish File DOI Reservation
Dataverse installations using DataCite as a persistent identifier (PID) provider (or other providers that support reserving PIDs) will be able to reserve PIDs for files when they are uploaded (rather than at publication time). Note that reserving file DOIs can slow uploads with large numbers of files so administrators may need to adjust timeouts (specifically any Apache "ProxyPass / ajp://localhost:8009/ timeout=" setting in the recommended Dataverse configuration). See also #7334.
Initial Support for Dataset Types
Out of the box, all datasets now have the type "dataset" but superusers can add additional types. At this time the type of a dataset can only be set at creation time via API. The types "dataset", "software", and "workflow" (just those three, for now) will be sent to DataCite (as resourceTypeGeneral) when the dataset is published.
For details see the guides, #10517 and #10694. Please note that this feature is highly experimental and is expected to evolve.
Croissant Support (Metadata Export)
A new metadata export format called Croissant is now available as an external metadata exporter. It is oriented toward making datasets consumable by machine learning.
For more about the Croissant exporter, including installation instructions, see https://github.com/gdcc/exporter-croissant. See also #10341, #10533, and discussion on the mailing list.
Please note: the Croissant exporter works best with Dataverse 6.2 and higher (where it updates the content of <head> as described in the guides) but can be used with 6.0 and higher (to get the export functionality).
RO-Crate Support (Metadata Export)
Dataverse now supports RO-Crate as a metadata export format. This functionality is not available out of the box, but you can enable one or more RO-Crate exporters from the list of external exporters. See also #10744 and #10796.
Rust API Client Library
An Dataverse API client library for the Rust programming language is now available at https://github.com/gdcc/rust-dataverse and has been added to the list of client libraries in the API Guide. See also #10758.
Collection Thumbnail Logo for Featured Collections
Collections can now have a thumbnail logo that is displayed when the collection is configured as a featured collection. If present, this thumbnail logo is shown. Otherwise, the collection logo is shown. Configuration is done under the "Theme" for a collection as explained in the guides. See also #10291 and #10433.
Saved Searches Can Be Deleted
Saved searches can now be deleted via API. See the Saved Search section of the API Guide, #9317 and #10198.
Notification Email Improvement
When notification emails are sent the part of the closing that says "contact us for support at" will now show the support email address (dataverse.mail.support-email), when configured, instead of the default system email address. Using the system email address here was particularly problematic when it was a "noreply" address. See also #10287 and #10504.
Ability to Disable Automatic Thumbnail Selection
It is now possible to turn off the feature that automatically selects one of the image datafiles to serve as the thumbnail of the parent dataset. An admin can turn it off by enabling the feature flag dataverse.feature.disable-dataset-thumbnail-autoselect. When the feature is disabled, a user can still manually pick a thumbnail image, or upload a dedicated thumbnail image. See also #10820.
More Flexible PermaLinks
The configuration setting dataverse.pid.*.permalink.base-url, which is used for PermaLinks, has been updated to support greater flexibility. Previously, the string /citation?persistentId= was automatically appended to the configured base URL. With this update, the base URL will now be used exactly as configured, without any automatic additions. See also #10775.
Globus Async Framework
A new alternative implementation of Globus polling during upload data transfers has been added in this release. This experimental framework does not rely on the instance staying up continuously for the duration of the transfer and saves the state information about Globus upload requests in the database. See globus-use-experimental-async-framework under Feature Flags and dataverse.files.globus-monitoring-server in the Installation Guide. See also #10623 and #10781.
CVoc (Controlled Vocabulary): Allow ORCID and ROR to Be Used Together in Author Field
Changes in Dataverse and updates to the ORCID and ROR external vocabulary scripts support deploying these for the citation block author field (and others). See also #10711, #10712, and https://github.com/gdcc/dataverse-external-vocab-support/pull/22.
Development on Windows
New instructions have been added for developers on Windows trying to run a Dataverse development environment using Windows Subsystem for Linux (WSL). See the guides, #10606, and #10608.
Experimental Crossref PID (DOI) Provider
Crossref can now be used as a PID (DOI) provider, but this feature is experimental. Please provide feedback through the usual channels. See also the guides, #8581, and #10806.
Improved JSON Schema Validation for Datasets
JSON Schema validation has been enhanced with checks for required and allowed child objects as well as type checking for field types including primitive, compound and controlledVocabulary. More user-friendly error messages help pinpoint the issues in the dataset JSON. See Retrieve a Dataset JSON Schema for a Collection in the API Guide, #10169, and #10543.
Counter Processor 1.05 Support (Make Data Count)
Counter Processor 1.05 is now supported for use with Make Data Count. If you are running Counter Processor, you should reinstall/reconfigure it as described in the latest guides. Note that Counter Processor 1.05 requires Python 3, so you will need to follow the full Counter Processor install. Also note that if you configure the new version the same way, it will reprocess the days in the current month when it is first run. This is normal and will not affect the metrics in Dataverse. See also #10479.
Version Tags for Container Base Images
With this release we introduce a detailed maintenance workflow for our container images. As output of the Containerization Working Group, the community takes another step towards production ready containers available directly from the core project.
The maintenance workflow regularly updates the Container Base Image, which contains the operating system, Java, Payara, and tools and libraries required by the Dataverse application. Shipping these rolling releases as well as immutable revisions is the foundation for secure and reliable Dataverse Application Container images. See also #10478 and #10827.
Bugs Fixed
Update Current Version
A significant bug in the superuser-only Update Current Version publication option was fixed. If the "Update Current Version" option was used when changes were made to the dataset terms (rather than to dataset metadata) or if the PID provider service was down or returned an error, the update would fail and render the dataset unusable and require restoration from a backup. The fix in this release allows the update to succeed in both of these cases and redesigns the functionality such that any unknown issues should not make the dataset unusable (i.e. the error would be reported and the dataset would remain in its current state with the last-published version as it was and changes still in the draft version.)
If you do not plan to upgrade to Dataverse 6.4 right away, you are encouraged to alert your superusers to this issue (see this post). Here are some workarounds for pre-6.4 versions:
- Change the "dataset.updateRelease" entry in the Bundle.properties file (or local language version) to "Do Not Use" or similar (this doesn't disable the button but alerts superusers to the issue), or
- Edit the dataset.xhtml file to remove the lines below, delete the contents of the generated and osgi-cache directories in the Dataverse Payara domain, and restart the Payara server. This will remove the "Update Current Version" from the UI.
<c:if test="#{dataverseSession.user.isSuperuser()}">
<f:selectItem rendered="#" itemLabel="#{bundle['dataset.updateRelease']}" itemValue="3" />
</c:if>
Again, the workarounds above are only for pre-6.4 versions. The bug has been fixed in Dataverse 6.4. See also #10797.
Broken Thumbnails
Dataverse 6.3 introduced a bug where publishing would break the dataset thumbnail, which in turn broke the rendering of the parent collection (dataverse) page.
This bug has been fixed but any existing broken thumbnails must be fixed manually. See "clearThumbnailFailureFlag" in the upgrade instructions below.
Additionally, it is now possible to turn off the feature that automatically selects of one of the image datafiles to serve as the thumbnail of the parent dataset. An admin can turn it off by raising the feature flag <jvm-options>-Ddataverse.feature.disable-dataset-thumbnail-autoselect=true</jvm-options>. When the feature is disabled, a user can still manually pick a thumbnail image, or upload a dedicated thumbnail image.
See also #10819, #10820, and the post on the mailing list.
No License, No Terms of Use
When datasets have neither a license nor custom terms of use, the dataset page will now indicate this. Also, these datasets will no longer be indexed as having custom terms. See also #8796, #10513, and #10614.
CC0 License Bug Fix
At a high level, some datasets have been mislabeled as "Custom License" when they should have been "CC0 1.0". This has been corrected.
In Dataverse 5.10, datasets with only "CC0 Waiver" in the "termsofuse" field were converted to "Custom License" (instead of the CC0 1.0 license) through a SQL migration script (see #10634). On deployment of Dataverse 6.4, a new SQL migration script will be run automatically to correct this, changing these datasets to CC0. You can review the script in #10634, which only affect the following datasets:
- The existing "Terms of Use" must be equal to "This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions: CC0 Waiver" (this was set in #10634).
- The following terms fields must be empty: Confidentiality Declaration, Special Permissions, Restrictions, Citation Requirements, Depositor Requirements, Conditions, and Disclaimer.
- The license ID must not be assigned.
The script will set the license ID to that of the CC0 1.0 license and remove the contents of "termsofuse" field. See also #9081 and #10634.
Remap oai_dc Export and Harvesting Format Fields: dc:type and dc:date
The oai_dc export and harvesting format has had the following fields remapped:
- dc:type was mapped to the field "Kind of Data". Now it is hard-coded to the word "Dataset".
- dc:date was mapped to the field "Production Date" when available and otherwise to "Publication Date". Now it is mapped the field "Publication Date" or the field used for the citation date, if set (see Set Citation Date Field Type for a Dataset).
In order for these changes to be reflected in existing datasets, a reexport all should be run (mentioned below). See #8129 and #10737.
Zip File No Longer Misdetected as Shapefile (Hidden Directories)
When detecting files types, Dataverse would previously detect a zip file as a shapefile if it contained markers of a shapefile in hidden directories. These hidden directories are now ignored when deciding if a zip file is a shapefile or not. See also #8945 and #10627.
External Controlled Vocabulary
This release fixes a bug (introduced in v6.3) in the external controlled vocabulary mechanism that could cause indexing to fail (with a NullPointerException) when a script is configured for one child field and no other child fields were managed. See also #10869 and #10870.
Valid JSON in Error Response
When any ApiBlockingFilter policy applies to a request, the JSON in the body of the error response is now valid JSON. See also #10085.
Docker Container Base Image Security and Compatibility
- Switch "wait-for" to "wait4x", aligned with the Configbaker Image
- Update "jattach" to v2.2
- Install AMD64 / ARM64 versions of tools as necessary
- Run base image as unprivileged user by default instead of
root- this was an oversight from OpenShift changes - Linux User, Payara Admin and Domain Master passwords:
- Print hints about default, public knowledge passwords in place for
- Enable replacing these passwords at container boot time
- Enable building with updates Temurin JRE image based on Ubuntu 24.04 LTS
- Fix entrypoint script troubles with pre- and postboot script files
- Unify location of files at CONFIG_DIR=/opt/payara/config, avoid writing to other places
See also #10508, #10672 and #10722.
Cleanup of Temp Directories
In this release we addressed an issue where copies of files uploaded via the UI were left in one specific temp directory (.../domain1/uploads by default). We would like to remind all the installation admins that it is strongly recommended to have some automated (and aggressive) cleanup mechanisms in place for all the temp directories used by Dataverse. For example, at Harvard/IQSS we have the following configuration for the PrimeFaces uploads directory above: (note that, even with this fix in place, PrimeFaces will be leaving a large number of small log files in that location)
Instead of the default location (.../domain1/uploads) we use a directory on a dedicated partition, outside of the filesystem where Dataverse is installed, via the following JVM option:
<jvm-options>-Ddataverse.files.uploads=/uploads/web</jvm-options>
and we have a dedicated cronjob that runs every 30 minutes and deletes everything older than 2 hours in that directory:
15,45 * * * * /bin/find /uploads/web/ -mmin +119 -type f -name "upload*" -exec rm -f {} \; > /dev/null 2>&1
Trailing Commas in Author Name Now Permitted
When an author name ended in a comma (e.g. Smith, or Smith,), the dataset page was broken after publishing (a "500" error page was presented to the user). The underlying issue causing the JSON-LD Schema.org output on the page to break was fixed. See #10343 and #10776.
API Updates
Search API: affiliation, parentDataverseName, image_url, etc.
The Search API (/api/search) response now includes additional fields, depending on the type.
For collections (dataverses):
- "affiliation"
- "parentDataverseName"
- "parentDataverseIdentifier"
- "image_url" (optional)
javascript
"items": [
{
"name": "Darwin's Finches",
...
"affiliation": "Dataverse.org",
"parentDataverseName": "Root",
"parentDataverseIdentifier": "root",
"image_url":"/api/access/dvCardImage/{identifier}"
(etc, etc)
For datasets:
- "image_url" (optional)
javascript
"items": [
{
...
"image_url": "http://localhost:8080/api/datasets/2/logo"
...
(etc, etc)
For files:
- "releaseOrCreateDate"
- "image_url" (optional)
javascript
"items": [
{
"name": "test.png",
...
"releaseOrCreateDate": "2016-05-10T12:53:39Z",
"image_url":"/api/access/datafile/42?imageThumb=true"
(etc, etc)
These examples are also shown in the Search API section of the API Guide.
The image_url field was already part of the SolrSearchResult JSON (and incorrectly appeared in Search API documentation), but it wasn't returned by the API because it was appended only after the Solr query was executed in the SearchIncludeFragment of JSF (the old/current UI framework). Now, the field is set in SearchServiceBean, ensuring it is always returned by the API when an image is available.
The Solr schema.xml file has been updated to include a new field called "dvParentAlias" for supporting the new response field "parentDataverseIdentifier". See upgrade instructions below.
See also #10810 and #10811.
Search API: publicationStatuses
The Search API (/api/search) response will now include publicationStatuses in the JSON response as long as the list is not empty.
Example:
javascript
"items": [
{
"name": "Darwin's Finches",
...
"publicationStatuses": [
"Unpublished",
"Draft"
],
(etc, etc)
See also #10733 and #10738.
Search Facet Information Exposed
A new endpoint /api/datasetfields/facetables lists all facetable dataset fields defined in the installation, as described in the guides.
A new optional query parameter "returnDetails" added to /api/dataverses/{identifier}/facets/ endpoint to include detailed information of each DataverseFacet, as described in the guides. See also #10726 and #10727.
User Permissions on Collections
A new endpoint at /api/dataverses/{identifier}/userPermissions for obtaining the user permissions on a collection (dataverse) has been added. See also the guides, #10749 and #10751.
addDataverse Extended
The addDataverse (/api/dataverses/{identifier}) API endpoint has been extended to allow adding metadata blocks, input levels and facet IDs at creation time, as the Dataverse page in create mode does in JSF. See also the guides, #10633 and #10644.
Metadata Blocks and Display on Create
The /api/dataverses/{identifier}/metadatablocks endpoint has been fixed to not return fields marked as displayOnCreate=true if there is an input level with include=false, when query parameters returnDatasetFieldTypes=true and onlyDisplayedOnCreate=true are set. See also #10741 and #10767.
The fields "depositor" and "dateOfDeposit" in the citation.tsv metadata block file have been updated to have the property "displayOnCreate" set to TRUE. In practice, only the API is affected because the UI has special logic that already shows these fields when datasets are created. See also and #10850 and #10884.
Feature Flags Can Be Listed
It is now possible to list all feature flags and see if they are enabled or not. See also the guides and #10732.
Settings Added
The following settings have been added:
- dataverse.feature.disable-dataset-thumbnail-autoselect
- dataverse.feature.globus-use-experimental-async-framework
- dataverse.files.globus-monitoring-server
- dataverse.pid.*.crossref.url
- dataverse.pid.*.crossref.rest-api-url
- dataverse.pid.*.crossref.username
- dataverse.pid.*.crossref.password
- dataverse.pid.*.crossref.depositor
- dataverse.pid.*.crossref.depositor-email
Backward Incompatible Changes
- The oaidc export format has changed. See the "Remap oaidc" section above.
- Several APIs related to DataCite have changed. See "More and Better Data Sent to DataCite" above.
Complete List of Changes
For the complete list of code changes in this release, see the 6.4 milestone in GitHub.
Getting Help
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!
Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!
You are also very welcome to join the Global Dataverse Community Consortium (GDCC).
Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.3.
0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.
shell
export PAYARA=/usr/local/payara6`
(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)
1. Undeploy the previous version
shell
$PAYARA/bin/asadmin undeploy dataverse-6.3
2. Stop and start Payara
shell
service payara stop
sudo service payara start
3. Deploy this version
shell
$PAYARA/bin/asadmin deploy dataverse-6.4.war
Note: if you have any trouble deploying, stop Payara, remove the following directories, start Payara, and try to deploy again.
shell
service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated
rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache
rm -rf $PAYARA/glassfish/domains/domain1/lib/databases
4. For installations with internationalization:
Please remember to update translations via Dataverse language packs.
5. Restart Payara
shell
service payara stop
service payara start
6. Update metadata blocks
These changes reflect incremental improvements made to the handling of core metadata fields.
```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.4/scripts/api/data/metadatablocks/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv ```
7. Update Solr schema.xml file. Start with the standard v6.4 schema.xml, then, if your installation uses any custom or experimental metadata blocks, update it to include the extra fields (step 7a).
Stop Solr (usually service solr stop, depending on Solr installation/OS, see the Installation Guide).
shell
service solr stop
Replace schema.xml
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.4/conf/solr/schema.xml
cp schema.xml /usr/local/solr/solr-9.4.1/server/solr/collection1/conf
Start Solr (but if you use any custom metadata blocks, perform the next step, 7a first).
shell
service solr start
7a. For installations with custom or experimental metadata blocks:
Before starting Solr, update the schema to include all the extra metadata fields that your installation uses. We do this by collecting the output of the Dataverse schema API and feeding it to the update-fields.sh script that we supply, as in the example below (modify the command lines as needed to reflect the names of the directories, if different):
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.4/conf/solr/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.4.1/server/solr/collection1/conf/schema.xml
Now start Solr.
8. Reindex Solr
Below is the simplest way to reindex Solr:
shell
curl http://localhost:8080/api/admin/index
The API above rebuilds the existing index "in place". If you want to be absolutely sure that your index is up-to-date and consistent, you may consider wiping it clean and reindexing everything from scratch (see the guides). Just note that, depending on the size of your database, a full reindex may take a while and the users will be seeing incomplete search results during that window.
9. Run reExportAll to update dataset metadata exports
This step is necessary because of changes described above for the Datacite and oai_dc export formats.
Below is the simple way to reexport all dataset metadata. For more advanced usage, please see the guides.
shell
curl http://localhost:8080/api/admin/metadata/reExportAll
10. Pushing updated metadata to DataCite
(If you don't use DataCite, you can skip this.)
Above you updated the citation metadata block and Solr with the new "relationType" field. With these two changes, the "Relation Type" fields will be available and creation/publication of datasets will result in the expanded XML being sent to DataCite. You've also already run "reExportAll" to update the Datacite metadata export format.
Entries at DataCite for published datasets can be updated by a superuser using an API call (newly documented):
curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/modifyRegistrationPIDMetadataAll
This will loop through all published datasets (and released files with PIDs). As long as the loop completes, the call will return a 200/OK response. Any PIDs for which the update fails can be found using the following command:
grep 'Failure for id' server.log
Failures may occur if PIDs were never registered, or if they were never made findable. Any such cases can be fixed manually in DataCite Fabrica or using the Reserve a PID API call and the newly documented /api/datasets/<id>/modifyRegistration call respectively. See https://guides.dataverse.org/en/6.4/admin/dataverses-datasets.html#send-dataset-metadata-to-pid-provider. Please reach out with any questions.
PIDs can also be updated by a superuser on a per-dataset basis using
curl -X POST -H 'X-Dataverse-key:<key>' http://localhost:8080/api/datasets/<id>/modifyRegistrationMetadata
Additional Upgrade Steps
11. If there are broken thumbnails
To restore any broken thumbnails caused by the bug described above, you can call the http://localhost:8080/api/admin/clearThumbnailFailureFlag API, which will attempt to clear the flag on all files (regardless of whether caused by this bug or some other problem with the file) or the http://localhost:8080/api/admin/clearThumbnailFailureFlag/$FILE_ID to clear the flag for individual files. Calling the former, batch API is recommended.
12. PermaLinks with custom base-url
If you currently use PermaLinks with a custom base-url: You must manually append /citation?persistentId= to the base URL to maintain functionality.
If you use a PermaLinks without a configured base-url, no changes are required.
- Java
Published by ofahimIQSS over 1 year ago
dataverse - v6.3
Dataverse 6.3
Summary
- New Contributor Guide. The UX Working Group released a new Dataverse Contributor Guide.
- Search Performance Improvements. Solr indexing and searching were improved, speeding up performance. Larger installations take note.
- Dataverse Now Supports File-level Retention Periods. See the Retention Periods section of the guide for details.
- API Optimizations for Large Datasets. Search API and permission checking have been improved for datasets with thousands of files.
- Improved Controlled Vocabulary Support. Improvements include updates to the citation metadata block's Language field and multiple extensions added to the external vocabulary mechanism.
- Improved Detection of RO-Crate Files. Dataverse now detects mime-types based on filename extensions and detects RO-Crate metadata files.
- Sitemap Now Supports More Than 50K Items. Dataverse can now handle more than 50,000 items when generating sitemap files. For details, see the sitemap section of the Installation Guide.
- Infrastructure Updates. Payara and Solr have been updated.
Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.3 rather than the list of releases, which will cut them off.
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Table of Contents
- Release Highlights
- Features
- Bug Fixes
- API
- Settings
- Complete List of Changes
- Getting Help
- Upgrade instructions
Release Highlights
Solr Search and Indexing Improvements
Multiple improvements have been made to the way Solr indexing and searching is done. Response times should be significantly improved.
Two experimental features flag called "add-publicobject-solr-field" and "avoid-expensive-solr-join" have been added to change how Solr documents are indexed for public objects and how Solr queries are constructed to accommodate access to restricted content (drafts, etc.). It is hoped that it will help with performance, especially on large instances and under load.
Before the search feature flag ("avoid-expensive...") can be turned on, the indexing flag must be enabled, and a full reindex performed. Otherwise publicly available objects are NOT going to be shown in search results.
A feature flag called "reduce-solr-deletes" has been added to improve how datafiles are indexed. When the flag is enabled, Dataverse will avoid pre-emptively deleting existing Solr documents for the files prior to sending updated information. This should improve performance and will allow additional optimizations going forward.
The /api/admin/index/status and /api/admin/index/clear-orphans calls (see https://guides.dataverse.org/en/latest/admin/solr-search-index.html#index-and-database-consistency) will now find and remove (respectively) additional permissions related Solr documents that were not being detected before. Reducing the overall number of documents will improve Solr performance and large sites may wish to periodically call the "clear-orphans" API.
Dataverse now relies on the autoCommit and autoSoftCommit settings in the Solr configuration instead of explicitly committing documents to the Solr index. This improves indexing speed.
See also #10554, #10654, and #10579.
File Retention Period
Dataverse now supports file-level retention periods. The ability to set retention periods, with a minimum duration (in months), can be configured by a Dataverse installation administrator. For more information, see the Retention Periods section of the User Guide.
Users can configure a specific retention period, defined by an end date and a short reason, on a set of selected files or an individual file, by selecting the "Retention Period" menu item and entering information in a popup dialog. Retention periods can only be set, changed, or removed before a file has been published. After publication, only Dataverse installation administrators can make changes, using an API.
After the retention period expires, files can not be previewed or downloaded (as if restricted, with no option to allow access requests). The file (landing) page and all the metadata remains available.
Features
Large Datasets Improvements
For scenarios involving API calls related to large datasets (numerous files, for example: ~10k) the following have been been optimized:
- The Search API endpoint.
- The permission checking logic present in PermissionServiceBean.
See also #10415.
Improved Controlled Vocabulary for Citation Block
The Controlled Vocabuary Values list for the "Language" metadata field in the citation block has been improved, with some missing two- and three-letter ISO 639 codes added, as well as more alternative names for some of the languages, making all these extra language identifiers importable. See also #8243.
Updates on Support for External Vocabulary Services
Multiple extensions of the external vocabulary mechanism have been added. These extensions allow interaction with services based on the Ontoportal software and are expected to be generally useful for other service types.
These changes include:
Improved Indexing with Compound Fields: When using an external vocabulary service with compound fields, you can now specify which field(s) will include additional indexed information, such as translations of an entry into other languages. This is done by adding the
indexIninretrieval-filtering. See also #10505 and GDCC/dataverse-external-vocab-support documentation.Broader Support for Indexing Service Responses: Indexing of the results from
retrieval-filteringresponses can now handle additional formats including JSON arrays of strings and values from arbitrary keys within a JSON Object. See #10505.HTTP Headers: You are now able to add HTTP request headers required by the service you are implementing. See #10331.
Flexible params in retrievalUri: You can now use
managed-fieldsfield names as well as theterm-uri-fieldfield name as parameters in theretrieval-uriwhen configuring an external vocabulary service.{0}as an alternative to using theterm-uri-fieldname is still supported for backward compatibility. Also you can specify if the value must be url encoded withencodeUrl:. See #10404.
For example : "retrieval-uri": "https://data.agroportal.lirmm.fr/ontologies/{keywordVocabulary}/classes/{encodeUrl:keywordermURL}"
- Hidden HTML Fields External controlled vocabulary scripts, configured via :CVocConf, can now access the values of managed fields as well as the term-uri-field for use in constructing the metadata view for a dataset. These values are now added as hidden elements in the HTML and can be found with the HTML attribute
data-cvoc-metadata-name. See also #10503.
A Contributor Guide is now available
A new Contributor Guide has been added by the UX Working Group (#10531 and #10532).
URL Validation Is More Permissive
URL validation now allows two slashes in the path component of the URL.
Among other things, this allows metadata fields of url type to be filled with more complex url such as https://archive.softwareheritage.org/browse/directory/561bfe6698ca9e58b552b4eb4e56132cac41c6f9/?origin_url=https://github.com/gem-pasteur/macsyfinder&revision=868637fce184865d8e0436338af66a2648e8f6e1&snapshot=1bde3cb370766b10132c4e004c7cb377979928d1
See also #9750 and #9739
Improved Detection of RO-Crate Files
Detection of mime-types based on a filename with extension and detection of the RO-Crate metadata files.
From now on, filenames with extensions can be added into MimeTypeDetectionByFileName.properties file. Filenames added there will take precedence over simply recognizing files by extensions. For example, two new filenames are added into that file:
ro-crate-metadata.json=application/ld+json; profile="http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted https://w3id.org/ro/crate"
ro-crate-metadata.jsonld=application/ld+json; profile="http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted https://w3id.org/ro/crate"
Therefore, files named ro-crate-metadata.json will be then detected as RO-Crated metadata files from now on, instead as generic JSON files.
For more information on the RO-Crate specifications, see https://www.researchobject.org/ro-crate
See also #10015.
New S3 Tagging Configuration Option
If your S3 store does not support tagging and gives an error if you configure direct upload, you can disable the tagging by using the dataverse.files.<id>.disable-tagging JVM option. For more details, see the section on S3 tags in the guides, #10022 and #10029.
Feature Flag To Remove the Required "Reason" Field in the "Return to Author" Dialog
A reason field, that is required to not be empty, was added to the "Return to Author" dialog in v6.2. Installations that handle author communications through email or another system may prefer to not be required to use this new field. v6.3 includes a new disable-return-to-author-reason feature flag that can be enabled to drop the reason field from the dialog and make sending a reason optional in the api/datasets/{id}/returnToAuthor call. See also #10655.
Improved Use of Dataverse Thumbnail
Dataverse will use the dataset thumbnail, if one is defined, rather than the generic Dataverse logo in the Open Graph metadata header. This means the image will be seen when, for example, the dataset is referenced on Facebook. See also #5621.
Improved Email Notifications When Guestbook is Used for File Access Requests
Multiple improvements to guestbook response emails making it easier to organize and process them. The subject line of the notification email now includes the name and user identifier of the requestor. Additionally, the body of the email now includes the user id of the requestor. Finally the guestbook responses have been sorted and spaced to improve readability. See also #10581.
New keywordTermURI Metadata Field in the Citation Metadata Block
A new metadata field - keywordTermURI, has been added in the citation metadata block (as a fourth child field under the keyword parent field). This has been done to improve usability and to facilitate the integration of controlled vocabulary services, adding the possibility of saving the "term" and/or its associated URI. For more information, see #10288 and PR #10371.
Updated Computational Workflow Metadata Block
The optional computational workflow metadata block has been updated to present a clickable link for the External Code Repository URL field. See also #10339.
Metadata Source Facet Added
An option has been added to index the name of the harvesting client as the "Metadata Source" of harvested datasets and files; if enabled, the Metadata Source facet will show separate entries for the content harvested from different sources, instead of the current, default behavior where there is one "Harvested" facet for all such content.
Tho enable this feature, set the optional feature flage (jvm option) dataverse.feature.index-harvested-metadata-source=true before reindexing.
See also #10611 and #10651.
Additional Facet Settings
Extra settings have been added giving an instance admin more choices in selectively limiting the availability of search facets on the collection and dataset pages.
See Disable Solr Facets under the configuration section of the Installation Guide for more info as well as #10570.
Sitemap Now Supports More Than 50k Items
Dataverse can now handle more than 50,000 items when generating sitemap files, splitting the content across multiple files to comply with the Sitemap protocol. For details, see the sitemap section of the Installation Guide. See also #8936 and #10321.
MIT and Apache 2.0 Licenses Added
New files have been added to import the MIT and Apache 2.0 Licenses to Dataverse:
- licenseMIT.json
- licenseApache-2.0.json
Guidance has been added to the guides to explain the procedure for adding new licenses to Dataverse.
See also #10425.
3D Viewer by Open Forest Data
3DViewer by openforestdata.pl has been added to the list of external tools. See also #10561.
Datalad Integration With Dataverse
DataLad has been integrated with Dataverse. For more information, see the integrations section of the guides. See also #10468.
Rsync Support Has Been Deprecated
Support for rsync has been deprecated. Information has been removed from the guides for rsync and related software such as Data Capture Module (DCM) and Repository Storage Abstraction Layer (RSAL). You can still find this information in older versions of the guides. See Settings, below, for deprecated settings. See also #8985.
Bug Fixes
OpenAPI Re-Enabled
In Dataverse 6.0 when Payara was updated it caused the url /openapi to stop working:
- https://github.com/IQSS/dataverse/issues/9981
- https://github.com/payara/Payara/issues/6369
In addition to fixing the /openapi URL, we are also making some changes on how we provide the OpenAPI document:
When it worked in Dataverse 5.x, the /openapi output was generated automatically by Payara, but in this release we have switched to OpenAPI output produced by the SmallRye OpenAPI plugin. This gives us finer control over the output.
For more information, see the section on OpenAPI in the API Guide and #10328.
Re-Addition of "Cell Counting" to Life Sciences Block
In the Life Sciences metadata block under the "Measurement Type" field the value cell counting was accidentally removed in v5.1. It has been restored. See also #8655 and #9735.
Math Challenge Fixed on 403 Error Page
On the "forbidden" (403) error page, the math challenge now correctly displays so that the contact form can be submitted. See also #10466.
Ingest Option Bug Fixed
A bug that prevented the "Ingest" option in the file page "Edit File" menu from working has been fixed. See also #10568.
Incomplete Metadata Bug Fix
A bug was fixed where the incomplete metadata label was being shown for published dataset with incomplete metadata in certain scenarios. This label will now be shown for draft versions of such datasets and published datasets that the user can edit. This label can also be made invisible for published datasets (regardless of edit rights) with the new option dataverse.ui.show-validity-label-when-published set to false. See also #10116.
Identical Role Error Message
An error is now correctly reported when an attempt is made to assign an identical role to the same collection, dataset, or file. See also #9729 and #10465.
API
Superuser Endpoint
The existing API endpoint for toggling the superuser status of a user has been deprecated in favor of a new API endpoint that allows you to explicitly and idempotently set the status as true or false. For details, see the API Guide, #9887 and #10440.
New Featured Collections Endpoints
New API endpoints have been added to allow you to add or remove featured collections from a collection.
See also the sections on listing, setting, and removing featured collections in the API Guide, #10242 and #10459.
Dataset Version Endpoint Extended
The API endpoint for getting the Dataset version has been extended to include latestVersionPublishingStatus. See also #10330.
New Optional Query Parameters for Metadatablocks Endpoints
New optional query parameters have been added to api/metadatablocks and api/dataverses/{id}/metadatablocks endpoints:
returnDatasetFieldTypes: Whether or not to return the dataset field types present in each metadata block. If not set, the default value is false.- Setting the query parameter
onlyDisplayedOnCreate=truealso returns metadata blocks with dataset field type input levels configured as required on the General Information page of the collection, in addition to the metadata blocks and their fields with the propertydisplayOnCreate=true.
See also #10389
Dataverse Payload Includes Release Status
The Dataverse object returned by /api/dataverses has been extended to include "isReleased": {boolean}. See also #10491.
New Field Type Input Level Endpoint
A new endpoint api/dataverses/{id}/inputLevels has been created for updating the dataset field type input levels of a collection via API. See also #10477.
Banner Message Endpoint Extended
The endpoint api/admin/bannerMessage has been extended so the ID is returned when created. See also #10565.
Settings
Database Settings:
New:
- :DisableSolrFacets
Deprecated (used with rsync):
- :DataCaptureModuleUrl
- :DownloadMethods
- :LocalDataAccessPath
- :RepositoryStorageAbstractionLayerUrl
New Configuration Options
dataverse.files.<id>.disable-taggingdataverse.feature.add-publicobject-solr-fielddataverse.feature.avoid-expensive-solr-joindataverse.feature.reduce-solr-deletesdataverse.feature.disable-return-to-author-reasondataverse.feature.index-harvested-metadata-sourcedataverse.ui.show-validity-label-when-published
Complete List of Changes
For the complete list of code changes in this release, see the 6.3 Milestone in GitHub.
Getting Help
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan accordingly, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.2.
0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only supported way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands, we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.
export PAYARA=/usr/local/payara6
(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin undeploy dataverse-6.2
2. Stop Payara and remove the following directories:
shell
service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated
rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache
rm -rf $PAYARA/glassfish/domains/domain1/lib/databases
3. Upgrade Payara to v6.2024.6
With this version of Dataverse, we encourage you to upgrade to version 6.2024.6. This will address security issues accumulated since the release of 6.2023.8.
Note that if you are using GDCC containers, this upgrade is included when pulling new release images. No manual intervention is necessary.
The steps below are a simple matter of reusing your existing domain directory with the new distribution. But we recommend that you review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara Release Notes. We also recommend you ensure you followed all update instructions from the past releases regarding Payara. (The latest Payara update was for v6.0.)
Move the current Payara directory out of the way:
shell
mv $PAYARA $PAYARA.2023.8
Download the new Payara version 6.2024.6 (from https://www.payara.fish/downloads/payara-platform-community-edition/), and unzip it in its place:
shell
wget https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2024.6/payara-6.2024.6.zip
shell
cd /usr/local
unzip payara-6.2024.6.zip
Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1:
shell
mv payara6/glassfish/domains/domain1 payara6/glassfish/domains/domain1_DIST
mv payara6.2023.8/glassfish/domains/domain1 payara6/glassfish/domains/
Make sure that you have the following --add-opens options in your payara6/glassfish/domains/domain1/config/domain.xml. If not present, add them:
<jvm-options>--add-opens=java.management/javax.management=ALL-UNNAMED</jvm-options>
<jvm-options>--add-opens=java.management/javax.management.openmbean=ALL-UNNAMED</jvm-options>
<jvm-options>[17|]--add-opens=java.base/java.io=ALL-UNNAMED</jvm-options>
<jvm-options>[21|]--add-opens=java.base/jdk.internal.misc=ALL-UNNAMED</jvm-options>
(Note that you likely already have the java.base/java.io option there, but without the [17|] prefix. Make sure to replace it with the version above)
Start Payara:
shell
sudo service payara start
4. Deploy this version.
shell
$PAYARA/bin/asadmin deploy dataverse-6.3.war
5. For installations with internationalization:
- Please remember to update translations via Dataverse language packs.
6. Restart Payara
shell
service payara stop
service payara start
7. Update the following metadata blocks to reflect the incremental improvements made to the handling of core metadata fields:
```shell wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/scripts/api/data/metadatablocks/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/scripts/api/data/metadatablocks/biomedical.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file biomedical.tsv
``` 7a. If you are using the optional computational workflow metadata block, update it:
```shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/scripts/api/data/metadatablocks/computational_workflow.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file computational_workflow.tsv
```
8. Upgrade Solr
Solr 9.4.1 is now the version recommended in our Installation Guide and used with automated testing. There is a known security issue in the previously recommended version 9.3.0: https://nvd.nist.gov/vuln/detail/CVE-2023-36478. While the risk of an exploit should not be significant unless the Solr instance is accessible from outside networks (which we have always recommended against), we recommend to upgrade.
Install Solr 9.4.1 following the instructions from the Installation Guide.
The instructions in the guide suggest to use the config files from the installer zip bundle. Upgrading an existing instance, it may be easier to download them from the source tree:
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/conf/solr/solrconfig.xml
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/conf/solr/schema.xml
cp solrconfig.xml schema.xml /usr/local/solr/solr-9.4.1/server/solr/collection1/conf
8a. For installations with custom or experimental metadata blocks:
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide).Run the
update-fields.shscript that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your Solr installation):
shell
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.3/conf/solr/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.4.1/server/solr/collection1/conf/schema.xml
- Start Solr instance (usually
service solr startdepending on Solr/OS).
9. Enable the Metadata Source facet for harvested content (Optional):
If you choose to enable this new feature, set the optional feature flag (jvm option) dataverse.feature.index-harvested-metadata-source=true before reindexing.
10. Reindex Solr, if you upgraded Solr (recommended), or chose to enable any options that require a reindex:
shell
curl http://localhost:8080/api/admin/index
Note: if you choose to perform a migration of your keywordValue metadata fields (section below), that will require a reindex as well, so do that first.
Notes for Dataverse Installation Administrators
Data migration to the new keywordTermURI field
You can migrate your keywordValue data containing URIs to the new keywordTermURI field.
In case of data migration, view the affected data with the following database query:
sql
SELECT value FROM datasetfieldvalue dfv
INNER JOIN datasetfield df ON df.id = dfv.datasetfield_id
WHERE df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordValue')
AND value ILIKE 'http%';
If you wish to migrate your data, a database update is then necessary:
sql
UPDATE datasetfield df
SET datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordTermURI')
FROM datasetfieldvalue dfv
WHERE dfv.datasetfield_id = df.id
AND df.datasetfieldtype_id = (SELECT id FROM datasetfieldtype WHERE name = 'keywordValue')
AND dfv.value ILIKE 'http%';
A reindex in place will be required. ReExportAll will need to be run to update the metadata exports of the dataset. Follow the directions in the Admin Guide.
- Java
Published by pdurbin almost 2 years ago
dataverse - v6.2
Dataverse 6.2
Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.2 rather than the list of releases, which will cut them off.
This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Table of Contents
- 💡 Release Highlights
- 🪲 Bug fixes
- 💾 Persistence
- 🌐 API
- ⚠️ Backward Incompatibilities
- 📖 Guides
- ⚙️ New Settings
- 📋 Complete List of Changes
- 🛟 Getting Help
- 💻 Upgrade instructions
💡Release Highlights
Search and Facet by License
License have been added to the search facets in the search side panel to filter datasets by license (e.g. CC0).
Datasets with Custom Terms are aggregated under the "Custom Terms" value of this facet. See the Licensing section of the guide for more details on configured Licenses and Custom Terms.
For more information, see #9060.
Licenses can also be used to filter the Search API results using the fq parameter, for example : /api/search?q=*&fq=license%3A%22CC0+1.0%22 for CC0 1.0, see the Search API guide for more examples.
For more information, see #10204.
When Returning Datasets to Authors, Reviewers Can Add a Note to the Author
The Popup for returning to author now allows to type in a message to explain the reasons of return and potential edits needed, that will be sent by email to the author.
Please note that this note is mandatory, but that you can still type a creative and meaningful comment such as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation.
For more information, see #10137.
Support for Using Multiple PID Providers
This release adds support for using multiple PID (DOI, Handle, PermaLink) providers, multiple PID provider accounts (managing a given protocol, authority, separator, shoulder combination), assigning PID provider accounts to specific collections, and supporting transferred PIDs (where a PID is managed by an account when its authority, separator, and/or shoulder don't match the combination where the account can mint new PIDs). It also adds the ability for additional provider services beyond the existing DataCite, EZId, Handle, and PermaLink providers to be dynamically added as separate jar files.
These changes require per-provider settings rather than the global PID settings previously supported. While backward compatibility for installations using a single PID Provider account is provided, updating to use the new microprofile settings is highly recommended and will be required in a future version.
For more information check the PID settings on this link.
Rate Limiting
The option to rate limit has been added to prevent users from over taxing the system either deliberately or by runaway automated processes. Rate limiting can be configured on a tier level with tier 0 being reserved for guest users and tiers 1-any for authenticated users. Superuser accounts are exempt from rate limiting.
Rate limits can be imposed on command APIs by configuring the tier, the command, and the hourly limit in the database. Two database settings configure the rate limiting :RateLimitingDefaultCapacityTiers and RateLimitingCapacityByTierAndAction, If either of these settings exist in the database rate limiting will be enabled and If neither setting exists rate limiting is disabled.
For more details check the detailed guide on this link.
Simplified SMTP Configuration
With this release, we deprecate the usage of asadmin create-javamail-resource to configure Dataverse to send mail using your SMTP server and provide a simplified, standard alternative using JVM options or MicroProfile Config.
At this point, no action is required if you want to keep your current configuration. Warnings will show in your server logs to inform and remind you about the deprecation. A future major release of Dataverse may remove this way of configuration.
Please do take the opportunity to update your SMTP configuration. Details can be found in section of the Installation Guide starting with the SMTP/Email Configuration section of the Installation Guide.
Once reconfiguration is complete, you should remove legacy, unused config. First, run asadmin delete-javamail-resource mail/notifyMailSession as described in the 6.2 guides. Then run curl -X DELETE http://localhost:8080/api/admin/settings/:SystemEmail as this database setting has been replace with dataverse.mail.system-email as described below.
Please note: as there have been problems with email delivered to SPAM folders when the "From" within mail envelope and the mail session configuration didn't match (#4210), as of this version the sole source for the "From" address is the setting dataverse.mail.system-email once you migrate to the new way of configuration.
Binder Redirect
If your installation is configured to use Binder, you should remove the old "girder_ythub" tool and replace it with the tool described at https://github.com/IQSS/dataverse-binder-redirect
For more information, see #10360.
Optional Croissant 🥐 Exporter Support
When a Dataverse installation is configured to use a metadata exporter for the Croissant format, the content of the JSON-LD in the <head> of dataset landing pages will be replaced with that format. However, both JSON-LD and Croissant will still be available for download from the dataset page and API.
For more information, see #10382.
Harvesting Handle Missing Controlled Values
Allows datasets to be harvested with Controlled Vocabulary Values that existed in the originating Dataverse installation but are not in the harvesting Dataverse installation. For more information, view the changes to the endpoint here.
Add .QPJ and .QMD Extensions to Shapefile Handling
Support for .qpj and .qmd files in shapefile uploads has been introduced, ensuring that these files are properly recognized and handled as part of geospatial datasets in Dataverse.
For more information, see #10305.
Ingested Tabular Data Files Can Be Stored Without the Variable Name Header
Tabular Data Ingest can now save the generated archival files with the list of variable names added as the first tab-delimited line.
Access API will be able to take advantage of Direct Download for .tab files saved with these headers on S3 - since they no longer have to be generated and added to the streamed content on the fly.
This behavior is controlled by the new setting :StoreIngestedTabularFilesWithVarHeaders. It is false by default, preserving the legacy behavior. When enabled, Dataverse will be able to handle both the newly ingested files, and any already-existing legacy files stored without these headers transparently to the user. E.g. the access API will continue delivering tab-delimited files with this header line, whether it needs to add it dynamically for the legacy files, or reading complete files directly from storage for the ones stored with it.
We are planning to add an API for converting existing legacy tabular files in a future release.
For more information, see #10282.
Uningest/Reingest Options Available in the File Page Edit Menu
New Uningest/Reingest options are available in the File Page Edit menu. Ingest errors can be cleared by users who can published the associated dataset and by superusers, allowing for a successful ingest to be undone or retried (e.g. after a Dataverse version update or if ingest size limits are changed).
The /api/files/
For more information, see #10319.
Sphinx Guides Now Support Markdown Format and Tabs
Our guides now support the Markdown format with the extension .md. Additionally, an option to create tabs in the guides using Sphinx Tabs has been added. (You can see the tabs in action in the "dev usage" page of the Container Guide.) To continue building the guides, you will need to install this new dependency by re-running:
pip install -r requirements.txt
For more information, see #10111.
Number of Concurrent Indexing Operations Now Configurable
A new MicroProfile setting called dataverse.solr.concurrency.max-async-indexes has been added that controls the maximum number of simultaneously running asynchronous dataset index operations (defaults to 4).
For more information, see #10388.
🪲 Bug fixes
Publication Status Facet Restored
In version 6.1, the publication status facet location was unintentionally moved to the bottom. In this version, we have restored the original order.
Assign a Role With Higher Permissions Than Its Own Role Has Been Fixed
The permissions required to assign a role have been fixed. It is no longer possible to assign a role that includes permissions that the assigning user doesn't have.
Geospatial Metadata Block Fields for North and South Renamed
The Geospatial metadata block fields for north and south were labeled incorrectly as longitudes, as reported in #5645. After updating to this version of Dataverse, users will need to update any API client code used "northLongitude" and "southLongitude" to "northLatitude" and "southLatitude", respectively, as mentioned on the mailing list. Also, we have updated the tooltips in the Geospatial metadata block, where the use of commas instead of dots in coordinate values was incorrectly suggested.
OAI-PMH Error Handling Has Been Improved
OAI-PMH error handling has been improved to display a machine-readable error in XML rather than a 500 error with no further information.
- /oai?foo=bar will show "No argument 'verb' found"
- /oai?verb=foo&verb=bar will show "Verb must be singular, given: '[foo, bar]'"
Granting File Access Without Access Request
A bug introduced with the guestbook-at-request, requests are not deleted when granted, they are now given the state granted.
Harvesting redirects fixed
Redirects from search cards back to the original source for datasets harvested from "Generic OAI Archives", i.e. non-Dataverse OAI servers, have been fixed.
💾 Persistence
Missing Database Constraints
This release adds two missing database constraints that will assure that the externalvocabularyvalue table only has one entry for each uri and that the oaiset table only has one set for each spec. (In the very unlikely case that your existing database has duplicate entries now, install would fail. This can be checked by running the following commands:
SELECT uri, count(*) FROM externalvocabularyvalue group by uri;
And:
SELECT spec, count(*) FROM oaiset group by spec;
Then removing any duplicate rows (where count>1).
Universe Field in Variablemetadata Table Changed
Universe field in variablemetadata table was changed from varchar(255) to text. The change was made to support longer strings in "universe" metadata field, similar to the rest of text fields in variablemetadata table.
PostgreSQL Versions
This release adds install script support for the new permissions model in PostgreSQL versions 15+, and bumps Flyway to support PostgreSQL 16.
PostgreSQL 13 remains the version used with automated testing.
🌐 API
Listing Collection/Dataverse API
Listing collection/dataverse role assignments via API still requires ManageDataversePermissions, but listing dataset role assignments via API now requires only ManageDatasetPermissions.
New API Endpoint for Clearing an Individual Dataset From Solr
A new Index API endpoint has been added allowing an admin to clear an individual dataset from Solr.
For more information visit the documentation on this link
New Accounts Metrics API
Users can retrieve new types of metrics related to user accounts. The new capabilities are described in the guides.
New canDownloadAtLeastOneFile Endpoint
The /api/datasets/{id}/versions/{versionId}/canDownloadAtLeastOneFile endpoint has been created.
This API endpoint indicates if the calling user can download at least one file from a dataset version. Note that Shibboleth group permissions are not considered.
Harvesting Client Endpoint Extended
The API endpoint api/harvest/clients/{harvestingClientNickname} has been extended to include the following fields:
- allowHarvestingMissingCVV: enable/disable allowing datasets to be harvested with controlled vocabulary values that exist in the originating Dataverse server but are not present in the harvesting Dataverse server. The default is false.
Note: This setting is only available to the API and not currently accessible/settable via the UI.
Version Files Endpoint Extended
The response for getVersionFiles /api/datasets/{id}/versions/{versionId}/files endpoint has been modified to include a total count of records available totalCount:x.
This will aid in pagination by allowing the caller to know how many pages can be iterated through. The existing API (getVersionFileCounts) to return the count will still be available.
Metadata Blocks Endpoint Extended
The API endpoint /api/metadatablocks/{block_id} has been extended to include the following fields:
- isRequired: Whether or not this field is required
- displayOrder: The display order of the field in create/edit forms
- typeClass: The type class of this field ("controlledVocabulary", "compound", or "primitive")
Get File Citation as JSON
It is now possible to retrieve via API the file citation as it appears on the file landing page. It is formatted in HTML and encoded in JSON.
This API is not for downloading various citation formats such as EndNote XML, RIS, or BibTeX.
For more information check the documentation on this link
Files Endpoint Extended
The API endpoint api/files/{id} has been extended to support the following optional query parameters:
- includeDeaccessioned: Indicates whether or not to consider deaccessioned dataset versions in the latest file search. (Default:
false). - returnDatasetVersion: Indicates whether or not to include the dataset version of the file in the response. (Default:
false).
A new endpoint api/files/{id}/versions/{datasetVersionId} has been created. This endpoint returns the file metadata present in the requested dataset version. To specify the dataset version, you can use :latest-published, :latest, :draft or 1.0 or any other available version identifier.
The endpoint supports the includeDeaccessioned and returnDatasetVersion optional query parameters, as does the api/files/{id} endpoint.
api/files/{id}/draft endpoint is no longer available in favor of the new endpoint api/files/{id}/versions/{datasetVersionId}, which can use the version identifier :draft (api/files/{id}/versions/:draft) to obtain the same result.
Datasets, Dataverse Collections, and Datafiles Endpoints Extended
The API endpoints for getting datasets, Dataverse collections, and datafiles have been extended to support the following optional 'returnOwners' query parameter.
Including the parameter and setting it to true will add a hierarchy showing which dataset and dataverse collection(s) the object is part of to the json object returned.
For more information visit the full native API guide on this link
Endpoint Fixed: Datasets Metadata
The API endpoint api/datasets/{id}/metadata has been changed to default to the latest version of the dataset to which the user has access.
Experimental Make Data Count processingState API
An experimental Make Data Count processingState API has been added. For now it has been documented in the (developer guide)[https://guides.dataverse.org/en/6.2/developers/make-data-count.html#processing-archived-logs].
⚠️ Backward Incompatibilities
To view a list of changes that can be impactful to your implementation please visit our detailed list of changes to the API.
📖 Guides
Container Guide, Documentation for Faster Redeploy
In the Container Guide, documentation for developers on how to quickly redeploy code has been added for Netbeans and improved for IntelliJ.
Also in the context of containers, a new option to skip deployment has been added and the war file is now consistently named "dataverse.war" rather than having a version in the filename, such as "dataverse-6.1.war". This predictability makes tooling easier.
Evaluation Version Tutorial on the Containers Guide
The Container Guide now containers a tutorial for running Dataverse in containers for demo or evaluation purposes: https://guides.dataverse.org/en/6.2/container/running/demo.html
New QA Guide
A new QA Guide is intended mostly for the core development team but may be of interest to contributors on: https://guides.dataverse.org/en/6.2/develop/qa
⚙️ New Settings
MicroProfile Settings
The * indicates a provider id indicating which provider the setting is for
- dataverse.pid.providers
- dataverse.pid.default-provider
- dataverse.pid.*.type
- dataverse.pid.*.label
- dataverse.pid.*.authority
- dataverse.pid.*.shoulder
- dataverse.pid.*.identifier-generation-style
- dataverse.pid.*.datafile-pid-format
- dataverse.pid.*.managed-list
- dataverse.pid.*.excluded-list
- dataverse.pid.*.datacite.mds-api-url
- dataverse.pid.*.datacite.rest-api-url
- dataverse.pid.*.datacite.username
- dataverse.pid.*.datacite.password
- dataverse.pid.*.ezid.api-url
- dataverse.pid.*.ezid.username
- dataverse.pid.*.ezid.password
- dataverse.pid.*.permalink.base-url
- dataverse.pid.*.permalink.separator
- dataverse.pid.*.handlenet.index
- dataverse.pid.*.handlenet.independent-service
- dataverse.pid.*.handlenet.auth-handle
- dataverse.pid.*.handlenet.key.path
- dataverse.pid.*.handlenet.key.passphrase
- dataverse.spi.pidproviders.directory
- dataverse.solr.concurrency.max-async-indexes
SMTP Settings:
- dataverse.mail.system-email
- dataverse.mail.mta.host
- dataverse.mail.mta.port
- dataverse.mail.mta.ssl.enable
- dataverse.mail.mta.auth
- dataverse.mail.mta.user
- dataverse.mail.mta.password
- dataverse.mail.mta.allow-utf8-addresses
- Plus many more for advanced usage and special provider requirements. See configuration guide for a full list.
Database Settings:
- :RateLimitingDefaultCapacityTiers
- :RateLimitingCapacityByTierAndAction
- :StoreIngestedTabularFilesWithVarHeaders
📋 Complete List of Changes
For the complete list of code changes in this release, see the 6.2 Milestone in GitHub.
🛟 Getting Help
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
💻 Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.1.
0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.
export PAYARA=/usr/local/payara6
(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)
1. Usually, when a Solr schema update is released, we recommend deploying the new version of Dataverse, then updating the schema.xml on the solr side. With 6.2, we recommend to install the base schema first. Without it Dataverse 6.2 is not going to be able to show any results after the initial deployment. If your instance is using any custom metadata blocks, you will need to further modify the schema, see the last step of this instruction (step 8).
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide)Replace schema.xml
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.2/conf/solr/9.3.0/schema.xmlcp schema.xml /usr/local/solr/solr-9.3.0/server/solr/collection1/conf
Start Solr instance (usually
service solr start, depending on Solr/OS)
2. Undeploy the previous version.
$PAYARA/bin/asadmin undeploy dataverse-6.1
3. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
4. Start Payara
service payara start
5. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-6.2.war
The deployment of the war file may take some time on a large production database due to the database migration scripts that are part of the release.
6. Restart Payara
service payara stopservice payara start
7. Update the following Metadata Blocks to reflect the incremental improvements made to the handling of core metadata fields:
``` wget https://github.com/IQSS/dataverse/releases/download/v6.2/geospatial.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file geospatial.tsv
wget https://github.com/IQSS/dataverse/releases/download/v6.2/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv
wget https://github.com/IQSS/dataverse/releases/download/v6.2/astrophysics.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file astrophysics.tsv
wget https://github.com/IQSS/dataverse/releases/download/v6.2/biomedical.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file biomedical.tsv ```
8. For installations with custom or experimental metadata blocks:
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide)Run the
update-fields.shscript that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your solr installation):wget https://raw.githubusercontent.com/IQSS/dataverse/v6.2/conf/solr/9.3.0/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.3.0/server/solr/collection1/conf/schema.xmlRestart Solr instance (usually
service solr restartdepending on solr/OS)
9. Reindex Solr:
For details, see https://guides.dataverse.org/en/6.2/admin/solr-search-index.html but here is the reindex command:
curl http://localhost:8080/api/admin/index
⬆️
- Java
Published by landreev about 2 years ago
dataverse - v6.1
Dataverse 6.1
Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.1 rather than the list of releases, which will cut them off.
This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release highlights
Guestbook at request
Dataverse can now be configured (via the dataverse.files.guestbook-at-request option) to display any configured guestbook to users when they request restricted files (new functionality) or when they download files (previous behavior).
The global default defined by this setting can be overridden at the collection level on the collection page and at the individual dataset level by a superuser using the API. The default, showing guestbooks when files are downloaded, remains as it was in prior Dataverse versions.
For details, see dataverse.files.guestbook-at-request and PR #9599.
Collection-level storage quotas
This release adds support for defining storage size quotas for collections. Please see the API guide for details. This is an experimental feature that has not yet been used in production on any real life Dataverse instance, but we are planning to try it out at Harvard/IQSS.
Please note that this release includes a database update (via a Flyway script) that will calculate the storage sizes of all the existing datasets and collections on the first deployment. On a large production database with tens of thousands of datasets this may add a couple of extra minutes to the first, initial deployment of Dataverse 6.1.
For details, see Storage Quotas for Collections in the Admin Guide.
Globus support (experimental), continued
Globus support in Dataverse has been expanded to include support for using file-based Globus endpoints, including the case where files are stored on tape and are not immediately accessible and for the case of referencing files stored on remote Globus endpoints. Support for using the Globus S3 Connector with an S3 store has been retained but requires changes to the Dataverse configuration. Please note:
- Globus functionality remains experimental/advanced in that it requires significant setup, differs in multiple ways from other file storage mechanisms, and may continue to evolve with the potential for backward incompatibilities.
- The functionality is configured per store and replaces the previous single-S3-Connector-per-Dataverse-instance model.
- Adding files to a dataset, and accessing files is supported via the Dataverse user interface through a separate dataverse-globus app.
- The functionality is also accessible via APIs (combining calls to the Dataverse and Globus APIs)
Backward incompatibilities: - The configuration for use of a Globus S3 Connector has changed and is aligned with the standard store configuration mechanism - The new functionality is incompatible with older versions of the globus-dataverse app and the Globus-related functionality in the UI will only function correctly if a Dataverse 6.1 compatible version of the dataverse-globus app is configured.
New JVM options: - A new "globus" store type and associated store-related options have been added. These are described in the File Storage section of the Installation Guide. - dataverse.files.globus-cache-maxage - specifies the number of minutes Dataverse will wait between an initial request for a file transfer occurs and when that transfer must begin.
Obsolete Settings: the :GlobusBasicToken, :GlobusEndpoint, and :GlobusStores settings are no longer used
Further details can be found in the Big Data Support section of the Developer Guide.
Alternative Title now allows multiple values
Alternative Title now allows multiples. Note that JSON used to create a dataset with an Alternate Title must be changed. See "Backward incompatibilities" below and PR #9440 for details.
External tools: configure tools now available at the dataset level
Read/write "configure" tools (a type of external tool) are now available at the dataset level. They appear under the "Edit Dataset" menu. See External Tools in the Admin Guide and PR #9925.
S3 out-of-band upload
In some situations, direct upload might not work from the UI, e.g., when s3 storage is not accessible from the internet. This pull request adds an option to allow direct uploads via API only. This way, a third party application can use direct upload from within the internal network, while there is no direct download available to the users via UI. By default, Dataverse supports uploading files via the add a file to a dataset API. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server). With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the Adding the Uploaded file to the Dataset API call (described in the Direct DataFile Upload/Replace API page) used to add metadata and inform Dataverse that a new file has been added to the relevant store.
JSON Schema for datasets
Functionality has been added to help validate dataset JSON prior to dataset creation. There are two new API endpoints in this release. The first takes in a collection alias and returns a custom dataset schema based on the required fields of the collection. The second takes in a collection alias and a dataset JSON file and does an automated validation of the JSON file against the custom schema for the collection. In this release functionality is limited to JSON format validation and validating required elements. Future releases will address field types, controlled vocabulary, etc. See Retrieve a Dataset JSON Schema for a Collection in the API Guide and PR #10109.
OpenID Connect (OIDC) improvements
Using MicroProfile Config for provisioning
With this release it is possible to provision a single OIDC-based authentication provider by using MicroProfile Config instead of or in addition to the classic Admin API provisioning.
If you are using an external OIDC provider component as an identity management system and/or broker to other authentication providers such as Google, eduGain SAML and so on, this might make your life easier during instance setups and reconfiguration. You no longer need to generate the necessary JSON file.
Adding PKCE Support
Some OIDC providers require using PKCE as additional security layer. As of this version, you can enable support for this on any OIDC provider you configure. (Note that OAuth2 providers have not been upgraded.)
For both features, see the OIDC section of the Installation Guide and PR #9273.
Solr improvements
As of this release, application-side support has been added for the "circuit breaker" mechanism in Solr that makes it drop requests more gracefully when the search engine is experiencing load issues.
Please see the Installing Solr section of the Installation Guide.
New release of Dataverse Previewers (including a Markdown previewer)
Version 1.4 of the standard Dataverse Previewers from https://github/com/gdcc/dataverse-previewers is available. The new version supports the use of signedUrls rather than API keys when previewing restricted files (including files in draft dataset versions). Upgrading is highly recommended. Please note:
- SignedUrls can now be used with PrivateUrl access tokens, which allows PrivateUrl users to view previewers that are configured to use SignedUrls. See #10093.
- Launching a dataset-level configuration tool will automatically generate an API token when needed. This is consistent with how other types of tools work. See #10045.
- There is now a Markdown (.md) previewer.
New or improved APIs
The development of a new UI for Dataverse is driving the addition or improvement of many APIs.
New API endpoints
- deaccessionDataset (/api/datasets/{id}/versions/{versionId}/deaccession): version deaccessioning through API (Given a dataset and a version).
- /api/files/{id}/downloadCount
- /api/files/{id}/dataTables
- /api/files/{id}/metadata/tabularTags New endpoint to set tabular file tags.
- canManageFilePermissions (/access/datafile/{id}/userPermissions) Added for getting user permissions on a file.
- getVersionFileCounts (/api/datasets/{id}/versions/{versionId}/files/counts): Given a dataset and its version, retrieves file counts based on different criteria (Total count, per content type, per access status and per category name).
- setFileCategories (/api/files/{id}/metadata/categories): Updates the categories (by name) for an existing file. If the specified categories do not exist, they will be created.
- userFileAccessRequested (/api/access/datafile/{id}/userFileAccessRequested): Returns true or false depending on whether or not the calling user has requested access to a particular file.
- hasBeenDeleted (/api/files/{id}/hasBeenDeleted): Know if a particular file that existed in a previous version of the dataset no longer exists in the latest version.
- getZipDownloadLimit (/api/info/zipDownloadLimit): Get the configured zip file download limit. The response contains the long value of the limit in bytes.
- getMaxEmbargoDurationInMonths (/api/info/settings/:MaxEmbargoDurationInMonths): Get the maximum embargo duration in months, if available, configured through the database setting :MaxEmbargoDurationInMonths.
- getDatasetJsonSchema (/api/dataverses/{id}/datasetSchema): Get a dataset schema with the fields required by a given dataverse collection.
- validateDatasetJsonSchema (/api/dataverses/{id}/validateDatasetJson): Validate that a dataset JSON file is in proper format and contains the required elements and fields for a given dataverse collection.
- downloadTmpFile (/api/admin/downloadTmpFile): For testing purposes, allows files to be downloaded from /tmp.
Pagination of files in dataset versions
- optional pagination has been added to
/api/datasets/{id}/versionsthat may be useful in datasets with a large number of versions - a new flag
includeFilesis added to both/api/datasets/{id}/versionsand/api/datasets/{id}/versions/{vid}(true by default), providing an option to drop the file information from the output - when files are requested to be included, some database lookup optimizations have been added to improve the performance on datasets with large numbers of files.
This is reflected in the Dataset Versions API section of the Guide.
DataFile API payload has been extended to include the following fields
- tabularData: Boolean field to know if the DataFile is of tabular type
- fileAccessRequest: Boolean field to know if the file access requests are enabled on the Dataset (DataFile owner)
- friendlyType: String
The getVersionFiles endpoint (/api/datasets/{id}/versions/{versionId}/files) has been extended to support pagination, ordering, and optional filtering
- Access status: through the
accessStatusquery parameter, which supports the following values:- Public
- Restricted
- EmbargoedThenRestricted
- EmbargoedThenPublic
- Category name: through the
categoryNamequery parameter. To return files to which the particular category has been added. - Content type: through the
contentTypequery parameter. To return files matching the requested content type. For example: "image/png".
Additional improvements to existing API endpoints
- getVersionFiles (/api/datasets/{id}/versions/{versionId}/files): Extended to support optional filtering by search text through the
searchTextquery parameter. The search will be applied to the labels and descriptions of the dataset files. AddedtabularTagNameto return files to which the particular tabular tag has been added. Added optional boolean query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain files. - getVersionFileCounts (/api/datasets/{id}/versions/{versionId}/files/counts): Added optional boolean query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain file counts. Added support for filtering by optional criteria query parameter:
- contentType
- accessStatus
- categoryName
- tabularTagName
- searchText
- getDownloadSize ("api/datasets/{identifier}/versions/{versionId}/downloadsize"): Added optional boolean query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain files. Added a new optional query parameter "mode"
This parameter applies a filter criteria to the operation and supports the following values:
- All (Default): Includes both archival and original sizes for tabular files
- Archival: Includes only the archival size for tabular files
- Original: Includes only the original size for tabular files.
- /api/datasets/{id}/versions/{versionId} New query parameter
includeDeaccessionedadded to consider deaccessioned versions when searching for versions. - /api/datasets/{id}/userPermissions Get user permissions on a dataset, in particular, the user permissions that this API call checks, returned as booleans, are the following:
- Can view the unpublished dataset
- Can edit the dataset
- Can publish the dataset
- Can manage the dataset permissions
- Can delete the dataset draft
- getDatasetVersionCitation (/api/datasets/{id}/versions/{versionId}/citation) endpoint now accepts a new boolean optional query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain the citation.
Improvements for developers
- Developers can enjoy a dramatically faster feedback loop when iterating on code if they are using Netbeans or IntelliJ IDEA Ultimate (with the Payara Platform Tools plugin). For details, see https://guides.dataverse.org/en/6.1/container/dev-usage.html#intellij-idea-ultimate-and-payara-platform-tools and the thread on the mailing list.
- Developers can now test S3 locally by using the Dockerized development environment, which now includes both LocalStack and MinIO. API (end to end) tests are in S3AccessIT.
- In addition, a new integration test class (not an API test, the new Testcontainers-based test launched with
mvn verify) has been added at S3AccessIOLocalstackIT. It uses Testcontainers to spin up Localstack for S3 testing and does not require Dataverse to be running. - With this release, we add a new type of testing to Dataverse: integration tests which are not end-to-end tests (like our API tests). Starting with OIDC authentication support, we test regularly on CI for working condition of both OIDC login options in UI and API.
- The testing and development Keycloak realm has been updated with more users and compatibility with Keycloak 21.
- The support for setting JVM options during testing has been improved for developers. You now may add the
@JvmSettingannotation to classes (also inner classes) and reference factory methods for values. This improvement is also paving the way to enable manipulating JVM options during end-to-end tests on remote ends. - As part of these testing improvements, the code coverage report file for unit tests has moved from
target/jacoco.exectotarget/coverage-reports/jacoco-unit.exec.
Major use cases and infrastructure enhancements
Changes and fixes in this release not already mentioned above include:
- Validation has been added for the Geographic Bounding Box values in the Geospatial metadata block. This will prevent improperly defined bounding boxes from being created via the edit page or metadata imports. This also fixes the issue where existing datasets with invalid geoboxes were quietly failing to get reindexed. See PR #10142.
- Dataverse's OAIORE Metadata Export format and archival BagIT exports (which include the OAI-ORE metadata export file) have been updated to include information about the dataset version state, e.g. RELEASED or DEACCESSIONED and to indicate which version of Dataverse was used to create the archival Bag. As part of the latter, the current OAIORE Metadata format has been given a 1.0.0 version designation and it is expected that any future changes to the OAIORE export format will result in a version change and that tools such as DVUploader that can recreate datasets from archival Bags will start indicating which version(s) of the OAIORE format they can read. Dataverse installations that have been using archival Bags may wish to update any existing archival Bags they have, e.g. by deleting existing Bags and using the Dataverse archival Bag export API to generate updated versions.
- For BagIT export, it is now possible to configure the following information in bag-info.txt. (Previously, customization was possible by editing
Bundle.propertiesbut this is no longer supported.) For details, see https://guides.dataverse.org/en/6.1/installation/config.html#bag-info-txt- Source-Organization from
dataverse.bagit.sourceorg.name. - Organization-Address from
dataverse.bagit.sourceorg.address. - Organization-Email from
dataverse.bagit.sourceorg.address.
- Source-Organization from
- This release fixes several issues (#9952, #9953, #9957) where the Signposting output did not match the Signposting specification. These changes introduce backward-incompatibility, but since Signposting support was added recently (in Dataverse 5.14 in PR #8981), we feel it's best to do this clean up and not support the old implementation that was not fully compliant with the spec.
- To fix #9952, we surround the license info with
<and>. - To fix #9953, we no longer wrap the response in a
{"status":"OK","data":{JSON object. This has also been noted in the guides at https://dataverse-guide--9955.org.readthedocs.build/en/9955/api/native-api.html#retrieve-signposting-information - To fix #9957, we corrected the mime/content type, changing it from
json+ldtold+json. For backward compatibility, we are still supporting the old one, for now.
- To fix #9952, we surround the license info with
- It's now possible to configure the docroot, which holds collection logos and more. See dataverse.files.docroot in the Installation Guide and PR #9819.
- We have started maintaining an API changelog of breaking changes: https://guides.dataverse.org/en/6.1/api/changelog.html See also #10060.
New configuration options
- dataverse.auth.oidc.auth-server-url
- dataverse.auth.oidc.client-id
- dataverse.auth.oidc.client-secret
- dataverse.auth.oidc.enabled
- dataverse.auth.oidc.pkce.enabled
- dataverse.auth.oidc.pkce.max-cache-age
- dataverse.auth.oidc.pkce.max-cache-size
- dataverse.auth.oidc.pkce.method
- dataverse.auth.oidc.subtitle
- dataverse.auth.oidc.title
- dataverse.bagit.sourceorg.address
- dataverse.bagit.sourceorg.address
- dataverse.bagit.sourceorg.name
- dataverse.files.docroot
- dataverse.files.globus-cache-maxage
- dataverse.files.guestbook-at-request
- dataverse.files.{driverId}.upload-out-of-band
Backward incompatibilities
- Since Alternative Title is now repeatable, the JSON you send to create or edit a dataset must be an array rather than a simple string. For example, instead of "value": "Alternative Title", you must send "value": ["Alternative Title1", "Alternative Title2"]
- Several issues (#9952, #9953, #9957) where the Signposting output did not match the Signposting specification introduce backward-incompatibility. See above for details.
- For BagIT export, if you were configuring values in bag-info.txt using
Bundle.properties, you must switch to the newdataverse.bagitJVM options mentioned above. For details, see https://guides.dataverse.org/en/6.1/installation/config.html#bag-info-txt - See "Globus support" above for backward incompatibilies specific to Globus.
Complete list of changes
For the complete list of code changes in this release, see the 6.1 Milestone in GitHub.
Getting help
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!
Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!
You are also very welcome to join the Global Dataverse Community Consortium (GDCC).
Upgrade instructions
Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.0.
0. These instructions assume that you are upgrading from 6.0. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to 5.14.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 6 is installed in /usr/local/payara6. If not, adjust as needed.
export PAYARA=/usr/local/payara6
(or setenv PAYARA /usr/local/payara6 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin undeploy dataverse-6.0
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-6.1.war
As noted above, deployment of the war file might take several minutes due a database migration script required for the new storage quotas feature.
5. Restart Payara
service payara stopservice payara start
6. Update Geospatial Metadata Block (to improve validation of bounding box values)
wget https://github.com/IQSS/dataverse/releases/download/v6.1/geospatial.tsvcurl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file geospatial.tsv
6a. Update Citation Metadata Block (to make Alternative Title repeatable)
wget https://github.com/IQSS/dataverse/releases/download/v6.1/citation.tsvcurl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv
7. Upate Solr schema.xml to allow multiple Alternative Titles to be used. See specific instructions below for those installations without custom metadata blocks (7a) and those with custom metadata blocks (7b).
7a. For installations without custom or experimental metadata blocks:
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide)Replace schema.xml
wget https://github.com/IQSS/dataverse/releases/download/v6.1/schema.xmlcp schema.xml /usr/local/solr/solr-9.3.0/server/solr/collection1/conf
Start Solr instance (usually
service solr start, depending on Solr/OS)
7b. For installations with custom or experimental metadata blocks:
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide)There are 2 ways to regenerate the schema: Either by collecting the output of the Dataverse schema API and feeding it to the
update-fields.shscript that we supply, as in the example below (modify the command lines as needed):wget https://raw.githubusercontent.com/IQSS/dataverse/master/conf/solr/9.3.0/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.3.0/server/solr/collection1/conf/schema.xmlOR, alternatively, you can edit the following line in your schema.xml by hand as follows (to indicate that alternative title is nowmultiValued="true"):<field name="alternativeTitle" type="text_en" multiValued="true" stored="true" indexed="true"/>Restart Solr instance (usually
service solr restartdepending on solr/OS)
8. Run ReExportAll to update dataset metadata exports. Follow the directions in the Admin Guide.
- Java
Published by landreev over 2 years ago
dataverse - v6.0
Dataverse 6.0
This is a platform upgrade release. Payara, Solr, and Java have been upgraded. No features have been added to the Dataverse software itself. Only a handful of bugs were fixed.
Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project!
Release Highlights (Major Upgrades, Breaking Changes)
This release contains major upgrades to core components. Detailed upgrade instructions can be found below.
Runtime
- The required Java version has been increased from version 11 to 17.
- See PR #9764 for details.
- Payara application server has been upgraded to version 6.2023.8.
- This is a required update.
- Please note that Payara Community 5 has reached end of life
- See PR #9685 and PR #9795 for details.
- Solr has been upgraded to version 9.3.0.
- See PR #9787 for details.
- PostgreSQL 13 remains the tested and supported version.
- See the PostgreSQL section of the Installation Guide for details.
Development
- Removal of Vagrant and Docker All In One (docker-aio), deprecated in Dataverse v5.14. See PR #9838 and PR #9685 for details.
- All tests have been migrated to use JUnit 5 exclusively from now on. See PR #9796 for details.
Installation
If this is a new installation, please follow our Installation Guide. Please don't be shy about asking for help if you need it!
Once you are in production, we would be delighted to update our map of Dataverse installations around the world to include yours! Please create an issue or email us at support@dataverse.org to join the club!
You are also very welcome to join the Global Dataverse Community Consortium (GDCC).
Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 5.14.
Upgrade from Java 11 to Java 17
Java 17 is now required for Dataverse. Solr can run under Java 11 or Java 17 but the latter is recommended. In preparation for the Java upgrade, stop both Dataverse/Payara and Solr.
- Undeploy Dataverse, if deployed, using the unprivileged service account.
sudo -u dataverse /usr/local/payara5/bin/asadmin list-applications
sudo -u dataverse /usr/local/payara5/bin/asadmin undeploy dataverse-5.14
- Stop Payara 5.
sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain
- Stop Solr 8.
sudo systemctl stop solr.service
- Install Java 17.
Assuming you are using RHEL or a derivative such as Rocky Linux:
sudo yum install java-17-openjdk
- Set Java 17 as the default.
Assuming you are using RHEL or a derivative such as Rocky Linux:
sudo alternatives --config java
- Test that Java 17 is the default.
java -version
Upgrade from Payara 5 to Payara 6
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
- Download Payara 6.2023.8.
curl -L -O https://nexus.payara.fish/repository/payara-community/fish/payara/distributions/payara/6.2023.8/payara-6.2023.8.zip
- Unzip it to /usr/local (or your preferred location).
sudo unzip payara-6.2023.8.zip -d /usr/local/
- Change ownership of the unzipped Payara to your "service" user ("dataverse" by default).
sudo chown -R dataverse /usr/local/payara6
- Undeploy Dataverse, if deployed, using the unprivileged service account.
sudo -u dataverse /usr/local/payara5/bin/asadmin list-applications
sudo -u dataverse /usr/local/payara5/bin/asadmin undeploy dataverse-5.14
- Stop Payara 5, if running.
sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain
- Copy Dataverse-related lines from Payara 5 to Payara 6 domain.xml.
sudo -u dataverse cp /usr/local/payara6/glassfish/domains/domain1/config/domain.xml /usr/local/payara6/glassfish/domains/domain1/config/domain.xml.orig
sudo egrep 'dataverse|doi' /usr/local/payara5/glassfish/domains/domain1/config/domain.xml > lines.txt
sudo vi /usr/local/payara6/glassfish/domains/domain1/config/domain.xml
If any JVM options reference the old payara5 path (/usr/local/payara5) be sure to change it to payara6.
The lines will appear in two sections, examples shown below (but your content will vary).
Section 1: system properties (under <server name="server" config-ref="server-config">)
<system-property name="dataverse.db.user" value="dvnuser"></system-property>
<system-property name="dataverse.db.host" value="localhost"></system-property>
<system-property name="dataverse.db.port" value="5432"></system-property>
<system-property name="dataverse.db.name" value="dvndb"></system-property>
<system-property name="dataverse.db.password" value="dvnsecret"></system-property>
Note: if you used the Dataverse installer, you won't have a dataverse.db.password property. See "Create password aliases" below.
Section 2: JVM options (under <java-config classpath-suffix="" debug-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=9009" system-classpath="">, the one under <config name="server-config">, not under <config name="default-config">)
<jvm-options>-Ddataverse.files.directory=/usr/local/dvn/data</jvm-options>
<jvm-options>-Ddataverse.files.file.type=file</jvm-options>
<jvm-options>-Ddataverse.files.file.label=file</jvm-options>
<jvm-options>-Ddataverse.files.file.directory=/usr/local/dvn/data</jvm-options>
<jvm-options>-Ddataverse.rserve.host=localhost</jvm-options>
<jvm-options>-Ddataverse.rserve.port=6311</jvm-options>
<jvm-options>-Ddataverse.rserve.user=rserve</jvm-options>
<jvm-options>-Ddataverse.rserve.password=rserve</jvm-options>
<jvm-options>-Ddataverse.auth.password-reset-timeout-in-minutes=60</jvm-options>
<jvm-options>-Ddataverse.timerServer=true</jvm-options>
<jvm-options>-Ddataverse.fqdn=dev1.dataverse.org</jvm-options>
<jvm-options>-Ddataverse.siteUrl=https://dev1.dataverse.org</jvm-options>
<jvm-options>-Ddataverse.files.storage-driver-id=file</jvm-options>
<jvm-options>-Ddoi.username=testaccount</jvm-options>
<jvm-options>-Ddoi.password=notmypassword</jvm-options>
<jvm-options>-Ddoi.baseurlstring=https://mds.test.datacite.org/</jvm-options>
<jvm-options>-Ddoi.dataciterestapiurlstring=https://api.test.datacite.org</jvm-options>
- Check the
Xmxsetting indomain.xml.
Under /usr/local/payara6/glassfish/domains/domain1/config/domain.xml, check the Xmx setting under <config name="server-config">, where you put the JVM options, not the one under <config name="default-config">. Note that there are two such settings, and you want to adjust the one in the stanza with Dataverse options. This sets the JVM heap size; a good rule of thumb is half of your system's total RAM. You may specify the value in MB (8192m) or GB (8g).
- Copy
jhove.confandjhoveConfig.xsdfrom Payara 5, edit and changepayara5topayara6.
sudo cp /usr/local/payara5/glassfish/domains/domain1/config/jhove* /usr/local/payara6/glassfish/domains/domain1/config/
sudo chown dataverse /usr/local/payara6/glassfish/domains/domain1/config/jhove*
sudo -u dataverse vi /usr/local/payara6/glassfish/domains/domain1/config/jhove.conf
- Copy logos from Payara 5 to Payara 6.
These logos are for collections (dataverses).
sudo -u dataverse cp -r /usr/local/payara5/glassfish/domains/domain1/docroot/logos /usr/local/payara6/glassfish/domains/domain1/docroot
- If you are using Make Data Count (MDC), edit :MDCLogPath.
Your :MDCLogPath database setting might be pointing to a Payara 5 directory such as /usr/local/payara5/glassfish/domains/domain1/logs. If so, edit this to be Payara 6. You'll probably want to copy your logs over as well.
Update systemd unit file (or other init system) from
/usr/local/payara5to/usr/local/payara6, if applicable.Start Payara.
sudo -u dataverse /usr/local/payara6/bin/asadmin start-domain
- Create a Java mail resource, replacing "localhost" for mailhost with your mail relay server, and replacing "localhost" for fromaddress with the FQDN of your Dataverse server.
sudo -u dataverse /usr/local/payara6/bin/asadmin create-javamail-resource --mailhost "localhost" --mailuser "dataversenotify" --fromaddress "do-not-reply@localhost" mail/notifyMailSession
- Create password aliases for your database, rserve and datacite jvm-options, if you're using them.
echo "AS_ADMIN_ALIASPASSWORD=yourDBpassword" > /tmp/dataverse.db.password.txt
sudo -u dataverse /usr/local/payara6/bin/asadmin create-password-alias --passwordfile /tmp/dataverse.db.password.txt
When you are prompted "Enter the value for the aliasname operand", enter dataverse.db.password
You should see "Command create-password-alias executed successfully."
You'll want to perform similar commands for rserve_password_alias and doi_password_alias if you're using Rserve and/or DataCite.
- Enable workaround for FISH-7722.
The following workaround is for https://github.com/payara/Payara/issues/6337
sudo -u dataverse /usr/local/payara6/bin/asadmin create-jvm-options --add-opens=java.base/java.io=ALL-UNNAMED
- Create the network listener on port 8009.
sudo -u dataverse /usr/local/payara6/bin/asadmin create-network-listener --protocol http-listener-1 --listenerport 8009 --jkenabled true jk-connector
- Deploy the Dataverse 6.0 war file.
sudo -u dataverse /usr/local/payara6/bin/asadmin deploy /path/to/dataverse-6.0.war
- Check that you get a version number from Dataverse.
This is just a sanity check that Dataverse has been deployed properly.
curl http://localhost:8080/api/info/version
- Perform one final Payara restart to ensure that timers are initialized properly.
sudo -u dataverse /usr/local/payara6/bin/asadmin stop-domain
sudo -u dataverse /usr/local/payara6/bin/asadmin start-domain
Upgrade from Solr 8 to 9
Solr has been upgraded to Solr 9. You must install Solr fresh and reindex. You cannot use your old schema.xml because the format has changed.
The instructions below are copied from https://guides.dataverse.org/en/6.0/installation/prerequisites.html#installing-solr and tweaked a bit for an upgrade scenario.
We assume that you already have a user called "solr" (from the instructions above), added during your initial installation of Solr. We also assume that you have already stopped Solr 8 as explained in the instructions above about upgrading Java.
- Become the "solr" user and then download and configure Solr.
su - solr
cd /usr/local/solr
wget https://archive.apache.org/dist/solr/solr/9.3.0/solr-9.3.0.tgz
tar xvzf solr-9.3.0.tgz
cd solr-9.3.0
cp -r server/solr/configsets/_default server/solr/collection1
- Unzip "dvinstall.zip" from this release. Unzip it into /tmp. Then copy the following files into place.
cp /tmp/dvinstall/schema*.xml /usr/local/solr/solr-9.3.0/server/solr/collection1/conf
cp /tmp/dvinstall/solrconfig.xml /usr/local/solr/solr-9.3.0/server/solr/collection1/conf
- A Dataverse installation requires a change to the jetty.xml file that ships with Solr.
Edit /usr/local/solr/solr-9.3.0/server/etc/jetty.xml, increasing requestHeaderSize from 8192 to 102400
- Tell Solr to create the core "collection1" on startup.
echo "name=collection1" > /usr/local/solr/solr-9.3.0/server/solr/collection1/core.properties
- Update your init script.
Your init script may be located at /etc/systemd/system/solr.service, for example. Update the path to Solr to be /usr/local/solr/solr-9.3.0.
- Start Solr using your init script and check collection1.
The collection1 check below should print out fields Dataverse uses like "dsDescription".
systemctl start solr.service
curl http://localhost:8983/solr/collection1/schema/fields
- If you have custom metadata blocks installed, you must update your Solr
schema.xmlto include your custom fields.
For details, please see https://guides.dataverse.org/en/6.0/admin/metadatacustomization.html#updating-the-solr-schema
At a high level you will be copying custom fields from the output of http://localhost:8080/api/admin/index/solr/schema or using a script to automate this.
- Reindex Solr.
For details, see https://guides.dataverse.org/en/6.0/admin/solr-search-index.html but here is the reindex command:
curl http://localhost:8080/api/admin/index
Potential Archiver Incompatibilities with Payara 6
The Google Cloud and DuraCloud archivers may not work in Dataverse 6.0.
This is due to the archivers' dependence on libraries that include classes in javax.* packages that are no longer available. If these classes are actually used when the archivers run, the archivers would fail. As these two archivers require additional setup, they have not been tested in 6.0. Community members using these archivers or considering their use are encouraged to test them with 6.0 and report any errors and/or provide fixes for them that can be included in future releases.
Bug Fix for Dataset Templates with Custom Terms of Use
A bug was fixed for the following scenario:
- Create a template with custom terms.
- Set that template as the default.
- Try to create a dataset.
- A 500 error appears before the form to create dataset is even shown.
For more details, see issue #9825 and PR #9892
Complete List of Changes
For the complete list of code changes in this release, see the 6.0 Milestone in GitHub.
Getting Help
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
- Java
Published by kcondon almost 3 years ago
dataverse - v5.14
Dataverse Software 5.14
(If this note appears truncated on the GitHub Releases page, you can view it in full in the source tree: https://github.com/IQSS/dataverse/blob/master/doc/release-notes/5.14-release-notes.md)
This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Please note that, as an experiment, the sections of this release note are organized in a different order. The Upgrade and Installation sections are at the top, with the detailed sections highlighting new features and fixes further down.
Installation
If this is a new installation, please see our Installation Guide. Please don't be shy about asking for help if you need it!
After your installation has gone into production, you are welcome to add it to our map of installations by opening an issue in the dataverse-installations repo.
Upgrade Instructions
0. These instructions assume that you are upgrading from 5.13. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to 5.14.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin undeploy dataverse-5.13
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.14.war
5. Restart Payara
service payara stopservice payara start
6. Update the Citation metadata block: (the update makes the field Series repeatable)
wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.tsvcurl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
If you are running an English-only installation, you are finished with the citation block. Otherwise, download the updated citation.properties file and place it in the dataverse.lang.directory; /home/dataverse/langBundles used in the example below.
wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.propertiescp citation.properties /home/dataverse/langBundles
7. Upate Solr schema.xml to allow multiple series to be used. See specific instructions below for those installations without custom metadata blocks (7a) and those with custom metadata blocks (7b).
7a. For installations without custom or experimental metadata blocks:
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide)Replace schema.xml
cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
Start Solr instance (usually
service solr start, depending on Solr/OS)
7b. For installations with custom or experimental metadata blocks:
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide)There are 2 ways to regenerate the schema: Either by collecting the output of the Dataverse schema API and feeding it to the
update-fields.shscript that we supply, as in the example below (modify the command lines as needed):wget https://raw.githubusercontent.com/IQSS/dataverse/master/conf/solr/8.11.1/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.8.1/server/solr/collection1/conf/schema.xmlOR, alternatively, you can edit the following lines in your schema.xml by hand as follows (to indicate that series and its components are nowmultiValued="true"):<field name="series" type="string" stored="true" indexed="true" multiValued="true"/> <field name="seriesInformation" type="text_en" multiValued="true" stored="true" indexed="true"/> <field name="seriesName" type="text_en" multiValued="true" stored="true" indexed="true"/>Restart Solr instance (usually
service solr restartdepending on solr/OS)
8. Run ReExportAll to update dataset metadata exports. Follow the directions in the Admin Guide.
9. If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:
curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled
10. If your installation uses Handles as persistent identifiers (instead of DOIs): remember to upgrade your Handles service installation to a currently supported version.
Generally, Handles is known to be working reliably even when running older versions that haven't been officially supported in years. We still recommend to check on your service and make sure to upgrade to a supported version (the latest version is 9.3.1, https://www.handle.net/hnr-source/handle-9.3.1-distribution.tar.gz, as of writing this). An older version may be running for you seemingly just fine, but do keep in mind that it may just stop working unexpectedly at any moment, because of some incompatibility introduced in a Java rpm upgrade, or anything similarly unpredictable.
Handles is also very good about backward incompatibility. Meaning, in most cases you can simply stop the old version, unpack the new version from the distribution and start it on the existing config and database files, and it'll just keep working. However, it is a good idea to keep up with the recommended format upgrades, for the sake of efficiency and to avoid any unexpected surprises, should they finally decide to drop the old database format, for example. The two specific things we recommend: 1) Make sure your service is using a json version of the siteinfo bundle (i.e., if you are still using siteinfo.bin, convert it to siteinfo.json and remove the binary file from the service directory) and 2) Make sure you are using the newer bdbje database format for your handles catalog (i.e., if you still have the files handles.jdb and nas.jdb in your server directory, convert them to the new format). Follow the simple conversion instructions in the file README.txt in the Handles software distribution. Make sure to stop the service before converting the files and make sure to have a full backup of the existing server directory, just in case. Do not hesitate to contact the Handles support with any questions you may have, as they are very responsive and helpful.
New JVM Options and MicroProfile Config Options
The following PID provider options are now available. See the section "Changes to PID Provider JVM Settings" below for more information.
dataverse.pid.datacite.mds-api-urldataverse.pid.datacite.rest-api-urldataverse.pid.datacite.usernamedataverse.pid.datacite.passworddataverse.pid.handlenet.key.pathdataverse.pid.handlenet.key.passphrasedataverse.pid.handlenet.indexdataverse.pid.permalink.base-urldataverse.pid.ezid.api-urldataverse.pid.ezid.usernamedataverse.pid.ezid.password
The following MicroProfile Config options have been added as part of Signposting support. See the section "Signposting for Dataverse" below for details.
dataverse.signposting.level1-author-limitdataverse.signposting.level1-item-limit
The following JVM options are described in the "Creating datasets with incomplete metadata through API" section below.
dataverse.api.allow-incomplete-metadatadataverse.ui.show-validity-filterdataverse.ui.allow-review-for-incomplete
The following JVM/MicroProfile setting is for External Exporters. See "Mechanism Added for Adding External Exporters" below.
dataverse.spi.export.directory
The following JVM/MicroProfile settings are for handling of support emails. See "Contact Email Improvements" below.
dataverse.mail.support-emaildataverse.mail.cc-support-on-contact-emails
The following JVM/MicroProfile setting is for extracting a geospatial bounding box even if S3 direct upload is enabled.
dataverse.netcdf.geo-extract-s3-direct-upload
Backward Incompatibilities
The following list of potential backward incompatibilities references the sections of the "Detailed Release Highlights..." portion of the document further below where the corresponding changes are explained in detail.
Using the new External Exporters framework
Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters, may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the reExport command for the metadata exports of existing datasets to be updated.
See "Mechanism Added for Adding External Exporters".
Publishing via API
When publishing a dataset via API, it now mirrors the UI behavior by requiring that the dataset has either a standard license configured, or has valid Custom Terms of Use (if allowed by the instance). Attempting to publish a dataset without such will fail with an error message.
See "Handling of license information fixed in the API" for guidance on how to ensure that datasets created or updated via native API have a license configured.
Detailed Release Highlights, New Features and Use Case Scenarios
For Dataverse developers, support for running Dataverse in Docker (experimental)
Developers can experiment with running Dataverse in Docker: (PR #9439)
This is an image developers build locally (or can pull from Docker Hub). It is not meant for production use!
To provide a complete container-based local development environment, developers can deploy a Dataverse container from the new image in addition to other containers for necessary dependencies: https://guides.dataverse.org/en/5.14/container/dev-usage.html
Please note that with this emerging solution we will sunset older tooling like docker-aio and docker-dcm.
We envision more testing possibilities in the future, to be discussed as part of the
Dataverse Containerization Working Group. There is no sunsetting roadmap yet, but you have been warned.
If there is some specific feature of these tools you would like to be kept, please reach out.
Indexing performance improved
Noticeable improvements in performance, especially for large datasets containing thousands of files. Uploading files one by one to the dataset is much faster now, allowing uploading thousands of files in an acceptable timeframe. Not only uploading a file, but all edit operations on datasets containing many files, got faster. Performance tweaks include indexing of the datasets in the background and optimizations in the amount of the indexing operations needed. Furthermore, updates to the dateset no longer wait for ingesting to finish. Ingesting was already running in the background, but it took a lock, preventing updating the dataset and degrading performance for datasets containing many files. (PR #9558)
For installations using MDC (Make Data Count), it is now possible to display both the MDC metrics and the legacy access counts, generated before MDC was enabled.
This is enabled via the new setting :MDCStartDate that specifies the cutoff date. If a dataset has any legacy access counts collected prior to that date, those numbers will be displayed in addition to any MDC numbers recorded since then. (PR #6543)
Changes to PID Provider JVM Settings
In preparation for a future feature to use multiple PID providers at the same time, all JVM settings for PID providers have been enabled to be configured using MicroProfile Config. In the same go, they were renamed to match the name of the provider to be configured.
Please watch your log files for deprecation warnings. Your old settings will be picked up, but you should migrate to the new names to avoid unnecessary log clutter and get prepared for more future changes. An example message looks like this:
[#|2023-03-31T16:55:27.992+0000|WARNING|Payara 5.2022.5|edu.harvard.iq.dataverse.settings.source.AliasConfigSource|_ThreadID=30;_ThreadName=RunLevelControllerThread-1680281704925;_TimeMillis=1680281727992;_LevelValue=900;|
Detected deprecated config option doi.username in use. Please update your config to use dataverse.pid.datacite.username.|#]
Here is a list of the new settings:
- dataverse.pid.datacite.mds-api-url
- dataverse.pid.datacite.rest-api-url
- dataverse.pid.datacite.username
- dataverse.pid.datacite.password
- dataverse.pid.handlenet.key.path
- dataverse.pid.handlenet.key.passphrase
- dataverse.pid.handlenet.index
- dataverse.pid.permalink.base-url
- dataverse.pid.ezid.api-url
- dataverse.pid.ezid.username
- dataverse.pid.ezid.password
See also https://guides.dataverse.org/en/5.14/installation/config.html#persistent-identifiers-and-publishing-datasets (multiple PRs: #8823 #8828)
Signposting for Dataverse
This release adds Signposting support to Dataverse to improve machine discoverability of datasets and files. (PR #8424)
The following MicroProfile Config options are now available (these can be treated as JVM options):
- dataverse.signposting.level1-author-limit
- dataverse.signposting.level1-item-limit
Signposting is described in more detail in a new page in the Admin Guide on discoverability: https://guides.dataverse.org/en/5.14/admin/discoverability.html
Permalinks support
Dataverse now optionally supports PermaLinks, a type of persistent identifier that does not involve a global registry service. PermaLinks are appropriate for Intranet deployment and catalog use cases. (PR #8674)
Creating datasets with incomplete metadata through API
It is now possible to create a dataset with some nominally mandatory metadata fields left unpopulated. For details on the use case that lead to this feature see issue #8822 and PR #8940.
The create dataset API call (POST to /api/dataverses/#dataverseId/datasets) is extended with the "doNotValidate" parameter. However, in order to be able to create a dataset with incomplete metadata, the Solr configuration must be updated first with the new "schema.xml" file (do not forget to run the metadata fields update script when you use custom metadata). Reindexing is optional, but recommended. Also, even when this feature is not used, it is recommended to update the Solr configuration and reindex the metadata. Finally, this new feature can be activated with the "dataverse.api.allow-incomplete-metadata" JVM option.
You can also enable a valid/incomplete metadata filter in the "My Data" page using the "dataverse.ui.show-validity-filter" JVM option. By default, this filter is not shown. When you wish to use this filter, you must reindex the datasets first, otherwise datasets with valid metadata will not be shown in the results.
It is not possible to publish datasets with incomplete or incomplete metadata. By default, you also cannot send such datasets for review. If you wish to enable sending for review of datasets with incomplete metadata, turn on the "dataverse.ui.allow-review-for-incomplete" JVM option.
In order to customize the wording and add translations to the UI sections extended by this feature, you can edit the "Bundle.properties" file and the localized versions of that file. The property keys used by this feature are: - incomplete - valid - dataset.message.incomplete.warning - mydataFragment.validity - dataverses.api.create.dataset.error.mustIncludeAuthorName
Registering PIDs (DOIs or Handles) for files in select collections
It is now possible to configure registering PIDs for files in individual collections.
For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See the :FilePIDsEnabled section of the Configuration guide for details. (PR #9614)
Mechanism Added for Adding External Exporters
It is now possible for third parties to develop and share code to provide new metadata export formats for Dataverse. Export formats can be made available via the Dataverse UI and API or configured for use in Harvesting. Dataverse now provides developers with a separate dataverse-spi JAR file that contains the Java interfaces and classes required to create a new metadata Exporter. Once a new Exporter has been created and packaged as a JAR file, administrators can use it by specifying a local directory for third party Exporters, dropping then Exporter JAR there, and restarting Payara. This mechanism also allows new Exporters to replace any of Dataverse's existing metadata export formats. (PR #9175). See also https://guides.dataverse.org/en/5.14/developers/metadataexport.html
Backward Incompatibilities
Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the reExport command for the metadata exports of existing datasets to be updated.
New JVM/MicroProfile Settings
dataverse.spi.export.directory - specifies a directory, readable by the Dataverse server. Any Exporter JAR files placed in this directory will be read by Dataverse and used to add/replace the specified metadata format.
Contact Email Improvements
Email sent from the contact forms to the contact(s) for a collection, dataset, or datafile can now optionally be cc'd to a support email address. The support email address can be changed from the default :SystemEmail address to a separate :SupportEmail address. When multiple contacts are listed, the system will now send one email to all contacts (with the optional cc if configured) instead of separate emails to each contact. Contact names with a comma that refer to Organizations will no longer have the name parts reversed in the email greeting. A new protected/admin feedback API has been added. (PR #9186) See https://guides.dataverse.org/en/5.14/api/native-api.html#send-feedback-to-contact-s
New JVM/MicroProfile Settings
dataverse.mail.support-email - allows a separate email, distinct from the :SystemEmail to be used as the to address in emails from the contact form/ feedback api. dataverse.mail.cc-support-on-contact-emails - include the support email address as a CC: entry when contact/feedback emails are sent to the contacts for a collection, dataset, or datafile.
Support for Grouping Dataset Files by Folder and Category Tag
Dataverse now supports grouping dataset files by folder and/or optionally by Tag/Category. The default for whether to order by folder can be changed via :OrderByFolder. Ordering by category must be enabled by an administrator via the :CategoryOrder parameter which is used to specify which tags appear first (e.g. to put Documentation files before Data or Code files, etc.) These Group-By options work with the existing sort options, i.e. sorting alphabetically means that files within each folder or tag group will be sorted alphabetically. :AllowUsersToManageOrdering can be set to true to allow users to turn folder ordering and category ordering (if enabled) on or off in the current dataset view. (PR #9204)
New Settings
:CategoryOrder - a comma separated list of Category/Tag names defining the order in which files with those tags should be displayed. The setting can include custom tag names along with the pre-defined defaults ( Documentation, Data, and Code, which can be overridden by the ::FileCategories setting.) :OrderByFolder - defaults to true - whether to group files in the same folder together :AllowUserManagementOfOrder - default false - allow users to toggle ordering on/off in the dataset display
Metadata field Series now repeatable
This enhancement allows depositors to define multiple instances of the metadata field Series in the Citation Metadata block.
Data contained in a dataset may belong to multiple series. Making the field repeatable makes it possible to reflect this fact in the dataset metadata. (PR #9256)
Guides in PDF Format
An experimental version of the guides in PDF format is available at http://preview.guides.gdcc.io/_/downloads/en/develop/pdf/ (PR #9474)
Advice for anyone who wants to help improve the PDF is available at https://guides.dataverse.org/en/5.14/developers/documentation.html#pdf-version-of-the-guides
Datasets API extended
The following APIs have been added: (PR #9592)
/api/datasets/summaryFieldNames/api/datasets/privateUrlDatasetVersion/{privateUrlToken}/api/datasets/privateUrlDatasetVersion/{privateUrlToken}/citation/api/datasets/{datasetId}/versions/{version}/citation
Extra fields included in the JSON metadata
The following fields are now available in the native JSON output:
alternativePersistentIdpublicationDatecitationDate
(PR #9657)
Files downloaded from Binder are now in their original format.
For example, data.dta (a Stata file) will be downloaded instead of data.tab (the archival version Dataverse creates as part of a successful ingest). (PR #9483)
This should make it easier to write code to reproduce results as the dataset authors and subsequent researchers are likely operating on the original file format rather that the format that Dataverse creates.
For details, see #9374, https://github.com/jupyterhub/repo2docker/issues/1242, and https://github.com/jupyterhub/repo2docker/pull/1253.
Handling of license information fixed in the API
(PR #9568)
When publishing a dataset via API, it now requires the dataset to either have a standard license configured, or have valid Custom Terms of Use (if allowed by the instance). Attempting to publish a dataset without such will fail with an error message. This introduces a backward incompatibility, and if you have scripts that automatically create, update and publish datasets, this last step may start failing. Because, unfortunately, there were some problems with the datasets APIs that made it difficult to manage licenses, so an API user was likely to end up with a dataset missing either of the above. In this release we have addressed it by making the following fixes:
We fixed the incompatibility between the format in which license information was exported in json, and the format the create and update APIs were expecting it for import (https://github.com/IQSS/dataverse/issues/9155). This means that the following json format can now be imported:
"license": {
"name": "CC0 1.0",
"uri": "http://creativecommons.org/publicdomain/zero/1.0"
}
However, for the sake of backward compatibility the old format
"license" : "CC0 1.0"
will be accepted as well.
We have added the default license (CC0) to the model json file that we provide and recommend to use as the model in the Native API Guide (https://github.com/IQSS/dataverse/issues/9364).
And we have corrected the misleading language in the same guide where we used to recommend to users that they select, edit and re-import only the .metadataBlocks fragment of the json metadata representing the latest version. There are in fact other useful pieces of information that need to be preserved in the update (such as the "license" section above). So the recommended way of creating base json for updates via the API is to select everything but the "files" section, with (for example) the following jq command:
jq '.data | del(.files)'
Please see the Update Metadata For a Dataset section of our Native Api guide for more information.
New External Tool Type and Implementation
With this release a new experimental external tool type has been added to the Dataverse Software. The tool type is "query" and its first implementation is an experimental tool named Ask the Data which allows users to ask natural language queries of tabular files in Dataverse. More information is available in the External Tools section of the guides. (PR #9737) See https://guides.dataverse.org/en/5.14/admin/external-tools.html#file-level-query-tools
Default Value for File PIDs registration has changed
The default for whether PIDs are registered for files or not is now false.
Installations where file PIDs were enabled by default will have to add the :FilePIDsEnabled = true setting to maintain the existing functionality.
See Step 9 of the upgrade instructions:
If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:
curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled
It is now possible to allow File PIDs to be enabled/disabled per collection. See the :AllowEnablingFilePIDsPerCollection section of the Configuration guide for details.
For example, registration of PIDs for files can now be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default.
Changes and fixes in this release not already mentioned above include:
- An endpoint for deleting a file has been added to the native API: https://guides.dataverse.org/en/5.14/api/native-api.html#deleting-files (PR #9383)
- A date column has been added to the restricted file access request overview, indicating when the earliest request by that user was made. An issue was fixed where where the request list was not updated when a request was approved or rejected. (PR #9257)
- Changes made in v5.13 and v5.14 in multiple PRs to improve the embedded Schema.org metadata in dataset pages will only be propagated to the Schema.Org JSON-LD metadata export if a reExportAll() is done. (PR #9102)
- It is now possible to write external vocabulary scripts that target a single child field in a metadata block. Example scripts are now available at https://github.com/gdcc/dataverse-external-vocab-support that can be configured to support lookup from the Research Orgnaization Registry (ROR) for the Author Affiliation Field and for the CrossRef Funding Registry (Fundreg) in the Funding Information/Agency field, both in the standard Citation metadata block. Application if these scripts to other fields, and the development of other scripts targetting child fields are now possible (PR #9402)
- Dataverse now supports requiring a secret key to add or edit metadata in specified "system" metadata blocks. Changing the metadata in such system metadata blocks is not allowed without the key and is currently only allowed via API. (PR #9388)
- An attempt will be made to extract a geospatial bounding box (west, south, east, north) from NetCDF and HDF5 files and then insert these values into the geospatial metadata block, if enabled. (#9541) See https://guides.dataverse.org/en/5.14/user/dataset-management.html#geospatial-bounding-box
- A file previewer called H5Web is now available for exploring and visualizing NetCDF and HDF5 files. (PR #9600) See https://guides.dataverse.org/en/5.14/user/dataset-management.html#h5web-previewer
- Two file previewers for GeoTIFF and Shapefiles are now available for visualizing geotiff image files and zipped Shapefiles on a map. See https://github.com/gdcc/dataverse-previewers
- New alternative to setup the Dataverse dependencies for the development environment through Docker Compose. (PR #9417)
- New alternative, explained in the documentation, to build the Sphinx guides through a Docker container. (PR #9417)
- A container has been added called "configbaker" that configures Dataverse while running in containers. This allows developers to spin up Dataverse with a single command. (PR #9574)
- Direct upload via the Dataverse UI will now support any algorithm configured via the
:FileFixityChecksumAlgorithmsetting. External apps using the direct upload API can now query Dataverse to discover which algorithm should be used. Sites that have been using an algorithm other than MD5 and direct upload and/or dvwebloader may want to use the/api/admin/updateHashValuescall (see https://guides.dataverse.org/en/5.14/installation/config.html?highlight=updatehashvalues#filefixitychecksumalgorithm) to replace any MD5 hashes on existing files. (PR #9482) - The OAI_ORE metadata export (and hence the archival Bag for a dataset) now includes information about file embargoes. (PR #9698)
- DatasetFieldType attribute "displayFormat", is now returned by the API. (PR #9668)
- An API named "MyData" has been available for years but is newly documented. It is used to get a list of the objects (datasets, collections or datafiles) that an authenticated user can modify. (PR #9596)
- A Go client library for Dataverse APIs is now available. See https://guides.dataverse.org/en/5.14/api/client-libraries.html
- A feature flag called "api-session-auth" has been added temporarily to aid in the development of the new frontend (#9063) but will be removed once bearer tokens (#9229) have been implemented. There is a security risk (CSRF) in enabling this flag! Do not use it in production! For more information, see https://guides.dataverse.org/en/5.14/installation/config.html#feature-flags
- A feature flag called "api-bearer-auth" has been added. This allows OIDC useraccounts to send authenticated API requests using Bearer Tokens. Note: This feature is limited to OIDC! For more information, see https://guides.dataverse.org/en/5.14/installation/config.html#feature-flags (PR #9591)
Complete List of Changes
For the complete list of code changes in this release, see the 5.14 milestone on GitHub.
- Java
Published by kcondon almost 3 years ago
dataverse - v5.14
Dataverse Software 5.14
(If this note appears truncated on the GitHub Releases page, you can view it in full in the source tree: https://github.com/IQSS/dataverse/blob/master/doc/release-notes/5.14-release-notes.md)
This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Please note that, as an experiment, the sections of this release note are organized in a different order. The Upgrade and Installation sections are at the top, with the detailed sections highlighting new features and fixes further down.
Installation
If this is a new installation, please see our Installation Guide. Please don't be shy about asking for help if you need it!
After your installation has gone into production, you are welcome to add it to our map of installations by opening an issue in the dataverse-installations repo.
Upgrade Instructions
0. These instructions assume that you are upgrading from 5.13. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to 5.14.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin undeploy dataverse-5.13
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.14.war
5. Restart Payara
service payara stopservice payara start
6. Update the Citation metadata block: (the update makes the field Series repeatable)
wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.tsvcurl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
If you are running an English-only installation, you are finished with the citation block. Otherwise, download the updated citation.properties file and place it in the dataverse.lang.directory; /home/dataverse/langBundles used in the example below.
wget https://github.com/IQSS/dataverse/releases/download/v5.14/citation.propertiescp citation.properties /home/dataverse/langBundles
7. Upate Solr schema.xml to allow multiple series to be used. See specific instructions below for those installations without custom metadata blocks (7a) and those with custom metadata blocks (7b).
7a. For installations without custom or experimental metadata blocks:
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide)Replace schema.xml
cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
Start Solr instance (usually
service solr start, depending on Solr/OS)
7b. For installations with custom or experimental metadata blocks:
Stop Solr instance (usually
service solr stop, depending on Solr installation/OS, see the Installation Guide)There are 2 ways to regenerate the schema: Either by collecting the output of the Dataverse schema API and feeding it to the
update-fields.shscript that we supply, as in the example below (modify the command lines as needed):wget https://raw.githubusercontent.com/IQSS/dataverse/master/conf/solr/8.11.1/update-fields.sh chmod +x update-fields.sh curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.8.1/server/solr/collection1/conf/schema.xmlOR, alternatively, you can edit the following lines in your schema.xml by hand as follows (to indicate that series and its components are nowmultiValued="true"):<field name="series" type="string" stored="true" indexed="true" multiValued="true"/> <field name="seriesInformation" type="text_en" multiValued="true" stored="true" indexed="true"/> <field name="seriesName" type="text_en" multiValued="true" stored="true" indexed="true"/>Restart Solr instance (usually
service solr restartdepending on solr/OS)
8. Run ReExportAll to update dataset metadata exports. Follow the directions in the Admin Guide.
9. If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:
curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled
10. If your installation uses Handles as persistent identifiers (instead of DOIs): remember to upgrade your Handles service installation to a currently supported version.
Generally, Handles is known to be working reliably even when running older versions that haven't been officially supported in years. We still recommend to check on your service and make sure to upgrade to a supported version (the latest version is 9.3.1, https://www.handle.net/hnr-source/handle-9.3.1-distribution.tar.gz, as of writing this). An older version may be running for you seemingly just fine, but do keep in mind that it may just stop working unexpectedly at any moment, because of some incompatibility introduced in a Java rpm upgrade, or anything similarly unpredictable.
Handles is also very good about backward incompatibility. Meaning, in most cases you can simply stop the old version, unpack the new version from the distribution and start it on the existing config and database files, and it'll just keep working. However, it is a good idea to keep up with the recommended format upgrades, for the sake of efficiency and to avoid any unexpected surprises, should they finally decide to drop the old database format, for example. The two specific things we recommend: 1) Make sure your service is using a json version of the siteinfo bundle (i.e., if you are still using siteinfo.bin, convert it to siteinfo.json and remove the binary file from the service directory) and 2) Make sure you are using the newer bdbje database format for your handles catalog (i.e., if you still have the files handles.jdb and nas.jdb in your server directory, convert them to the new format). Follow the simple conversion instructions in the file README.txt in the Handles software distribution. Make sure to stop the service before converting the files and make sure to have a full backup of the existing server directory, just in case. Do not hesitate to contact the Handles support with any questions you may have, as they are very responsive and helpful.
New JVM Options and MicroProfile Config Options
The following PID provider options are now available. See the section "Changes to PID Provider JVM Settings" below for more information.
dataverse.pid.datacite.mds-api-urldataverse.pid.datacite.rest-api-urldataverse.pid.datacite.usernamedataverse.pid.datacite.passworddataverse.pid.handlenet.key.pathdataverse.pid.handlenet.key.passphrasedataverse.pid.handlenet.indexdataverse.pid.permalink.base-urldataverse.pid.ezid.api-urldataverse.pid.ezid.usernamedataverse.pid.ezid.password
The following MicroProfile Config options have been added as part of Signposting support. See the section "Signposting for Dataverse" below for details.
dataverse.signposting.level1-author-limitdataverse.signposting.level1-item-limit
The following JVM options are described in the "Creating datasets with incomplete metadata through API" section below.
dataverse.api.allow-incomplete-metadatadataverse.ui.show-validity-filterdataverse.ui.allow-review-for-incomplete
The following JVM/MicroProfile setting is for External Exporters. See "Mechanism Added for Adding External Exporters" below.
dataverse.spi.export.directory
The following JVM/MicroProfile settings are for handling of support emails. See "Contact Email Improvements" below.
dataverse.mail.support-emaildataverse.mail.cc-support-on-contact-emails
The following JVM/MicroProfile setting is for extracting a geospatial bounding box even if S3 direct upload is enabled.
dataverse.netcdf.geo-extract-s3-direct-upload
Backward Incompatibilities
The following list of potential backward incompatibilities references the sections of the "Detailed Release Highlights..." portion of the document further below where the corresponding changes are explained in detail.
Using the new External Exporters framework
Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters, may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the reExport command for the metadata exports of existing datasets to be updated.
See "Mechanism Added for Adding External Exporters".
Publishing via API
When publishing a dataset via API, it now mirrors the UI behavior by requiring that the dataset has either a standard license configured, or has valid Custom Terms of Use (if allowed by the instance). Attempting to publish a dataset without such will fail with an error message.
See "Handling of license information fixed in the API" for guidance on how to ensure that datasets created or updated via native API have a license configured.
Detailed Release Highlights, New Features and Use Case Scenarios
For Dataverse developers, support for running Dataverse in Docker (experimental)
Developers can experiment with running Dataverse in Docker: (PR #9439)
This is an image developers build locally (or can pull from Docker Hub). It is not meant for production use!
To provide a complete container-based local development environment, developers can deploy a Dataverse container from the new image in addition to other containers for necessary dependencies: https://guides.dataverse.org/en/5.14/container/dev-usage.html
Please note that with this emerging solution we will sunset older tooling like docker-aio and docker-dcm.
We envision more testing possibilities in the future, to be discussed as part of the
Dataverse Containerization Working Group. There is no sunsetting roadmap yet, but you have been warned.
If there is some specific feature of these tools you would like to be kept, please reach out.
Indexing performance improved
Noticeable improvements in performance, especially for large datasets containing thousands of files. Uploading files one by one to the dataset is much faster now, allowing uploading thousands of files in an acceptable timeframe. Not only uploading a file, but all edit operations on datasets containing many files, got faster. Performance tweaks include indexing of the datasets in the background and optimizations in the amount of the indexing operations needed. Furthermore, updates to the dateset no longer wait for ingesting to finish. Ingesting was already running in the background, but it took a lock, preventing updating the dataset and degrading performance for datasets containing many files. (PR #9558)
For installations using MDC (Make Data Count), it is now possible to display both the MDC metrics and the legacy access counts, generated before MDC was enabled.
This is enabled via the new setting :MDCStartDate that specifies the cutoff date. If a dataset has any legacy access counts collected prior to that date, those numbers will be displayed in addition to any MDC numbers recorded since then. (PR #6543)
Changes to PID Provider JVM Settings
In preparation for a future feature to use multiple PID providers at the same time, all JVM settings for PID providers have been enabled to be configured using MicroProfile Config. In the same go, they were renamed to match the name of the provider to be configured.
Please watch your log files for deprecation warnings. Your old settings will be picked up, but you should migrate to the new names to avoid unnecessary log clutter and get prepared for more future changes. An example message looks like this:
[#|2023-03-31T16:55:27.992+0000|WARNING|Payara 5.2022.5|edu.harvard.iq.dataverse.settings.source.AliasConfigSource|_ThreadID=30;_ThreadName=RunLevelControllerThread-1680281704925;_TimeMillis=1680281727992;_LevelValue=900;|
Detected deprecated config option doi.username in use. Please update your config to use dataverse.pid.datacite.username.|#]
Here is a list of the new settings:
- dataverse.pid.datacite.mds-api-url
- dataverse.pid.datacite.rest-api-url
- dataverse.pid.datacite.username
- dataverse.pid.datacite.password
- dataverse.pid.handlenet.key.path
- dataverse.pid.handlenet.key.passphrase
- dataverse.pid.handlenet.index
- dataverse.pid.permalink.base-url
- dataverse.pid.ezid.api-url
- dataverse.pid.ezid.username
- dataverse.pid.ezid.password
See also https://guides.dataverse.org/en/5.14/installation/config.html#persistent-identifiers-and-publishing-datasets (multiple PRs: #8823 #8828)
Signposting for Dataverse
This release adds Signposting support to Dataverse to improve machine discoverability of datasets and files. (PR #8424)
The following MicroProfile Config options are now available (these can be treated as JVM options):
- dataverse.signposting.level1-author-limit
- dataverse.signposting.level1-item-limit
Signposting is described in more detail in a new page in the Admin Guide on discoverability: https://guides.dataverse.org/en/5.14/admin/discoverability.html
Permalinks support
Dataverse now optionally supports PermaLinks, a type of persistent identifier that does not involve a global registry service. PermaLinks are appropriate for Intranet deployment and catalog use cases. (PR #8674)
Creating datasets with incomplete metadata through API
It is now possible to create a dataset with some nominally mandatory metadata fields left unpopulated. For details on the use case that lead to this feature see issue #8822 and PR #8940.
The create dataset API call (POST to /api/dataverses/#dataverseId/datasets) is extended with the "doNotValidate" parameter. However, in order to be able to create a dataset with incomplete metadata, the Solr configuration must be updated first with the new "schema.xml" file (do not forget to run the metadata fields update script when you use custom metadata). Reindexing is optional, but recommended. Also, even when this feature is not used, it is recommended to update the Solr configuration and reindex the metadata. Finally, this new feature can be activated with the "dataverse.api.allow-incomplete-metadata" JVM option.
You can also enable a valid/incomplete metadata filter in the "My Data" page using the "dataverse.ui.show-validity-filter" JVM option. By default, this filter is not shown. When you wish to use this filter, you must reindex the datasets first, otherwise datasets with valid metadata will not be shown in the results.
It is not possible to publish datasets with incomplete or incomplete metadata. By default, you also cannot send such datasets for review. If you wish to enable sending for review of datasets with incomplete metadata, turn on the "dataverse.ui.allow-review-for-incomplete" JVM option.
In order to customize the wording and add translations to the UI sections extended by this feature, you can edit the "Bundle.properties" file and the localized versions of that file. The property keys used by this feature are: - incomplete - valid - dataset.message.incomplete.warning - mydataFragment.validity - dataverses.api.create.dataset.error.mustIncludeAuthorName
Registering PIDs (DOIs or Handles) for files in select collections
It is now possible to configure registering PIDs for files in individual collections.
For example, registration of PIDs for files can be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default. See the :FilePIDsEnabled section of the Configuration guide for details. (PR #9614)
Mechanism Added for Adding External Exporters
It is now possible for third parties to develop and share code to provide new metadata export formats for Dataverse. Export formats can be made available via the Dataverse UI and API or configured for use in Harvesting. Dataverse now provides developers with a separate dataverse-spi JAR file that contains the Java interfaces and classes required to create a new metadata Exporter. Once a new Exporter has been created and packaged as a JAR file, administrators can use it by specifying a local directory for third party Exporters, dropping then Exporter JAR there, and restarting Payara. This mechanism also allows new Exporters to replace any of Dataverse's existing metadata export formats. (PR #9175). See also https://guides.dataverse.org/en/5.14/developers/metadataexport.html
Backward Incompatibilities
Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the reExport command for the metadata exports of existing datasets to be updated.
New JVM/MicroProfile Settings
dataverse.spi.export.directory - specifies a directory, readable by the Dataverse server. Any Exporter JAR files placed in this directory will be read by Dataverse and used to add/replace the specified metadata format.
Contact Email Improvements
Email sent from the contact forms to the contact(s) for a collection, dataset, or datafile can now optionally be cc'd to a support email address. The support email address can be changed from the default :SystemEmail address to a separate :SupportEmail address. When multiple contacts are listed, the system will now send one email to all contacts (with the optional cc if configured) instead of separate emails to each contact. Contact names with a comma that refer to Organizations will no longer have the name parts reversed in the email greeting. A new protected/admin feedback API has been added. (PR #9186) See https://guides.dataverse.org/en/5.14/api/native-api.html#send-feedback-to-contact-s
New JVM/MicroProfile Settings
dataverse.mail.support-email - allows a separate email, distinct from the :SystemEmail to be used as the to address in emails from the contact form/ feedback api. dataverse.mail.cc-support-on-contact-emails - include the support email address as a CC: entry when contact/feedback emails are sent to the contacts for a collection, dataset, or datafile.
Support for Grouping Dataset Files by Folder and Category Tag
Dataverse now supports grouping dataset files by folder and/or optionally by Tag/Category. The default for whether to order by folder can be changed via :OrderByFolder. Ordering by category must be enabled by an administrator via the :CategoryOrder parameter which is used to specify which tags appear first (e.g. to put Documentation files before Data or Code files, etc.) These Group-By options work with the existing sort options, i.e. sorting alphabetically means that files within each folder or tag group will be sorted alphabetically. :AllowUsersToManageOrdering can be set to true to allow users to turn folder ordering and category ordering (if enabled) on or off in the current dataset view. (PR #9204)
New Settings
:CategoryOrder - a comma separated list of Category/Tag names defining the order in which files with those tags should be displayed. The setting can include custom tag names along with the pre-defined defaults ( Documentation, Data, and Code, which can be overridden by the ::FileCategories setting.) :OrderByFolder - defaults to true - whether to group files in the same folder together :AllowUserManagementOfOrder - default false - allow users to toggle ordering on/off in the dataset display
Metadata field Series now repeatable
This enhancement allows depositors to define multiple instances of the metadata field Series in the Citation Metadata block.
Data contained in a dataset may belong to multiple series. Making the field repeatable makes it possible to reflect this fact in the dataset metadata. (PR #9256)
Guides in PDF Format
An experimental version of the guides in PDF format is available at http://preview.guides.gdcc.io/_/downloads/en/develop/pdf/ (PR #9474)
Advice for anyone who wants to help improve the PDF is available at https://guides.dataverse.org/en/5.14/developers/documentation.html#pdf-version-of-the-guides
Datasets API extended
The following APIs have been added: (PR #9592)
/api/datasets/summaryFieldNames/api/datasets/privateUrlDatasetVersion/{privateUrlToken}/api/datasets/privateUrlDatasetVersion/{privateUrlToken}/citation/api/datasets/{datasetId}/versions/{version}/citation
Extra fields included in the JSON metadata
The following fields are now available in the native JSON output:
alternativePersistentIdpublicationDatecitationDate
(PR #9657)
Files downloaded from Binder are now in their original format.
For example, data.dta (a Stata file) will be downloaded instead of data.tab (the archival version Dataverse creates as part of a successful ingest). (PR #9483)
This should make it easier to write code to reproduce results as the dataset authors and subsequent researchers are likely operating on the original file format rather that the format that Dataverse creates.
For details, see #9374, https://github.com/jupyterhub/repo2docker/issues/1242, and https://github.com/jupyterhub/repo2docker/pull/1253.
Handling of license information fixed in the API
(PR #9568)
When publishing a dataset via API, it now requires the dataset to either have a standard license configured, or have valid Custom Terms of Use (if allowed by the instance). Attempting to publish a dataset without such will fail with an error message. This introduces a backward incompatibility, and if you have scripts that automatically create, update and publish datasets, this last step may start failing. Because, unfortunately, there were some problems with the datasets APIs that made it difficult to manage licenses, so an API user was likely to end up with a dataset missing either of the above. In this release we have addressed it by making the following fixes:
We fixed the incompatibility between the format in which license information was exported in json, and the format the create and update APIs were expecting it for import (https://github.com/IQSS/dataverse/issues/9155). This means that the following json format can now be imported:
"license": {
"name": "CC0 1.0",
"uri": "http://creativecommons.org/publicdomain/zero/1.0"
}
However, for the sake of backward compatibility the old format
"license" : "CC0 1.0"
will be accepted as well.
We have added the default license (CC0) to the model json file that we provide and recommend to use as the model in the Native API Guide (https://github.com/IQSS/dataverse/issues/9364).
And we have corrected the misleading language in the same guide where we used to recommend to users that they select, edit and re-import only the .metadataBlocks fragment of the json metadata representing the latest version. There are in fact other useful pieces of information that need to be preserved in the update (such as the "license" section above). So the recommended way of creating base json for updates via the API is to select everything but the "files" section, with (for example) the following jq command:
jq '.data | del(.files)'
Please see the Update Metadata For a Dataset section of our Native Api guide for more information.
New External Tool Type and Implementation
With this release a new experimental external tool type has been added to the Dataverse Software. The tool type is "query" and its first implementation is an experimental tool named Ask the Data which allows users to ask natural language queries of tabular files in Dataverse. More information is available in the External Tools section of the guides. (PR #9737) See https://guides.dataverse.org/en/5.14/admin/external-tools.html#file-level-query-tools
Default Value for File PIDs registration has changed
The default for whether PIDs are registered for files or not is now false.
Installations where file PIDs were enabled by default will have to add the :FilePIDsEnabled = true setting to maintain the existing functionality.
See Step 9 of the upgrade instructions:
If your installation did not have :FilePIDsEnabled set, you will need to set it to true to keep file PIDs enabled:
curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:FilePIDsEnabled
It is now possible to allow File PIDs to be enabled/disabled per collection. See the :AllowEnablingFilePIDsPerCollection section of the Configuration guide for details.
For example, registration of PIDs for files can now be enabled in a specific collection when it is disabled instance-wide. Or it can be disabled in specific collections where it is enabled by default.
Changes and fixes in this release not already mentioned above include:
- An endpoint for deleting a file has been added to the native API: https://guides.dataverse.org/en/5.14/api/native-api.html#deleting-files (PR #9383)
- A date column has been added to the restricted file access request overview, indicating when the earliest request by that user was made. An issue was fixed where where the request list was not updated when a request was approved or rejected. (PR #9257)
- Changes made in v5.13 and v5.14 in multiple PRs to improve the embedded Schema.org metadata in dataset pages will only be propagated to the Schema.Org JSON-LD metadata export if a reExportAll() is done. (PR #9102)
- It is now possible to write external vocabulary scripts that target a single child field in a metadata block. Example scripts are now available at https://github.com/gdcc/dataverse-external-vocab-support that can be configured to support lookup from the Research Orgnaization Registry (ROR) for the Author Affiliation Field and for the CrossRef Funding Registry (Fundreg) in the Funding Information/Agency field, both in the standard Citation metadata block. Application if these scripts to other fields, and the development of other scripts targetting child fields are now possible (PR #9402)
- Dataverse now supports requiring a secret key to add or edit metadata in specified "system" metadata blocks. Changing the metadata in such system metadata blocks is not allowed without the key and is currently only allowed via API. (PR #9388)
- An attempt will be made to extract a geospatial bounding box (west, south, east, north) from NetCDF and HDF5 files and then insert these values into the geospatial metadata block, if enabled. (#9541) See https://guides.dataverse.org/en/5.14/user/dataset-management.html#geospatial-bounding-box
- A file previewer called H5Web is now available for exploring and visualizing NetCDF and HDF5 files. (PR #9600) See https://guides.dataverse.org/en/5.14/user/dataset-management.html#h5web-previewer
- Two file previewers for GeoTIFF and Shapefiles are now available for visualizing geotiff image files and zipped Shapefiles on a map. See https://github.com/gdcc/dataverse-previewers
- New alternative to setup the Dataverse dependencies for the development environment through Docker Compose. (PR #9417)
- New alternative, explained in the documentation, to build the Sphinx guides through a Docker container. (PR #9417)
- A container has been added called "configbaker" that configures Dataverse while running in containers. This allows developers to spin up Dataverse with a single command. (PR #9574)
- Direct upload via the Dataverse UI will now support any algorithm configured via the
:FileFixityChecksumAlgorithmsetting. External apps using the direct upload API can now query Dataverse to discover which algorithm should be used. Sites that have been using an algorithm other than MD5 and direct upload and/or dvwebloader may want to use the/api/admin/updateHashValuescall (see https://guides.dataverse.org/en/5.14/installation/config.html?highlight=updatehashvalues#filefixitychecksumalgorithm) to replace any MD5 hashes on existing files. (PR #9482) - The OAI_ORE metadata export (and hence the archival Bag for a dataset) now includes information about file embargoes. (PR #9698)
- DatasetFieldType attribute "displayFormat", is now returned by the API. (PR #9668)
- An API named "MyData" has been available for years but is newly documented. It is used to get a list of the objects (datasets, collections or datafiles) that an authenticated user can modify. (PR #9596)
- A Go client library for Dataverse APIs is now available. See https://guides.dataverse.org/en/5.14/api/client-libraries.html
- A feature flag called "api-session-auth" has been added temporarily to aid in the development of the new frontend (#9063) but will be removed once bearer tokens (#9229) have been implemented. There is a security risk (CSRF) in enabling this flag! Do not use it in production! For more information, see https://guides.dataverse.org/en/5.14/installation/config.html#feature-flags
- A feature flag called "api-bearer-auth" has been added. This allows OIDC useraccounts to send authenticated API requests using Bearer Tokens. Note: This feature is limited to OIDC! For more information, see https://guides.dataverse.org/en/5.14/installation/config.html#feature-flags (PR #9591)
Complete List of Changes
For the complete list of code changes in this release, see the 5.14 milestone on GitHub.
- Java
Published by kcondon almost 3 years ago
dataverse - v5.13
Dataverse Software 5.13
This release brings new features, enhancements, and bug fixes to the Dataverse software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Schema.org Improvements (Some Backward Incompatibility)
The Schema.org metadata used as an export format and also embedded in dataset pages has been updated to improve compliance with Schema.org's schema and Google's recommendations for Google Dataset Search.
Please be advised that these improvements have the chance to break integrations that rely on the old, less compliant structure. For details see the "backward incompatibility" section below. (Issue #7349)
Folder Uploads via Web UI (dvwebloader, S3 only)
For installations using S3 for storage and with direct upload enabled, a new tool called DVWebloader can be enabled that allows web users to upload a folder with a hierarchy of files and subfolders while retaining the relative paths of files (similarly to how the DVUploader tool does it on the command line, but with the convenience of using the browser UI). See Folder Upload in the User Guide for details. (PR #9096)
Long Descriptions of Collections (Dataverses) are Now Truncated
Like datasets, long descriptions of collections (dataverses) are now truncated by default but can be expanded with a "read full description" button. (PR #9222)
License Sorting
Licenses as shown in the dropdown in UI can be now sorted by the superusers. See Sorting Licenses section of the Installation Guide for details. (PR #8697)
Metadata Field Production Location Now Repeatable, Facetable, and Enabled for Advanced Search
Depositors can now click the plus sign to enter multiple instances of the metadata field "Production Location" in the citation metadata block. Additionally this field now appears on the Advanced Search page and can be added to the list of search facets. (PR #9254)
Support for NetCDF and HDF5 Files
NetCDF and HDF5 files are now detected based on their content rather than just their file extension. Both "classic" NetCDF 3 files and more modern NetCDF 4 files are detected based on content. Detection for older HDF4 files is only done through the file extension ".hdf", as before.
For NetCDF and HDF5 files, an attempt will be made to extract metadata in NcML (XML) format and save it as an auxiliary file. There is a new NcML previewer available in the dataverse-previewers repo.
An extractNcml API endpoint has been added, especially for installations with existing NetCDF and HDF5 files. After upgrading, they can iterate through these files and try to extract an NcML file.
See the NetCDF and HDF5 section of the User Guide for details. (PR #9239)
Support for .eln Files (Electronic Laboratory Notebooks)
The .eln file format is used by Electronic Laboratory Notebooks as an exchange format for experimental protocols, results, sample descriptions, etc...
Improved Security for External Tools
External tools can now be configured to use signed URLs to access the Dataverse API as an alternative to API tokens. This eliminates the need for tools to have access to the user's API token in order to access draft or restricted datasets and datafiles. Signed URLs can be transferred via POST or via a callback when triggering a tool via GET. See Authorization Options in the External Tools documentation for details. (PR #9001)
Geospatial Search (API Only)
Geospatial search is supported via the Search API using two new parameters: geo_point and geo_radius.
The fields that are geospatially indexed are "West Longitude", "East Longitude", "North Latitude", and "South Latitude" from the "Geographic Bounding Box" field in the geospatial metadata block. (PR #8239)
Reproducibility and Code Execution with Binder
Binder has been added to the list of external tools that can be added to a Dataverse installation. From the dataset page, you can launch Binder, which spins up a computational environment in which you can explore the code and data in the dataset, or write new code, such as a Jupyter notebook. (PR #9341)
CodeMeta (Software) Metadata Support (Experimental)
Experimental support for research software metadata deposits has been added.
By adding a metadata block for CodeMeta, we take another step toward adding first class support of diverse FAIR objects, such as research software and computational workflows.
There is more work underway to make Dataverse installations around the world "research software ready."
Note: Like the metadata block for computational workflows before, CodeMeta is listed under Experimental Metadata in the guides. Experimental means it's brand new, opt-in, and might need future tweaking based on experience of usage in the field. We hope for feedback from installations on the new metadata block to optimize and lift it from the experimental stage. (PR #7877)
Mechanism Added for Stopping a Harvest in Progress
It is now possible for a sysadmin to stop a long-running harvesting job. See Harvesting Clients in the Admin Guide for more information. (PR #9187)
API Endpoint Listing Metadata Block Details has been Extended
The API endpoint /api/metadatablocks/{block_id} has been extended to include the following fields:
controlledVocabularyValues- All possible values for fields with a controlled vocabulary. For example, the values "Agricultural Sciences", "Arts and Humanities", etc. for the "Subject" field.isControlledVocabulary: Whether or not this field has a controlled vocabulary.multiple: Whether or not the field supports multiple values.
See Metadata Blocks in the API Guide for details. (PR #9213)
Advanced Database Settings
You can now enable advanced database connection pool configurations useful for debugging and monitoring as well as other settings. Of particular interest may be sslmode=require, though installations already setting this parameter in the Postgres connection string will need to move it to dataverse.db.parameters. See the new Database Persistence section of the Installation Guide for details. (PR #8915)
Support for Cleaning up Leftover Files in Dataset Storage
Experimental feature: the leftover files stored in the Dataset storage location that are not in the file list of that Dataset, but are named following the Dataverse technical convention for dataset files, can be removed with the new Cleanup Storage of a Dataset API endpoint.
OAI Server Bug Fixed
A bug introduced in 5.12 was preventing the Dataverse OAI server from serving incremental harvesting requests from clients. It was fixed in this release (PR #9316).
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release not already mentioned above include:
- Administrators can configure an alternative storage location where files uploaded via the UI are temporarily stored during the transfer from client to server. (PR #8983, See also Configuration Guide)
- To improve performance, Dataverse estimates download counts. This release includes an update that makes the estimate more accurate. (PR #8972)
- Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files). (PR #9018)
- A persistent identifier, CSRT, is added to the Related Publication field's ID Type child field. For datasets published with CSRT IDs, Dataverse will also include them in the datasets' Schema.org metadata exports. (Issue #8838)
- Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections.
New JVM Options and MicroProfile Config Options
The following JVM option is now available:
dataverse.personOrOrg.assumeCommaInPersonName- the default is false
The following MicroProfile Config options are now available (these can be treated as JVM options):
dataverse.files.uploads- alternative storage location of generated temporary files for UI file uploadsdataverse.api.signing-secret- used by signed URLsdataverse.solr.hostdataverse.solr.portdataverse.solr.protocoldataverse.solr.coredataverse.solr.pathdataverse.rserve.host
The following existing JVM options are now available via MicroProfile Config:
dataverse.siteUrldataverse.fqdndataverse.files.directorydataverse.rserve.hostdataverse.rserve.portdataverse.rserve.userdataverse.rserve.passworddataverse.rserve.tempdir
Notes for Developers and Integrators
See the "Backward Incompatibilities" section below.
Backward Incompatibilities
Schema.org
The following changes have been made to Schema.org exports (necessary for the improvements mentioned above):
- Descriptions are now joined and truncated to less than 5K characters.
- The "citation"/"text" key has been replaced by a "citation"/"name" key.
- File entries now have the mimetype reported as 'encodingFormat' rather than 'fileFormat' to better conform with the Schema.org specification for DataDownload entries. Download URLs are now sent for all files unless the dataverse.files.hide-schema-dot-org-download-urls setting is set to true.
- Author/creators now have an
@typeof Person or Organization and any affiliation (affiliation for Person, parentOrganization for Organization) is now an object of@typeOrganization
License Files
License files are now required to contain the new "sortOrder" column. When attempting to create a new license without this field, an error would be returned. See Configuring Licenses section of the Installation Guide for reference.
Complete List of Changes
For the complete list of code changes in this release, see the 5.13 milestone on GitHub.
Installation
If this is a new installation, please see our Installation Guide. Please don't be shy about asking for help if you need it!
After your installation has gone into production, you are welcome to add it to our map of installations by opening an issue in the dataverse-installations repo.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from version 4.x to 5.0 of the Dataverse software following the instructions in the release notes for version 5.0. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.13.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.13.war
5. Restart Payara
service payara stopservice payara start
6. Reload citation metadata block
wget https://github.com/IQSS/dataverse/releases/download/v5.13/citation.tsvcurl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
If you are running an English-only installation, you are finished with the citation block. Otherwise, download the updated citation.properties file and place in the dataverse.lang.directory.
wget https://github.com/IQSS/dataverse/releases/download/v5.13/citation.propertiescp citation.properties /home/dataverse/langBundles
7. Replace Solr schema.xml to allow multiple production locations and support for geospatial indexing to be used. See specific instructions below for those installations without custom metadata blocks (1a) and those with custom metadata blocks (1b).
Note: with this release support for indexing of the experimental workflow metadata block has been removed from the standard schema.xml. If you are using the workflow metadata block be sure to follow the instructions in step 7b) below to maintain support for indexing workflow metadata.
7a. For installations without custom or experimental metadata blocks:
Stop Solr instance (usually service solr stop, depending on Solr installation/OS, see the Installation Guide
Replace schema.xml
cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.11.1/server/solr/collection1/conf
Start solr instance (usually service solr start, depending on Solr/OS)
7b. For installations with custom or experimental metadata blocks:
Stop solr instance (usually service solr stop, depending on solr installation/OS, see the Installation Guide
Edit the following line to your schema.xml (to indicate that productionPlace is now multiValued='true"):
<field name="productionPlace" type="string" stored="true" indexed="true" multiValued="true"/>Add the following lines to your schema.xml to add support for geospatial indexing:
<!-- Dataverse geospatial search --><!-- https://solr.apache.org/guide/8_11/spatial-search.html#rpt --><field name="geolocation" type="location_rpt" multiValued="true" stored="true" indexed="true"/><!-- https://solr.apache.org/guide/8_11/spatial-search.html#bboxfield --><field name="boundingBox" type="bbox" multiValued="true" stored="true" indexed="true"/><!-- Dataverse - per GeoBlacklight, adding field type for bboxField that enables, among other things, overlap ratio calculations --><fieldType name="bbox" class="solr.BBoxField" geo="true" distanceUnits="kilometers" numberType="pdouble" />Restart Solr instance (usually service solr start, depending on solr/OS)
Optional Upgrade Step: Reindex Linked Dataverse Collections
Datasets that are part of linked dataverse collections will now be displayed in their linking dataverse collections. In order to fix the display of collections that have already been linked you must re-index the linked collections. This query will provide a list of commands to re-index the effected collections:
select 'curl http://localhost:8080/api/admin/index/dataverses/'
|| tmp.dvid from (select distinct dataverse_id as dvid
from dataverselinkingdataverse) as tmp
The result of the query will be a list of re-index commands such as:
curl http://localhost:8080/api/admin/index/dataverses/633
where '633' is the id of the linked collection.
Optional Upgrade Step: Run File Detection on .eln Files
Now that .eln files are recognized, you can run the Redetect File Type API on them to switch them from "unknown" to "ELN Archive". Afterward, you can reindex these files to make them appear in search facets.
- Java
Published by kcondon over 3 years ago
dataverse - v5.12.1
Dataverse Software 5.12.1
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Bug Fix for "Internal Server Error" When Creating a New Remote Account
Unfortunately, as of 5.11 new remote users have seen "Internal Server Error" when creating an account (or checking notifications just after creating an account). Remote users are those who log in with institutional (Shibboleth), OAuth (ORCID, GitHub, or Google) or OIDC providers.
This is a transient error that can be worked around by reloading the browser (or logging out and back in again) but it's obviously a very poor user experience and a bad first impression. This bug is the primary reason we are putting out this patch release. Other features and bug fixes are coming along for the ride.
Ability to Disable OAuth Sign Up While Allowing Existing Accounts to Log In
A new option called :AllowRemoteAuthSignUp has been added providing a mechanism for disabling new account signups for specific OAuth2 authentication providers (Orcid, GitHub, Google etc.) while still allowing logins for already-existing accounts using this authentication method.
See the Installation Guide for more information on the setting.
Production Date Now Used for Harvested Datasets in Addition to Distribution Date (oai_dc format)
Fix the year displayed in citation for harvested dataset, especially for oai_dc format.
For normal datasets, the date used is the "citation date" which is by default the publication date (the first release date) unless you change it.
However, for a harvested dataset, the distribution date was used instead and this date is not always present in the harvested metadata.
Now, the production date is used for harvested dataset in addition to distribution date when harvesting with the oai_dc format.
Publication Date Now Used for Harvested Dataset if Production Date is Not Set (oai_dc format)
For exports and harvesting in oai_dc format, if "Production Date" is not set, "Publication Date" is now used instead. This change is reflected in the Dataverse 4+ Metadata Crosswalk linked from the Appendix of the User Guide.
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release include:
- Users creating an account by logging in with Shibboleth, OAuth, or OIDC should not see errors. (Issue 9029, PR #9030)
- When harvesting datasets, I want the Production Date if I can't get the Distribution Date (PR #8732)
- When harvesting datasets, I want the Publication Date if I can't get the Production Date (PR #8733)
- As a sysadmin I'd like to disable (temporarily or permanently) sign ups from OAuth providers while allowing existing users to continue to log in from that provider (PR #9112)
- As a C/C++ developer I want to use Dataverse APIs (PR #9070)
New DB Settings
The following DB settings have been added:
:AllowRemoteAuthSignUp
See the Database Settings section of the Guides for more information.
Complete List of Changes
For the complete list of code changes in this release, see the 5.12.1 Milestone in GitHub.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.12.1.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
shell
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version
shell
$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara
shell
service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated
6. Start Payara
shell
service payara start
7. Deploy this version.
shell
$PAYARA/bin/asadmin deploy dataverse-5.12.1.war
8. Restart payara
shell
service payara stop
service payara start
Upcoming Versions of Payara
With the recent release of Payara 6 (Payara 6.2022.1 being the first version), the days of free-to-use Payara 5.x Platform Community versions are numbered. Specifically, Payara's blog post says, "Payara Platform Community 5.2022.4 has been released today as the penultimate Payara 5 Community release."
Given the end of free-to-use Payara 5 versions, we plan to get the Dataverse software working on Payara 6 (#8305), which will require substantial efforts from the IQSS team and community members, as this also means shifting our app to be a Jakarta EE 10 application (upgrading from EE 8). We are currently working out the details and will share news as soon as we can. Rest assured we will do our best to provide you with a smooth transition. You can follow along in Issue #8305 and related pull requests and you are, of course, very welcome to participate by testing and otherwise contributing, as always.
- Java
Published by pdurbin over 3 years ago
dataverse - v5.12
Dataverse Software 5.12
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Support for Globus
Globus can be used to transfer large files. Part of "Harvard Data Commons Additions" below.
Support for Remote File Storage
Dataset files can be stored at remote URLs. Part of "Harvard Data Commons Additions" below.
New Computational Workflow Metadata Block
The new Computational Workflow metadata block will allow depositors to effectively tag datasets as computational workflows.
To add the new metadata block, follow the instructions in the Admin Guide: https://guides.dataverse.org/en/5.12/admin/metadatacustomization.html
The location of the new metadata block tsv file is scripts/api/data/metadatablocks/computational_workflow.tsv. Part of "Harvard Data Commons Additions" below.
Support for Linked Data Notifications (LDN)
Linked Data Notifications (LDN) is a standard from the W3C. Part of "Harvard Data Commons Additions" below.
Harvard Data Commons Additions
As reported at the 2022 Dataverse Community Meeting, the Harvard Data Commons project has supported a wide range of additions to the Dataverse software that improve support for Big Data, Workflows, Archiving, and interaction with other repositories. In many cases, these additions build upon features developed within the Dataverse community by Borealis, DANS, QDR, TDL, and others. Highlights from this work include:
- Initial support for Globus file transfer to upload to and download from a Dataverse managed S3 store. The current implementation disables file restriction and embargo on Globus-enabled stores.
- Initial support for Remote File Storage. This capability, enabled via a new RemoteOverlay store type, allows a file stored in a remote system to be added to a dataset (currently only via API) with download requests redirected to the remote system. Use cases include referencing public files hosted on external web servers as well as support for controlled access managed by Dataverse (e.g. via restricted and embargoed status) and/or by the remote store.
- Initial support for computational workflows, including a new metadata block and detected filetypes.
- Support for archiving to any S3 store using Dataverse's RDA-conformant BagIT file format (a BagPack).
- Improved error handling and performance in archival bag creation and new options such as only supporting archiving of one dataset version.
- Additions/corrections to the OAI-ORE metadata format (which is included in archival bags) such as referencing the name/mimetype/size/checksum/download URL of the original file for ingested files, the inclusion of metadata about the parent collection(s) of an archived dataset version, and use of the URL form of PIDs.
- Display of archival status within the dataset page versions table, richer status options including success, pending, and failure states, with a complete API for managing archival status.
- Support for batch archiving via API as an alternative to the current options of configuring archiving upon publication or archiving each dataset version manually.
- Initial support for sending and receiving Linked Data Notification messages indicating relationships between a dataset and external resources (e.g. papers or other dataset) that can be used to trigger additional actions, such as the creation of a back-link to provide, for example, bi-directional linking between a published paper and a Dataverse dataset.
- A new capability to provide custom per field instructions in dataset templates
- The following file extensions are now detected:
- wdl=text/x-workflow-description-language
- cwl=text/x-computational-workflow-language
- nf=text/x-nextflow
- Rmd=text/x-r-notebook
- rb=text/x-ruby-script
- dag=text/x-dagman
Improvements to Fields that Appear in the Citation Metadata Block
Grammar, style and consistency improvements have been made to the titles, tooltip description text, and watermarks of metadata fields that appear in the Citation metadata block.
This includes fields that dataset depositors can edit in the Citation Metadata accordion (i.e. fields controlled by the citation.tsv and citation.properties files) and fields whose values are system-generated, such as the Dataset Persistent ID, Previous Dataset Persistent ID, and Publication Date fields whose titles and tooltips are configured in the bundles.properties file.
The changes should provide clearer information to curators, depositors, and people looking for data about what the fields are for.
A new page in the Style Guides called "Text" has also been added. The new page includes a section called "Metadata Text Guidelines" with a link to a Google Doc where the guidelines are being maintained for now since we expect them to be revised frequently.
New Static Search Facet: Metadata Types
A new static search facet has been added to the search side panel. This new facet is called "Metadata Types" and is driven from metadata blocks. When a metadata field value is inserted into a dataset, an entry for the metadata block it belongs to is added to this new facet.
This new facet needs to be configured for it to appear on the search side panel. The configuration assigns to a dataverse what metadata blocks to show. The configuration is inherited by child dataverses.
To configure the new facet, use the Metadata Block Facet API: https://guides.dataverse.org/en/5.12/api/native-api.html#set-metadata-block-facet-for-a-dataverse-collection
Broader MicroProfile Config Support for Developers
As of this release, many JVM options can be set using any MicroProfile Config Source.
Currently this change is only relevant to developers but as settings are migrated to the new "lookup" pattern documented in the Consuming Configuration section of the Developer Guide, anyone installing the Dataverse software will have much greater flexibility when configuring those settings, especially within containers. These changes will be announced in future releases.
Please note that an upgrade to Payara 5.2021.8 or higher is required to make use of this. Payara 5.2021.5 threw exceptions, as explained in PR #8823.
HTTP Range Requests: New HTTP Status Codes and Headers for Datafile Access API
The Basic File Access resource for datafiles (/api/access/datafile/$id) was slightly modified in order to comply better with the HTTP specification for range requests.
If the request contains a "Range" header: * The returned HTTP status is now 206 (Partial Content) instead of 200 * A "Content-Range" header is returned containing information about the returned bytes * An "Accept-Ranges" header with value "bytes" is returned
CORS rules/headers were modified accordingly: * The "Range" header is added to "Access-Control-Allow-Headers" * The "Content-Range" and "Accept-Ranges" header are added to "Access-Control-Expose-Headers"
This new functionality has enabled a Zip Previewer and file extractor for zip files, an external tool.
File Type Detection When File Has No Extension
File types are now detected based on the filename when the file has no extension.
The following filenames are now detected:
- Makefile=text/x-makefile
- Snakemake=text/x-snakemake
- Dockerfile=application/x-docker-file
- Vagrantfile=application/x-vagrant-file
These are defined in MimeTypeDetectionByFileName.properties.
Upgrade to Payara 5.2022.3 Highly Recommended
With lots of bug and security fixes included, we encourage everyone to upgrade to Payara 5.2022.3 as soon as possible. See below for details.
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release include:
- Administrators can configure an S3 store used in Dataverse to support users uploading/downloading files via Globus File Transfer. (PR #8891)
- Administrators can configure a RemoteOverlay store to allow files that remain hosted by a remote system to be added to a dataset. (PR #7325)
- Administrators can configure the Dataverse software to send archival Bag copies of published dataset versions to any S3-compatible service. (PR #8751)
- Users can see information about a dataset's parent collection(s) in the OAI-ORE metadata export. (PR #8770)
- Users and administrators can now use the OAI-ORE metadata export to retrieve and assess the fixity of the original file (for ingested tabular files) via the included checksum. (PR #8901)
- Archiving via RDA-conformant Bags is more robust and is more configurable. (PR #8773, #8747, #8699, #8609, #8606, #8610)
- Users and administrators can see the archival status of the versions of the datasets they manage in the dataset page version table. (PR #8748, #8696)
- Administrators can configure messaging between their Dataverse installation and other repositories that may hold related resources or services interested in activity within that installation. (PR #8775)
- Collection managers can create templates that include custom instructions on how to fill out specific metadata fields.
- Dataset update API users are given more information when the dataset they are updating is out of compliance with Terms of Access requirements (Issue #8859)
- Adds a new setting (:ControlledVocabularyCustomJavaScript) that allows a JavaScript file to be loaded into the dataset page for the purpose of showing controlled vocabulary as a list (Issue #8722)
- Fixes an issue with the Redetect File Type API (Issue #7527)
- Terms of Use is now imported when using DDI format through harvesting or the native API. (Issue #8715, PR #8743)
- Optimizes some code to improve application memory usage (Issue #8871)
- Fixes sample data to reflect custom licenses.
- Fixes the Archival Status Input API (available to superusers) (Issue #8924)
- Small bugs have been fixed in the dataset export in the JSON and DDI formats; eliminating the export of "undefined" as a metadata language in the former, and a duplicate keyword tag in the latter. (Issue #8868)
New DB Settings
The following DB settings have been added:
- :ShibAffiliationOrder - Select the first or last entry in an Affiliation array
- :ShibAffiliationSeparator (default: ";") - Set the separator for the Affiliation array
- :LDNMessageHosts
- :GlobusBasicToken
- :GlobusEndpoint
- :GlobusStores
- :GlobusAppUrl
- :GlobusPollingInterval
- :GlobusSingleFileTransfer
- :S3ArchiverConfig
- :S3ArchiverProfile
- :DRSArchiverConfig
- :ControlledVocabularyCustomJavaScript
See the Database Settings section of the Guides for more information.
Notes for Dataverse Installation Administrators
Enabling Experimental Capabilities
Several of the capabilities introduced in v5.12 are "experimental" in the sense that further changes and enhancements to these capabilities should be expected and that these changes may involve additional work, for those who use the initial implementations, when upgrading to newer versions of the Dataverse software. Administrators wishing to use them are encouraged to stay in touch, e.g. via the Dataverse Community Slack space, to understand the limits of current capabilities and to plan for future upgrades.
Notes for Developers and Integrators
See the "Backward Incompatibilities" section below.
Backward Incompatibilities
OAI-ORE and Archiving Changes
The Admin API call to manually submit a dataset version for archiving has changed to require POST instead of GET and to have a name making it clearer that archiving is being done for a given dataset version: /api/admin/submitDatasetVersionToArchive.
Earlier versions of the archival bags included the ingested (tab-separated-value) version of tabular files while providing the checksum of the original file (Issue #8449). This release fixes that by including the original file and its metadata in the archival bag. This means that archival bags created prior to this version do not include a way to validate ingested files. Further, it is likely that capabilities in development (i.e. as part of the Dataverse Uploader to allow re-creation of a dataset version from an archival bag will only be fully compatible with archival bags generated by a Dataverse instance at a release > v5.12. (Specifically, at a minimum, since only the ingested file is included in earlier archival bags, an upload via DVUploader would not result in the same original file/ingested version as in the original dataset.) Administrators should be aware that re-creating archival bags, i.e. via the new batch archiving API, may be advisable now and will be recommended at some point in the future (i.e. there will be a point where we will start versioning archival bags and will start maintaining backward compatibility for older versions as part of transitioning this from being an experimental capability).
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
Instructions for Upgrading to Payara 5.2022.3
Note: with the approaching EOL for the Payara 5 Community release train it's likely we will switch to a yet-to-be-released Payara 6 in the not-so-far-away future.
We recommend you ensure you followed all update instructions from the past releases regarding Payara. (latest Payara update was for v5.6)
Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.
The steps below are a simple matter of reusing your existing domain directory with the new distribution. But we also recommend that you review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara Release Notes
Please note that the deletion of the lib/databases directory below is only required once, for this upgrade (see Issue #8230 for details).
shell
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version
shell
$PAYARA/bin/asadmin list-applications
$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara
shell
service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated
rm -rf $PAYARA/glassfish/domains/domain1/osgi-cache
rm -rf $PAYARA/glassfish/domains/domain1/lib/databases
3. Move the current Payara directory out of the way
shell
mv $PAYARA $PAYARA.MOVED
4. Download the new Payara version (5.2022.3), and unzip it in its place
5. Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1
6. Start Payara
shell
service payara start
7. Deploy this version.
shell
$PAYARA/bin/asadmin deploy dataverse-5.12.war
8. Restart payara
shell
service payara stop
service payara start
Additional Upgrade Steps
Update the Citation metadata block:
wget https://github.com/IQSS/dataverse/releases/download/v5.12/citation.tsvcurl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"Run ReExportAll to update metadata files (OAI_ORE, JSON and DDI formats are affected by the changes and bug fixes in this release; PRs #8770 and #8868). Optionally, for those using the Dataverse software's BagIt-based archiving, re-archive dataset versions archived using prior versions of the Dataverse software. This will be recommended/required in a future release.
- Java
Published by pdurbin over 3 years ago
dataverse - v5.11.1
Dataverse Software 5.11.1
This is a bug fix release of the Dataverse Software. The .war file for v5.11 will no longer be made available and installations should upgrade directly from v5.10.1 to v5.11.1. To do so you will need to follow the instructions for installing release 5.11 using the v5.11.1 war file. (Note specifically the upgrade steps 6-9 from the 5.11 release note; most importantly, the ones related to the citation block and the Solr schema). If you had previously installed v5.11 (no longer available), follow the simplified instructions below.
Release Highlights
Dataverse Software 5.11 contains two critical issues that are fixed in this release.
First, if you delete a file from a published version of a dataset, the file will be deleted from the file system (or S3) and lose its "owner id" in the database. For details, see Issue #8867.
Second, if you are a superuser, it's possible to click "Delete Draft" and delete a published dataset if it has restricted files. For details, see #8845 and #8742.
Notes for Dataverse Installation Administrators
Identifying Datasets with Deleted Files
If you have been running 5.11, check if any files show "null" for the owner id. The "owner" of a file is the parent dataset:
select * from dvobject where dtype = 'DataFile' and owner_id is null;
For any of these files, change the owner id to the database id of the parent dataset. In addition, the file on disk (or in S3) is likely gone. Look at the "storageidentifier" field from the query above to determine the location of the file then restore the file from backup.
Identifying Datasets Superusers May Have Accidentally Destroyed
Check the "actionlogrecord" table for DestroyDatasetCommand. While these "destroy" entries are normal when a superuser uses the API to destroy datasets, an entry is also created if a superuser has accidentally deleted a published dataset in the web interface with the "Delete Draft" button.
Complete List of Changes
For the complete list of code changes in this release, see the 5.11.1 Milestone in GitHub.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.11.1. To upgrade from 5.10.1, follow the instructions for installing release 5.11 using the v5.11.1 war file. If you had previously installed v5.11 (no longer available), follow the simplified instructions below.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.11.1.war
5. Restart Payara
service payara stopservice payara start
- Java
Published by kcondon almost 4 years ago
dataverse - v5.11
Dataverse Software 5.11
Please note: We have removed the 5.11 war file and dvinstall.zip because there are very serious bugs in the 5.11 release. For the upgrade instructions below, please use the 5.11.1 war file instead. New installations should start with 5.11.1. The bugs are explained in the 5.11.1 release notes.
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Terms of Access or Request Access Required for Restricted Files
Beginning in this release, datasets with restricted files must have either Terms of Access or Request Access enabled. This change is to ensure that for each file in a Dataverse installation there is a clear path to get to the data, either through requesting access to the data or to provide context about why requesting access is not enabled.
Published datasets are not affected by this change. Datasets that are in draft and that have neither Terms of Access nor Request Access enabled must be updated to select one or the other (or both). Otherwise, datasets cannot be futher edited or published. Dataset authors will be able to tell if their dataset is affected by the presence of the following message at the top of their dataset (when they are logged in):
"Datasets with restricted files are required to have Request Access enabled or Terms of Access to help people access the data. Please edit the dataset to confirm Request Access or provide Terms of Access to be in compliance with the policy."
At this point, authors should click "Edit Dataset" then "Terms" and then check the box for "Request Access" or fill in "Terms of Access for Restricted Files" (or both). Afterwards, authors will be able to further edit metadata and publish.
In the "Notes for Dataverse Installation Administrators" section, we have provided a query to help proactively identify datasets that need to be updated.
See also Issue #8191 and PR #8308.
Muting Notifications
Users can control which notifications they receive if the system is configured to allow this. See also Issue #7492 and PR #8530.
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release include:
- Terms of Access or Request Access required for restricted files. (Issue #8191, PR #8308)
- Users can control which notifications they receive if the system is configured to allow this. (Issue #7492, PR #8530)
- A 500 error was occuring when creating a dataset if a template did not have an associated "termsofuseandaccess". See "Legacy Templates Issue" below for details. (Issue #8599, PR #8789)
- Tabular ingest can be skipped via API. (Issue #8525, PR #8532)
- The "Verify Email" button has been changed to "Send Verification Email" and rather than sometimes showing a popup now always sends a fresh verification email (and invalidates previous verification emails). (Issue #8227, PR #8579)
- For Shibboleth users, the
emailconfirmedtimestamp is now set on login and the UI should show "Verified". (Issue #5663, PR #8579) - Information about the license selection (or custom terms) is now available in the confirmation popup when contributors click "Submit for Review". Previously, this was only available in the confirmation popup for the "Publish" button, which contributors do not see. (Issue #8561, PR #8691)
- For installations configured to support multiple languages, controlled vocabulary fields that do not allow multiple entries (e.g. journalArticleType) are now indexed properly. (Issue #8595, PR #8601, PR #8624)
- Two-letter ISO-639-1 codes for languages are now supported, in metadata imports and harvesting. (Issue #8139, PR #8689)
- The API endpoint for listing notifications has been enhanced to show the subject, text, and timestamp of notifications. (Issue #8487, PR #8530)
- The API Guide has been updated to explain that the
Content-typeheader is now (as of Dataverse 5.6) necessary to create datasets via native API. (Issue #8663, PR #8676) - Admin API endpoints have been added to find and delete dataset templates. (Issue 8600, PR #8706)
- The BagIt file handler detects and transforms zip files with a BagIt package format into Dataverse data files, validating checksums along the way. See the BagIt File Handler section of the Installation Guide for details. (Issue #8608, PR #8677)
- For BagIt Export, the number of threads used when zipping data files into an archival bag is now configurable using the
:BagGeneratorThreadsdatabase setting. (Issue #8602, PR #8606) - PostgreSQL 14 can now be used (though we've tested mostly with 13). PostgreSQL 10+ is required. (Issue #8295, PR #8296)
- As always, widgets can be embedded in the
<iframe>HTML tag, but the HTTP header "Content-Security-Policy" is now being sent on non-widget pages to prevent them from being embedded. (PR #8662) - URIs in the the experimental Semantic API have changed (details below). (Issue #8533, PR #8592)
- Installations running Make Data Count can upgrade to Counter Processor-0.1.04. (Issue #8380, PR #8391)
- PrimeFaces, the UI framework we use, has been upgraded from 10 to 11. (Issue #8456, PR #8652)
Notes for Dataverse Installation Administrators
Identifying Datasets Requiring Terms of Access or Request Access Changes
In support of the change to require either Terms of Access or Request Access for all restricted files (see above for details), we have provided a query to identify datasets in your installation where at least one restricted file has neither Terms of Access nor Request Access enabled:
https://github.com/IQSS/dataverse/blob/v5.11/scripts/issues/8191/datasetswithouttoaorrequest_access
This will allow you to reach out to those dataset owners as appropriate.
Legacy Templates Issue
When custom license functionality was added, dataverses that had older legacy templates as their default template would not allow the creation of a new dataset (500 error).
This occurred because those legacy templates did not have an associated termsofuseandaccess linked to them.
In this release, we run a script that creates a default empty termsofuseandaccess for each of these templates and links them.
Note the termsofuseandaccess that are created this way default to using the license with id=1 (cc0) and the fileaccessrequest to false.
See also Issue #8599 and PR #8789.
PostgreSQL Version 10+ Required
This release upgrades the bundled PostgreSQL JDBC driver to support major version 14.
Note that the newer PostgreSQL driver required a Flyway version bump, which entails positive and negative consequences:
- The newer version of Flyway supports PostgreSQL 14 and includes a number of security fixes.
- As of version 8.0 the Flyway Community Edition dropped support for PostgreSQL 9.6 and older.
This means that as foreshadowed in the 5.10 and 5.10.1 release notes, version 10 or higher of PostgreSQL is now required. For suggested upgrade steps, please see "PostgreSQL Update" in the release notes for 5.10: https://github.com/IQSS/dataverse/releases/tag/v5.10
Counter Processor 0.1.04 Support
This release includes support for counter-processor-0.1.04 for processing Make Data Count metrics. If you are running Make Data Counts support, you should reinstall/reconfigure counter-processor as described in the latest Guides. (For existing installations, note that counter-processor-0.1.04 requires a newer version of Python so you will need to follow the full counter-processor install. Also note that if you configure the new version the same way, it will reprocess the days in the current month when it is first run. This is normal and will not affect the metrics in Dataverse.)
New JVM Options and DB Settings
The following DB settings have been added:
:ShowMuteOptions:AlwaysMuted:NeverMuted:CreateDataFilesMaxErrorsToDisplay:BagItHandlerEnabled:BagValidatorJobPoolSize:BagValidatorMaxErrors:BagValidatorJobWaitInterval:BagGeneratorThreads
See the Database Settings section of the Guides for more information.
Notes for Developers and Integrators
See the "Backward Incompatibilities" section below.
Backward Incompatibilities
Semantic API Changes
This release includes an update to the experimental semantic API and the underlying assignment of URIs to metadata block terms that are not explicitly mapped to terms in community vocabularies. The change affects the output of the OAIORE metadata export, the OAIORE file in archival bags, and the input/output allowed for those terms in the semantic API.
For those updating integrating code or existing files intended for input into this release of Dataverse, URIs of the form...
https://dataverse.org/schema/<block name>/<parentField name>#<childField title>
and
https://dataverse.org/schema/<block name>/<Field title>
...are both replaced with URIs of the form...
https://dataverse.org/schema/<block name>/<Field name>.
Create Dataset API Requires Content-type Header (Since 5.6)
Due to a code change introduced in Dataverse 5.6, calls to the native API without the Content-type header will fail to create a dataset. The API Guide has been updated to indicate the necessity of this header: https://guides.dataverse.org/en/5.11/api/native-api.html#create-a-dataset-in-a-dataverse-collection
Complete List of Changes
For the complete list of code changes in this release, see the 5.11 Milestone in GitHub.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.11.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.11.war
5. Restart Payara
service payara stopservice payara start
6. Reload citation metadata block
wget https://github.com/IQSS/dataverse/releases/download/v5.11/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
7. Update Solr schema.xml
Note that if you have custom metadata blocks you can skip this step and proceed to the next one.
Edit schema.xml and for journalArticleType change multiValued from "false" to "true" and then restart Solr. Alternatively, download and use the version from https://github.com/IQSS/dataverse/releases/download/v5.11/schema.xml . By default the file can be found at /usr/local/solr/solr-8.11.1/server/solr/collection1/conf/schema.xml.
7b. For installations with custom metadata blocks
Use the script provided in the release to add the custom fields to the base schema.xml installed in the previous step.
wget https://github.com/IQSS/dataverse/releases/download/v5.11/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.11.1/server/solr/collection1/conf/schema.xml
(Note that the curl command above calls the admin API on localhost to obtain the list of the custom fields. In the unlikely case that you are running the main Dataverse Application and Solr on different servers, generate the schema.xml on the application node, then copy it onto the Solr server.)
In either case, reload solr schema: https://guides.dataverse.org/en/5.11/admin/metadatacustomization.html#updating-the-solr-schema curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1"
8. Re-export metadata files (only OAI_ORE is affected)
People archiving Bags should re-archive. Follow the directions in the Admin Guide
9. (Optional) Delete duplicate templates in database
Prior to this release making a copy of a dataset template was creating two copies, only one of which is visible in the dataverse collection and usable. The other was not being assigned a collection was invisible to the user (#8600).
If you would like to remove these orphan templates you may run the following script:
https://github.com/IQSS/dataverse/blob/v5.11/scripts/issues/8600/deleteorphantemplates_8600.sh
Also, admin APIs for finding and deleting templates have been added: https://guides.dataverse.org/en/5.11/api/native-api.html#list-dataset-templates
- Java
Published by kcondon about 4 years ago
dataverse - v5.10.1
Dataverse Software 5.10.1
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Bug Fix for Request Access
Dataverse Software 5.10 contains a bug where the "Request Access" button doesn't work from the file listing on the dataset page if the dataset contains custom terms. This has been fixed in PR #8555.
Bug Fix for Searching and Selecting Controlled Vocabulary Values
Dataverse Software 5.10 contains a bug where the search option is no longer present when selecting from more than ten controlled vocabulary values. This has been fixed in PR #8521.
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release include:
- Users can use the "Request Access" button when the dataset has custom terms. (Issue #8553, PR #8555)
- Users can search when selecting from more than ten controlled vocabulary values. (Issue #8519, PR #8521)
- The default file categories ("Documentation", "Data", and "Code") can be redefined through the
:FileCategoriesdatabase setting. (Issue #8461, PR #8478) - Documentation on troubleshooting Excel ingest errors was improved. (PR #8541)
- Internationalized controlled vocabulary values can now be searched. (Issue #8286, PR #8435)
- Curation labels can be internationalized. (Issue #8381, PR #8466)
- "NONE" is no longer accepted as a license using the SWORD API (since 5.10). See "Backward Incompatibilities" below for details. (Issue #8551, PR #8558).
Notes for Dataverse Installation Administrators
PostgreSQL Version 10+ Required Soon
Because 5.10.1 is a bug fix release, an upgrade to PostgreSQL is not required. However, this upgrade is still coming in the next non-bug fix release. For details, please see the release notes for 5.10: https://github.com/IQSS/dataverse/releases/tag/v5.10
Payara Upgrade
You may notice that the Payara version used in the install scripts has been updated from 5.2021.5 to 5.2021.6. This was to address a bug where it was not possible to easily update the logging level. For existing installations, this release does not require upgrading Payara and a Payara upgrade is not part of the Upgrade Instructions below. For more information, see PR #8508.
New JVM Options and DB Settings
The following DB settings have been added:
:FileCategories- The default list of the pre-defined file categories ("Documentation", "Data" and "Code") can now be redefined with a comma-separated list (e.g.'Docs,Data,Code,Workflow').
See the Database Settings section of the Guides for more information.
Notes for Developers and Integrators
In the "Backward Incompatibilities" section below, note changes in the API regarding licenses and the SWORD API.
Backward Incompatibilities
As of Dataverse 5.10, "NONE" is no longer supported as a valid license when creating a dataset using the SWORD API. The API Guide has been updated to reflect this. Additionally, if you specify an invalid license, a list of available licenses will be returned in the response.
Complete List of Changes
For the complete list of code changes in this release, see the 5.10.1 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10.1.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.10.1.war
5. Restart payara
service payara stopservice payara start
- Java
Published by kcondon about 4 years ago
dataverse - v5.10
Dataverse Software 5.10
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Multiple License Support
Users can now select from a set of configured licenses in addition to or instead of the previous Creative Commons CC0 choice or provide custom terms of use (if configured) for their datasets. Administrators can configure their Dataverse instance via API to allow any desired license as a choice and can enable or disable the option to allow custom terms. Administrators can also mark licenses as "inactive" to disallow future use while keeping that license for existing datasets. For upgrades, only the CC0 license will be preinstalled. New installations will have both CC0 and CC BY preinstalled. The Configuring Licenses section of the Installation Guide shows how to add or remove licenses.
Note: Datasets in existing installations will automatically be updated to conform to new requirements that custom terms cannot be used with a standard license and that custom terms cannot be empty. Administrators may wish to manually update datasets with these conditions if they do not like the automated migration choices. See the "Notes for Dataverse Installation Administrators" section below for details.
This release also makes the license selection and/or custom terms more prominent when publishing and viewing a dataset and when downloading files.
Ingest and File Upload Messaging Improvements
Messaging around ingest failure has been softened to prevent support tickets. In addition, messaging during file upload has been improved, especially with regard to showing size limits and providing links to the guides about tabular ingest. For screenshots and additional details see PR #8271.
Downloading of Guestbook Responses with Fewer Clicks
A download button has been added to the page that lists guestbooks. This saves a click but you can still download responses from the "View Responses" page, as before.
Also, links to the guides about guestbooks have been added in additional places.
Dynamically Request Arbitrary Metadata Fields from Search API
The Search API now allows arbitrary metadata fields to be requested when displaying results from datasets. You can request all fields from metadata blocks or pick and choose certain fields.
The new parameter is called metadata_fields and the Search API documentation contains details and examples: https://guides.dataverse.org/en/5.10/api/search.html
Solr 8 Upgrade
The Dataverse Software now runs on Solr 8.11.1, the latest available stable release in the Solr 8.x series.
PostgreSQL Upgrade
A PostgreSQL upgrade is not required for this release but is planned for the next release. See below for details.
Major Use Cases and Infrastructure Enhancements
Changes and fixes in this release include:
- When creating or updating datasets, users can select from a set of licenses configured by the administrator (CC, CC BY, custom licenses, etc.) or provide custom terms (if the installation is configured to allow them). (Issue #7440, PR #7920)
- Users can get better feedback on tabular ingest errors and more information about size limits when uploading files. (Issue #8205, PR #8271)
- Users can more easily download guestbook responses and learn how guestbooks work. (Issue #8244, PR #8402)
- Search API users can specify additional metadata fields to be returned in the search results. (Issue #7863, PR #7942)
- The "Preview" tab on the file page can now show restricted files. (Issue #8258, PR #8265)
- Users wanting to upload files from GitHub to Dataverse can learn about a new GitHub Action called "Dataverse Uploader". (PR #8416)
- Users requesting access to files now get feedback that it was successful. (Issue #7469, PR #8341)
- Users may notice various accessibility improvements. (Issue #8321, PR #8322)
- Users of the Social Science metadata block can now add multiples of the "Collection Mode" field. (Issue #8452, PR #8473)
- Guestbooks now support multi-line text area fields. (Issue #8288, PR #8291)
- Guestbooks can better handle commas in responses. (Issue #8193, PR #8343)
- Dataset editors can now deselect a guestbook. (Issue #2257, PR #8403)
- Administrators with a large
actionlogrecordtable can read docs on archiving and then trimming it. (Issue #5916, PR #8292) - Administrators can list locks across all datasets. (PR #8445)
- Administrators can run a version of Solr that doesn't include a version of log4j2 with serious known vulnerabilities. We trust that you have patched the version of Solr you are running now following the instructions that were sent out. An upgrade to the latest version is recommended for extra peace of mind. (PR #8415)
- Administrators can run a version of Dataverse that doesn't include a version of log4j with known vulnerabilities. (PR #8377)
Notes for Dataverse Installation Administrators
Updating for Multiple License Support
Adding and Removing Licenses and How Existing Datasets Will Be Automatically Updated
As part of installing or upgrading an existing installation, administrators may wish to add additional license choices and/or configure Dataverse to allow custom terms. Adding additional licenses is managed via API, as explained in the Configuring Licenses section of the Installation Guide. Licenses are described via a JSON structure providing a name, URL, short description, and optional icon URL. Additionally licenses may be marked as active (selectable for new or updated datasets) or inactive (only allowed on existing datasets) and one license can be marked as the default. Custom Terms are allowed by default (backward compatible with the current option to select "No" to using CC0) and can be disabled by setting :AllowCustomTermsOfUse to false.
Further, administrators should review the following automated migration of existing licenses and terms into the new license framework and, if desired, should manually find and update any datasets for which the automated update is problematic. To understand the migration process, it is useful to understand how the multiple license feature works in this release:
"Custom Terms", aka a custom license, are defined through entries in the following fields of the dataset "Terms" tab:
- Terms of Use
- Confidentiality Declaration
- Special Permissions
- Restrictions
- Citation Requirements
- Depositor Requirements
- Conditions
- Disclaimer
"Custom Terms" require, at a minimum, a non-blank entry in the "Terms of Use" field. Entries in other fields are optional.
Since these fields are intended for terms/conditions that would potentially conflict with or modify the terms in a standard license, they are no longer shown when a standard license is selected.
In earlier Dataverse releases, it was possible to select the CC0 license and have entries in the fields above. It was also possible to say "No" to using CC0 and leave all of these terms fields blank.
The automated process will update existing datasets as follows.
- "CC0 Waiver" and no entries in the fields above -> CC0 License (no change)
- No CC0 Waiver and an entry in the "Terms of Use" field and possibly others fields listed above -> "Custom Terms" with the same entries in these fields (no change)
- CC0 Waiver and an entry in some of the fields listed -> 'Custom Terms' with the following text preprended in the "Terms of Use" field: "This dataset is made available under a Creative Commons CC0 license with the following additional/modified terms and conditions:"
- No CC0 Waiver and an entry in a field(s) other than the "Terms of Use" field -> "Custom Terms" with the following "Terms of Use" added: "This dataset is made available with limited information on how it can be used. You may wish to communicate with the Contact(s) specified before use."
- No CC0 Waiver and no entry in any of the listed fields -> "Custom Terms" with the following "Terms of Use" added: "This dataset is made available without information on how it can be used. You should communicate with the Contact(s) specified before use."
Administrators who have datasets where CC0 has been selected along with additional terms, or datasets where the Terms of Use field is empty, may wish to modify those datasets prior to upgrading to avoid the automated changes above. This is discussed next.
Handling Datasets that No Longer Comply With Licensing Rules
In most Dataverse installations, one would expect the vast majority of datasets to either use the CC0 Waiver or have non-empty Terms of Use. As noted above, these will be migrated without any issue. Administrators may however wish to find and manually update datasets that specified a CC0 license but also had terms (no longer allowed) or had no license and no terms of use (also no longer allowed) rather than accept the default migrations for these datasets listed above.
Finding and Modifying Datasets with a CC0 License and Non-Empty Terms
To find datasets with a CC0 license and non-empty terms:
select CONCAT('doi:', dvo.authority, '/', dvo.identifier), v.alias as dataverse_alias, case when versionstate='RELEASED' then concat(dv.versionnumber, '.', dv.minorversionnumber) else versionstate END as version, dv.id as datasetversion_id, t.id as termsofuseandaccess_id, t.termsofuse, t.confidentialitydeclaration, t.specialpermissions, t.restrictions, t.citationrequirements, t.depositorrequirements, t.conditions, t.disclaimer from dvobject dvo, termsofuseandaccess t, datasetversion dv, dataverse v where dv.dataset_id=dvo.id and dv.termsofuseandaccess_id=t.id and dvo.owner_id=v.id and t.license='CC0' and not (t.termsofuse is null and t.confidentialitydeclaration is null and t.specialpermissions is null and t.restrictions is null and citationrequirements is null and t.depositorrequirements is null and t.conditions is null and t.disclaimer is null);
The datasetdoi column will let you find and view the affected dataset in the Dataverse web interface. The version column will indicate which version(s) are relevant. The dataverse_alias will tell you which Dataverse collection the dataset is in (and may be useful if you want to adjust all datasets in a given collection). The termsofuseandaccess_id column indicates which specific entry in that table is associated with the dataset/version. The remaining columns show the values of any terms fields.
There are two options to migrate such datasets:
Option 1: Set all terms fields to null:
update termsofuseandaccess set termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id=<id>;
or to change several at once:
update termsofuseandaccess set termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id in (<comma separated list of termsanduseofaccess_ids>);
Option 2: Change the Dataset version(s) to not use the CCO waiver and modify the Terms of Use (and/or other fields) as you wish to indicate that the CC0 waiver was previously selected:
update termsofuseandaccess set license='NONE', termsofuse=concat('New text. ', termsofuse) where id=<id>;
or
update termsofuseandaccess set license='NONE', termsofuse=concat('New text. ', termsofuse) where id in (<comma separated list of termsanduseofaccess_ids>);
Finding and Modifying Datasets without a CC0 License and with Empty Terms
To find datasets with a without a CC0 license and with empty terms:
select CONCAT('doi:', dvo.authority, '/', dvo.identifier), v.alias as dataverse_alias, case when versionstate='RELEASED' then concat(dv.versionnumber, '.', dv.minorversionnumber) else versionstate END as version, dv.id as datasetversion_id, t.id as termsofuseandaccess_id, t.termsofuse, t.confidentialitydeclaration, t.specialpermissions, t.restrictions, t.citationrequirements, t.depositorrequirements, t.conditions, t.disclaimer from dvobject dvo, termsofuseandaccess t, datasetversion dv, dataverse v where dv.dataset_id=dvo.id and dv.termsofuseandaccess_id=t.id and dvo.owner_id=v.id and (t.license='NONE' or t.license is null) and t.termsofuse is null;
As before, there are a couple options.
Option 1: These datasets could be updated to use CC0:
update termsofuseandaccess set license='CC0', confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null where id=<id>;
Option 2: Terms of Use could be added:
update termsofuseandaccess set termsofuse='New text. ' where id=<id>;
In both cases, the same where id in (<comma separated list of termsanduseofaccess_ids>); ending could be used to change multiple datasets/versions at once.
Standardizing Custom Licenses
If many datasets use the same set of Custom Terms, it may make sense to create and register a standard license including those terms. Doing this would include:
- Creating and posting an external document that includes the custom terms, i.e. an HTML document with sections corresponding to the terms fields that are used.
- Defining a name, short description, URL (where it is posted), and optionally an icon URL for this license
- Using the Dataverse API to register the new license as one of the options available in your installation
- Using the API to make sure the license is active and deciding whether the license should also be the default
- Once the license is registered with Dataverse, making an SQL update to change datasets/versions using that license to reference it instead of having their own copy of those custom terms.
The benefits of this approach are:
- usability: the license can be selected for new datasets without allowing custom terms and without users having to cut/paste terms or collection administrators having to configure templates with those terms
- efficiency: custom terms are stored per dataset whereas licenses are registered once and all uses of it refer to the same object and external URL
- security: with the license terms maintained external to Dataverse, users cannot edit specific terms and curators do not need to check for edits
Once a standardized version of your Custom Terms are registered as a license, an SQL update like the following can be used to have datasets use it:
UPDATE termsofuseandaccess
SET license_id = (SELECT license.id FROM license WHERE license.name = '<Your License Name>'), termsofuse=null, confidentialitydeclaration=null, t.specialpermissions=null, t.restrictions=null, citationrequirements=null, depositorrequirements=null, conditions=null, disclaimer=null
WHERE termsofuseandaccess.termsofuse LIKE '%<Unique phrase in your Terms of Use>%';
Note that this information is also available in the Configuring Licenses section of the Installation Guide. Look for "Standardizing Custom Licenses".
PostgreSQL Version 10+ Required
If you are still using PostgreSQL 9.x, now is the time to upgrade. PostgreSQL 9.x is now EOL (no longer supported, as of January 2022), and in the next version of the Dataverse Software we plan to upgrade the Flyway library (used for database migrations) to a version that will no longer work with versions prior to PostgreSQL 10. See PR #8296 for more on this upcoming Flyway upgrade.
The Dataverse Software has been tested with PostgreSQL versions up to 13. The current stable version 13.5 is recommended. If that's not an option for reasons specific to your installation (for example, if PostgreSQL 13.5 is not available for the OS distribution you are using), any 10+ version should work.
See the upgrade section below for more information.
Providing S3 Storage Credentials via MicroProfile Config
With this release, you may use two new JVM options (dataverse.files.<id>.access-key and dataverse.files.<id>.secret-key) to pass an access key identifier and a secret access key for S3-based storage definitions without creating the files used by the AWS CLI tools (~/.aws/config & ~/.aws/credentials).
This has been added to ease setups using containers (Docker, Podman, Kubernetes, OpenShift) or testing and development installations. Find additional documentation and a word of warning in the Installation Guide.
New JVM Options and DB Settings
The following JVM settings have been added:
dataverse.files.<id>.access-key- S3 access key ID.dataverse.files.<id>.secret-key- S3 secret access key.
See the JVM Options section of the Installation Guide for more information.
The following DB settings have been added:
:AllowCustomTermsOfUse(default: true) - allow users to provide Custom Terms instead of choosing one of the configured standard licenses.
See the Database Settings section of the Guides for more information.
Notes for Developers and Integrators
In the "Backward Incompatibilities" section below, note changes in the API regarding licenses and the native JSON format.
Backward Incompatibilities
With the change to support multiple licenses, which can include cases where CC0 is not an option, and the decision to prohibit two previously possible cases (no license and no entry in the "Terms of Use" field, a standard license and entries in "Terms of Use", "Special Permissions", and related fields), this release contains changes to the display, API payloads, and export metadata that are not backward compatible. These include:
- "CC0 Waiver" has been replaced by "CC0 1.0" (the short name specified by Creative Commons) in the web interface, API payloads, and export formats that include a license name. (Note that installation admins can alter the license name in the database to maintain the original "CC0 Waiver" text, if desired.)
- Schema.org metadata in page headers and the Schema.org JSON-LD metadata export now reference the license via URL (which should avoid the current warning from Google about an invalid license object in the page metadata).
- Metadata exports and import methods (including SWORD) use either the license name (e.g. in the JSON export) or URL (e.g. in the OAI_ORE export) rather than a hardcoded value of "CC0" or "CC0 Waiver" currently (if the CC0 license is available, its default name would be "CC0 1.0").
- API calls (e.g. for import, migrate) that specify both a license and custom terms will be considered an error, as would having no license and an empty/blank value for "Terms of Use".
- Rollback. In general, one should not deploy an earlier release over a database that has been modified by deployment of a later release. (Make a db backup before upgrading and use that copy if you go back to a prior version.) Due to the nature of the db changes in this release, attempts to deploy an earlier version of Dataverse will fail unless the database is also restored to its pre-release state.
Also, note that since CC0 Waiver is no longer a hardcoded option, text strings that reference it have been edited or removed from Bundle.properties. This means that the ability to provide translations of the CC0 license name/description has been removed. The initial release of multiple license functionality doesn't include an alternative mechanism to provide translations of license names/descriptions, so this is a regression in capability (see #8346). The instructions and help information about license and terms remains internationalizable, it is only the name/description of the licenses themselves that cannot yet be translated.
An update in the metadata block Social Science changes the field CollectionMode to allow multiple values. This changes the way the field is encoded in the native JSON format. From
"typeName": "collectionMode",
"multiple": false,
"typeClass": "primitive",
"value": "some text"
to
"typeName": "collectionMode",
"multiple": true,
"typeClass": "primitive",
"value": ["some text", "more text"]
Complete List of Changes
For the complete list of code changes in this release, see the 5.10 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.10.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.10.war
5. Restart payara
service payara stopservice payara start
6. Update the Social Science metadata block
wget https://github.com/IQSS/dataverse/releases/download/v5.10/social_science.tsvcurl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @social_science.tsv -H "Content-type: text/tab-separated-values"- Note that this update also requires an updated Solr schema. We strongly recommend that you upgrade Solr as part of this release, by installing the latest stable release from scratch (see below). In the process you will configure it with the latest version of the schema as distributed with this Dataverse release, so no further steps will be needed. If you have already upgraded, or have some very good reason to stay on the old version a little longer, please refer to https://guides.dataverse.org/en/5.10/admin/metadatacustomization.html#updating-the-solr-schema for information on updating your Solr schema in place.
7. Run ReExportall to update Exports
Following the directions in the Admin Guide
8. Upgrade Solr
See "Additional Release Steps" below for how to upgrade Solr.
Additional Release Steps
Solr Upgrade
With this release we upgrade to the latest available stable release in the Solr 8.x branch. We recommend a fresh installation of Solr (the index will be empty) followed by an "index all".
Before you start the "index all", the Dataverse installation will appear to be empty because the search results come from Solr. As indexing progresses, partial results will appear until indexing is complete.
See http://guides.dataverse.org/en/5.10/installation/prerequisites.html#installing-solr for more information.
Please note that after you have followed the instruction above you will have Solr installed with the default schema that lists all the fields in the standard Dataverse metadata blocks. If your installation uses any custom metadata blocks, please refer to https://guides.dataverse.org/en/5.10/admin/metadatacustomization.html#updating-the-solr-schema for information on updating your Solr schema to include these extra fields.
PostgreSQL Upgrade
The tested and recommended way of upgrading an existing database is as follows:
- Export your current database with
pg_dumpall. - Install the new version of PostgreSQL (make sure it's running on the same port, so that no changes are needed in the Payara configuration).
- Re-import the database with
psql, as the userpostgres.
It is strongly recommended to use the versions of the pg_dumpall and psql from the old and new versions of PostgreSQL, respectively. For example, the commands below were used to migrate a database running under PostgreSQL 9.6 to 13.5. Adjust the versions and the path names to match your environment.
Back up/export:
/usr/pgsql-9.6/bin/pg_dumpall -U postgres > /tmp/backup.sql
Restore/import:
/usr/pgsql-13/bin/psql -U postgres -f /tmp/backup.sql
When upgrading the production database here at Harvard IQSS we were able to go from version 9.6 all the way to 13.3 without any issues.
You may want to try these backup and restore steps on a test server to get an accurate estimate of how much downtime to expect with the final production upgrade. That of course will depend on the size of your database.
Consult the PostgreSQL upgrade documentation for more information, for example https://www.postgresql.org/docs/13/upgrading.html#UPGRADING-VIA-PGDUMPALL.
- Java
Published by kcondon over 4 years ago
dataverse - v5.9
Dataverse Software 5.9
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Dataverse Collection Page Optimizations
The Dataverse Collection page, which also serves as the search page and the homepage in most Dataverse installations, has been optimized, with a specific focus on reducing the number of queries for each page load. These optimizations will be more noticable on Dataverse installations with higher traffic.
Support for HTTP "Range" Header for Partial File Downloads
Dataverse now supports the HTTP "Range" header, which allows users to download parts of a file. Here are some examples:
bytes=0-9gets the first 10 bytes.bytes=10-19gets 10 bytes from the middle.bytes=-10gets the last 10 bytes.bytes=9-gets all bytes except the first 10.
Only a single range is supported. For more information, see the Data Access API section of the API Guide.
Support for Optional External Metadata Validation Scripts
The Dataverse software now allows an installation administrator to provide custom scripts for additional metadata validation when datasets are being published and/or when Dataverse collections are being published or modified. The Harvard Dataverse Repository has been using this mechanism to combat content that violates our Terms of Use, specifically spam content. All the validation or verification logic is defined in these external scripts, thus making it possible for an installation to add checks custom-tailored to their needs.
Please note that only the metadata are subject to these validation checks. This does not check the content of any uploaded files.
For more information, see the Database Settings section of the Guide. The new settings are listed below, in the "New JVM Options and DB Settings" section of these release notes.
Displaying Author's Identifier as Link
In the dataset page's metadata tab the author's identifier is now displayed as a clickable link, which points to the profile page in the external service (ORCID, VIAF etc.) in cases where the identifier scheme provides a resolvable landing page. If the identifier does not match the expected scheme, a link is not shown.
Auxiliary File API Enhancements
This release includes updates to the Auxiliary File API. These updates include:
- Auxiliary files can now also be associated with non-tabular files
- Auxiliary files can now be deleted
- Duplicate Auxiliary files can no longer be created
- A new API has been added to list Auxiliary files by their origin
- Some auxiliary were being saved with the wrong content type (MIME type) but now the user can supply the content type on upload, overriding the type that would otherwise be assigned
- Improved error reporting
- A bugfix involving checksums for Auxiliary files
Please note that the Auxiliary files feature is experimental and is designed to support integration with tools from the OpenDP Project. If the API endpoints are not needed they can be blocked.
Major Use Cases and Infrastructure Enhancements
Newly-supported major use cases in this release include:
- The Dataverse collection page has been optimized, resulting in quicker load times on one of the most common pages in the application (Issue #7804, PR #8143)
- Users will now be able to specify a certain byte range in their downloads via API, allowing for downloads of file parts. (Issue #6397, PR #8087)
- A Dataverse installation administrator can now set up metadata validation for datasets and Dataverse collections, allowing for publish-time and create-time checks for all content. (Issue #8155, PR #8245)
- Users will be provided with clickable links to authors' ORCIDs and other IDs in the dataset metadata (Issue #7978, PR #7979)
- Users will now be able to associate Auxiliary files with non-tabular files (Issue #8235, PR #8237)
- Users will no longer be able to create duplicate Auxiliary files (Issue #8235, PR #8237)
- Users will be able to delete Auxiliary files (Issue #8235, PR #8237)
- Users can retrieve a list of Auxiliary files based on their origin (Issue #8235, PR #8237)
- Users will be able to supply the content type of Auxiliary files on upload (Issue #8241, PR #8282)
- The indexing process has been updated so that datasets with fewer files and indexed first, resulting in fewer failures and making it easier to identify problematically-large datasets. (Issue #8097, PR #8152)
- Users will no longer be able to create metadata records with problematic special characters, which would later require Dataverse installation administrator intervention and a database change (Issue #8018, PR #8242)
- The Dataverse software will now appropriately recognize files with the .geojson extension as GeoJSON files rather than "unknown" (Issue #8261, PR #8262)
- A Dataverse installation administrator can now retrieve more information about role deletion from the ActionLogRecord (Issue #2912, PR #8211)
- Users will be able to use a new role to allow a user to respond to file download requests without also giving them the power to manage the dataset (Issue #8109, PR #8174)
- Users will no longer be forced to update their passwords when moving from Dataverse 3.x to Dataverse 4.x (PR #7916)
- Improved accessibility of buttons on the Dataset and File pages (Issue #8247, PR #8257)
Notes for Dataverse Installation Administrators
Indexing Performance on Datasets with Large Numbers of Files
We discovered that whenever a full reindexing needs to be performed, datasets with large numbers of files take an exceptionally long time to index. For example, in the Harvard Dataverse Repository, it takes several hours for a dataset that has 25,000 files. In situations where the Solr index needs to be erased and rebuilt from scratch (such as a Solr version upgrade, or a corrupt index, etc.) this can significantly delay the repopulation of the search catalog.
We are still investigating the reasons behind this performance issue. For now, even though some improvements have been made, a dataset with thousands of files is still going to take a long time to index. In this release, we've made a simple change to the reindexing process, to index any such datasets at the very end of the batch, after all the datasets with fewer files have been reindexed. This does not improve the overall reindexing time, but will repopulate the bulk of the search index much faster for the users of the installation.
Custom Analytics Code Changes
You should update your custom analytics code to capture a bug fix related to tracking within the dataset files table. This release restores that tracking.
For more information, see the documentation and sample analytics code snippet provided in Installation Guide. This update can be used on any version 5.4+.
New ManageFilePermissions Permission
Dataverse can now support a use case in which a Admin or Curator would like to delegate the ability to grant access to restricted files to other users. This can be implemented by creating a custom role (e.g. DownloadApprover) that has the new ManageFilePermissions permission. This release introduces the new permission, and a Flyway script adjusts the existing Admin and Curator roles so they continue to have the ability to grant file download requrests.
Thumbnail Defaults
New default values have been added for the JVM settings dataverse.dataAccess.thumbnail.image.limit and dataverse.dataAccess.thumbnail.pdf.limit, of 3MB and 1MB respectively. This means that, unless specified otherwise by the JVM settings already in your domain configuration, the application will skip attempting to generate thumbnails for image files and PDFs that are above these size limits.
In previous versions, if these limits were not explicitly set, the application would try to create thumbnails for files of unlimited size. Which would occasionally cause problems with very large images.
New JVM Options and DB Settings
The following DB settings allow configuration of the external metadata validator:
- :DataverseMetadataValidatorScript
- :DataverseMetadataPublishValidationFailureMsg
- :DataverseMetadataUpdateValidationFailureMsg
- :DatasetMetadataValidatorScript
- :DatasetMetadataValidationFailureMsg
- :ExternalValidationAdminOverride
See the Database Settings section of the Guides for more information.
Notes for Developers and Integrators
Two sections of the Developer Guide have been updated:
- Instructions on how to sync a PR in progress with develop have been added in the version control section
- Guidance on avoiding ineffeciencies in JSF render logic has been added to the "Tips" section
Complete List of Changes
For the complete list of code changes in this release, see the 5.9 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.9.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.9.war
5. Restart payara
service payara stopservice payara start
6. Run ReExportall to update JSON Exports
Following the directions in the Admin Guide
Additional Release Steps
(for installations collecting web analytics)
1. Update custom analytics code per the Installation Guide.
(for installations with GeoJSON files)
1. Redetect GeoJSON files to update the type from "Unknown" to GeoJSON, following the directions in the API Guide
2. Kick off full reindex following the directions in the Admin Guide
- Java
Published by kcondon over 4 years ago
dataverse - v5.8
Dataverse Software 5.8
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Support for Data Embargoes
The Dataverse Software now supports file-level embargoes. The ability to set embargoes, up to a maximum duration (in months), can be configured by a Dataverse installation administrator. For more information, see the Embargoes section of the Dataverse Software Guides.
Users can configure a specific embargo, defined by an end date and a short reason, on a set of selected files or an individual file, by selecting the 'Embargo' menu item and entering information in a popup dialog. Embargoes can only be set, changed, or removed before a file has been published. After publication, only Dataverse installation administrators can make changes, using an API.
While embargoed, files cannot be previewed or downloaded (as if restricted, with no option to allow access requests). After the embargo expires, files become accessible. If the files were also restricted, they remain inaccessible and functionality is the same as for any restricted file.
By default, the citation date reported for the dataset and the datafiles in version 1.0 reflects the longest embargo period on any file in version 1.0, which is consistent with recommended practice from DataCite. Administrators can still specify an alternate date field to be used in the citation date via the Set Citation Date Field Type for a Dataset API Call.
The work to add this functionality was initiated by Data Archiving and Networked Services (DANS-KNAW), the Netherlands. It was further developed by the Global Dataverse Community Consortium (GDCC) in cooperation with and with funding from DANS.
Major Use Cases and Infrastructure Enhancements
Newly-supported major use cases in this release include:
- Users can set file-level embargoes. (Issue #7743, #4052, #343, PR #8020)
- Improved accessibility of form labels on the advanced search page (Issue #8169, PR #8170)
Notes for Dataverse Installation Administrators
Mitigate Solr Schema Management Problems
With Release 5.5, the <copyField> definitions had been reincluded into schema.xml to fix searching for datasets.
This release includes a final update to schema.xml and a new script update-fields.sh to manage your custom metadata fields, and to provide opportunities for other future improvements. The broken script updateSchemaMDB.sh has been removed.
You will need to replace your schema.xml with the one provided in order to make sure that the new script can function. If you do not use any custom metadata blocks in your installation, this is the only change to be made. If you do use custom metadata blocks you will need to take a few extra steps, enumerated in the step-by-step instructions below.
New JVM Options and DB Settings
- :MaxEmbargoDurationInMonths controls whether embargoes are allowed in a Dataverse instance and can limit the maximum duration users are allowed to specify. A value of 0 months or non-existent setting indicates embargoes are not supported. A value of -1 allows embargoes of any length.
Complete List of Changes
For the complete list of code changes in this release, see the 5.8 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide. Please also contact us to get added to the Dataverse Project Map if you have not done so already.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.8.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.8.war
5. Restart payara
service payara stopservice payara start
6. Update Solr schema.xml.
/usr/local/solr/solr-8.8.1/server/solr/collection1/conf is used in the examples below as the location of your Solr schema. Please adapt it to the correct location, if different in your installation. Use find / -name schema.xml if in doubt.
6a. Replace schema.xml with the base version included in this release.
wget https://github.com/IQSS/dataverse/releases/download/v5.8/schema.xml
cp schema.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf
For installations that are not using any Custom Metadata Blocks, you can skip the next step.
6b. For installations with Custom Metadata Blocks
Use the script provided in the release to add the custom fields to the base schema.xml installed in the previous step.
wget https://github.com/IQSS/dataverse/releases/download/v5.8/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-8.8.1/server/solr/collection1/conf/schema.xml
(Note that the curl command above calls the admin api on localhost to obtain the list of the custom fields. In the unlikely case that you are running the main Dataverse Application and Solr on different servers, generate the schema.xml on the application node, then copy it onto the Solr server)
7. Restart Solr
Usually service solr stop; service solr start, but may be different on your system. See the Installation Guide for more details.
- Java
Published by kcondon over 4 years ago
dataverse - v5.7
Dataverse Software 5.7
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Experimental Support for External Vocabulary Services
Dataverse can now be configured to associate specific metadata fields with third-party vocabulary services to provide an easy way for users to select values from those vocabularies. The mapping involves use of external Javascripts. Two such scripts have been developed so far: one for vocabularies served via the SKOSMOS protocol and one allowing people to be identified via their ORCID. The guides contain info about the new :CVocConf setting used for configuration and additional information about this functionality. Scripts, examples, and additional documentation are available at the GDCC GitHub Repository.
Please watch the online presentation, read the document with requirements and join the Dataverse Working Group on Ontologies and Controlled Vocabularies if you have some questions and want to contribute.
This functionality was initially developed by Data Archiving and Networked Services (DANS-KNAW), the Netherlands, and funded by SSHOC, "Social Sciences and Humanities Open Cloud". SSHOC has received funding from the European Union’s Horizon 2020 project call H2020-INFRAEOSC-04-2018, grant agreement #823782. It was further improved by the Global Dataverse Community Consortium (GDCC) and extended with the support of semantic search.
Curation Status Labels
A new :AllowedCurationLabels setting allows a sysadmins to define one or more sets of labels that can be applied to a draft Dataset version via the user interface or API to indicate the status of the dataset with respect to a defined curation process.
Labels are completely customizable (alphanumeric or spaces, up to 32 characters, e.g. "Author contacted", "Privacy Review", "Awaiting paper publication"). Superusers can select a specific set of labels, or disable this functionality per collection. Anyone who can publish a draft dataset (e.g. curators) can set/change/remove labels (from the set specified for the collection containing the dataset) via the user interface or via an API. The API also would allow external tools to search for, read and set labels on Datasets, providing an integration mechanism. Labels are visible on the Dataset page and in Dataverse collection listings/search results. Internally, the labels have no effect, and at publication, any existing label will be removed. A reporting API call allows admins to get a list of datasets and their curation statuses.
The Solr schema must be updated as part of installing the release of Dataverse containing this feature for it to work.
Major Use Cases
Newly-supported major use cases in this release include:
- Administrators will be able to set up integrations with external vocabulary services, allowing for autocomplete-assisted metadata entry, metadata standardization, and better integration with other systems (Issue #7711, PR #7946)
- Users viewing datasets in the root Dataverse collection will now see breadcrumbs that have have a link back to the root Dataverse collection (Issue #7527, PR #8078)
- Users will be able to more easily differentiate between datasets and files through new iconography (Issue #7991, PR #8021)
- Users retrieving large guestbooks over the API will experience fewer failures (Issue #8073, PR #8084)
- Dataverse collection administrators can specify which language will be used when entering metadata for new Datasets in a collection, based on a list of languages specified by the Dataverse installation administrator (Issue #7388, PR #7958)
- Users will see the language used for metadata entry indicated at the document or element level in metadata exports (Issue #7388, PR #7958)
- Administrators will now be able to specify the language(s) of controlled vocabulary entries, in addition to the installation's default language (Issue #6751, PR #7959)
- Administrators and curators can now receive notifications when a dataset is created (Issue #8069, PR #8070)
- Administrators with large files in their installation can disable the automatic checksum verification process at publish time (Issue #8043, PR #8074)
Notes for Dataverse Installation Administrators
Dataset Creation Notifications for Administrators
A new :SendNotificationOnDatasetCreation setting has been added. When true, administrators and curators (those who can publish the dataset) will get a notification when a new dataset is created. This makes it easier to track activity in a Dataverse and, for example, allow admins to follow up when users do not publish a new dataset within some period of time.
Skip Checksum Validation at Publish Based on Size
When a user requests to publish a dataset, the time taken to complete the publishing process varies based on the dataset/datafile size.
With the additional settings of :DatasetChecksumValidationSizeLimit and :DataFileChecksumValidationSizeLimit, the checksum validation can be skipped while publishing.
If the Dataverse administrator chooses to set these values, it's strongly recommended to have an external auditing system run periodically in order to monitor the integrity of the files in the Dataverse installation.
Guestbook Response API Update
With this release the Retrieve Guestbook Responses for a Dataverse Collection API will no longer produce a file by default. You may specify an output file by adding a -o $YOURFILENAME to the curl command.
Dynamic JavaServer Faces Configuration Options
This release includes a new way to easily change JSF settings via MicroProfile Config, especially useful during development. See the development guide on "Debugging" for more information.
Enhancements to DDI Metadata Exports
Several changes have been made to the DDI exports to improve support for internationalization and to improve compliance with CESSDA requirements. These changes include:
- Addition of xml:lang attributes specifying the dataset metadata language at the document level and for individual elements such as title and description
- Specification of controlled vocabulary terms in duplicate elements in multiple languages (in the installation default langauge and, if different, the dataset metadata language)
While these changes are intended to improve harvesting and integration with external systems, they could break existing connections that make assumptions about the elements and attributes that have been changed.
New JVM Options and DB Settings
- :SendNotificationOnDatasetCreation - A boolean setting that, if true will send an email and notification to additional users when a Dataset is created. Messages go to those, other than the dataset creator, who have the ability/permission necessary to publish the dataset.
- :DatasetChecksumValidationSizeLimit - disables the checksum validation while publishing for any dataset size greater than the limit.
- :DataFileChecksumValidationSizeLimit - Disables the checksum validation while publishing for any datafiles greater than the limit.
- :CVocConf - A JSON-structured setting that configures Dataverse to associate specific metadatablock fields with external vocabulary services and specific vocabularies/sub-vocabularies managed by that service.
- :MetadataLanguages - Sets which languages can be used when entering dataset metadata.
- :AllowedCurationLabels - A JSON Object containing lists of allowed labels (up to 32 characters, spaces allowed) that can be set, via API or UI by users with the permission to publish a dataset. The set of labels allowed for datasets can be selected by a superuser - via the Dataverse collection page (Edit/General Info) or set via API call.
Notes for Tool Developers and Integrators
Bags Now Support File Paths
The original Bag generation code stored all dataset files directly under the /data directory. With the addition in Dataverse of a directory path for files and then a change to allow files with different paths to have the same name, archival Bags will now use the directory path from Dataverse to avoid name collisions within the /data directory. Prior to this update, Bags from Datasets with multiple files with the same name would have been created with only one of the files with that name (with warnings in the log, but still generating the Bag).
Complete List of Changes
For the complete list of code changes in this release, see the 5.7 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.7.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.7.war
5. Restart payara
service payara stopservice payara start
Additional Release Steps
1. Replace Solr schema.xml to allow Curation Labels to be used. See specific instructions below for those installations with custom metadata blocks (1a) and those without (1b).
1a.
For installations with Custom Metadata Blocks:
-stop solr instance (usually service solr stop, depending on solr installation/OS, see the Installation Guide
add the following line to your schema.xml:
<field name="externalStatus" type="string" stored="true" indexed="true" multiValued="false"/>restart solr instance (usually service solr start, depending on solr/OS)
1b.
For installations without Custom Metadata Blocks:
-stop solr instance (usually service solr stop, depending on solr installation/OS, see the Installation Guide
-replace schema.xml
cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-8.8.1/server/solr/collection1/conf
-start solr instance (usually service solr start, depending on solr/OS)
- Java
Published by kcondon over 4 years ago
dataverse - v5.6
Dataverse Software 5.6
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Anonymized Access in Support of Double Blind Review
Dataverse installations can select whether or not to allow users to create anonymized private URLs and can control which specific identifying fields are anonymized. If this is enabled, users can create private URLs that do not expose identifying information about dataset depositors, allowing for double blind reviews of datasets in the Dataverse installation.
Guestbook Responses API
A new API to retrieve Guestbook responses has been added. This makes it easier to retrieve the records for large guestbooks and also makes it easier to integrate with external systems.
Dataset Semantic API (Experimental)
Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format - following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field storage architecture). This new API also allows for the update of terms metadata (#5899).
This development was supported by the Research Data Alliance, DANS, and Sciences PO and follows the recommendations from the Research Data Repository Interoperability Working Group.
Dataset Migration API (Experimental)
Datasets can now imported following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier migration from one Dataverse installation to another, and migration from other systems. This experimental, superuser only, endpoint also allows keeping the existing persistent identifier (where the authority and shoulder match those for which the software is configured) and publication dates.
This development was supported by DANS and the Research Data Alliance and follows the recommendations from the Research Data Repository Interoperability Working Group.
Direct Upload API Now Available for Adding Multiple Files' Metadata to the Dataset
Using the Direct Upload API, users can now add metadata of multiple files to the dataset after the files exist in the S3 bucket. This makes direct uploads more efficient and reduces server load by only updating the dataset once instead of once per file. For more information, see the Direct DataFile Upload/Replace API section of the Dataverse Software Guides.
Major Use Cases
Newly-supported major use cases in this release include:
- Users can create Private URLs that anonymize dataset metadata, allowing for double blind peer review. (Issue #1724, PR #7908)
- Users can download Guestbook records using a new API. (Issue #7767, PR #7931)
- Users can update terms metadata using the new semantic API. (Issue #5899, PR #7414)
- Users can retrieve, set, and update metadata using a new, flatter JSON-LD format. (Issue #6497, PR #7414)
- Administrators can use the Traces API to retrieve information about specific types of user activity (Issue #7952, PR #7953)
Notes for Dataverse Installation Administrators
New Database Constraint
A new DB Constraint has been added in this release. Full instructions on how to identify whether or not your database needs any cleanup before the upgrade can be found in the Dataverse software GitHub repository. This information was also emailed out to Dataverse installation contacts.
Payara 5.2021.5 (or Higher) Required
Some changes in this release require an upgrade to Payara 5.2021.5 or higher. (See the upgrade section).
Instructions on how to update can be found in the Payara documentation We've included the necessary steps below, but we recommend that you review the Payara upgrade instructions as it could be helpful during any troubleshooting.
Installations upgrading from a previous Payara version shouldn't encounter a logging configuration bug in Payara-5.2021.5, but if your server.log fills with repeated notes about logging configuration and WELD complaints about loading beans, see the paragraph on logging.properties in the Installation Guide
Enhancement to DDI Metadata Exports
To increase support for internationalization and to improve compliance with CESSDA requirements, DDI exports now have a holdings element with a URI attribute whose value is the URL form of the dataset PID.
New JVM Options and DB Settings
:AnonymizedFieldTypeNames can be used to enable creation of anonymized Private URLs and to specify which fields will be anonymized.
Notes for Tool Developers and Integrators
Semantic API
The new Semantic API is especially helpful in data migrations and getting metadata into a Dataverse installation. Learn more in the Developers Guide.
Complete List of Changes
For the complete list of code changes in this release, see the 5.6 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.6.
The steps below include a required upgrade to Payara 5.2021.5 or higher. (It is a simple matter of reusing your existing domain directory with the new distribution). But we also recommend that you review the Payara upgrade instructions as it could be helpful during any troubleshooting: Payara documentation
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Move the current Payara directory out of the way
mv $PAYARA $PAYARA.MOVED
4. Download the new Payara version (5.2021.5+), and unzip it in its place
5. Replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1
6. Start Payara
service payara start
7. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.6.war
8. Restart payara
service payara stopservice payara start
10. Run ReExportall to update JSON Exports http://guides.dataverse.org/en/5.6/admin/metadataexport.html?highlight=export#batch-exports-through-the-api
Additional Release Steps
If your installation relies on the database-side stored procedure for generating sequential numeric identifiers:
Note that you can skip this step if your installation uses the default-style, randomly-generated six alphanumeric character-long identifiers for your datasets! This is the case with most Dataverse installations.
The underlying database framework has been modified in this release, to make it easier for installations to create custom procedures for generating identifier strings that suit their needs. Your current configuration will be automatically updated by the database upgrade (Flyway) script incorporated in the release. No manual configuration changes should be necessary. However, after the upgrade, we recommend that you confirm that your installation can still create new datasets, and that they are still assigned sequential numeric identifiers. In the unlikely chance that this is no longer working, please re-create the stored procedure following the steps described in the documentation for the :IdentifierGenerationStyle setting in the Configuration section of the Installation Guide for this release (v5.6).
(Running the script supplied there will NOT overwrite the position on the sequence you are currently using!)
- Java
Published by kcondon almost 5 years ago
dataverse - v5.5
Dataverse Software 5.5
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Note: this release has a change to the default value for the :ZipDownloadLimit setting, from 100 MB to 0 bytes. If you have not previously adjusted this setting from the default, your Dataverse installation will no longer generate zip files once v5.5 is installed, as the setting will now be 0 bytes. This behavior will be revisited in a later release.
Release Highlights
Auxiliary Files Accessible Through the UI
Auxiliary Files can now be downloaded from the web interface. Auxiliary files uploaded as type=DP appear under "Differentially Private Statistics" under file level download. The rest appear under "Other Auxiliary Files".
Please note that the auxiliary files feature is experimental and is designed to support integration with tools from the OpenDP Project. If the API endpoints are not needed they can be blocked.
Improved Workflow for Downloading Large Zip Files
Users trying to download a zip file larger than the Dataverse installation's :ZipDownloadLimit will now receive messaging that the zip file is too large, and the user will be presented with alternate access options. Previously, the zip file would download and files above the :ZipDownloadLimit would be excluded and noted in a MANIFEST.TXT file.
Guidelines on Depositing Code
The Software Metadata Working Group has created guidelines on depositing research code in a Dataverse installation. Learn more in the Dataset Management section of the Dataverse Guides.
New Metrics API
Users can retrieve new types of metrics and per-collection metrics. The new capabilities are described in the guides. A new version of the Dataverse Metrics web app adds interactive graphs to display these metrics. Anyone running the existing Dataverse Metrics app will need to upgrade or apply a small patch to continue retrieving metrics from Dataverse instances upgrading to this release.
Major Use Cases
Newly-supported major use cases in this release include:
- Users can now select and download auxiliary files through the UI. (Issue #7400, PR #7729)
- Users attempting to download zip files above the installation's size limit will receive better messaging and be directed to other download options. (Issue #7714, PR #7806)
- Superusers can now sort users on the Dashboard. (Issue #7814, PR #7815)
- Users can now access expanded and new metrics through a new API (Issue #7177, PR #7178)
- Dataverse collection administrators can now add a search facet on their collection pages for the Geospatial metadatablock's "Other" field, so that others can narrow searches in their collections using the values entered in that "Other" field (Issue #7399, PR #7813)
- Depositors can now receive guidance about depositing code into a Dataverse installation (PR #7717)
Notes for Dataverse Installation Administrators
Simple Search Fix for Solr Configuration
The introduction in v4.17 of a schemadvmdb_copies.xml file as part of the Solr configuration accidentally removed the contents of most metadata fields from index used for simple searches in Dataverse (i.e. when one types a word without indicating which field to search in the normal search box). This was somewhat ameliorated/hidden by the fact that many common fields such as description were still included by other means.
This release removes the schemadvmdb_copies.xml file and includes the updates needed in the schema.xml file. Installations with no custom metadata blocks can simply replace their current schema.xml file for Solr, restart Solr, and run a 'Reindex in Place' as described in the guides.
Installations using custom metadata blocks should manually copy the contents of their schemadvmdb_copies.xml file (excluding the enclosing <schema> element and only including the <copyField> elements) into their schema.xml file, replacing the section between
<!-- Dataverse copyField from http://localhost:8080/api/admin/index/solr/schema -->
and
<!-- End: Dataverse-specific -->.
In existing schema.xml files, this section currently includes only one line:
<xi:include href="schema_dv_mdb_copies.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />.
In this release, that line has already been replaced with the default set of <copyFields>.
It doesn't matter whether schemadvmdbcopies.xml was originally created manually or via the recommended updateSchemaMDB.sh script and this fix will work with all prior versions of Dataverse from v4.17 on. If you make further changes to metadata blocks in your installation, you can repeat this process (i.e. run updateSchemaMDB.sh, copy the entries in schemadvmdbcopies.xml into the same section of schema.xml, restart solr, and reindex.)
Once schema.xml is updated, Solr should be restarted and a 'Reindex in Place' will be required. (Future Dataverse Software versions will avoid this manual copy step.)
Geospatial Metadata Block Updated
The Geospatial metadata block (geospatial.tsv) was updated. Dataverse collection administrators can now add a search facet on their collection pages for the metadata block's "Other" field, so that people searching in their collections can narrow searches using the values entered in that field.
Extended support for S3 Download Redirects ("Direct Downloads")
If your installation uses S3 for storage and you have "direct downloads" enabled, please note that it will now cover the following download types that were not handled by redirects in the earlier versions: saved originals of tabular data files, cached RData frames, resized thumbnails for image files and other auxiliary files. In other words, all the forms of the file download API that take extra arguments, such as "format" or "imageThumb" - for example:
/api/access/datafile/12345?format=original
/api/access/datafile/:persistentId?persistentId=doi:1234/ABCDE/FGHIJ&imageThumb=true
etc., that were previously excluded.
Since browsers follow redirects automatically, this change should not in any way affect the web GUI users. However, some API users may experience problems, if they use it in a way that does not expect to receive a redirect response. For example, if a user has a script where they expect to download a saved original of an ingested tabular file with the following command:
curl https://yourhost.edu/api/access/datafile/12345?format=original > orig.dta
it will fail to save the file when it receives a 303 (redirect) response instead of 200. So they will need to add "-L" to the command line above, to instruct curl to follow redirects:
curl -L https://yourhost.edu/api/access/datafile/12345?format=original > orig.dta
Most of your API users have likely figured it out already, since you enabled S3 redirects for "straightforward" downloads in your installation. But we feel it was worth a heads up, just in case.
Authenticated User Deactivated Field Updated
The "deactivated" field on the Authenticated User table has been updated to be a non-nullable field. When the field was added in version 5.3 it was set to 'false' in an update script. If for whatever reason that update failed in the 5.3 deploy you will need to re-run it before deploying 5.5. The update query you may need to run is: UPDATE authenticateduser SET deactivated = false WHERE deactivated IS NULL;
Notes for Tool Developers and Integrators
S3 Download Redirects
See above note about download redirects. If your application integrates with the the Dataverse software using the APIs, you may need to change how redirects are handled in your tool or integration.
Complete List of Changes
For the complete list of code changes in this release, see the 5.5 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.5.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.5.war
5. Restart payara
service payara stopservice payara start
Additional Release Steps
1. Follow the steps to update your Solr configuration, found in the "Notes for Dataverse Installation Administrators" section above. Note that there are different instructions for Dataverse installations running with custom metadata blocks and those without.
2. Update Geospatial Metadata Block (if used)
wget https://github.com/IQSS/dataverse/releases/download/v5.5/geospatial.tsvcurl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @geospatial.tsv -H "Content-type: text/tab-separated-values"
- Java
Published by kcondon about 5 years ago
dataverse - v5.4.1
Dataverse Software 5.4.1
This release provides a fix for a regression introduced in 5.4 and implements a few other small changes. Please use 5.4.1 for production deployments instead of 5.4.
Release Highlights
API Backwards Compatibility Maintained
The syntax in the example in the Basic File Access section of the Dataverse Software Guides will continue to work.
Direct Upload API Now Available for Replacing Files
Users can now replace files using the direct upload API. For more information, see the Direct DataFile Upload/Replace API section of the Dataverse Software Guides.
Complete List of Changes
For the complete list of code changes in this release, see the 5.4.1 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.4.1.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
1. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
2. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
3. Start Payara
service payara start
4. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.4.1.war
5. Restart payara
service payara stopservice payara start
- Java
Published by kcondon about 5 years ago
dataverse - v5.4
Dataverse Software 5.4
This release brings new features, enhancements, and bug fixes to the Dataverse Software. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project. Please note that there is an API backwards compatibility issue in 5.4, and we recommend using 5.4.1 for any production environments.
Release Highlights
Deactivate Users API, Get User Traces API, Revoke Roles API
A new API has been added to deactivate users to prevent them from logging in, receiving communications, or otherwise being active in the system. Deactivating a user is an alternative to deleting a user, especially when the latter is not possible due to the amount of interaction the user has had with the Dataverse installation. In order to learn more about a user before deleting, deactivating, or merging, a new "get user traces" API is available that will show objects created, roles, group memberships, and more. Finally, the "remove all roles" button available in the superuser dashboard is now also available via API.
New File Access API
A new API offers crawlable access view of the folders and files within a dataset:
/api/datasets/<dataset id>/dirindex/
will output a simple html listing, based on the standard Apache directory index, with Access API download links for individual files, and recursive calls to the API above for sub-folders. Please see the Native API Guide for more information.
Using this API, wget --recursive (or similar crawling client) can be used to download all the files in a dataset, preserving the file names and folder structure; without having to use the download-as-zip API. In addition to being faster (zipping is a relatively resource-intensive operation on the server side), this process can be restarted if interrupted (with wget --continue or equivalent) - unlike zipped multi-file downloads that always have to start from the beginning.
On a system that uses S3 with download redirects, the individual file downloads will be handled by S3 directly (with the exception of tabular files), without having to be proxied through the Dataverse application.
Restricted Files and DDI "dataDscr" Information (Summary Statistics, Variable Names, Variable Labels)
In previous releases, DDI "dataDscr" information (summary statistics, variable names, and variable labels, sometimes known as "variable metadata") for tabular files that were ingested successfully were available even if files were restricted. This has been changed in the following ways:
- At the dataset level, DDI exports no longer show "dataDscr" information for restricted files. There is only one version of this export and it is the version that's suitable for public consumption with the "dataDscr" information hidden for restricted files.
- Similarly, at the dataset level, the DDI HTML Codebook no longer shows "dataDscr" information for restricted files.
- At the file level, "dataDscr" information is no longer publicly available for restricted files. In practice, it was only possible to get this publicly via API (the download/access button was hidden).
- At the file level, "dataDscr" (variable metadata) information can still be downloaded for restricted files if you have access to download the file.
Search with Accented Characters
Many languages include characters that have close analogs in ascii, e.g. (á, à, â, ç, é, è, ê, ë, í, ó, ö, ú, ù, û, ü…). This release changes the default Solr configuration to allow search to match words based on these associations, e.g. a search for Mercè would match the word Merce in a Dataset, and vice versa. This should generally be helpful, but can result in false positives, e.g. "canon" will be found searching for "cañon".
Java 11, PostgreSQL 13, and Solr 8 Support/Upgrades
Several of the core components of the Dataverse Software have been upgraded. Specifically:
- The Dataverse Software now runs on and requires Java 11. This will provide performance and security enhancements, allows developers to take advantage of new and updated Java features, and moves the project to a platform with better longer term support. This upgrade requires a few extra steps in the release process, outlined below.
- The Dataverse Software has now been tested with PostgreSQL versions up to 13. Versions 9.6+ will still work, but this update is necessary to support the software beyond PostgreSQL EOL later in 2021.
- The Dataverse Software now runs on Solr 8.8.1, the latest available stable release in the Solr 8.x series.
Saved Search Performance Improvements
A refactoring has greatly improved Saved Search performance in the application. If your installation has multiple, potentially long-running Saved Searches in place, this greatly improves the probability that those search jobs will complete without timing out.
Worldmap/Geoconnect Integration Now Obsolete
As of this release, the Geoconnect/Worldmap integration is no longer available. The Harvard University Worldmap is going through a migration process, and instead of updating this code to work with the new infrastructure, the decision was made to pursue future Geospatial exploration/analysis through other tools, following the External Tools Framework in the Dataverse Software.
Guides Updates
The Dataverse Software Guides have been updated to follow recent changes to how different terms are used across the Dataverse Project. For more information, see Mercè's note to the community:
https://groups.google.com/g/dataverse-community/c/pD-aFrpXMPo
Conditionally Required Metadata Fields
Prior to this release, when defining metadata for compound fields (via their dataset field types), fields could be either be optional or required, i.e. if required you must always have (at least one) value for that field. For example, Author Name being required means you must have at least one Author with an nonempty Author name.
In order to support more robust metadata (and specifically to resolve #7551), we need to allow a third case: Conditionally Required, that is, the field is required if and only if any of its "sibling" fields are entered. For example, Producer Name is now conditionally required in the citation metadata block. A user does not have to enter a Producer, but if they do, they have to enter a Producer Name.
Major Use Cases
Newly-supported major use cases in this release include:
- Dataverse Installation Administrators can now deactivate users using a new API. (Issue #2419, PR #7629)
- Superusers can remove all of a user's assigned roles using a new API. (Issue #2419, PR #7629)
- Superusers can use an API to gather more information about actions a user has taken in the system in order to make an informed decisions about whether or not to deactivate or delete a user. (Issue #2419, PR #7629)
- Superusers will now be able to harvest from installations using ISO-639-3 language codes. (Issue #7638, PR #7690)
- Users interacting with the workflow system will receive status messages (Issue #7564, PR #7635)
- Users interacting with prepublication workflows will see speed improvements (Issue #7681, PR #7682)
- API Users will receive Dataverse collection API responses in a deterministic order. (Issue #7634, PR #7708)
- API Users will be able to access a list of crawlable URLs for file download, allowing for faster and easily resumable transfers. (Issue #7084, PR #7579)
- Users will no longer be able to access summary stats for restricted files. (Issue #7619, PR #7642)
- Users will now see truncated versions of long strings (primarily checksums) throughout the application (Issue #6685, PR #7312)
- Users will now be able to easily copy checksums, API tokens, and private URLs with a single click (Issue #6039, Issue #6685, PR #7539, PR #7312)
- Users uploading data through the Direct Upload API will now be able to use additional checksums (Issue #7600, PR #7602)
- Users searching for content will now be able to search using non-ascii characters. (Issue #820, PR #7378)
- Users can now replace files in draft datasets, a functionality previously only available on published datasets. (Issue #7149, PR #7337)
- Dataverse Installation Administrators can now set subfields of compound fields as conditionally required, that is, the field is required if and only if any of its "sibling" fields are entered. For example, Producer Name is now conditionally required in the citation metadata block. A user does not have to enter a Producer, but if they do, they have to enter a Producer Name. (Issue #7606, PR #7608)
Notes for Dataverse Installation Administrators
Java 11 Upgrade
There are some things to note and keep in mind regarding the move to Java 11:
You should install the JDK/JRE following your usual methods, depending on your operating system. An example of this on a RHEL/CentOS 7 or RHEL/CentOS 8 system is:
$ sudo yum remove java-1.8.0-openjdk java-1.8.0-openjdk-devel java-1.8.0-openjdk-headless$ sudo yum install java-11-openjdk-develThe
removecommand may provide an error message if -headless isn't installed.We targeted and tested Java 11, but 11+ will likely work. Java 11 was targeted because of its long term support.
If you're moving from a Dataverse installation that was previously running Glassfish 4.x (typically this would be Dataverse Software 4.x), you will need to adjust some JVM options in domain.xml as part of the upgrade process. We've provided these optional steps below. These steps are not required if your first installed Dataverse Software version was running Payara 5.x (typically Dataverse Software 5.x).
PostgreSQL Versions Up To 13 Supported
Up until this release our installation guide "strongly recommended" to install PostgreSQL v. 9.6. While that version is known to be very stable, it is nearing its end-of-life (in Nov. 2021). Dataverse Software has now been tested with versions up to 13. If you decide to upgrade PostgreSQL, the tested and recommended way of doing that is as follows:
- Export your current database with
pg_dumpall; - Install the new version of PostgreSQL; (make sure it's running on the same port, etc. so that no changes are needed in the Payara configuration)
- Re-import the database with
psql, as the postgres user.
Consult the PostgreSQL upgrade documentation for more information, for example https://www.postgresql.org/docs/13/upgrading.html#UPGRADING-VIA-PGDUMPALL.
Solr Upgrade
With this release we upgrade to the latest available stable release in the Solr 8.x branch. We recommend a fresh installation of Solr 8.8.1 (the index will be empty) followed by an "index all".
Before you start the "index all", the Dataverse installation will appear to be empty because the search results come from Solr. As indexing progresses, partial results will appear until indexing is complete.
See http://guides.dataverse.org/en/5.4/installation/prerequisites.html#installing-solr for more information.
Managing Conditionally Required Metadata Fields
Prior to this release, when defining metadata for compound fields (via their dataset field types), fields could be either be optional or required, i.e. if required you must always have (at least one) value for that field. For example, Author Name being required means you must have at least one Author with an nonempty Author name.
In order to support more robust metadata (and specifically to resolve #7551), we need to allow a third case: Conditionally Required, that is, the field is required if and only if any of its "sibling" fields are entered. For example, Producer Name is now conditionally required in the citation metadata block. A user does not have to enter a Producer, but if they do, they have to enter a Producer Name.
This change required some modifications to how "required" is defined in the metadata .tsv files (for compound fields).
Prior to this release, the value of required for the parent compound field did not matter and so was set to false.
Going forward:
- For optional, the parent compound field would be required = false and all children would be required = false.
- For required, the parent compound field would be required = true and at least one child would be required = true.
- For conditionally required, the parent compound field would be required = false and at least one child would be required = true.
This release updates the citation .tsv file that is distributed with the software for the required parent compound fields (e.g. author), as well as sets Producer Name to be conditionally required. No other distributed .tsv files were updated, as they did not have any required compound values.
If you have created any custom metadata .tsv files, you will need to make the same (type of) changes there.
Citation Metadata Block Update
Due to the changes for Conditionally Required Metadata Fields, and a minor update in the citation metadata block to support extra ISO-639-3 language codes, a block upgrade is required. Instructions are provided below.
Retroactively Store Original File Size
Beginning in Dataverse Software 4.10, the size of the saved original file (for an ingested tabular datafile) was stored in the database. For files added before this change, we provide an API that retrieves and permanently stores the sizes for any already existing saved originals. See Datafile Integrity API for more information.
This was documented as a step in previous release notes, but we are noting it in these release notes to give it more visibility.
DB Cleanup for Saved Searches
A previous version of the Dataverse Software changed the indexing logic so that when a user links a Dataverse collection, its children are also indexed as linked. This means that the children do not need to be separately linked, and in this version we removed the logic that creates a saved search to create those links when a Dataverse collection is linked.
We recommend cleaning up the db to a) remove these saved searches and b) remove the links for the objects. We can do this via a few queries, which are available in the folder here:
https://github.com/IQSS/dataverse/raw/develop/scripts/issues/7398/
There are four sets of queries available, and they should be run in this order:
- ssfordeletion.txt to identify the Saved Searches to be deleted
- delete_ss.txt to delete the Saved Searches identified in the previous query
- dldfordeletion.txt to identify the linked datasets and Dataverse collections to be deleted
- delete_dld.txt to delete the linked datasets and Dataverse collections identified in the previous queries
Note: removing these saved searches and links should not affect what users will see as linked due to the aforementioned indexing change. Similarly, not removing these saved searches and links should not affect anything, but is a cleanup of unnecessary rows in the database.
DB Cleanup for Superusers Releasing without Version Updates
In datasets where a superuser has run the Curate command and the update included a change to the fileaccessrequest flag, those changes would not be reflected appropriately in the published version. This should be a rare occurrence.
Instead of an automated solution, we recommend inspecting the affected datasets and correcting the fileaccessrequest flag as appropriate. You can identify the affected datasets this via a query, which is available in the folder here:
https://github.com/IQSS/dataverse/raw/develop/scripts/issues/7687/
New JVM Options and Database Settings
For installations that were previously running on Dataverse Software 4.x, a number of new JVM options need to be added as part of the upgrade. The JVM Options are enumerated in the detailed upgrade instructions below.
Two new Database settings were added:
- :InstallationName
- :ExportInstallationAsDistributorOnlyWhenNotSet
For an overview of these new options, please see the Installation Guide
Notes for Tool Developers and Integrators
UTF-8 Characters and Spaces in File Names
UTF-8 characters in filenames are now preserved when downloaded.
Dataverse installations will no longer replace spaces in file names of downloaded files with the + character. If your tool or integration has any special handling around this, you may need to make further adjustments to maintain backwards compatibility while also supporting Dataverse installations on 5.4+.
Note that this follows a change from 5.1 that only corrected this for installations running with S3 storage. This makes the behavior consistent across installations running all types of file storage.
Complete List of Changes
For the complete list of code changes in this release, see the 5.4 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse Software 4.x to Dataverse Software 5 following the instructions in the Dataverse Software 5 Release Notes. After upgrading from the 4.x series to 5.0, you should progress through the other 5.x releases before attempting the upgrade to 5.4.
1. Upgrade to Java 11.
2. Upgrade to Solr 8.8.1.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo to change to that user first. For example, sudo -i -u dataverse if dataverse is your dedicated application user.
In the following commands we assume that Payara 5 is installed in /usr/local/payara5. If not, adjust as needed.
export PAYARA=/usr/local/payara5
(or setenv PAYARA /usr/local/payara5 if you are using a csh-like shell)
3. Undeploy the previous version.
$PAYARA/bin/asadmin list-applications$PAYARA/bin/asadmin undeploy dataverse<-version>
4. Stop Payara and remove the generated directory
service payara stoprm -rf $PAYARA/glassfish/domains/domain1/generated
5. (only required for installations previously running Dataverse Software 4.x!) In other words, if you have a domain.xml that originated under Glassfish 4, the below JVM Options need to be added. If your Dataverse installation was first installed on the 5.x series, these JVM options should already be present.
In domain.xml:
Remove the following JVM options from the <config name="server-config"><java-config> section:
<jvm-options>-Djava.endorsed.dirs=/usr/local/payara5/glassfish/modules/endorsed:/usr/local/payara5/glassfish/lib/endorsed</jvm-options>
<jvm-options>-Djava.ext.dirs=${com.sun.aas.javaRoot}/lib/ext${path.separator}${com.sun.aas.javaRoot}/jre/lib/ext${path.separator}${com.sun.aas.instanceRoot}/lib/ext</jvm-options>
Add the following JVM options to the <config name="server-config"><java-config> section:
<jvm-options>[9|]--add-opens=java.base/jdk.internal.loader=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-exports=java.base/jdk.internal.ref=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.base/java.lang=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.base/java.net=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.base/java.nio=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.base/java.util=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.base/sun.nio.ch=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.management/sun.management=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.base/sun.net.www.protocol.jrt=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.base/sun.net.www.protocol.jar=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.naming/javax.naming.spi=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED</jvm-options>
<jvm-options>[9|]--add-opens=java.logging/java.util.logging=ALL-UNNAMED</jvm-options>
6. Start Payara
service payara start
7. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-5.4.war
8. Restart payara
service payara stopservice payara start
9. Reload Citation Metadata Block:
wget https://github.com/IQSS/dataverse/releases/download/v5.4/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
Additional Release Steps
1. Confirm that the schema.xml was updated with the new v5.4 version when you updated Solr.
2. Run the script updateSchemaMDB.sh to generate updated solr schema files and preserve any other custom fields in your Solr configuration.
For example: (modify the path names as needed)
cd /usr/local/solr-8.8.1/server/solr/collection1/conf wget https://github.com/IQSS/dataverse/releases/download/v5.4/updateSchemaMDB.sh chmod +x updateSchemaMDB.sh ./updateSchemaMDB.sh -t .
See https://guides.dataverse.org/en/5.4/admin/metadatacustomization.html#updating-the-solr-schema for more information.
3. Do a clean reindex by first clearing then indexing. Re-indexing is required to get full-functionality from this change. Please refer to the guides on how to clear and index if needed.
4. Upgrade Postgres.
- Export your current database with
pg_dumpall; - Install the new version of PostgreSQL; (make sure it's running on the same port, etc. so that no changes are needed in the Payara configuration)
- Re-import the database with
psql, as the postgres user.
Consult the PostgreSQL upgrade documentation for more information, for example https://www.postgresql.org/docs/13/upgrading.html#UPGRADING-VIA-PGDUMPALL.
5. Retroactively store original file size
Use the Datafile Integrity API to ensure that the sizes of all original files are stored in the database.
6. DB Cleanup for Superusers Releasing without Version Updates
In datasets where a superuser has run the Curate command and the update included a change to the fileaccessrequest flag, those changes would not be reflected appropriately in the published version. This should be a rare occurrence.
Instead of an automated solution, we recommend inspecting the affected datasets and correcting the fileaccessrequest flag as appropriate. You can identify the affected datasets this via a query, which is available in the folder here:
https://github.com/IQSS/dataverse/raw/develop/scripts/issues/7687/
7. (Optional, but recommended) DB Cleanup for Saved Searches and Linked Objects
Perform the DB Cleanup for Saved Searches and Linked Objects, summarized in the "Notes for Dataverse Installation Administrators" section above.
8. Take a backup of the Worldmap links, if any.
9. (Only required if custom metadata blocks are used in your Dataverse installation) Update any custom metadata blocks:
In the .tsv for any custom metadata blocks, for any subfield that has a required value of TRUE, find the corresponding parent field and change its required value to TRUE.
Note: As there is an accompanying Flyway script that updates the values directly in the database, you do not need to reload these metadata .tsv files via API, unless you make additional changes, e.g set some compound fields to be conditionally required.
- Java
Published by kcondon about 5 years ago
dataverse - v5.3
Dataverse 5.3
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Auxiliary Files (Experimental)
Auxiliary files can now be added to datafiles and accessed using new experimental API endpoints. These endpoints allow additional, non-Dataverse generated metadata to be added alongside datafiles in dataverse.
The support for auxiliary files in Dataverse is being driven by integration with the Open Differential Privacy (DP) Project and is designed to support the deposit and retrieval of differentially private metadata, but the endpoints are not specific to differential privacy use cases.
Additional Banner Functionality
Banners in Dataverse can now be set to allow dismissal by a logged in user. Previously, banners would persist until they were removed by an administrator. This allows administrators to more easily communicate one-time messages to users.
File Tags Searchable from Advanced Search and Dataset Search
File tags ("Documentation", "Data", "Code", etc.) now appear on the Advanced Search page.
Performing a search for files on the dataset page now includes file tags. Previously, only file name and file description were searched.
Easier Configuration of Database Connections
Previously, the configuration of the database connections has been quite static and not very easy to update. This has been an issue especially for cloud and container usage. Using new technologies provided by the move to Payara, you can now more easily configure the connection to your PostgreSQL DB.
Using MicroProfile Config API (Issue #7000, Issue #7418), you can much more easily specify configuration details. For an overview of supported options, please see the Installation Guide.
Note that some settings have been moved from domain.xml to code, such as min and max pool size.
Major Use Cases
Newly-supported use cases in this release include:
- Users can use an API to add auxiliary files to files in order to provide metadata representations for specific tools or integrations (Issue #7275, PR #7350)
- Administrators can use a new API to manage banner messages and take advantage of new banner display options (Issue #7263, PR #7434)
- Users replacing files will now have their files renamed when a file name conflict exists, making the behavior consistent with upload and edit (Issue #7335, PR #7336)
- Users will now be able to search on file tags on the advanced search and dataset pages (Issue #7194, PR #7385)
Notes for Dataverse Installation Administrators
Payara 5.2020.6 (or Higher) Required
Some changes in this release require an upgrade to Payara 5.2020.6 or higher.
Instructions on how to update can be found in the Payara documentation
New Banner API, Obsolete DB Settings
The functionality previously provided by the DB settings :StatusMessageHeader and ::StatusMessageText is no longer supported and is now provided through the Manage Banner Messages API. Learn more in the API Guide.
New Database Settings and JVM Options
Several new JVM options have been added in this release:
- dataverse.db.name
- dataverse.db.user
- dataverse.db.password
- dataverse.db.host
- dataverse.db.port
For an overview of these new options, please see the Installation Guide
See above note about obsolete DB options.
Introducing MicroProfile Config API
With this Dataverse release, Dataverse Administrators can start to make use of the MicroProfile Config API.
This will benefit both developers and sysadmins, but the codebase will have to be refactored to make use of it. As this will take time, we will always provide a backward compatible way of using it.
For more details about these new options, please see the Consuming Configuration section of the Developer Guide.
Java Message System Configuration
The Ingest process uses the Java Message System to create ingest tasks in a queue. That queue had been configured from command line or domain.xml before. This has now changed to being done in code.
In the unlikely case you might want to change any of these settings, feel free to change and recompile or raise an issue on Github. See IngestQueueProducer for more details.
If you want to clean up your existing installation, you can delete the old, unused queue like this:
<payara install path>/bin/asadmin delete-connector-connection-pool --cascade=true jms IngestQueueConnectionFactoryPool
Notes for Tool Developers and Integrators
Experimental Auxiliary File Support
Experimental endpoints have been added to allow auxiliary files to be added to datafiles. These auxiliary files can be deposited and accessed via API. Later releases will include options for accessing these files through the UI. For more information, see the Auxiliary File Support section of the Developer Guide.
Complete List of Changes
For the complete list of code changes in this release, see the 5.3 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.
1. Upgrade to Payara 5.2020.6 or higher.
Instructions on how to update can be found in the Payara documentation.
It would likely be safer to upgrade Payara first, while still running Dataverse 5.2, and then proceed with the steps below. Upgrading from an earlier version of Payara should be a straightforward process: Undeploy Dataverse; stop Payara; move the current Payara directory out of the way; unzip the new Payara version in its place; replace the brand new payara/glassfish/domains/domain1 with your old, preserved domain1; start Payara, deploy Dataverse 5.2. We still recommend that you read the detailed upgrade instructions above; and, if you run into any issues with this upgrade, it will help to be able to separate them from any problems with the upgrade of Dataverse proper.
If you are still using pre-5.0 version of Dataverse, and Glassfish version 4, please follow the upgrade instructions in the Dataverse 5.0 release notes; but use the latest version of Payara 5 (5.2020.7, as of this writing).
2. Undeploy the previous version.
<payara install path>/bin/asadmin list-applications<payara install path>/bin/asadmin undeploy dataverse<-version>
(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)
3. Update your database connection.
Please configure your connection details, replacing all the ${DB_...}.
<payara install path>/bin/asadmin create-system-properties "dataverse.db.user=${DB_USER}"<payara install path>/bin/asadmin create-system-properties "dataverse.db.host=${DB_HOST}"<payara install path>/bin/asadmin create-system-properties "dataverse.db.port=${DB_PORT}"<payara install path>/bin/asadmin create-system-properties "dataverse.db.name=${DB_NAME}"echo "AS_ADMIN_ALIASPASSWORD=${DB_PASS}" > /tmp/password.txt<payara install path>/bin/asadmin create-password-alias --passwordfile /tmp/password.txt dataverse.db.passwordrm /tmp/password.txt
4. In domain.xml, verify that the __TimerPool jdbc-connection-pool is using the H2 database, as follows (if you have the old Derby version from Glassfish 4, replace it):
<jdbc-connection-pool datasource-classname="org.h2.jdbcx.JdbcDataSource" name="__TimerPool" res-type="javax.sql.XADataSource"> <property name="URL" value="jdbc:h2:${com.sun.aas.instanceRoot}/lib/databases/ejbtimer;AUTO_SERVER=TRUE"></property> </jdbc-connection-pool>
5. Reset the EJB timer database back to default:
<payara install path>/bin/asadmin set configs.config.server-config.ejb-container.ejb-timer-service.timer-datasource=jdbc/__TimerPool
6. Delete the old password alias and DB pool:
<payara install path>/bin/asadmin delete-jdbc-connection-pool --cascade=true dvnDbPool<payara install path>/bin/asadmin delete-password-alias db_password_alias
7. Stop payara, remove the generated and ejbtimer database directories, then restart.
service payara stoprm -rf <payara install path>/glassfish/domains/domain1/generatedrm -rf <payara install path>/glassfish/domains/domain1/lib/databases/ejbtimerservice payara start
8. Deploy this version.
<payara install path>/bin/asadmin deploy dataverse-5.3.war
9. Restart payara
service payara stopservice payara start
- Java
Published by kcondon over 5 years ago
dataverse - v5.2
Dataverse 5.2
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
File Preview When Guestbooks or Terms Exist
Previously, file preview was only available when files were publicly downloadable. Now if a guestbook or terms (or both) are configured for the dataset, they will be shown in the Preview tab and once they are agreed to, the file preview will appear (#6919).
Preview Only External Tools
A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919).
Dataset Page Edit Options Consolidation
As part of the continued effort to redesign the Dataset and File pages, some of the edit options for a file on the dataset page are being moved to a "kebab" to allow for better consistency and future scalability.
Google Cloud Archiver
Dataverse Bags can now be sent to a bucket in Google Cloud, including those in the "Coldline" storage class, which provides less expensive but slower access.
Major Use Cases
Newly-supported use cases in this release include:
- Users can now preview files that have a guestbook or terms. (Issue #6919, PR #7369)
- External tool developers can indicate that their tool is "preview only". (Issue #6919, PR #7369)
- Dataverse Administrators can set up a regular export to Google Cloud so that the installation's data is preserved (Issue #7140, PR #7292)
- Dataverse Administrators can use a regex when defining a group (Issue #7344, PR #7351)
- External Tool Developers can use a new API endpoint to retrieve a user's information (Issue #7307, PR #7345)
Notes for Dataverse Installation Administrators
Converting Explore External Tools to Preview Only
When the war file is deployed, a SQL migration script will convert dataverse-previewers to have both "explore" and "preview" types so that they will continue to be displayed in the Preview tab.
If you would prefer that these tools be preview only, you can delete the tools, adjust the JSON manifests (changing "explore" to "preview"), and re-add them.
New Database Settings and JVM Options
Installations integrating with Google Cloud Archiver will need to use two new database settings:
- :GoogleCloudProject - the name of the project managing the bucket
- :GoogleCloudBucket - the name of the bucket to use
For more information, see the Google Cloud Configuration section of the Installation Guide
Automation of Make Data Count Scripts
Scripts have been added in order to automate Make Data Count processing. For more information, see the Make Data Count section of the Admin Guide.
Notes for Tool Developers and Integrators
Preview Only External Tools, "hasPreviewMode"
A new external tool type called "preview" has been added that prevents the tool from being displayed under "Explore Options" under the "Access File" button on the file landing page (#6919). This "preview" type replaces "hasPreviewMode", which has been removed.
Multiple Types for External Tools
External tools now support multiple types. In practice, the types "explore" and "preview" are the only combination that makes a difference in the UI as opposed to only having only one or the other type (see "preview only" above). Multiple types are specified in the JSON manifest with an array in "types". The older, single "type" is still supported but should be considered deprecated.
User Information Endpoint
New API endpoint to retrieve user info so that tools can email users if needed.
Complete List of Changes
For the complete list of code changes in this release, see the 5.2 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade Instructions
0. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.
1. Undeploy the previous version.
<payara install path>/bin/asadmin list-applications<payara install path>/bin/asadmin undeploy dataverse<-version>
(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)
2. Stop payara and remove the generated directory, start.
service payara stop- remove the generated directory:
rm -rf <payara install path>/payara/domains/domain1/generated service payara start
3. Deploy this version.
<payara install path>/bin/asadmin deploy dataverse-5.2.war
4. Restart payara
service payara stopservice payara start
- Java
Published by kcondon over 5 years ago
dataverse - v5.1.1
Dataverse 5.1.1
This minor release adds important scaling improvements for installations running on AWS S3. It is recommended that 5.1.1 be used in production instead of 5.1.
Release Highlights
Connection Pool Size Configuration Option, Connection Optimizations
Dataverse 5.1 improved the efficiency of making S3 connections through use of an http connection pool. This release adds optimizations around closing streams and channels that may hold S3 http connections open and exhaust the connection pool. In parallel, this release increases the default pool size from 50 to 256 and adds the ability to increase the size of the connection pool, so a larger pool can be configured if needed.
Major Use Cases
Newly-supported use cases in this release include:
- Administrators of installations using S3 will be able to define the connection pool size, allowing better resource scaling for larger installations (Issue #7309, PR #7313)
Notes for Dataverse Installation Administrators
5.1.1 vs. 5.1 for Production Use
As mentioned above, we encourage 5.1.1 instead of 5.1 for production use.
New JVM Option for Connection Pool Size
Larger installations may want to increase the number of open S3 connections allowed (default is 256). For example, to set the value to 4096:
./asadmin create-jvm-options "-Ddataverse.files.<id>.connection-pool-size=4096"
(where <id> is the identifier of your S3 file store (likely "s3"). The JVM Options section of the Configuration Guide has more information.
Complete List of Changes
For the complete list of code changes in this release, see the 5.1.1 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide
Upgrade Instructions
These instructions assume that you've already successfully upgraded to Dataverse 5.1 following the instructions in the Dataverse 5.1 Release Notes.
Undeploy the previous version.
<payara install path>/bin/asadmin list-applications
<payara install path>/bin/asadmin undeploy dataverse<-version>
- Stop payara and remove the generated directory, start.
service payara stop- remove the generated directory:
rm -rf <payara install path>/glassfish/domains/domain1/generated service payara start
Deploy this version.
<payara install path>/bin/asadmin deploy dataverse-5.1.1.warRestart payara
- Java
Published by kcondon over 5 years ago
dataverse - Dataverse 5.1
Dataverse 5.1
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Large File Upload for Installations Using AWS S3
The added support for multipart upload through the API and UI (Issue #6763) will allow files larger than 5 GB to be uploaded to Dataverse when an installation is running on AWS S3. Previously, only non-AWS S3 storage configurations would allow uploads larger than 5 GB.
Dataset-Specific Stores
In previous releases, configuration options were added that allow each dataverse to have a specific store enabled. This release adds even more granularity, with the ability to set a dataset-level store.
Major Use Cases
Newly-supported use cases in this release include:
- Users can now upload files larger than 5 GB on installations running AWS S3 (Issue #6763, PR #6995)
- Administrators will now be able to specify a store at the dataset level in addition to the Dataverse level (Issue #6872, PR #7272)
- Users will have their dataset's directory structure retained when uploading a dataset with shapefiles (Issue #6873, PR #7279)
- Users will now be able to download zip files through the experimental Zipper service when the set of downloaded files have duplicate names (Issue #80, PR #7276)
- Users will now be able to download zip files with the proper file structure through the experiment Zipper service (Issue #7255, PR #7258)
- Administrators will be able to use new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause stale search results to not load. (Issue #4225, PR #7211)
Notes for Dataverse Installation Administrators
New API for setting a Dataset-level Store
- This release adds a new API for setting a dataset-specific store. Learn more in the Managing Dataverse and Datasets section of the Admin Guide.
Multipart Upload Storage Monitoring, Recommended Use for Multipart Upload
Charges may be incurred for storage reserved for multipart uploads that are not completed or cancelled. Administrators may want to do periodic manual or automated checks for open multipart uploads. Learn more in the Big Data Support section of the Developers Guide.
While multipart uploads can support much larger files, and can have advantages in terms of robust transfer and speed, they are more complex than single part direct uploads. Administrators should consider taking advantage of the options to limit use of multipart uploads to specific users by using multiple stores and configuring access to stores with high file size limits to specific Dataverses (added in 4.20) or Datasets (added in this release).
New APIs for keeping Solr records in sync
This release adds new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause search results to not load. Learn more in the Solr section of the Admin Guide.
Documentation for Purging the Ingest Queue
At times, it may be necessary to cancel long-running Ingest jobs in the interest of system stability. The Troubleshooting section of the Admin Guide now has specific steps.
Biomedical Metadata Block Updated
The Life Science Metadata block (biomedical.tsv) was updated. "Other Design Type", "Other Factor Type", "Other Technology Type", "Other Technology Platform" boxes were added. See the "Additional Upgrade Steps" below if you use this in your installation.
Notes for Tool Developers and Integrators
Spaces in File Names
Dataverse Installations using S3 storage will no longer replace spaces in file names of downloaded files with the + character. If your tool or integration has any special handling around this, you may need to make further adjustments to maintain backwards compatibility while also supporting Dataverse installations on 5.1+.
Complete List of Changes
For the complete list of code changes in this release, see the 5.1 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide
Upgrade Instructions
These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the Dataverse 5 Release Notes.
Undeploy the previous version.
<payara install path>/bin/asadmin list-applications
<payara install path>/bin/asadmin undeploy dataverse<-version>
(where <payara install path> is where Payara 5 is installed, for example: /usr/local/payara5)
- Stop payara and remove the generated directory, start.
service payara stop- remove the generated directory:
rm -rf <payara install path>/payara/domains/domain1/generated service payara start
Deploy this version.
<payara install path>/bin/asadmin deploy dataverse-5.1.warRestart payara
Additional Upgrade Steps
- Update Biomedical Metadata Block (if used), Reload Solr, ReExportAll
wget https://github.com/IQSS/dataverse/releases/download/v5.1/biomedical.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @biomedical.tsv -H "Content-type: text/tab-separated-values"
Check if your Solr installation is running with the latest
schema.xmlconfig file (https://github.com/IQSS/dataverse/releases/download/v5.1/schema.xml), update if needed.Run the script updateSchemaMDB.sh to generate updated solr schema files and preserve any other custom fields in your Solr configuration. For example: (modify the path names as needed)
cd /usr/local/solr-7.7.2/server/solr/collection1/confwget https://github.com/IQSS/dataverse/releases/download/v5.1/updateSchemaMDB.shchmod +x updateSchemaMDB.sh./updateSchemaMDB.sh -t .See http://guides.dataverse.org/en/5.1/admin/metadatacustomization.html?highlight=updateschemamdb for more information.Run ReExportall to update JSON Exports
http://guides.dataverse.org/en/5.1/admin/metadataexport.html?highlight=export#batch-exports-through-the-api
- Java
Published by kcondon over 5 years ago
dataverse - Dataverse 5.0
Dataverse 5.0
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Please note that this is a major release and these are long release notes. We offer no apologies. :)
Release Highlights
Continued Dataset and File Redesign: Dataset and File Button Redesign, Responsive Layout
The buttons available on the Dataset and File pages have been redesigned. This change is to provide more scalability for future expanded options for data access and exploration, and to provide a consistent experience between the two pages. The dataset and file pages have also been redesigned to be more responsive and function better across multiple devices.
This is an important step in the incremental process of the Dataset and File Redesign project, following the release of on-page previews, filtering and sorting options, tree view, and other enhancements. Additional features in support of these redesign efforts will follow in later 5.x releases.
Payara 5
A major upgrade of the application server provides security updates, access to new features like MicroProfile Config API, and will enable upgrades to other core technologies.
Note that moving from Glassfish to Payara will be required as part of the move to Dataverse 5.
Download Dataset
Users can now more easily download all files in Dataset through both the UI and API. If this causes server instability, it's suggested that Dataverse Installation Administrators take advantage of the new Standalone Zipper Service described below.
Download All Option on the Dataset Page
In previous versions of Dataverse, downloading all files from a dataset meant several clicks to select files and initiate the download. The Dataset Page now includes a Download All option for both the original and archival formats of the files in a dataset under the "Access Dataset" button.
Download All Files in a Dataset by API
In previous versions of Dataverse, downloading all files from a dataset via API was a two step process:
- Find all the database ids of the files.
- Download all the files, using those ids (comma-separated).
Now you can download all files from a dataset (assuming you have access to them) via API by passing the dataset persistent ID (PID such as DOI or Handle) or the dataset's database id. Versions are also supported, and you can pass :draft, :latest, :latest-published, or numbers (1.1, 2.0) similar to the "download metadata" API.
A Multi-File, Zipped Download Optimization
In this release we are offering an experimental optimization for the multi-file, download-as-zip functionality. If this option is enabled, instead of enforcing size limits, we attempt to serve all the files that the user requested (that they are authorized to download), but the request is redirected to a standalone zipper service running as a cgi executable. Thus moving these potentially long-running jobs completely outside the Application Server (Payara); and preventing service threads from becoming locked serving them. Since zipping is also a CPU-intensive task, it is possible to have this service running on a different host system, thus freeing the cycles on the main Application Server. The system running the service needs to have access to the database as well as to the storage filesystem, and/or S3 bucket.
Please consult the scripts/zipdownload/README.md in the Dataverse 5 source tree.
The components of the standalone "zipper tool" can also be downloaded here:
https://github.com/IQSS/dataverse/releases/download/v5.0/zipper.zip
Updated File Handling
Files without extensions can now be uploaded through the UI. This release also changes the way Dataverse handles duplicate (filename or checksum) files in a dataset. Specifically:
- Files with the same checksum can be included in a dataset, even if the files are in the same directory.
- Files with the same filename can be included in a dataset as long as the files are in different directories.
- If a user uploads a file to a directory where a file already exists with that directory/filename combination, Dataverse will adjust the file path and names by adding "-1" or "-2" as applicable. This change will be visible in the list of files being uploaded.
- If the directory or name of an existing or newly uploaded file is edited in such a way that would create a directory/filename combination that already exists, Dataverse will display an error.
- If a user attempts to replace a file with another file that has the same checksum, an error message will be displayed and the file will not be able to be replaced.
- If a user attempts to replace a file with a file that has the same checksum as a different file in the dataset, a warning will be displayed.
- Files without extensions can now be uploaded through the UI.
Pre-Publish DOI Reservation with DataCite
Dataverse installations using DataCite will be able to reserve the persistent identifiers for datasets with DataCite ahead of publishing time. This allows the DOI to be reserved earlier in the data sharing process and makes the step of publishing datasets simpler and less error-prone.
Primefaces 8
Primefaces, the open source UI framework upon which the Dataverse front end is built, has been updated to the most recent version. This provides security updates and bug fixes and will also allow Dataverse developers to take advantage of new features and enhancements.
Major Use Cases
Newly-supported use cases in this release include:
- Users will be presented with a new workflow around dataset and file access and exploration. (Issue #6684, PR #6909)
- Users will experience a UI appropriate across a variety of device sizes. (Issue #6684, PR #6909)
- Users will be able to download an entire dataset without needing to select all the files in that dataset. (Issue #6564, PR #6262)
- Users will be able to download all files in a dataset with a single API call. (Issue #4529, PR #7086)
- Users will have DOIs reserved for their datasets upon dataset create instead of at publish time. (Issue #5093, PR #6901)
- Users will be able to upload files without extensions. (Issue #6634, PR #6804)
- Users will be able to upload files with the same name in a dataset, as long as a those files are in different file paths. (Issue #4813, PR #6924)
- Users will be able to upload files with the same checksum in a dataset. (Issue #4813, PR #6924)
- Users will be less likely to encounter locks during the publishing process due to PID providers being unavailable. (Issue #6918, PR #7118)
- Users will now have their files validated during publish, and in the unlikely event that anything has happened to the files between deposit and publish, they will be able to take corrective action. (Issue #6558, PR #6790)
- Administrators will likely see more success with Harvesting, as many minor harvesting issues have been resolved. (Issues #7127, #7128, #4597, #7056, #7052, #7023, #7009, and #7003)
- Administrators can now enable an external zip service that frees up application server resources and allows the zip download limit to be increased. (Issue #6505, PR #6986)
- Administrators can now create groups based on users' email domains. (Issue #6936, PR #6974)
- Administrators can now set date facets to be organized chronologically. (Issue #4977, PR #6958)
- Administrators can now link harvested datasets using an API. (Issue #5886, PR #6935)
- Administrators can now destroy datasets with mapped shapefiles. (Issue #4093, PR #6860)
Notes for Dataverse Installation Administrators
Glassfish to Payara
This upgrade requires a few extra steps. See the detailed upgrade instructions below.
Dataverse Installations Using DataCite: Upgrade Action Required
If you are using DataCite as your DOI provider you must add a new JVM option called "doi.dataciterestapiurlstring" with a value of "https://api.datacite.org" for production environments and "https://api.test.datacite.org" for test environments. More information about this JVM option can be found in the Installation Guide.
"doi.mdcbaseurlstring" should be deleted if it was previously set.
Dataverse Installations Using DataCite: Upgrade Action Recommended
For installations that are using DataCite, Dataverse v5.0 introduces a change in the process of registering the Persistent Identifier (DOI) for a dataset. Instead of registering it when the dataset is published for the first time, Dataverse will try to "reserve" the DOI when it's created (by registering it as a "draft", using DataCite terminology). When the user publishes the dataset, the DOI will be publicized as well (by switching the registration status to "findable"). This approach makes the process of publishing datasets simpler and less error-prone.
New APIs have been provided for finding any unreserved DataCite-issued DOIs in your Dataverse, and for reserving them (see below). While not required - the user can still attempt to publish a dataset with an unreserved DOI - having all the identifiers reserved ahead of time is recommended. If you are upgrading an installation that uses DataCite, we specifically recommend that you reserve the DOIs for all your pre-existing unpublished drafts as soon as Dataverse v5.0 is deployed, since none of them were registered at create time. This can be done using the following API calls:
/api/pids/unreservedwill report the ids of the datasets/api/pids/:persistentId/reservereserves the assigned DOI with DataCite (will need to be run on every id reported by the the first API).
See the Native API Guide for more information.
Scripted, the whole process would look as follows (adjust as needed):
``` API_TOKEN='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
curl -s -H "X-Dataverse-key:$APITOKEN" http://localhost:8080/api/pids/unreserved | # the API outputs JSON; note the use of jq to parse it: jq '.data.count[].pid' | tr -d '"' | while read doi do curl -s -H "X-Dataverse-key:$APITOKEN" -X POST http://localhost:8080/api/pids/:persistentId/reserve?persistentId=$doi done ```
Going forward, once all the DOIs have been reserved for the legacy drafts, you may still get an occasional dataset with an unreserved identifier. DataCite service instability would be a potential cause. There is no reason to expect that to happen often, but it is not impossible. You may consider running the script above (perhaps with some extra diagnostics added) regularly, from a cron job or otherwise, to address this preemptively.
Terms of Use Display Updates
In this release we’ve fixed an issue that would cause the Application Terms of Use to not display when the user's language is set to a language that does not match one of the languages for which terms were created and registered for that Dataverse installation. Instead of the expected Terms of Use, users signing up could receive the “There are no Terms of Use for this Dataverse installation” message. This could potentially result in some users signing up for an account without having the proper Terms of Use displayed. This will only affect installations that use the :ApplicationTermsOfUse setting.
Please note that there is not currently a native workflow in Dataverse to display updated Terms of Use to a user or to force re-agreement. This would only potentially affect users that have signed up since the upgrade to 4.17 (or a following release if 4.17 was skipped).
Datafiles Validation when Publishing Datasets
When a user requests to publish a dataset, Dataverse will now attempt to validate the physical files in the dataset, by recalculating the checksums and verifying them against the values in the database. The goal is to prevent any corrupted files in published datasets. Most of all the instances of actual damage to physical files that we've seen in the past happened while the datafiles were still in the Draft state. (Physical files become essentially read-only once published). So this is the logical place to catch any such issues.
If any files in the dataset fail the validation, the dataset does not get published, and the user is notified that they need to contact their Dataverse support in order to address the issue before another attempt to publish can be made. See the "Troubleshooting" section of the Guide on how to fix such problems.
This validation will be performed asynchronously, the same way as the registration of the file-level persistent ids. Similarly to the file PID registration, this validation process can be disabled on your system, with the setting :FileValidationOnPublishEnabled. (A Dataverse admin may choose to disable it if, for example, they are already running an external auditing system to monitor the integrity of the files in their Dataverse, and would prefer the publishing process to take less time). See the Configuration section of the Installation Guide.
Please note that we are not aware of any bugs in the current versions of Dataverse that would result in damage to users' files. But you may have some legacy files in your archive that were affected by some issue in the past, or perhaps affected by something outside Dataverse, so we are adding this feature out of abundance of caution. An example of a problem we've experienced in the early versions of Dataverse was a possible scenario where a user actually attempted to delete a Draft file from an unpublished version, where the database transaction would fail for whatever reason, but only after the physical file had already been deleted from the filesystem. Thus resulting in a datafile entry remaining in the dataset, but with the corresponding physical file missing. The fix for this case, since the user wanted to delete the file in the first place, is simply to confirm it and purge the datafile entity from the database.
The Setting :PIDAsynchRegFileCount is Deprecated as of 5.0
It used to specify the number of datafiles in the dataset to warrant adding a lock during publishing. As of v5.0 all datasets get locked for the duration of the publishing process. The setting will be ignored if present.
Location Changes for Related Projects
The dataverse-ansible and dataverse-previewers repositories have been moved to the GDCC Organization on GitHub. If you have been referencing the dataverse-ansible repository from IQSS and the dataverse-previewers from QDR, please instead use them from their new locations:
https://github.com/GlobalDataverseCommunityConsortium/dataverse-ansible https://github.com/GlobalDataverseCommunityConsortium/dataverse-previewers
Harvesting Improvements
Many updates have been made to address common Harvesting failures. You may see Harvests complete more often and have a higher success rate on a dataset-by-dataset basis.
New JVM Options and Database Settings
Several new JVM options and DB Settings have been added in this release. More documentation about each of these settings can be found in the Configuration section of the Installation Guide.
New JVM Options
- doi.dataciterestapiurlstring: Set with a value of "https://api.datacite.org" for production environments and "https://api.test.datacite.org" for test environments. Must be set if you are using DataCite as your DOI provider.
- dataverse.useripaddresssourceheader: If set, specifies an HTTP Header such as X-Forwarded-For to use to retrieve the user's IP address. This setting is useful in cases such as running Dataverse behind load balancers where the default option of getting the Remote Address from the servlet isn't correct (e.g. it would be the load balancer IP address). Note that unless your installation always sets the header you configure here, this could be used as a way to spoof the user's address. See the Configuration section of the Installation Guide for more information about proper use and security concerns.
- http.request-timeout-seconds: To facilitate large file upload and download, the Dataverse installer bumps the Payara server-config.network-config.protocols.protocol.http-listener-1.http.request-timeout-seconds setting from its default 900 seconds (15 minutes) to 1800 (30 minutes).
New Database Settings
- :CustomZipDownloadServiceUrl: If defined, this is the URL of the zipping service outside the main application server where zip downloads should be directed (instead of /api/access/datafiles/).
- :ShibAttributeCharacterSetConversionEnabled: By default, all attributes received from Shibboleth are converted from ISO-8859-1 to UTF-8. You can disable this behavior by setting to false.
- :ChronologicalDateFacets: Facets with Date/Year are sorted chronologically by default, with the most recent value first. To have them sorted by number of hits, e.g. with the year with the most results first, set this to false.
- :NavbarGuidesUrl: Set to a fully-qualified URL which will be used for the "User Guide" link in the navbar.
- :FileValidationOnPublishEnabled: Toggles validation of the physical files in the dataset when it's published, by recalculating the checksums and comparing against the values stored in the DataFile table. By default this setting is absent and Dataverse assumes it to be true. If enabled, the validation will be performed asynchronously, similarly to how we handle assigning persistent identifiers to datafiles, with the dataset locked for the duration of the publishing process.
Custom Analytics Code Changes
You should update your custom analytics code to implement necessary changes for tracking updated dataset and file buttons. There was also a fix to the analytics code that will now properly track downloads for tabular files.
For more information, see the documentation and sample analytics code snippet provided in Installation Guide > Configuration > Web Analytics Code to reflect the changes implemented in this version (#6938/#6684).
Tracking Users' IP Addresses Behind an Address-Masking Proxy
It is now possible to collect real user IP addresses in MDC logs and/or set up an IP group on a system running behind a proxy/load balancer that hides the addresses of incoming requests. See "Recording User IP Addresses" in the Configuration section of the Installation Guide.
Reload Astrophysics Metadata Block (if used)
Tooltips have been updated for the Astrophysics Metadata Block. If you'd like these updated Tooltips to be displayed to users of your installation, you should update the Astrophysics Metadata Block:
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @astrophysics.tsv -H "Content-type: text/tab-separated-values"
We've included this in the step-by-step instructions below.
Run ReExportall
We made changes to the JSON Export in this release. If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process following the steps in Admin Guide
We've included this in the step-by-step instructions below.
Notes for Tool Developers and Integrators
Complete List of Changes
For the complete list of code changes in this release, see the 5.0 Milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide
Upgrade Instructions
Upgrade from Glassfish 4.1 to Payara 5
The instructions below describe the upgrade procedure based on moving an existing glassfish4 domain directory under Payara. We recommend this method instead of setting up a brand-new Payara domain using the installer because it appears to be the easiest way to recreate your current configuration and preserve all your data.
- Download Payara, v5.2020.2 as of this writing:
curl -L -O https://github.com/payara/Payara/releases/download/payara-server-5.2020.2/payara-5.2020.2.zip
sha256sum payara-5.2020.2.zip
1f5f7ea30901b1b4c7bcdfa5591881a700c9b7e2022ae3894192ba97eb83cc3e
- Unzip it somewhere (/usr/local is a safe bet)
sudo unzip payara-5.2020.2.zip -d /usr/local/
- Copy the Postgres driver to /usr/local/payara5/glassfish/lib
sudo cp /usr/local/glassfish4/glassfish/lib/postgresql-42.2.9.jar /usr/local/payara5/glassfish/lib/
- Move payara5/glassfish/domains/domain1 out of the way
sudo mv /usr/local/payara5/glassfish/domains/domain1 /usr/local/payara5/glassfish/domains/domain1.orig
- Undeploy the Dataverse web application (if deployed; version 4.20 is assumed in the example below)
sudo /usr/local/glassfish4/bin/asadmin list-applications
sudo /usr/local/glassfish4/bin/asadmin undeploy dataverse-4.20
- Stop Glassfish; copy domain1 to Payara
sudo /usr/local/glassfish4/bin/asadmin stop-domain
sudo cp -ar /usr/local/glassfish4/glassfish/domains/domain1 /usr/local/payara5/glassfish/domains/
- Remove the cache directories
sudo rm -rf /usr/local/payara5/glassfish/domains/domain1/generated/
sudo rm -rf /usr/local/payara5/glassfish/domains/domain1/osgi-cache/
- In domain.xml:
Replace the -XX:PermSize and -XX:MaxPermSize JVM options with -XX:MetaspaceSize and -XX:MaxMetaspaceSize.
Add the below JVM options beneath the -Ddataverse settings:
Also in domain.xml, replace the follow element:
<jdbc-connection-pool datasource-classname="org.apache.derby.jdbc.EmbeddedXADataSource" name="__TimerPool" res-type="javax.sql.XADataSource"> <property name="databaseName" value="${com.sun.aas.instanceRoot}/lib/databases/ejbtimer"></property><property name="connectionAttributes" value=";create=true"></property></jdbc-connection-pool>
with
<jdbc-connection-pool datasource-classname="org.h2.jdbcx.JdbcDataSource" name="__TimerPool" res-type="javax.sql.XADataSource"> <property name="URL" value="jdbc:h2:${com.sun.aas.instanceRoot}/lib/databases/ejbtimer;AUTO_SERVER=TRUE"></property> </jdbc-connection-pool>
Change any full pathnames /usr/local/glassfish4/... to /usr/local/payara5/... or whatever it is in your case. (Specifically check the -Ddataverse.files.directory and -Ddataverse.files.file.directory JVM options)
In domain1/config/jhove.conf, change the hard-coded /usr/local/glassfish4 path, as above.
(Optional): If you renamed your service account from glassfish to payara or appserver, update the ownership permissions. The Installation Guide recommends a service account of dataverse:
sudo chown -R dataverse /usr/local/payara5/glassfish/domains/domain1
sudo chown -R dataverse /usr/local/payara5/glassfish/lib
You will also need to check that the service account has write permission on the files directory, if they are located outside the old Glassfish domain. And/or make sure the service account has the correct AWS credentials, if you are using S3 for storage.
Finally, start Payara:
sudo -u dataverse /usr/local/payara5/bin/asadmin start-domain
- Deploy the Dataverse 5 warfile:
sudo -u dataverse /usr/local/payara5/bin/asadmin deploy /path/to/dataverse-5.0.war
- Then restart Payara:
sudo -u dataverse /usr/local/payara5/bin/asadmin stop-domain
sudo -u dataverse /usr/local/payara5/bin/asadmin start-domain
Additional Upgrade Steps
- Update Astrophysics Metadata Block (if used)
wget https://github.com/IQSS/dataverse/releases/download/v5.0/astrophysics.tsv
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @astrophysics.tsv -H "Content-type: text/tab-separated-values"
- (Recommended) Run ReExportall to update JSON Exports
- (Required for installations using DataCite) Add the JVM option doi.dataciterestapiurlstring
For production environments:
/usr/local/payara5/bin/asadmin create-jvm-options "\-Ddoi.dataciterestapiurlstring=https\://api.datacite.org"
For test environments:
/usr/local/payara5/bin/asadmin create-jvm-options "\-Ddoi.dataciterestapiurlstring=https\://api.test.datacite.org"
The JVM option doi.mdcbaseurlstring should be deleted if it was previously set, for example:
/usr/local/payara5/bin/asadmin delete-jvm-options "\-Ddoi.mdcbaseurlstring=https\://api.test.datacite.org"
- (Recommended for installations using DataCite) Pre-register DOIs
Execute the script described in the section "Dataverse Installations Using DataCite: Upgrade Action Recommended" earlier in the Release Note.
Please consult the earlier sections of the Release Note for any additional configuration options that may apply to your installation.
- Java
Published by djbrooke almost 6 years ago
dataverse - 4.20
Dataverse 4.20
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Multiple Store Support
Dataverse can now be configured to store files in more than one place at the same time (multiple file, s3, and/or swift stores).
General information about this capability can be found below and in the Configuration Guide - File Storage section.
S3 Direct Upload support
S3 stores can now optionally be configured to support direct upload of files, as one option for supporting upload of larger files. In the current implementation, each file is uploaded in a single HTTP call. For AWS, this limits file size to 5 GB. With Minio the theoretical limit should be 5 TB and 50+ GB file uploads have been tested successfully. (In practice other factors such as network timeouts may prevent a successful upload a multi-TB file and minio instances may be configured with a < 5 TB single HTTP call limit.) No other S3 service providers have been tested yet. Their limits should be the lower of the maximum object size allowed and any single HTTP call upload limit.
General information about this capability can be found in the Big Data Support Guide with specific information about how to enable it in the Configuration Guide - File Storage section.
Integration Test Coverage Reporting
The percentage of code covered by the API-based integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.
New APIs
New APIs for Role Management and Dataset Size have been added. Previously, managing roles at the dataset and file level was only possible through the UI. API users can now also retrieve the size of a dataset through an API call, with specific parameters depending on the type of information needed.
More information can be found in the API Guide.
Major Use Cases
Newly-supported use cases in this release include:
- Users will now be able to see the number of linked datasets and dataverses accurately reflected in the facet counts on the Dataverse search page. (Issue #6564, PR #6262)
- Users will be able to upload large files directly to S3. (Issue #6489, PR #6490)
- Users will be able to see the PIDs of datasets and files in the Guestbook export. (Issue #6534, PR #6628)
- Administrators will be able to configure multiple stores per Dataverse installation, which allow dataverse-level setting of storage location, upload size limits, and supported data transfer methods (Issue #6485, PR #6488)
- Administrators and integrators will be able to manage roles using a new API. (Issue #6290, PR #6622)
- Administrators and integrators will be able to determine a dataset's size. (Issue #6524, PR #6609)
- Integrators will now be able to retrieve the number of files in a dataset as part of a single API call instead of needing to count the number of files in the response. (Issue #6601, PR #6623)
Notes for Dataverse Installation Administrators
Potential Data Integrity Issue
We recently discovered a potential data integrity issue in Dataverse databases. One manifests itself as duplicate DataFile objects created for the same uploaded file (https://github.com/IQSS/dataverse/issues/6522); the other as duplicate DataTable (tabular metadata) objects linked to the same DataFile (https://github.com/IQSS/dataverse/issues/6510). This issue impacted approximately .03% of datasets in Harvard's Dataverse.
To see if any datasets in your installation have been impacted by this data integrity issue, we've provided a diagnostic script here:
https://github.com/IQSS/dataverse/raw/develop/scripts/issues/6510/checkdatafiles6522_6510.sh
The script relies on the PostgreSQL utility psql to access the database. You will need to edit the credentials at the top of the script to match your database configuration.
If neither of the two issues is present in your database, you will see a message "... no duplicate DataFile objects in your database" and "no tabular files affected by this issue in your database".
If either, or both kinds of duplicates are detected, the script will provide further instructions. We will need you to send us the produced output. We will then assist you in resolving the issues in your database.
Multiple Store Support Changes
Existing installations will need to make configuration changes to adopt this version, regardless of whether additional stores are to be added or not.
Multistore support requires that each store be assigned a label, id, and type - see the Configuration Guide for a more complete explanation. For an existing store, the recommended upgrade path is to assign the store id based on it's type, i.e. a 'file' store would get id 'file', an 's3' store would have the id 's3'.
With this choice, no manual changes to datafile 'storageidentifier' entries are needed in the database. If you do not name your existing store using this convention, you will need to edit the database to maintain access to existing files.
The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade: For a file store:
./asadmin create-jvm-options "\-Ddataverse.files.file.type=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.label=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.directory=<your directory>"
For a s3 store:
./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"
./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"
./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=<your_bucket_name>"
./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=<your_bucket_name>"
Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured.
Once these options are set, restarting the Glassfish service is all that is needed to complete the change.
Note that the "-Ddataverse.files.directory", if defined, continues to control where temporary files are stored (in the /temp subdir of that directory), independent of the location of any 'file' store defined above.
Also note that the :MaxFileUploadSizeInBytes property has a new option to provide independent limits for each store instead of a single value for the whole installation. The default is to apply any existing limit defined by this property to all stores.
Direct S3 Upload Changes
Direct upload to S3 is enabled per store by one new jvm option:
./asadmin create-jvm-options "\-Ddataverse.files.<id>.upload-redirect=true"
The existing :MaxFileUploadSizeInBytes property and dataverse.files.<id>.url-expiration-minutes jvm option for the same store also apply to direct upload.
Direct upload via the Dataverse web interface is transparent to the user and handled automatically by the browser. Some minor differences in file upload exist: directly uploaded files are not unzipped and Dataverse does not scan their content to help in assigning a MIME type. Ingest of tabular files and metadata extraction from FITS files will occur, but can be turned off for files above a specified size limit through the new dataverse.files.<id>.ingestsizelimit jvm option.
API calls to support direct upload also exist, and, if direct upload is enabled for a store in Dataverse, the latest DVUploader (v1.0.8) provides a'-directupload' flag that enables its use.
Solr Update
With this release we upgrade to the latest available stable release in the Solr 7.x branch. We recommend a fresh installation of Solr 7.7.2 (the index will be empty) followed by an "index all".
Before you start the "index all", Dataverse will appear to be empty because the search results come from Solr. As indexing progresses, results will appear until indexing is complete.
Dataverse Linking Fix
The fix implemented for #6262 will display the datasets contained in linked dataverses in the linking dataverse. The full reindex described above will correct these counts. Going forward, this will happen automatically whenever a dataverse is linked.
Google Analytics Download Tracking Bug
The button tracking capability discussed in the installation guide (http://guides.dataverse.org/en/4.20/installation/config.html#id88) relies on an analytics-code.html file that must be configured using the :WebAnalyticsCode setting. The example file provided in the installation guide is no longer compatible with recent Dataverse releases (>v4.16). Installations using this feature should update their analytics-code.html file by following the installation instructions using the updated example file. Alternately, sites can modify their existing files to include the one-line change made in the example file at line 120.
Run ReExportall
We made changes to the JSON Export in this release (Issue #6650, PR #6669). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below.
New JVM Options and Database Settings
New JVM Options for file storage drivers
- The JVM option dataverse.files.file.directory=
<your directory>controls where temporary files are stored (in the /temp subdir of the defined directory), independent of the location of any 'file' store defined above. - The JVM option
dataverse.files.<id>.upload-redirectenables direct upload of files added to a dataset to the S3 bucket. (S3 stores only!) - The JVM option
dataverse.files.<id>.ingestsizelimit controlsthe maximum size of files for which ingest will be attempted, for the given file store.
New Database Settings for Shibboleth
- The database setting :ShibAffiliationAttribute can now be set to prevent affiliations for Shibboleth users from being reset upon each log in.
Notes for Tool Developers and Integrators
Integration Test Coverage Reporting
API-based integration tests are run every time a branch is merged to develop and the percentage of code covered by these integration tests is now shown on a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.
Guestbook Column Changes
Users of downloaded guestbooks should note that two new columns have been added:
- Dataset PID
- File PID
If you are expecting column in the CSV file to be in a particular order, you will need to make adjustments.
Old columns: Guestbook, Dataset, Date, Type, File Name, File Id, User Name, Email, Institution, Position, Custom Questions
New columns: Guestbook, Dataset, Dataset PID, Date, Type, File Name, File Id, File PID, User Name, Email, Institution, Position, Custom Questions
API Changes
As reported in #6570, the affiliation for dataset contacts has been wrapped in parentheses in the JSON output from the Search API. These parentheses have now been removed. This is a backward incompatible change but it's expected that this will not cause issues for integrators.
Role Name Change
The role alias provided in API responses has changed, so if anything was hard-coded to "editor" instead of "contributor" it will need to be updated.
Complete List of Changes
For the complete list of code changes in this release, see the 4.20 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
- Stop glassfish and remove the generated directory, start.
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
- Install and configure Solr v7.7.2
See http://guides.dataverse.org/en/4.20/installation/prerequisites.html#installing-solr
- Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.20.war
- The following set of commands to change the Glassfish JVM options will adapt an existing file or s3 store for this upgrade:
For a file store:
./asadmin create-jvm-options "\-Ddataverse.files.file.type=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.label=file"
./asadmin create-jvm-options "\-Ddataverse.files.file.directory=<your directory>"
For a s3 store:
./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"
./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"
./asadmin delete-jvm-options "-Ddataverse.files.s3-bucket-name=<your_bucket_name>"
./asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=<your_bucket_name>"
Any additional S3 options you have set will need to be replaced as well, following the pattern in the last two lines above - delete the option including a '-' after 's3' and creating the same option with the '-' replaced by a '.', using the same value you currently have configured.
Restart glassfish.
Update Citation Metadata Block
wget https://github.com/IQSS/dataverse/releases/download/v4.20/citation.tsvcurl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
- Kick off full reindex
http://guides.dataverse.org/en/4.20/admin/solr-search-index.html
- (Recommended) Run ReExportall to update JSON Exports
- Java
Published by kcondon about 6 years ago
dataverse - 4.19
Dataverse 4.19
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Open ID Connect Support
Dataverse now provides basic support for any OpenID Connect (OIDC) compliant authentication provider.
Prior to supporting this standard, new authentication methods needed to be added by pull request. OIDC support provides a standardized way for authentication, sharing user information, and more. You are able to use any compliant provider just by loading a configuration file, without touching the codebase. While the usual prominent providers like Google and others feature OIDC support there are plenty of other options to easily attach your installation to a custom authentication provider, using enterprise grade software.
See the OpenID Connect Login Options documentation in the Installation Guide for more details.
This is to be extended with support for attribute mapping, group syncing and more in future versions of the code.
Python Installer
We are introducing a new installer script, written in Python. It is intended to eventually replace the old installer (written in Perl). For now it is being offered as an (experimental) alternative.
See README_python.txt in scripts/installer and/or in the installer bundle for more information.
Major Use Cases
Newly-supported use cases in this release include:
- Dataverse installation administrators will be able to experiment with a Python Installer (Issue #3937, PR #6484)
- Dataverse installation administrators will be able to set up an OIDC-compliant login options by editing a configuration file and with no need for a code change (Issue #6432, PR #6433)
- Following setup by a Dataverse administration, users will be able to log in using OIDC-compliant methods (Issue #6432, PR #6433)
- Users of the Search API will see additional fields in the JSON output (Issues #6300, #6396, PR #6441)
- Users loading the support form will now be presented with the math challenge as expected and will be able to successfully send an email to support (Issue #6307, PR #6462)
- Users of https://mybinder.org can now spin up Jupyter Notebooks and other computational environments from Dataverse DOIs (Issue #4714, PR #6453)
Notes for Dataverse Installation Administrators
Security vulnerability in Solr
A serious security issue has recently been identified in multiple versions of Solr search engine, including v.7.3 that Dataverse is currently using. Follow the instructions below to verify that your installation is safe from a potential attack. You can also consult the following link for a detailed description of the issue:
RCE in Solr via Velocity Template.
The vulnerability allows an intruder to execute arbitrary code on the system running Solr. Fortunately, it can only be exploited if Solr API access point is open to direct access from public networks (aka, "the outside world"), which is NOT needed in a Dataverse installation.
We have always recommended having Solr (port 8983) firewalled off from public access in our installation guides. But we recommend that you double-check your firewall settings and verify that the port is not accessible from outside networks. The simplest quick test is to try the following URL in your browser:
`http://<your Solr server address>:8983`
and confirm that you get "access denied" or that it times out, etc.
In most cases, when Solr runs on the same server as the Dataverse web application, you will only want the port accessible from localhost. We also recommend that you add the following arguments to the Solr startup command: -j jetty.host=127.0.0.1. This will make Solr accept connections from localhost only; adding redundancy, in case of the firewall failure.
In a case where Solr needs to run on a different host, make sure that the firewall limits access to the port only to the Dataverse web host(s), by specific ip address(es).
We would also like to reiterate that it is simply never a good idea to run Solr as root! Running the process as a non-privileged user would substantially minimize any potential damage even in the event that the instance is compromised.
Citation and Geospatial Metadata Block Updates
We updated two metadata blocks in this release. Updating these metadata blocks is mentioned in the step-by-step upgrade instructions below.
Run ReExportall
We made changes to the JSON Export in this release (#6426). If you'd like these changes to reflected in your JSON exports, you should run ReExportall as part of the upgrade process. We've included this in the step-by-step instructions below.
BinderHub
https://mybinder.org now supports spinning up Jupyter Notebooks and other computational environments from Dataverse DOIs.
Widgets update for OpenScholar
We updated the code for widgets so that they will keep working in OpenScholar sites after the upcoming upgrade OpenScholar upgrade to Drupal 8. If users of your dataverse have embedded widgets on an Openscholar site that upgrades to Drupal 8, you will need to run this Dataverse version (or later) for the widgets to keep working.
Payara tech preview
Dataverse 4 has always run on Glassfish 4.1 but changes in this release (PR #6523) should open the door to upgrading to Payara 5 eventually. Production installations of Dataverse should remain on Glassfish 4.1 but feedback from any experiments running Dataverse on Payara 5 is welcome via the usual channels.
Notes for Tool Developers and Integrators
Search API
The boolean parameter query_entities has been removed from the Search API. The former "true" behavior of "whether entities are queried via direct database calls (for developer use)" is now always true.
Additional fields are now available via the Search API, mostly related to information about specific dataset versions.
Complete List of Changes
For the complete list of code changes in this release, see the 4.19 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
- Stop glassfish and remove the generated directory, start.
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
- Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.19.war
Restart glassfish.
Update Geospatial Metadata Block
wget https://github.com/IQSS/dataverse/releases/download/v4.19/geospatial.tsvcurl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @geospatial.tsv -H "Content-type: text/tab-separated-values"
- (Optional) Run ReExportall to update JSON Exports
- Java
Published by djbrooke over 6 years ago
dataverse - 4.18.1
Dataverse 4.18.1
This release provides a fix for a regression introduced in 4.18 and implements a few other small changes.
Release Highlights
Proper Validation Messages
When creating or editing dataset metadata, users were not receiving field-level indications about what entries failed validation and were only receiving a message at the top of the page. This fix restores field-level indications.
Major Use Cases
Use cases in this release include:
- Users will receive the proper messaging when dataset metadata entries are not valid.
- Users can now view the expiration date of an API token and revoke a token on the API Token tab of the account page.
Complete List of Changes
For the complete list of code changes in this release, see the 4.18.1 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
- Stop glassfish and remove the generated directory, start.
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
- Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.18.1.war
- Restart glassfish.
- Java
Published by kcondon over 6 years ago
dataverse - 4.18
Dataverse 4.18
Note: There is an issue in 4.18 with the display of validation messages on the dataset page (#6380) and we recommend using 4.18.1 for any production environments.
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
File Page Previews and Previewers
File-level External Tools can now be configured to display in a "Preview Mode" designed for embedding within the file landing page.
While not technically part of this release, previewers have been made available for several common file types. The previewers support for spreadsheet, image, text, document, audio, video, html files and more. These previewers can be found in the Qualitative Data Repository Github Repository. The spreadsheet viewer was contributed by the Dataverse SSHOC project.
Microsoft Login
Users can now create Dataverse accounts and login using self-provisioned Microsoft accounts such as live.com and outlook.com. Users can also use Microsoft accounts managed by their institutions. This new feature not only makes it easier to log in to Dataverse but will also streamline the interaction between any external tools that utilize Azure services that require login.
Add Data and Host Dataverse
More workflows to add data have been added across the UI, including a new button on the My Data tab of the Account page, as well as a link in the Dataverse navbar, which will display on every page. This will provider users much easier access to start depositing data. By default, the Host Dataverse will be the installation root dataverse for these new Add Data workflows, but there is now a dropdown component allowing creators to select a dataverse you have proper permissions to create a new dataverse or dataset in.
Primefaces 7
Primefaces, the open source UI framework upon which the Dataverse front end is built, has been updated to the most recent version. This provides security updates and bug fixes and will also allow Dataverse developers to take advantage of new features and enhancements.
Integration Test Pipeline and Test Health Reporting
As part of the Dataverse Community's ongoing efforts to provide more robust automated testing infrastructure, and in support of the project's desire to have the develop branch constantly in a "release ready" state, API-based integration tests are now run every time a branch is merged to develop. The status of the last test run is available as a badge at the bottom of the README.md file that serves as the homepage of Dataverse Github Repository.
Make Data Count Metrics Updates
A new configuration option has been added that allows Make Data Count metrics to be collected, but not reflected in the front end. This option was designed to allow installations to collect and verify metrics for a period before turning on the display to users. It is suggested that installations turn on Make Data Count as part of the upgrade.
Search API Enhancements
The Dataverse Search API will now display unpublished content when an API token is passed (and appropriate permissions exist).
Additional Dataset Author Identifiers
The following dataset author identifiers are now supported:
- DAI: https://en.wikipedia.org/wiki/DigitalAuthorIdentifier
- ResearcherID: http://researcherid.com
- ScopusID: https://www.scopus.com
Major Use Cases
Newly-supported use cases in this release include:
- Users can view previews of several common file types, eliminating the need to download or explore a file just to get a quick look.
- Users can log in using self-provisioned Microsoft accounts and also can log in using Microsoft accounts managed by an organization.
- Dataverse administrators can now revoke and regenerate API tokens with an API call.
- Users will receive notifications when their ingests complete, and will be informed if the ingest was a success or failure.
- Dataverse developers will receive feedback about the health of the develop branch after their pull request was merged.
- Dataverse tool developers will be able to query the Dataverse API for unpublished data as well as published data.
- Dataverse administrators will be able to collect Make Data Count metrics without turning on the display for users.
- Users with a DAI, ResearcherID, or ScopusID and use these author identifiers in their datasets.
Notes for Dataverse Installation Administrators
API Token Management
- You can now delete a user's API token, recreate a user's API token, and find a token's expiration date. See the Native API guide for more information.
New JVM Options
:mdcbaseurlstring allows dataverse administrators to use a test base URL for Make Data Count.
New Database Settings
:DisplayMDCMetrics can be set to false to disable display of MDC metrics.
Notes for Tool Developers and Integrators
Preview Mode
Tool Developers can now add the hasPreviewMode parameter to their file level external tools. This setting provides an embedded, simplified view of the tool on the file pages for any installation that installs the tool. See Building External Tools for more information.
API Token Management
If your tool writes content back to Dataverse, you can now take advantage of administrative endpoints that delete and re-create API tokens. You can also use an endpoint that provides the expiration date of a specific API token. See the Native API guide for more information.
View Unpublished Data Using Search API
If you pass a token, the search API output will include unpublished content.
Complete List of Changes
For the complete list of code changes in this release, see the 4.18 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
- Stop glassfish and remove the generated directory, start.
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
- Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.18.war
Restart glassfish.
Update Citation Metadata Block
wget https://github.com/IQSS/dataverse/releases/download/v4.18/citation.tsvcurl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
- (Recommended) Enable Make Data Count if your installation plans to make use of it at some point in the future.
- Java
Published by kcondon over 6 years ago
dataverse - 4.17
Dataverse 4.17
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Dataset Level Explore Tools
Tools that integrate with Dataverse can now be launched from the dataset page! This makes it possible to develop and add tools that work across the entire dataset instead of single files. Tools to verify reproducibility and allow researchers to compute on an entire dataset will take advantage of this new infrastructure.
Performance Enhancements
Dataverse now allows installation administrators to configure the session timeout for logged in users using the new :LoginSessionTimeout setting. (Session length for anonymous users has been reduced from 24 hours to 10 minutes.) Setting this lower will release system resources as configured and will result in better performance (less memory use) throughout a Dataverse installation.
Dataverse and Dataset pages have also been optimized to discard more of the objects they allocate immediately after the page load. Thus keeping less memory permanently tied up for the duration of the user's login session. These savings are especially significant in the Dataverse page.
Major Use Cases
Newly-supported use cases in this release include:
- As a user, I can launch and utilize external tools that allow me to work across the code, data, and other files in a dataset.
- As a user, I can add a footer to my dataverse to show the logo for a funder or other entity.
- As a developer, I can build external tools to verify reproducibility or allow computation.
- As a developer, I can check to see the impact of my proposed changes on memory utilization.
- As an installation administrator, I can make a quick configuration change to provide a better experience for my installation's users.
Notes for Dataverse Installation Administrators
Configurable User Session Timeout
Idle session timeout for logged-in users has been made configurable in this release. The default is now set to 8 hours (this is a change from the previous default value of 24 hours). If you want to change it, set the setting :LoginSessionTimeout to the new value in minutes. For example, to reduce the timeout to 4 hours:
curl -X PUT -d 240 http://localhost:8080/api/admin/settings/:LoginSessionTimeout
Once again, this is the session timeout for logged-in users only. For the anonymous sessions the sessions are set to time out after the default session-timeout value (also in minutes) in the web.xml of the Dataverse application, which is set to 10 minutes. You will most likely not ever need to change this, but if you do, configure it by editing the web.xml file.
Flexible Solr Schema, optionally reconfigure Solr
With this release, we moved all fields in Solr search index that relate to the default metadata schemas from schema.xml to separate files. Custom metadata block configuration of the search index can be more easily automated that way. For details, see admin/metadatacustomization.html#updating-the-solr-schema.
This is optional, but all future changes will go to these files. It might be a good idea to reconfigure Solr now or be aware to look for changes to these files in the future, too. Here's how:
- You will need to replace or modify your
schema.xmlwith the recent one (containing XML includes) - Copy
schema_dv_mdb_fields.xmlandschema_dv_mdb_copies.xmlto the same location as theschema.xml - A re-index is not necessary as long no other changes happened, as this is only a reorganization of Solr fields from a single schema.xml file into multiple files.
In case you use custom metadata blocks, you might find the new updateSchemaMDB.sh script beneficial. Again,
see http://guides.dataverse.org/en/4.17/admin/metadatacustomization.html#updating-the-solr-schema
Memory Benchmark Test
Developers and installation administrators can take advantage of new scripts to produce graphs of memory usage and garbage collection events. This is helpful for developers to investigate the implications of changes on memory usage and it is helpful for installation administrators to compare graphs across releases or time periods. For details see the scripts/tests/ec2-memory-benchmark directory.
New Database Settings
:LoginSessionTimeout controls the session timeout (in minutes) for logged-in users.
Notes for Tool Developers and Integrators
New Features and Breaking Changes for External Tool Developers
The good news is that external tools can now be defined at the dataset level and there is new and improved documentation for external tool developers, linked below.
Additionally, the reserved words {datasetPid}, {{filePid}, and {localeCode} were added. Please consider making it possible to translate your tool into various languages! The reserved word {datasetVersion} has been made more flexible.
The bad news is that there are two breaking changes. First, tools must now define a "scope" of either "file" or "dataset" for the manifest to be successfully loaded into Dataverse. Existing tools in a Dataverse installations will be assigned a scope of "file" automatically by a SQL migration script but new installations of Dataverse will need to load an updated manifest file with this new "scope" variable.
Second, file level tools that did not previously define a "contentType" are now required to do so. In previously releases, file level tools that did not define a contentType were automatically given a contentType of "text/tab-separated-values" but now Dataverse will refuse to load the manifest file if contentType is not specified.
The Dataverse team has been reaching out to tool makers about these breaking changes and getting various tools working in the https://github.com/IQSS/dataverse-ansible repo. Thank you for your patience as the dust settles around the external tool framework.
For more information, check out new Building External Tools section of the API Guide.
Complete List of Changes
For the complete list of code changes in this release, see the 4.17 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
- Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
- Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.17.war
Restart glassfish
Update Citation Metadata Block
wget https://github.com/IQSS/dataverse/releases/download/v4.17/citation.tsvcurl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
If you have any trouble adding an external tool at the dataset level and see warnings about "contenttype" in server.log, it is recommended that you run the following SQL update from pull request #6460:
ALTER TABLE externaltool ALTER contenttype DROP NOT NULL;
- Java
Published by kcondon over 6 years ago
dataverse - 4.16
Dataverse 4.16
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Release Highlights
Metrics Redesign
The metrics view at both the Dataset and File level has been redesigned. The main driver of this redesign has been the expanded metrics (citations and views) provided through an integration with Make Data Count, but installations that do not adopt Make Data Count will also be able to take advantage of the new metrics panel.
HTML Codebook Export
Users will now be able to download HTML Codebooks as an additional Dataset Export type. This codebook is a more human-readable version of the DDI Codebook 2.5 metadata export and provides valuable information about the contents and structure of a dataset and will increase reusability of the datasets in Dataverse.
Harvesting Improvements
The Harvesting code will now better handle problematic records during incremental harvests. Fixing this will mean not only fewer manual interventions by installation administrators to keep harvesting running, but it will also mean users can more easily find and access data that is important to their research.
Major Use Cases
Newly-supported use cases in this release include:
- As a user, I can view the works that have cited a dataset.
- As a user, I can view the downloads and views for a dataset, based on the Make Data Count standard.
- As a user, I can export an HTML codebook for a dataset.
- As a user, I can expect harvested datasets to be made available more regularly.
- As a user, I'll encounter fewer locks as I go through the publishing process.
- As an installation administrator, I no longer need to destroy a PID in another system after destroying a dataset in Dataverse.
Notes for Dataverse Installation Administrators
Run ReExportall
We made changes to the citation block in this release that will require installations to run ReExportall as part of the upgrade process. We've included this in the detailed instructions below.
Custom Analytics Code Changes
You should update your custom analytics code to include CDATA sections, inside the script tags, around the javascript code. We have updated the documentation and sample analytics code snippet provided in Installation Guide > Configuration > Web Analytics Code to fix a bug that broke the rendering of the 403 and 500 custom error pgs (#5967).
Destroy Updates
Destroying Datasets in Dataverse will now unregister/delete the PID with the PID provider. This eliminates the need for an extra step to "clean up" a PID registration after destroying a Dataset.
Deleting Notifications
In making the fix for #5687 we discovered that notifications created prior to 2018 may have been invalidated. With this release we advise that these older notifications are deleted from the database. The following query can be used for this purpose:
delete from usernotification where date_part('year', senddate) < 2018;
Lock Improvements
In 4.15 a new lock was added to prevent parallel edits. After seeing that the lock was not being released as expected, which required administrator intervention, we've adjusted this code to release the lock as expected.
New Database Settings
:AllowCors - Allows Cross-Origin Resource sharing(CORS). By default this setting is absent and Dataverse assumes it to be true.
Notes for Tool Developers and Integrators
OpenAIRE Export Changes
The OpenAIRE metadata export now correctly expresses information about a dataset's Production Place and GeoSpatial Bounding Box. When users add metadata to Dataverse's Production Place and GeoSpatial Bounding Box fields, those fields are now mapped to separate DataCite geoLocation properties.
Metadata about the software name and version used to create a dataset, Software Name and Software Version, are re-mapped from DataCite's more general descriptionType="Methods" property to descriptionType="TechnicalInfo", which was added in a recent version of the DataCite schema in order to improve discoverability of metadata about the software used to create datasets.
Complete List of Changes
For the complete list of code changes in this release, see the 4.16 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation
If this is a new installation, please see our Installation Guide.
Upgrade
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
- Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
- Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.16.war
- Restart glassfish
- Update Citation Metadata Block
curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @citation.tsv -H "Content-type: text/tab-separated-values"
- Run ReExportall to update the citations
- Java
Published by kcondon almost 7 years ago
dataverse - 4.15.1
This release adds an important Solr optimization, an API for editing variable metadata, and fixes a bug on the dataset page with searching and filtering of tags with spaces.
For the complete list of issues, see the 4.15.1 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation:
If this is a new installation, please see our Installation Guide.
Upgrade:
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.15.1.war
Restart glassfish
- Java
Published by djbrooke almost 7 years ago
dataverse - 4.15
Note: There is a stability issue in 4.15 and we recommend waiting for 4.15.1 for any production environments. 4.15.1 will also contain fixes for issue #5972, which provides better filtering and sorting for file tags that have spaces.
Note: PostgreSQL 9.6 is required. Previous versions of PostgreSQL do not support ALTER TABLE ADD COLUMN IF NOT EXISTS which is used in an upgrade script. Newer versions of PostgreSQL such as version 10 have not been tested.
This release adds the ability to filter and sort the files in a dataset, better recognition and categorization of file types, accessibility enhancements, and a new API to load language packs in support of internationalization.
For the complete list of issues, see the 4.15 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation:
If this is a new installation, please see our Installation Guide.
Upgrade:
- In an effort to prevent accidental duplicate accounts, user spoofing, or other username-based confusion, this release introduces a database constraint that no longer allows usernames that are exactly the same but use different capitalization, e.g. Bob11 vs. bob11. You may need to do some cleanup before upgrading to deal with existing usernames like this.
To check whether you have any usernames like this that need cleaning up, run the case insensitive duplicate queries from our Useful Queries doc.
Once you identify the usernames that need cleaning up, you should use either Merge User Accounts (if it’s the same person) or Change User Identifier (if they are different people). After the cleanup you can safely upgrade without issue.
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
A new version of file type detection software, Jhove, is added in this release. It requires an update of its configuration file: jhove.conf. Download the new configuration file from the Dataverse release page on GitHub, or from the source tree at https://raw.githubusercontent.com/IQSS/dataverse/master/conf/jhove/jhove.conf , and place it in
/config/. For example: /usr/local/glassfish4/glassfish/domains/domain1/config/jhove.conf.
Important: If your Glassfish installation directory is different from /usr/local/glassfish4, make sure to edit the header of the config file, to reflect the correct location.
Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.15.war
Restart glassfish
Replace Solr schema.xml to allow sorting and filtering on the file page -stop solr instance (service solr stop, depending on solr installation/OS, see http://guides.dataverse.org/en/4.15/installation/prerequisites.html#solr-init-script) -replace schema.xml
cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-7.3.1/server/solr/collection1/conf cp /tmp/dvinstall/solrconfig.xml /usr/local/solr/solr-7.3.1/server/solr/collection1/conf-start solr instance (service solr start, depending on solr/OS)Kick off in place reindex http://guides.dataverse.org/en/4.15/admin/solr-search-index.html#reindex-in-place
curl -X DELETE http://localhost:8080/api/admin/index/timestamps curl http://localhost:8080/api/admin/index/continueRedetect file types using the new Redetect File Types API:
https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/api/native-api.rst#id31
- Java
Published by djbrooke about 7 years ago
dataverse - 4.14
This release adds OpenAIRE-compliant exports, an option on the Dashboard for superusers to move datasets, and expanded analytics options.
For the complete list of issues, see the 4.14 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation:
If this is a new installation, please see our Installation Guide.
Upgrade:
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.14.war
Restart glassfish
- Java
Published by kcondon about 7 years ago
dataverse - 4.13
This release adds a file tree view at the Dataset level and adds a new API for file level metadata edits. It also reverts an API change from the previous release.
For the complete list of issues, see the 4.13 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation:
If this is a new installation, please see our Installation Guide.
Upgrade:
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
Upgrade your version of PostgreSQL to at least 9.3. Version 9.6 is recommended.
NOTE for Dataverse Installations running OpenStack Swift:
Now all Swift properties have been migrated to domain.xml, no longer needing to maintain a separate swift.properties file, and offering better governability and performance. Furthermore, now the Swift credential's password is stored using create-password-alias, which encrypts the password so that it does not appear in plain text on domain.xml.
In order to migrate to these new configuration settings, please visit http://guides.dataverse.org/en/4.13/installation/config.html#swift-storage
Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.13.war
Restart glassfish
- Java
Published by djbrooke about 7 years ago
dataverse - 4.12
Note: Before using the User Management APIs on Shibboleth or OAuth users, we recommend upgrading to the 4.14 release or later, which will contain the fix for issue #5811. If you have renamed users and are experiencing issues, please contact support@dataverse.org.
This release adds User Management APIs, the ability to edit the hierarchy of files in a dataset, backend support for Make Data Count, and guidance on best practices for making datasets appear in search engines.
For the complete list of issues, see the 4.12 milestone in Github.
For help with upgrading, installing, or general questions please post to the Dataverse Google Group or email support@dataverse.org.
Installation:
If this is a new installation, please see our Installation Guide.
Upgrade:
- Undeploy the previous version.
- <glassfish install path>/glassfish4/bin/asadmin list-applications
- <glassfish install path>/glassfish4/bin/asadmin undeploy dataverse
Stop glassfish and remove the generated directory, start
- service glassfish stop
- remove the generated directory: rm -rf <glassfish install path>glassfish4/glassfish/domains/domain1/generated
- service glassfish start
Upgrade your version of PostgreSQL to at least 9.3. Version 9.6 is recommended.
Deploy this version.
- <glassfish install path>/glassfish4/bin/asadmin deploy <path>dataverse-4.12.war
Restart glassfish
Replace Solr schema.xml -stop solr instance (service solr stop, depending on solr installation/OS, see http://guides.dataverse.org/en/4.12/installation/prerequisites.html#solr-init-script) -replace schema.xml
cp /tmp/dvinstall/schema.xml /usr/local/solr/solr-7.3.0/server/solr/collection1/conf-start solr instance (service solr start, depending on solr/OS)Kick off in place reindex http://guides.dataverse.org/en/4.12/admin/solr-search-index.html#reindex-in-place
curl -X DELETE http://localhost:8080/api/admin/index/timestamps curl http://localhost:8080/api/admin/index/continueIf you are using Web Analytics, please review your "analytics-code.html" fragment (described in Installation Guide > Configuration > Web Analytics Code), and see if any of the script lines contain an empty "async" attribute. In the documentation provided by Google, its value is left blank (as in