gov.nasa.pds:harvest

Standalone Harvest client application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/nasa-pds/registry).

https://github.com/nasa-pds/harvest

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    13 of 25 committers (52.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary

Keywords

pds planetary registry

Keywords from Contributors

ingestion interactive spacy-extension network-simulation hacking optim projection generic sequences interpretability
Last synced: 6 months ago · JSON representation

Repository

Standalone Harvest client application providing the functionality for capturing and indexing product metadata into the PDS Registry system (https://github.com/nasa-pds/registry).

Basic Info
Statistics
  • Stars: 6
  • Watchers: 6
  • Forks: 3
  • Open Issues: 33
  • Releases: 57
Topics
pds planetary registry
Created about 6 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License Codeowners Security Zenodo

README.md

Harvest Tool

DOI [🤪 Unstable integration & delivery 😌 Stable integration & delivery

The Harvest Tool captures and indexes product metadata. Each discipline node of the Planetary Data System runs the tool to crawl the local data repositories, discovering products and indexing associated metadata into the Registry Service. As such, it's a sub-component of the PDS Registry Application (https://github.com/NASA-PDS/registry).

For more detailed documentation on this tool, see the PDS Registry Documentation: https://nasa-pds.github.io/registry/.

Documentation

The documentation for the latest release of the Harvest Tool, including release notes, installation, and operation of the software is ready to browse online.

If you would like to get the latest documentation, including any updates since the last release, you can execute the "mvn site:run" command and view the documentation locally at http://localhost:8080/.

👥 Contributing

Within the NASA Planetary Data System, we value the health of our community as much as the code. Towards that end, we ask that you read and practice what's described in these documents:

  • Our contributor's guide delineates the kinds of contributions we accept.
  • Our code of conduct outlines the standards of behavior we practice and expect by everyone who participates with our software.

🔢 Versioning

We use the SemVer philosophy for versioning this software. Or not! Update this as you see fit.

🪛 Development

To develop this project, use your favorite text editor, or an integrated development environment with Java support, such as Eclipse. You'll also need Apache Maven version 3. With these tools, you can typically run

mvn package

to produce a complete package. This runs all the phases necessary, including compilation, testing, and package assembly. Other common Maven phases include:

  • compile - just compile the source code
  • test - just run unit tests
  • install - install into your local repository
  • deploy - deploy to a remote repository — note that the Roundup action does this automatically for releases

:guardsman: Secrets Detection Setup and Update

The PDS uses Detect Secrets to help prevent committing information to a repository that should remain secret.

For Detect Secrets to work, there is a one-time setup required to your personal global Git configuration, as well as several steps to create or update the required .secrets.baseline file needed to avoid false positive failures of the software. See the wiki entry on Detect Secrets to learn how to do this.

🪝 Pre-Commit Hooks

This package comes with a configuration for Pre-Commit, a system for automating and standardizing git hooks for code linting, security scanning, etc. Here in this Java template repository, we use Pre-Commit with Detect Secrets to prevent the accidental committing or commit messages containing secrets like API keys and passwords.

Pre-Commit and detect-secrets are language-neutral, but they themselves are written in Python. To take advantage of these features, you'll need a nearby Python installation. A recommended way to do this is with a virtual Python environment. Using the command line interface, run:

console $ python -m venv .venv $ source .venv/bin/activate # Use source .venv/bin/activate.csh if you're using a C-style shell $ pip install pre-commit git+https://github.com/NASA-AMMOS/slim-detect-secrets.git@exp

See Detect Secrets information above to setup your secrets baseline prior to proceeding.

Finally, install the pre-commit hooks:

pre-commit install
pre-commit install -t pre-push
pre-commit install -t prepare-commit-msg
pre-commit install -t commit-msg

You can then work normally. Pre-commit will run automatically during git commit and git push so long as the Python virtual environment is active.

👉 Note: For Detect Secrets to work, there is a one-time setup required to your personal global Git configuration. See the wiki entry on Detect Secrets to learn how to do this.

🚅 Continuous Integration & Deployment

Thanks to GitHub Actions and the Roundup Action, this software undergoes continuous integration and deployment. Every time a change is merged into the main branch, an "unstable" (known in Java software development circles as a "SNAPSHOT") is created and delivered to the releases page and to the OSSRH.

You can make an official delivery by pushing a release/X.Y.Z branch to GitHub, replacing X with the major version number, Y with the minor version number, and Z with the micro version number. This results in a stable (non-SNAPSHOT) release generated and cryptographically signed (but by an automated process so alter trust expectations accordingly) and made available on the releases page and OSSRH; the website published; changelogs and requirements updated; and a new version number in the main branch prepared for future development.

The following sections detail how to do this manually should the automated steps fail.

🔧 Manual Publication

👉 Note: Requires using PDS Maven Parent POM to ensure release profile is set.

Update Version Numbers

Update pom.xml for the release version or use the Maven Versions Plugin, e.g.: console $ # Skip this step if this is a RELEASE CANDIDATE, we will deploy as SNAPSHOT version for testing $ VERSION=1.15.0 $ mvn -DnewVersion=$VERSION versions:set $ git add pom.xml $ git add */pom.xml

Update Changelog

Update Changelog using Github Changelog Generator. Note: Make sure you set $CHANGELOG_GITHUB_TOKEN in your .bash_profile or use the --token flag. console $ # For RELEASE CANDIDATE, set VERSION to future release version. $ GITHUB_ORG=NASA-PDS $ GITHUB_REPO=validate $ github_changelog_generator --future-release v$VERSION --user $GITHUB_ORG --project $GITHUB_REPO --configure-sections '{"improvements":{"prefix":"**Improvements:**","labels":["Epic"]},"defects":{"prefix":"**Defects:**","labels":["bug"]},"deprecations":{"prefix":"**Deprecations:**","labels":["deprecation"]}}' --no-pull-requests --token $GITHUB_TOKEN $ git add CHANGELOG.md

Commit Changes

Commit changes using following template commit message: console $ # For operational release $ git commit -m "[RELEASE] Validate v$VERSION" $ # Push changes to main $ git push --set-upstream origin main

Build and Deploy Software to Maven Central Repo

console $ # For operational release $ mvn --activate-profiles release clean site site:stage package deploy $ # For release candidate $ mvn clean site site:stage package deploy

Push Tagged Release

```console $ # For Release Candidate, you may need to delete old SNAPSHOT tag $ git push origin :v$VERSION $ # Now tag and push $ REPO=validate $ git tag v${VERSION} -m "[RELEASE] $REPO v$VERSION" -m "See CHANGELOG for more details." $ git push --tags

```

Deploy Site to Github Pages

From cloned repo: console $ git checkout gh-pages $ # Copy the over to version-specific and default sites $ rsync --archive --verbose target/staging/ . $ git add . $ # For operational release $ git commit -m "Deploy v$VERSION docs" $ # For release candidate $ git commit -m "Deploy v${VERSION}-SNAPSHOT docs" $ git push origin gh-pages

Update Versions For Development

Update pom.xml with the next SNAPSHOT version either manually or using Github Versions Plugin.

For RELEASE CANDIDATE, ignore this step. console $ git checkout main $ # For release candidates, skip to push changes to main $ VERSION=1.16.0-SNAPSHOT $ mvn -DnewVersion=$VERSION versions:set $ git add pom.xml $ git commit -m "Update version for $VERSION development" $ # Push changes to main $ git push --set-upstream origin main

Complete Release in Github

Currently the process to create more formal release notes and attach Assets is done manually through the Github UI.

NOTE: Be sure to add the tar.gz and zip from the target/ directory to the release assets, and use the CHANGELOG generated above to create the RELEASE NOTES.

📃 License

The project is licensed under the Apache version 2 license.

Maven JAR Dependency Reference

  • Operational Releases: https://search.maven.org/search?q=g:gov.nasa.pds%20AND%20a:harvest&core=gav
  • Snapshots: https://oss.sonatype.org/content/repositories/snapshots/gov/nasa/pds/harvest/

If you want to access snapshots, add the following to your ~/.m2/settings.xml: xml <profiles> <profile> <id>allow-snapshots</id> <activation><activeByDefault>true</activeByDefault></activation> <repositories> <repository> <id>snapshots-repo</id> <url>https://oss.sonatype.org/content/repositories/snapshots</url> <releases><enabled>false</enabled></releases> <snapshots><enabled>true</enabled></snapshots> </repository> </repositories> </profile> </profiles>

Owner

  • Name: NASA Planetary Data System Software
  • Login: NASA-PDS
  • Kind: organization
  • Email: pds-operator@jpl.nasa.gov

GitHub Events

Total
  • Create event: 55
  • Release event: 26
  • Issues event: 62
  • Watch event: 1
  • Delete event: 79
  • Issue comment event: 231
  • Push event: 89
  • Pull request review event: 9
  • Pull request event: 35
Last Year
  • Create event: 55
  • Release event: 26
  • Issues event: 62
  • Watch event: 1
  • Delete event: 79
  • Issue comment event: 231
  • Push event: 89
  • Pull request review event: 9
  • Pull request event: 35

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 842
  • Total Committers: 25
  • Avg Commits per committer: 33.68
  • Development Distribution Score (DDS): 0.784
Past Year
  • Commits: 113
  • Committers: 9
  • Avg Commits per committer: 12.556
  • Development Distribution Score (DDS): 0.46
Top Committers
Name Email Commits
PDSEN CI Bot p****i@j****v 182
shardman s****n@4****1 127
Jordan Padams j****s@j****v 119
mcayanan m****n@4****1 113
thomas loubrieu t****u@j****v 55
PDS dev admin p****i@g****m 45
Eugene t****t@t****m 42
Al Niessner A****r@x****x 27
Eugene t****2@y****m 22
dependabot[bot] 4****] 21
Eugene k****o@R****v 21
al-niessner 1****r 19
Sean Kelly k****y@s****z 9
Sean Hardman S****n@j****v 7
Alex Dunn a****n@j****v 7
thomas loubrieu 6****l 6
Michael Cayanan m****n@j****v 5
Thomas Loubrieu l****u@j****v 5
Mike Cayanan m****n@j****v 3
Jimmie Young j****g@j****v 2
Galen A Hollins G****s@j****v 1
Ramesh Maddegoda 9****a 1
GitHub Action a****n@g****m 1
Lyle Barner l****r@j****v 1
jpadams j****s@4****1 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 125
  • Total pull requests: 108
  • Average time to close issues: 4 months
  • Average time to close pull requests: 11 days
  • Total issue authors: 14
  • Total pull request authors: 9
  • Average comments per issue: 3.68
  • Average comments per pull request: 0.99
  • Merged pull requests: 87
  • Bot issues: 0
  • Bot pull requests: 35
Past Year
  • Issues: 41
  • Pull requests: 33
  • Average time to close issues: 24 days
  • Average time to close pull requests: 20 days
  • Issue authors: 7
  • Pull request authors: 4
  • Average comments per issue: 3.54
  • Average comments per pull request: 0.85
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 14
Top Authors
Issue Authors
  • jordanpadams (61)
  • tloubrieu-jpl (29)
  • tdddblog (7)
  • plawton-umd (7)
  • scholes-ds (4)
  • rchenatjpl (3)
  • alexdunnjpl (3)
  • tariqksoliman (2)
  • mdrum (2)
  • gxtchen (2)
  • al-niessner (1)
  • nutjob4life (1)
  • imoon-ucla (1)
  • msbentley (1)
  • dependabot[bot] (1)
Pull Request Authors
  • dependabot[bot] (55)
  • al-niessner (35)
  • tdddblog (30)
  • alexdunnjpl (6)
  • jordanpadams (6)
  • nutjob4life (5)
  • tloubrieu-jpl (4)
  • ramesh-maddegoda (1)
  • lylebarner (1)
Top Labels
Issue Labels
bug (57) i&t.skip (30) requirement (27) sprint-backlog (27) B15.1 (22) needs:triage (19) s.medium (19) task (19) enhancement (19) icebox (18) p.should-have (16) s.high (16) p.must-have (15) B14.1 (13) wontfix (13) B12.1 (12) B15.0 (11) invalid (8) s.low (6) B13.1 (5) s.critical (5) B13.0 (5) B14.0 (4) B16 (4) p.could-have (4) i&t.issue (4) duplicate (4) B12.0 (4) Epic (4) B11.1 (3)
Pull Request Labels
dependencies (55) java (40) github_actions (11) sprint-backlog (6)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 30
repo1.maven.org: gov.nasa.pds:harvest

The Harvest Tool provides functionality for capturing and indexing product metadata. The tool will run locally at the Discipline Node to crawl the local data repository in order to discover products and index associated metadata with the Registry Service.

  • Versions: 30
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 32.0%
Forks count: 33.1%
Stargazers count: 36.2%
Average: 37.5%
Dependent packages count: 48.9%
Last synced: 6 months ago

Dependencies

.github/workflows/branch-cicd.yaml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-java v2 composite
.github/workflows/codeql-analysis.yml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • github/codeql-action/analyze v2 composite
  • github/codeql-action/autobuild v2 composite
  • github/codeql-action/init v2 composite
.github/workflows/stable-cicd.yaml actions
  • NASA-PDS/roundup-action stable composite
  • actions/cache v3 composite
  • actions/checkout v3 composite
.github/workflows/unstable-cicd.yaml actions
  • NASA-PDS/roundup-action stable composite
  • actions/cache v3 composite
  • actions/checkout v3 composite
pom.xml maven
  • com.google.code.gson:gson 2.8.9
  • commons-cli:commons-cli 1.4
  • commons-codec:commons-codec 1.15
  • commons-lang:commons-lang 2.6
  • gov.nasa.pds:registry-common 1.3.1
  • org.apache.tika:tika-core 1.23
  • org.json:json 20210307
.github/workflows/secrets-detection.yaml actions
  • actions/checkout v4 composite