Recent Releases of gleaner
gleaner - v3.0.9_dev4
What's Changed
- Better injection of version by @valentinedwv in https://github.com/gleanerio/gleaner/pull/196
- #198 headless never is headless wait < 0 by @valentinedwv in https://github.com/gleanerio/gleaner/pull/199
Full Changelog: https://github.com/gleanerio/gleaner/compare/v3.0.9dev3...v3.0.9dev4
- Go
Published by valentinedwv about 3 years ago
gleaner - v3.0.9_dev3
What's Changed
- Dv aws issues by @valentinedwv in https://github.com/gleanerio/gleaner/pull/186
- #190 headless broken. by @valentinedwv in https://github.com/gleanerio/gleaner/pull/191
- Df dev remove bolt by @fils in https://github.com/gleanerio/gleaner/pull/194
Full Changelog: https://github.com/gleanerio/gleaner/compare/v3.0.9dev2...v3.0.9dev3
- Go
Published by valentinedwv about 3 years ago
gleaner - v3.0.9_dev2
What's Changed
- Bump golang.org/x/net from 0.0.0-20220520000938-2e3eb7b945c2 to 0.7.0 by @dependabot in https://github.com/gleanerio/gleaner/pull/170
- docker dev matrix by @valentinedwv in https://github.com/gleanerio/gleaner/pull/173
- 175 aws region by @valentinedwv in https://github.com/gleanerio/gleaner/pull/176
- robots.txt not error. Identifier not an error by @valentinedwv in https://github.com/gleanerio/gleaner/pull/179
- Make the @id fixup work for arrays of jsonld as well by @nein09 in https://github.com/gleanerio/gleaner/pull/181
Full Changelog: https://github.com/gleanerio/gleaner/compare/v3.0.8...v3.0.9_dev2
- Go
Published by valentinedwv about 3 years ago
gleaner - v3.0.8_fix129
What's Changed
- runstats is output even if ctl-C is hit by @valentinedwv in https://github.com/gleanerio/gleaner/pull/138
- update glcon nabu depenency 202301 by @valentinedwv in https://github.com/gleanerio/gleaner/pull/139
- glcon readconfig by @valentinedwv in https://github.com/gleanerio/gleaner/pull/151
- Headless rework to implement headlessWait in javascript and make headless testable by @valentinedwv in https://github.com/gleanerio/gleaner/pull/153
- Consume JSON-LD metadata from a paged API by @nein09 in https://github.com/gleanerio/gleaner/pull/133
- Fix 124 identifiers sha by @valentinedwv in https://github.com/gleanerio/gleaner/pull/135
Full Changelog: https://github.com/gleanerio/gleaner/compare/v3.0.7-development...v3.0.8_fix129
- Go
Published by valentinedwv over 3 years ago
gleaner - Sync Nabu, Context Fixes, Identifiers
- Sync with Nabu Massive update
- context fixes to get files to normalize
- identifier changes to allow for internal json path to be used a the basis for the identifier SHA
What's Changed
- runstats is output even if ctl-C is hit by @valentinedwv in https://github.com/gleanerio/gleaner/pull/138
- update glcon nabu depenency 202301 by @valentinedwv in https://github.com/gleanerio/gleaner/pull/139
Full Changelog: https://github.com/gleanerio/gleaner/compare/v3.0.7-development...v3.0.8-fix129
- Go
Published by valentinedwv over 3 years ago
gleaner - Gleaner Console initial release
Configuration file changes.
What's Changed
- Handle getting multiple json-ld objects per page, headlessly or via http by @nein09 in https://github.com/gleanerio/gleaner/pull/35
- Complete glcon @valentinedwv
- integrate gleaner and nabu
- configuration generation
- one source @fils
Full Changelog: https://github.com/gleanerio/gleaner/compare/2.0.29...v3.0.1
- Go
Published by valentinedwv over 4 years ago
gleaner - Ocean InfoHub dev rc1
This is a release based on the dev branch. It adds a few new features such domain graph indexing. This is where a domain can not publish following structured data on the web patterns but can publish a single graph file pointing to its resources.
These release also has the the new prov building in place to aid in interfaces.
There is also some general code improvements (remove old systems, etc)
- Go
Published by fils almost 5 years ago
gleaner - IGSN Sprint Release
This is a release for Gleaner based on some updates for the IGSN2040 multi-week sprint. This sprint is detailed elsewhere but was focused on testing the structured data on the web pattern for the IGSN PID architecture.
Some improvements here include:
- gzipped sitemap support (by popular request)
- improved headless support for dynamic JSON-LD injection (headless support is still rather implementation focused... needs to be more generic)
- improved performance for large resources counts
- better object store layout
- now JSON-LD 1.1 based for better support of more advanced context patterns
- thread count now controlled from config file
- delay for indexing calls can be set in config file
- SHACL service URL now set in config (allows use of cloud based SHACL services)
- general performance improvements along the way ( fixed some bad code loops) ;)
Note this version will break on old config file formats.. be sure to add in the thread and delay params in the config file. I'll fix that in later versions.
- Go
Published by fils about 6 years ago
gleaner - Onebucket MkIII
Notes for the screen cast Screen cast video is at: https://www.youtube.com/watch?v=12figImXgDk
I made a few small changes in some directory / bucket locations and file names. So there will be a few small differences between the video and this release.
Get the files we need from the GitHub repo releases section https://github.com/earthcubearchitecture-project418/gleaner/releases
Make and set a directory for the data volume for the Docker containers if you are using those.
Examples would be:
mkdir /home/tmp/dv
export DATAVOL=/home/tmp/dv
Need to grab any context files we use
Just schema.org for now. Note, not required but highly recommended
curl -L -H "Accept: application/ld+json" -H "Content-Type: application/ld+json" https://schema.org > jsonldcontext.jsonld
Minio client (or use your web browser)
Ref: https://docs.min.io/docs/minio-client-complete-guide.html
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod 755 mc
./mc config host add minio http://0.0.0.0:9000 gleaneraccess gleanersecret --api S3v4
After running gleaner you can look for the output graphs and load the data into Jena
./mc cat local/gleaner/results/runid/samplesearth_graph.nq | curl -X POST --header "Content-Type:application/n-quads" -d @- http://localhost:3030/demo/data
- Go
Published by fils about 6 years ago
gleaner - The great burn MkII
Missed some edits needed for the demo
Been a while.... This release is the result some effort to start REMOVING things from Gleaner. Gleaner has been a bit of a playground for me and as such it started to bloat and worse, break, due to this. I'm moving my playing elsewhere and getting Gleaner back to focused on just harvesting and validating JSON-LD data graphs from the web.
This is a release from branch onebucket where I am changing gleaner to work with a single bucket (S3, Minio, Google Cloud, etc) and use object prefixes from there.
I'm not merged this yet as there is some work yet to do to remove deprecated code and resolve some code duplication that occurred during this process.
- Go
Published by fils about 6 years ago
gleaner - The great burn
Been a while.... This release is the result some effort to start REMOVING things from Gleaner. Gleaner has been a bit of a playground for me and as such it started to bloat and worse, break, due to this. I'm moving my playing elsewhere and getting Gleaner back to focused on just harvesting and validating JSON-LD data graphs from the web.
This is a release from branch onebucket where I am changing gleaner to work with a single bucket (S3, Minio, Google Cloud, etc) and use object prefixes from there.
I'm not merged this yet as there is some work yet to do to remove deprecated code and resolve some code duplication that occurred during this process.
- Go
Published by fils about 6 years ago
gleaner - The Old Dutch Church release
This release updates the code to address some issues with index sites that dynamically place the JSON-LD into the page DOM.
These sites use Javascript to call back to a server and obtain the JSON-LD. The DOM is then updated with this material. To process these a service must be in place that allows Gleaner to render the page, thus processing the JS and updating the DOM with the JSON-LD
An error in the docker compose file, an update to chrome sandboxing and a "bug" from P418 days of the code all combined to make this not work. Actually, each one did that.. I just had several issues at once to have redundancy in failure.
This is an updated release that I hope resolves all these.
- Go
Published by fils over 6 years ago
gleaner - Pilot take 2
Forgot to update the zip file with the new compose file. Found during filming.. take 2
- Go
Published by fils over 6 years ago
gleaner - Hollywood Pilot Edition
This is a roll up of some of the updates made during a CODATA meeting. It is also the basis for the first draft of the getting started documentation.
- Go
Published by fils over 6 years ago
gleaner - Been a while.. regression fix
Sorry for the long time to this point. I've been trying to do a major code reduction. I've been removing a lot of code and trying to replace some things with better community libraries. For example, I now use Viper for the config file management. I have been looking at replacing the sitemap code with a community package, but I need to verify it can address some edge cases we have with sitemaps and robots files.
I'm also looking to add back in the ability to read config files from the object store and use that to allow me to run the gleaner binary as a CLI from a docker image. You can do this now but it's a bit tricky to pass in the config file at the command line to a container. So I want to make that easier. At that point we would be able to deploy the entire system as a docker compose file.
It will NOT be long till the next release. I plan to push them out better going forward.
- Go
Published by fils over 6 years ago
gleaner - Are We There Yet?
An updated version I'm using to help build out the Gleaner demo that will be at the EarthCube Annual Meeting in Denver June 2019.
- Go
Published by fils about 7 years ago
gleaner - The "not quite ready" release
This release is "runnable" external to my setup. However, it lacks documentation to let anyone understand it. So I guess it's "not quite ready".
Testing the release process to move toward that. 2.0.4 should be a first cut.
The basic steps to running will be. 1) Using docker to bring up the supporting images 2) Setting your environment variables for connecting to those containers 3) Ensuring the needed buckets and config file are ready and present (the code will do a sanity check and help with that 4) Download the binary from the release and run with required flags.
We're almost there!
- Go
Published by fils about 7 years ago