Recent Releases of https://github.com/acdh-oeaw/arche-lib-ingest

https://github.com/acdh-oeaw/arche-lib-ingest - Indexer SKIP_SPECIAL skip mode added

the SKIP_SPECIAL mode skips all files with name starting with a dot and Thumbs.db files

- PHP
Published by zozlak 12 months ago

https://github.com/acdh-oeaw/arche-lib-ingest - Indexer's automatic versioning rework

Previously the metadata handling on an automatic new version creation was hardcoded (and configurable only with a bool $pidPass parameter). Now the Indexer::setVersioning(), File::upload() and File::uploadAsync() take two callables with signatures

php `function ( rdfInterface\DatasetNodeInterface $oldMeta, acdhOeaw\arche\lib\Schema $repoSchema ): array{rdfInterface\DatasetNodeInterface $oldMeta, rdfInterface\DatasetNodeInterface $newMeta} and php function ( acdhOeaw\arche\lib\RepoResource $old, acdhOeaw\arche\lib\RepoResource $new ): void

The first one should generate old and new version metadata according to a given repository's business logic. The second one is responsible for doing any metadata adjustments which require both old and new version resource to exist (e.g. updating references which pointed to the old version to point to the new one).

A sample implementation of such a handler can be found in the tests/IndexerTest.php::testNewVersionCreation()

- PHP
Published by zozlak about 1 year ago

https://github.com/acdh-oeaw/arche-lib-ingest - Redmine class restored

For unknown reasons the acdhOeaw\arche\lib\ingest\Redmine class was silently removed in 4.0. Now it's back

- PHP
Published by zozlak about 1 year ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

SkosVocabulary::preprocess(): do not add self-pointing parent link to the schema resource

- PHP
Published by zozlak over 1 year ago

https://github.com/acdh-oeaw/arche-lib-ingest - arche-lib bumped to ^7

- PHP
Published by zozlak over 1 year ago

https://github.com/acdh-oeaw/arche-lib-ingest - PHP 8.4 deprecation fixes

- PHP
Published by zozlak over 1 year ago

https://github.com/acdh-oeaw/arche-lib-ingest - Various minor fixes

  • MetadataCollection: terminate ingestion on terminal errors even in error mode pass
  • File: handle versioning when a repository resource exists but lacks binary (update it with the binary without new version creation then)
  • CI tuning

- PHP
Published by zozlak over 1 year ago

https://github.com/acdh-oeaw/arche-lib-ingest - Allow PHP ^8.1

- PHP
Published by zozlak almost 2 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

- PHP
Published by zozlak almost 2 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Allow PHP 8.2

- PHP
Published by zozlak about 2 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - FileId class created

The file path to repository resource id translation code extracted as a separate class (acdhOeaw\arche\lib\ingest\util\FileId) allowing easy reuse in different libraries.

- PHP
Published by zozlak over 2 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

  • Fixed network connection recognition in acdhOeaw\arche\lib\ingest\MetadataCollection::import()
  • acdhOeaw\arche\lib\ingest\MetadataCollection::import() and acdhOeaw\arche\lib\ingest\Indexer::import(): waiting before reingestion attempt on network errors tuned

- PHP
Published by zozlak almost 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

acdhOeaw\arche\lib\ingest\File::uploadAsync() emits the progress message for resources skipped on SKIP_NOT_EXIST

- PHP
Published by zozlak about 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - MetadataCollection and Indexer retry on network errors

Until now any network error just interrupted the ingestion. Now network errors are treated in (almost) the same way as conflict - a retry up to a given limit per resource is being made.

- PHP
Published by zozlak about 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Redmine class added

- PHP
Published by zozlak about 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Tuning

acdhOeaw\arche\lib\ingest\Indexer::pathToUtf8(): assume UTF-8 on linux systems

- PHP
Published by zozlak about 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

acdhOeaw\arche\lib\ingest\Indexer::index(): assure id prefix ends with a slash for flat-structured imports.

- PHP
Published by zozlak about 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

The way acdhOeaw\arche\lib\ingestIndexer::creatFile() generetes an id fixed for the flat structure ingestions (now the ids are also generated "flat" by combining just the id prefix and the filename).

- PHP
Published by zozlak about 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Indexer enhancements

  • acdhOeaw\arche\lib\ingest\Indexer allows to combine multiple skip modes (SKIP_NOT_EXIST | SKIP_BINARY_EXIST can be useful)
  • acdhOeaw\arche\lib\ingest\Indexer::import() supports new Indexer::ERRMODE_CONTINUE error mode allowing to continue the ingestion no matter errors.

- PHP
Published by zozlak about 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bump uri-normalizer to v2

- PHP
Published by zozlak over 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - SkosVocabulary class tuning

acdhOeaw\arche\lib\ingest\SkosVocabulary::assureTitles() - if everything fails, create a title from URI.

- PHP
Published by zozlak over 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - SkosVocabulary class added

A new class (acdhOeaw\arche\lib\ingest\SkosVocabulary) for SKOS vocabularies ingestion added.

It's a specialization of acdhOeaw\arche\lib\ingest\MetadataCollection class with:

  • Additional configurable preprocessing steps added.
  • Vocabulary binary ingestion.
  • Removal of obsolete vocabulary resources from the repository.

- PHP
Published by zozlak over 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Allow arche-lib 5

- PHP
Published by zozlak over 3 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - MetadataCollection - introduce two debug levels

acdhOeaw\arche\lib\ingest\MetadataCollection::$debug can now have following values:

  • false or 0 - no debug messages at all
  • - true or 1 - basic information on preprocessing stages and detailed information on ingestion progress
  • - 2 - detailed information on both preprocessing and ingestion progress

foo

- PHP
Published by zozlak almost 4 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

acdhOeaw\arche\lib\ingest\MetadataCollection ingestion progress meter fixed

- PHP
Published by zozlak almost 4 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Small fixes

  • Ingestion chunk size is now not bigger than $concurrency * 100 giving the $errorMode = ERRMODE_FAIL chances to fail early for large ingestions.
  • 409 Transaction xyz locked ARCHE REST API response being handled correctly (as an ordinary 409 Conflict error).

- PHP
Published by zozlak about 4 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

Required PHP version constraint fixed in composer.json

- PHP
Published by zozlak about 4 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

acdhOeaw\arche\lib\ingest\Indexer() - harden against slash at the end of the directory path.

- PHP
Published by zozlak about 4 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

acdhOeaw\arche\lib\ingest\Indexer::createFile() - top level directory recognition fixed.

- PHP
Published by zozlak about 4 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

acdhOeaw\arche\lib\ingest\MetadataCollection::import() - report ARCHE response error messages while importing in the ERRMODE_PASS mode.

- PHP
Published by zozlak about 4 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - 3.0.0

New features

  • Parallel ingestion

Backward-incompatible changes

  • TODO

Bugfixes

  • TODO

- PHP
Published by zozlak about 4 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Allow arche-lib v4

- PHP
Published by zozlak over 4 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Allow arche-lib 3.0.0

- PHP
Published by zozlak over 4 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - 2.0.0

Adjusted to arche-lib 2.0.0

- PHP
Published by zozlak almost 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Minor improvements

MetadataIndexer normalizes all triple object URIs now as at the end of a day all of them cause repo resource creation and end up as identifiers.

- PHP
Published by zozlak about 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Adapt for arche-schema v2.0

Arche-schema v2.0 doesn't allow filename property on the acdh:Collection class which is typically used for directories, therefore arche-lib-ingest stops providing the property indicated in config by $.schema.fileName for directories.

- PHP
Published by zozlak about 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

Hardcoded property URIs removed from File::getMetadata().

- PHP
Published by zozlak over 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

No need for the hack from 1.6.1 as the solution has been introduced to the easyrdf library 1.14.3

- PHP
Published by zozlak over 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

acdhOeaw\acdhRepoIngest\MetadataCollection hardened against EasyRdf\Literal\Date and EasyRdf\Literal\DateTime issues.

- PHP
Published by zozlak over 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Use schema.isNewVersionOf for versioning

The \acdhOeaw\acdhRepoIngest\schema\SchemaObject::createNewVersion() now uses only schema.isNewVersionOf to denote old<->new resource relationship now.

The change follows dropping of inverse properties from the arche-schema v2.

- PHP
Published by zozlak over 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

Avoid truncating long repository REST API error messages while reporting import errors in MetadataCollection class.

- PHP
Published by zozlak over 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Minor fixes

Prolong the repository transaction during the acdhOeaw\acdhRepoIngest\MetadataCollection::filterResources()

- PHP
Published by zozlak over 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - MetadataCollection enchantments

  • acdhOeaw\acdhRepoIngest\MetadataCollection::preprocess() extracted from acdhOeaw\acdhRepoIngest\MetadataCollection::index() which allows to solve an issue with preprocessing taking longer than repository's transaction timeout. To assure backward compatibility index() calls preprocess() when needed (if it wasn't called before).
  • acdhOeaw\acdhRepoIngest\MetadataCollection::setAddTitle() added allowing to adjust the automatic title creation behaviour.
    • Automatic title creation for resources missing it turned off by default.
  • Obsolete code removed from the acdhOeaw\acdhRepoIngest\MetadataCollection class.

- PHP
Published by zozlak over 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - URI normalization rules from package instead of a config

Instead of providing URI normalization rules for resource identifiers from the config file they are now read from the require acdh-oeaw/uri-norm-rules composer package.

- PHP
Published by zozlak over 5 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - MetadataIndexer error mode added

acdhOeaw\acdhRepoIngest\MetadataIndexer::index() now takes an optional third parameter allowing to set the error mode:

  • MetadataIndexer::ERRMODE_FAIL the default mode in which a first HTTP 400 response generated on a repository resource creation/update breaks the import.
  • MetadataIndexer::ERRMODE_PASS in this mode HTTP 400 responses don't break the import but turn of the autocommit and cause an error to be thrown at the end of import. This mode allows to collect metadata problems with all resources speeding up the curation process.

- PHP
Published by zozlak almost 6 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Indexer::setParent() strictLocations parameter added

An additional bool $strictLocations parameter added to the acdhOeaw\acdhRepoIngest\Indexer::setParent() method allowing to choose how strictly contained paths described in the parent resource's metadata should be checked.

- PHP
Published by zozlak almost 6 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

- PHP
Published by zozlak almost 6 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Bugfixes

- PHP
Published by zozlak almost 6 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Indexer class API adjusted

The containerDir and the containerToUriPrefix configuration properties were removed. As they are different for every ingestion it made no sense to store them in a common configuartion file. They are now taken by the acdhOeaw\acdhRepoIngest\Indexer class constructor instead.

- PHP
Published by zozlak almost 6 years ago

https://github.com/acdh-oeaw/arche-lib-ingest - Initial release

- PHP
Published by zozlak almost 6 years ago