Recent Releases of https://github.com/aliceo2group/control

https://github.com/aliceo2group/control - v1.43.0

Metrics now contain environmentId and runtype tags. Better error handling in core.

What's Changed

  • OCTRL-981: Do not print error for BASICTASKTERMINATED event with no parent role by @Copilot in https://github.com/AliceO2Group/Control/pull/741
  • Fix race condition in tests by @justonedev1 in https://github.com/AliceO2Group/Control/pull/747
  • OCTRL 1042: Add run type to metrics calls by @justonedev1 in https://github.com/AliceO2Group/Control/pull/748

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.42.0...v1.43.0

- Go
Published by justonedev1 10 months ago

https://github.com/aliceo2group/control - v1.42.0

This release includes Kafka event improvements targeted for COG, fixes for reconnecting DCS gateway if it's unavailable at ECS startup and calling ODC Stop when environment goes to ERROR. Lastly, there are several improvements to the documentation.

What's Changed

  • [docs] Adding link to environment workflow diagram by @justonedev1 in https://github.com/AliceO2Group/Control/pull/735
  • OCTRL-1035 Use one version of protoc by @justonedev1 in https://github.com/AliceO2Group/Control/pull/736
  • Regenerated protofiles by @justonedev1 in https://github.com/AliceO2Group/Control/pull/737
  • OCTRL-1021 Add package documentation for all Go packages by @Copilot in https://github.com/AliceO2Group/Control/pull/733
  • OCTRL-1008 Attempt reconnecting to ecs-dcs gateway at core startup by @Copilot in https://github.com/AliceO2Group/Control/pull/734
  • Call ODC STOP when an environment goes to ERROR due to ECS-controlled task erroring by @knopers8 in https://github.com/AliceO2Group/Control/pull/732
  • Add a message for the event reporting successful environment creation by @knopers8 in https://github.com/AliceO2Group/Control/pull/739
  • Add missing transitionStatus to Environment Events by @knopers8 in https://github.com/AliceO2Group/Control/pull/738
  • OCTRL-1039 Documentation of artificial states and transitions by @justonedev1 in https://github.com/AliceO2Group/Control/pull/740
  • Simplify critical hook execution error targeted towards operators by @knopers8 in https://github.com/AliceO2Group/Control/pull/745
  • Add Copilot agent instructions by @knopers8 in https://github.com/AliceO2Group/Control/pull/744
  • Add test runcounter.txt to .gitignore by @knopers8 in https://github.com/AliceO2Group/Control/pull/743
  • [build] Bump to v1.42.0 by @knopers8 in https://github.com/AliceO2Group/Control/pull/746

New Contributors

  • @Copilot made their first contribution in https://github.com/AliceO2Group/Control/pull/733

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.41.0...v1.42.0

- Go
Published by knopers8 10 months ago

https://github.com/aliceo2group/control - v1.41.0

ECS is now monitoring the length of communication with other systems (plugins) and our own hooks.

What's Changed

  • OCTRL-1033 Enhance monitoring with call durations by @justonedev1 in https://github.com/AliceO2Group/Control/pull/730
  • No need to force ERROR when GO_ERROR succeeded by @knopers8 in https://github.com/AliceO2Group/Control/pull/731

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.40.0...v1.41.0

- Go
Published by justonedev1 11 months ago

https://github.com/aliceo2group/control - v1.40.0

This release allows controlled tasks to retrieve originalrunnumber for replayed runs.

What's Changed

  • Fix link in operation_order.md by @knopers8 in https://github.com/AliceO2Group/Control/pull/727
  • [core] propagate originalrunnumber as FairMQ property to tasks by @knopers8 in https://github.com/AliceO2Group/Control/pull/726
  • document what is safe about SafeStatus and SafeState by @knopers8 in https://github.com/AliceO2Group/Control/pull/728
  • [core] Demote cleanup script failures to Support by @knopers8 in https://github.com/AliceO2Group/Control/pull/729

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.39.1...v1.40.0

- Go
Published by knopers8 12 months ago

https://github.com/aliceo2group/control - v1.39.1

A patch release to remove setting boundaries on FairMQ versions, which apparently does not work anyway.

What's Changed

  • Avoid specifying the FairMQ version by @ktf in https://github.com/AliceO2Group/Control/pull/725

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.39.0...v1.39.1

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.39.0

This release brings sorting of active environments being used by Big Screen.

What's Changed

  • [OCTRL-932] Sort list of runs provided to Monitoring by @justonedev1 in https://github.com/AliceO2Group/Control/pull/721

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.38.0...v1.39.0

- Go
Published by justonedev1 about 1 year ago

https://github.com/aliceo2group/control - v1.38.0

This release brings fixes for Qodana static analyser reports and change of o2control.proto which allows to request detailed report of integrated services in GetEnvironments gRPC call.

What's Changed

  • [doc] A bit more doc about the DCS mock setup by @knopers8 in https://github.com/AliceO2Group/Control/pull/716
  • update release procedure - delete obsolete, add patch release by @knopers8 in https://github.com/AliceO2Group/Control/pull/718
  • More permissive margins in monitoring tests by @knopers8 in https://github.com/AliceO2Group/Control/pull/719
  • [OCTRL-1012]: GetEnvironments is missing ODC devices info even with flag set by @justonedev1 in https://github.com/AliceO2Group/Control/pull/720
  • Fixes for probable bugs reported by Qodana by @knopers8 in https://github.com/AliceO2Group/Control/pull/717
  • Bump to v1.38.0 by @justonedev1 in https://github.com/AliceO2Group/Control/pull/722

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.37.0...v1.38.0

- Go
Published by justonedev1 about 1 year ago

https://github.com/aliceo2group/control - v1.37.0

This release brings a fix for undeployable environment retries, a possibility to request state sequences from DCS mock and improved documentation.

What's Changed

  • OCTRL-1026 [core] allow to specify a test state sequence upon DCS requests by @knopers8 in https://github.com/AliceO2Group/Control/pull/713
  • [build] bump enumer to fix generation with go 1.24 by @knopers8 in https://github.com/AliceO2Group/Control/pull/711
  • Reorganize and extend documentation by @knopers8 in https://github.com/AliceO2Group/Control/pull/697
  • [doc] document ECS2DCS2ECS mock server usage by @knopers8 in https://github.com/AliceO2Group/Control/pull/715
  • [OCTRL-1010] Fix automatic Mesos retry by @justonedev1 in https://github.com/AliceO2Group/Control/pull/714

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.36.0...v1.37.0

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.36.0

This release includes a necessary change in the trigger plugin to allow COG to listen to trigger stop kafka events.

What's Changed

  • [core] split TRG cleanup into two calls for distinguishable kafka events by @knopers8 in https://github.com/AliceO2Group/Control/pull/709

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.35.0...v1.36.0

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.35.0

This release brings: - reworked metrics system which allows for creating histograms and performs better batching - fixes to kafka events (their content and reporting in metrics) - reverted possibility to survive failed non-critical tasks during environment deployment

What's Changed

  • [core] OCTRL-1003 by @justonedev1 in https://github.com/AliceO2Group/Control/pull/698
  • [build] remove walnut by @knopers8 in https://github.com/AliceO2Group/Control/pull/703
  • [OCTRL-1016]: Missing runNumber in KafkaEvent on topic aliecs.run by @justonedev1 in https://github.com/AliceO2Group/Control/pull/705
  • Kafka adjustments by @justonedev1 in https://github.com/AliceO2Group/Control/pull/699
  • remove the obsolete prototype code by @knopers8 in https://github.com/AliceO2Group/Control/pull/702
  • [core] less verbose log in TRG plugin + fix missing run number by @knopers8 in https://github.com/AliceO2Group/Control/pull/706
  • demote MesosCommand send error log by @knopers8 in https://github.com/AliceO2Group/Control/pull/707
  • Reapply "[core] INVARIANT status introduced" by @justonedev1 in https://github.com/AliceO2Group/Control/pull/708

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.34.2...v1.35.0

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.34.3

This patch release brings a fix to include run number in kafka events in unhealthy run stop scenarios. It is based on top of v1.34.2

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.34.2...v1.34.3

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.34.2

This is a patch release with a fix for an issue affecting InfoLogger.

What's Changed

  • [core] fix malformatted fillinfofill_number passed to controlled tasks by @knopers8 in https://github.com/AliceO2Group/Control/pull/704

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.34.1...v1.34.2

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.34.1

This patch release ensures that expendable and rmsjobid fields from ODC devices are propagated to the relevant clients.

What's Changed

  • [odc] plumbing for expendable and rmsjobid in odcdevice by @justonedev1 in https://github.com/AliceO2Group/Control/pull/701

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.34.0...v1.34.1

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.34.0

This release includes dependency and proto updates as well as minor bug fixes.

What's Changed

  • [core] fix missing number of tasks in Environment calls by @knopers8 in https://github.com/AliceO2Group/Control/pull/693
  • Bump golang.org/x/net from 0.33.0 to 0.38.0 by @dependabot in https://github.com/AliceO2Group/Control/pull/690
  • [core] fix incorrect Sprintf usage in CCDB plugin by @knopers8 in https://github.com/AliceO2Group/Control/pull/696
  • [core] put error in the Error field of events by @knopers8 in https://github.com/AliceO2Group/Control/pull/694
  • [build] Use golang 1.24.2 in CI by @knopers8 in https://github.com/AliceO2Group/Control/pull/695
  • [protos] update protos by @justonedev1 in https://github.com/AliceO2Group/Control/pull/700

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.33.0...v1.34.0

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.33.0

This release brings the support for sending the replayed run number to CCDB GRP object and removes one repeating log which was considerably contributing to the total volume of logs. The cause of the warning will be addressed very soon.

What's Changed

  • [core] include original run number in CCDB GRP object for synthetic runs by @knopers8 in https://github.com/AliceO2Group/Control/pull/691
  • [core] disable warning for too metrics and increase metrics buffer size by @justonedev1 in https://github.com/AliceO2Group/Control/pull/692

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.32.1...v1.33.0

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.32.1

This is a minor patch which demotes a warning log.

What's Changed

  • [core] delegate too many metrics warning to devel by @justonedev1 in https://github.com/AliceO2Group/Control/pull/687

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.32.0...v1.32.1

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.32.0

This release involves optimization of sending events to Kafka by using batches and reverts the change introduced in https://github.com/AliceO2Group/Control/pull/678 until a recent issue in the production system is fully understood.

What's Changed

  • [OCTRL-1001]: Levarage kafka-go writer's ability to send messages in batches by @justonedev1 in https://github.com/AliceO2Group/Control/pull/683
  • Update GH runner from deprecated ubuntu-20.04 to ubuntu-24.04 by @knopers8 in https://github.com/AliceO2Group/Control/pull/684
  • update GH runners to use golang v1.22.2 by @knopers8 in https://github.com/AliceO2Group/Control/pull/685
  • Revert "[core] INVARIANT status introduced" by @justonedev1 in https://github.com/AliceO2Group/Control/pull/686

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.31.0...v1.32.0

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.31.0

This release fixes the AliECS core behaviour if a non-critical task fails during deployment and removes concurrency in writing events to kafka.

What's Changed

  • [OCTRL-948] Failing to deploy non-critical tasks should not prevent deployment by @justonedev1 in https://github.com/AliceO2Group/Control/pull/678
  • Added CODEOWNERS file by @justonedev1 in https://github.com/AliceO2Group/Control/pull/682
  • [OCTRL-998]: Sending kafka messages in wrong order by @justonedev1 in https://github.com/AliceO2Group/Control/pull/680

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.30.1...v1.31.0

- Go
Published by knopers8 about 1 year ago

https://github.com/aliceo2group/control - v1.30.1

This is a patch release which fixes a concurrency issue described in https://its.cern.ch/jira/browse/OCTRL-999

What's Changed

  • [core] fix a recursive mutex locking in task by @knopers8 in https://github.com/AliceO2Group/Control/pull/677

- Go
Published by knopers8 over 1 year ago

https://github.com/aliceo2group/control - v1.29.1

This is a patch release which fixes a concurrency issue described in https://its.cern.ch/jira/browse/OCTRL-999

- Go
Published by knopers8 over 1 year ago

https://github.com/aliceo2group/control - v1.30.0

This release brings fixes and updates to the DCS plugin.

What's Changed

  • [core] More concise logging calls in DCS plugin by @knopers8 in https://github.com/AliceO2Group/Control/pull/672
  • [core] ECS waits until DCS operation completes in case a detector fails by @knopers8 in https://github.com/AliceO2Group/Control/pull/675
  • [core] Sync DCS proto with ecs2dcsgateway by @knopers8 in https://github.com/AliceO2Group/Control/pull/676
  • [doc] Update DCS PFR documentation by @knopers8 in https://github.com/AliceO2Group/Control/pull/673
  • [core] Remove ddl_list from defaultable values in DCS plugin by @knopers8 in https://github.com/AliceO2Group/Control/pull/674

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.29.0...v1.30.0

- Go
Published by knopers8 over 1 year ago

https://github.com/aliceo2group/control - v1.29.0

This release brings improvements for ECS-GUI communication, changes for protobuf/grpc bump attempt in ALICE and go dependency security updates.

What's Changed

ROC deep cleanup procedure was enabled for v1.28.1 and disabled again for v1.29.0 * Added deep cleanup procedure (enabled by default). by @rdivia in https://github.com/AliceO2Group/Control/pull/664 * Disable ROC deep cleanu procedure by @knopers8 in https://github.com/AliceO2Group/Control/pull/670

Improvements for ECS-GUI communication * [core] OCTRL-989 by @justonedev1 in https://github.com/AliceO2Group/Control/pull/666 * [OCTRL-991]: Include timestamp in state related gRPC calls by @justonedev1 in https://github.com/AliceO2Group/Control/pull/665

Security updates in go dependencies * Bump github.com/expr-lang/expr from 1.16.1 to 1.17.0 by @dependabot in https://github.com/AliceO2Group/Control/pull/667 * Bump github.com/go-git/go-git/v5 from 5.11.0 to 5.13.0 by @dependabot in https://github.com/AliceO2Group/Control/pull/669

Protobuf/grpc bump changes * Support newer versions by @ktf in https://github.com/AliceO2Group/Control/pull/668 * [occ] bump OCC to c++20 by @knopers8 in https://github.com/AliceO2Group/Control/pull/671

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.28.1...v1.29.0

- Go
Published by knopers8 over 1 year ago

https://github.com/aliceo2group/control - v1.28.1

This is a patch release which corrects expected variable name to activate sending enabled links to DCS - [core] sending enabled links to DCS should require a lower-case var - utils/helpers unit test

- Go
Published by knopers8 over 1 year ago

https://github.com/aliceo2group/control - v1.28.0

Support for propagating enabled detector links to DCS (OCTRL-980) and related refactoring/testing: - [apricot] consequently return Go data structures in Service - [apricot] YAML backend support for GetHost/CRU/Endpoint methods + tests - [apricot] Support for retrieving linkIDs by CRU endpoint and by detector - [configuration] extend template test package, incl. utility and inventory functions - [core] DCS: send enabled link IDs for selected detectors

DCS PFR support improvement: - [core] Support partial DCS PFR if some detectors not ready for PFR

New workflow template variable for better display of large text edit boxes in COG: - [core] add Rows field to the VarStack variables

Kafka events tuning for better event synchronization, needed for BKP: - [core] changes to distribution of kafka messages to partitions - [core] use Taskid for kafka task events instead of Environmentid

SOR/EOR timestamps now available for tasks on EPNs: - [core] ODC: Send SOR/EOR timestamps as FMQ properties

Core crash fix: - [core] Ensure doNewEnvironmentAsync gets its own copy of args

- Go
Published by knopers8 over 1 year ago

https://github.com/aliceo2group/control - v1.27.0

This release includes several fixes and improvements to error reporting, as well as the propagation of pdp_beam_type to ODC tasks.

  • Fixes:

    • [core] protect from accessing nil t.Task
    • [core] fixing and refactoring monitoring
    • [core] fix nil task access
  • Error reporting:

    • [core] improvements in error reporting during env deployment and configuration
    • [core] do not warn about tasks not in roster when it's expected
    • [core][executor] fine-tuning operator logs
    • OCTRL-951 [core][executor] improve reporting error transition
    • OCTRL-900 [core] demote getConfig warnings containing debug information
    • OCTRL-759 [core] clearer deployment failure logs for OPS
  • ODC tasks communication:

    • [core] propagate pdpbeamtype as FairMQ property to ODC tasks
  • Documentation:

    • Update operation_order.md with trg.RunStop change

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.26.0

This release includes a new metrics publishing endpoint, a new sort function in the workflow template system to sort JSON lists, and miscellaneous fixes.

  • Metrics:

    • [core] Adding http metrics endpoint
    • fixup! [core] Adding http metrics endpoint
  • JSON list sorting:

    • [core] Add json.Sort function to workflow template context
    • [core] Sort detector list before including in ODC payload
  • Miscellaneous:

    • [build] Fix make fdset command to include common.proto
    • [occ] do not call iterateCheck when in ERROR
    • Bump golang.org/x/crypto from 0.21.0 to 0.31.0

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.25.0

This release includes fixes to events emitted to Kafka, improved handling of controlled node unreachable conditions, and documentation improvements.

  • Events:

    • [core] Report to Kafka source state if transition failed
    • [core] Handle gRPC code DeadlineExceeded in DCS client
  • Handling of node unreachable:

    • OCTRL-949 [core] Improve reaction to controlled nodes becoming unreachable
  • Documentation:

    • [build] Fix protoc call for o2control.proto docs generation
    • [build] Regenerate apidocs
    • [coconut] Improve role query command documentation

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.24.0

This release moves FairMQ sockets into the abstract namespace to avoid polluting /tmp, and fixes a core stuck issue.

  • FairMQ abstract namespace:

    • [core] Pass to FairMQ tasks abstract namespace endpoint paths
  • Core stuck fix:

    • [core] avoid stuck updateTaskStatus due to mulitple mesos updates

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.23.2

This release includes a fix for a core stuck issue.

  • [core] removed wrong code in environment/manager.go TasksStateChangedEvent
  • format everything with gofmt -s -w .

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.23.1

This release improves logging levels for deployment error messages.

  • moved logs connected to unknown undeployable to Devel

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.23.0

This release improves timeout handling in the DD scheduler and TRG clients.

  • DD scheduler:

    • [core] Increase default DD scheduler gRPC timeout
    • [core] Make new ddsched timeout only apply to GetData status calls
    • [core] Remove RPC interceptor
  • TRG:

    • [core] Ensure TRG timeouts are obeyed + add polling timeout

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.22.1

This release includes an update to the legacy Bookkeeping interface for compatibility with its current protofile.

  • [core] Pull fresh Bookkeeping protofiles
  • [core] Regenerate Bookkeeping protofiles
  • [core] Adapt to new Bookkeeping lhcInfo parameter names

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.22.0

This release includes a fix for a deployment failure, and improves communication with the DD scheduler by adding a timeout.

  • Deployment failure:

    • fix error handling in case of no resources error
    • Revert "use two channel to communicate mesos REVIVE"
  • DD scheduler timeout:

    • [core] ddsched plugin gRPC calls have timeout by default

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.21.1

This release includes a fix for improper interaction with Mesos when deploying a new environment.

  • use two channel to communicate mesos REVIVE

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.21.0

This release includes improvements to DCS events and log messages, improvements to Kafka events consumed by Bookkeeping, and miscellaneous fixes and improvements.

  • DCS integration:

    • [core] Include detector in DCS event messages to IL
    • [core] Include state in DCS op failure events
    • removed DCS EOF kafka message in PFR and EOR
  • Kafka interface:

    • [common] Add nanosecond timestamp + workflow info to environment events
    • [common] Include nanosecond timestamp in all Kafka events
    • [core] Include workflow template info (incl public) in all env events
  • Miscellaneous:

    • [apricot] add mock source for configuration for easier mocking in tests
    • [build] Ensure fdset files are built correctly
    • [core] disable empty aggregator roles

- Go
Published by teo over 1 year ago

https://github.com/aliceo2group/control - v1.20.0

This release contains multiple improvements in Kafka messages emitted by DCS plugins, which will allow for following DCS progress in ECS GUI.

What's Changed

  • [OCTRL-928] Add kafka event for DCS SOR state SOR_PROGRESSING by @justonedev1 in https://github.com/AliceO2Group/Control/pull/616
  • added string representation of dcsEvent state to DCS by @justonedev1 in https://github.com/AliceO2Group/Control/pull/617
  • Missing kafka messages for OCTRL-927 and OCTRL-928 by @justonedev1 in https://github.com/AliceO2Group/Control/pull/618

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.19.2...v1.20.0

- Go
Published by knopers8 almost 2 years ago

https://github.com/aliceo2group/control - v1.19.2

What's Changed

  • removing kafka event in DCS plugin for EOF received from DCS by @justonedev1 in https://github.com/AliceO2Group/Control/pull/615

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.19.1...v1.19.2

- Go
Published by knopers8 almost 2 years ago

https://github.com/aliceo2group/control - v1.19.1

What's Changed

  • Fix updating JIT generated templates in case of config change by @knopers8 in https://github.com/AliceO2Group/Control/pull/614

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.19.0...v1.19.1

- Go
Published by knopers8 almost 2 years ago

https://github.com/aliceo2group/control - v1.19.0

What's Changed

  • AgentCache thread safety by @justonedev1 in https://github.com/AliceO2Group/Control/pull/608
  • OCTRL-920 Fixes for stuck calibration runs by @knopers8 in https://github.com/AliceO2Group/Control/pull/609
  • OCTRL-779 Multiple mesos executor fix by @justonedev1 in https://github.com/AliceO2Group/Control/pull/613

Full Changelog: https://github.com/AliceO2Group/Control/compare/v1.18.0...v1.19.0

- Go
Published by knopers8 almost 2 years ago

https://github.com/aliceo2group/control - v1.18.0

This release extends the workflow template context with strings.IsTruthy and strings.IsFalsy functions, and fixes an issue where the legacy CreateAutoEnvironment code path didn't emit events to Kafka.

  • AutoEnvironment events:

    • [core] Emit environment events from CreateAutoEnvironment
  • IsTruthy/IsFalsy

    • [core] Add IsTruthy/IsFalsy to template system API
    • [docs] Document AliECS workflow/task template language

- Go
Published by teo almost 2 years ago

https://github.com/aliceo2group/control - v1.17.0

This release includes a fix for an issue with workflow parameter precedence, as well as a refactor of the hierarchical key-value store.

  • KV store:

    • [common] Upgrade hierarchical KV store to Go generics + unit tests
    • [common] Improve hierarchical KV store test cases
    • [common] Fix override with empty value issue
    • [core] Use gera.Map[string, string] instead of gera.StringMap
    • [core] Remove gera.StringMap
    • [core] Use custom unmarshaler in gera.Map[string, string] instances
    • [core] Test kvStoreUnmarshalYAMLWithTags and YAML→workflow unmarshal
    • [core] Add comments on gera and defaults/vars/userVars mechanism
  • Documentation:

    • Update documentation index

- Go
Published by teo almost 2 years ago

https://github.com/aliceo2group/control - v1.16.1

This patch release fixes an issue with integration plugin call management during environment teardown.

  • [core] OCTRL-912 Cancel calls pending await upon environment destruction

- Go
Published by teo almost 2 years ago

https://github.com/aliceo2group/control - v1.16.0

This release includes a fix to the DD scheduler plugin and improvements to the environment teardown sequence.

  • DD scheduler:

    • [core] Make StfB/StfS-to-FLP maps local for each request
  • Environment teardown:

    • [core] Fix missing env ID in a teardown log
    • [core] OCTRL-911 Transitions should not be performed concurrently
    • [core] OCTRL-911 do not teardown an environment in DONE

- Go
Published by teo almost 2 years ago

https://github.com/aliceo2group/control - v1.15.0

This release includes support for user information sent from GUI to the Bookkeeping system, support for hooks to be placed before or after run-related timestamps, and miscellaneous fixes and improvements.

  • Bookkeeping and GUI user information:

    • [build] Avoid protobuf namespace clash
    • [coconut] Include user@host in relevant coconut requests
    • [core] New protofile common.proto + User type + regenerate proto
    • [core] Include last known request user in all env and run events
    • [core] Pull and patch Bookkeeping protofiles
    • [core] Regenerate Bookkeeping proto code
    • [core] Update Bookkeeping client for latest proto
    • [core] Set proto User.externalId to explicit presence
  • Allow hooks to be placed before or after setting a run-related timestamp:

    • [core] document and test when transition can be cancelled
    • [core] delete runnumber only after all STOPACTIVITY hooks are called
    • OCTRL-902 [core] Set run timestamps before executing triggers with weight 0
    • OCTRL-899 [core] use SOSOR and EOEOR as run duration for GRP ECS object
  • Miscellaneous:

    • added logging filters for IL
    • [executor] Chmod sandbox directory 750→755 so task can read it
  • Documentation:

    • [docs] Add documentation on DCS op behaviour

- Go
Published by teo almost 2 years ago

https://github.com/aliceo2group/control - v1.14.0

This release includes miscellaneous core improvements and bug fixes.

  • Miscellaneous improvements:

    • [core] test environment's FSM, handling hooks and fix discovered issues
    • [core] OCTRL-891 propagate pdpbeamtype and pdp sor override to tasks
    • [core] Useless lock is useless
  • Bug fixes:

    • [core] Publish correct state when tasks_ is done
    • [occ] avoid a leak in JsonMessage::Deserialize
  • Documentation:

    • [docs] Update information on AliECS production deployment
    • [docs] Mention usage of cron in checker script

- Go
Published by teo almost 2 years ago

https://github.com/aliceo2group/control - v1.13.0

This release includes various fixes and improvements, including limiting error payload sizes sent to the GUI, addressing a potential race with Mesos offers handling, and more.

  • Mesos offers handling:

    • [core] Improve handling of Mesos resource offers
  • Miscellaneous fixes:

    • [core] Set state to DONE in last autoEnv event - stopgap until Kafka switch
    • [core] Limit ODC error string length
  • Testing and documentation:

    • test the plugin system
    • Update kafka.md

- Go
Published by teo about 2 years ago

https://github.com/aliceo2group/control - v1.12.0

This release includes ECS state translation for ODC partitions and devices, support for environment event production in auto-transitioning environments, improvements to critical trait representation, and many miscellaneous improvements and fixes.

  • Critical trait:

    • [coconut] Show critical trait of tasks in table
    • [core] Remove TaskClassInfo and add critical trait to ShortTaskInfo
    • [core] Track role criticality and return actual critical trait of tasks
    • [core] Task must return traits for non-basic tasks
  • Events in auto-transitioning environments:

    • [core] Emit environment events for Teardown and AutoEnv
  • ODC to ECS state translation:

    • [core] Convert ODC device and env states to ECS and publish
    • [core] Move state and transition into their own sm package
    • [core] Account for unlikely ODC state "OK"
    • [core] Consistent variable naming
    • [core] New ODC-ECS state mapping, based on previous ECS state
    • [core] Test ODC-ECS state mapping
    • [core] Move task state test to sm package
    • [core] Strong typing for ODC event payloads
    • [core] Send ODC device IDs as string for JS compatibility
    • [core] Account for ODC state MIXED as INVARIANT
    • [core] Fix test
    • [core] Make test more similar to real world behaviour
    • [core] Fix issue with ECS state reverting to UNKNOWN
    • [core] Ensure we don't overwrite the devices now that it's a pointer
  • Miscellaneous:

    • [core] Push CallEvents to aliecs.call topic
    • [core] log the failed JIT workflow and detector, push DPL out only to IL
    • [core] demote message about replying to GetEnvironment call
    • remove generic targets in the operation order doc, since they are not supported
    • spurrious brackets
    • fix links
    • fix detail
    • [occ] Fix deserialization of ConfigEntry with empty value
    • OCTRL-893 [core] always log why the environment goes into ERROR
    • OCTRL-894 [core] a FINISHED/DONE task should have INACTIVE status
    • OCTRL-870 [core] Handle escaped config URIs in DPL commands correctly
    • OCTRL-901 Document the order of actions performed during SOR and EOR

- Go
Published by teo about 2 years ago

https://github.com/aliceo2group/control - v1.11.0

This release adds support for pushing run information to a new Kafka topic, aliecs.run. It also adds a distinction between start of SOR, end of SOR, start of EOR, and end of EOR (respectively SOSOR, EOSOR, SOEOR, EOEOR), and ensures the lifetime of these values is consistent both within the core and when pushed to controlled tasks. Furthermore, it fixes an issue which prevented FairMQ devices from quitting cleanly from their ERROR state.

  • aliecs.run topic and timestamps:

    • [core] Emit run SOSOR/EOSOR/SOEOR/EOEOR events to aliecs.run topic
    • [core] Ensure SOEOR/EOEOR events are pushed in case of kill
    • [core] Push runendtimems at EOR both camelCase and snakecase
    • [core] Clear all old run timestamps on SOSOR
    • [core] Clarify timestamp variables naming
  • FairMQ devices behaviour:

    • OCTRL-888 [occ] If we see FairMQ's ERROR state, we should exit

- Go
Published by teo about 2 years ago

https://github.com/aliceo2group/control - v1.10.0

This release adds support for subdirectories within component configuration prefixes, and removes component configuration entry versioning. Specifically this means that all existing timestamped component configuration entry keys are not transparent any more, and must be referenced explicitly if desired. The recommendation is to redeploy the full component configuration tree.

  • Component configuration subpaths:
    • [apricot] allow for arbitrary number of entry subfolders in HTTP service
    • [apricot] test the HTTP handler for Apricot
    • [common] OCTRL-805 allow to group entries in subfolders
    • [common] get rid of the concept of versioning and timestamps
    • [common] add multiple unit tests concerning configuration, include fixes

- Go
Published by teo about 2 years ago

https://github.com/aliceo2group/control - v1.9.2

This patch release adds snake_case versions of all special SOR parameters (see https://github.com/AliceO2Group/Control/blob/master/docs/handbook/configuration.md#variables-pushed-to-controlled-tasks) to make task compatibility easier between FLP and EPN contexts.

  • [core] Push all AliECS-provided run parameters as snake_case+camelCase

- Go
Published by teo about 2 years ago

https://github.com/aliceo2group/control - v1.9.1

This patch release includes a fix for a core freeze, as well as a feature flag to disable Kafka event production for debugging.

  • [core] added internal non mutexed method getParentRolePath to task.go
  • [core] added the ability to turn off kafka sending

- Go
Published by teo about 2 years ago

https://github.com/aliceo2group/control - v1.9.0

This release fixes multiple race conditions in the AliECS core.

  • Race conditions:

    • [core] fixed race condition for getTasks in roster.go
    • [core] race condition fix in aggregator.role
    • [core] Added missing mutex around eventStream in environment
    • [core] OCTRL-886 correctly use mutex in environment/manager.go
    • [core] OCTRL-889 proper usage of mutex in environment.go
  • Miscellaneous fixes:

    • add a test for GetTasks and GetRoles
    • Bump golang.org/x/net from 0.22.0 to 0.23.0
    • [core] inform, not warn if there is one executor on a node
    • [doc] update the release documentation

- Go
Published by teo about 2 years ago

https://github.com/aliceo2group/control - v1.8.3

This patch release solves the issue with 824 billion tasks seen in GetEnvironments regardless of the architecture used to build the core, reported in OCTRL-882

  • golang bump to 1.22

- Go
Published by knopers8 about 2 years ago

https://github.com/aliceo2group/control -

This patch release is a fix attempt for OCTRL-881

  • [build] added coverage step to makefile
  • [core] keeping created kafka writers in global object

- Go
Published by knopers8 about 2 years ago

https://github.com/aliceo2group/control - v1.8.1

OCTRL-851 [core] do not attempt to transition inactive tasks

- Go
Published by knopers8 about 2 years ago

https://github.com/aliceo2group/control - v1.8.0

This release includes a fix to non-critical role handling, and adds production of role events to the aliecs.role topic.

  • Workflow state handling:
    • [core] non-critical roles should never update parents state
    • [core] test state propagation across role trees
    • [core] remove obsolete workaround to make ERROR state always win
  • aliecs.role topic:
    • [core] Emit role events to aliecs.role topic

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.7.3

This patch release fixes a minor build failure due to recent build system changes.

  • [occ] make directory for protofiles

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.7.2

This patch release improves the OCC build system to avoid a race involving the protobuf code generator.

  • [occ] switched to cmake functions officially provided by libproto

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.7.1

This patch release includes bug fixes concerning environment state management.

  • [core][executor] more details in logs related to handling roles
  • [core] tasks no longer update state to their parents if they are not critical
  • [core] add safestate unit test, demonstrate two bugs
  • [core] a MIXED subrole state shouldn't always invalidate the parents' ERROR
  • [core] remove the test case in safestate_test.go regarding non-critical tasks

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.7.0

This release includes an all-new event producer feature, which enables AliECS to publish events to Kafka for consumption by other O² components. It also includes several bug fixes to configuration access by JIT subworkflows and to task state management.

  • Kafka producer:

    • [build] Generate fdset file for decoding Kafka messages with pq
    • [common] Use events.proto in o2control.proto
    • [common] Add CallEvent
    • [common] Allow event creation with specific timestamp
    • [common] Various additions to events in events.proto
    • [common] Enable AllowAutoTopicCreation in Kafka client
    • [core] Refactor serverutil in preparation for eventstream
    • [core] Events protofile
    • [core] Additional events
    • [core] Update events.proto+o2control.proto with NewEnvironmentAsync
    • [core] Kafka wrapper
    • [core] Emit environment events
    • [core] Add task traits and CallEvent to events.proto
    • [core] Emit call events to inform on plugin calls
    • [core] Send EnvId with TaskEvents
    • [core] Rename busEvent in task.go
    • [core] Add IntegratedServiceEvent and rename Envid field
    • [core] Push env vars on workflow load
    • [core] Include parent role path in task events
    • [core] Improve call information in CallEvents
    • [core] Emit IntegratedServiceEvents from DCS
    • [core] Make sure we always output ECS detector codes, not DCS ones
    • [core] Don't forget to include error in DCS ERROR events
    • [core] Better DCS event descriptions
    • [core] Emit ddscheduler events
    • [core] Remove legacy ODC handlers
    • [core] Emit ODC events
    • [core] Emit TRG events
    • [core] Correct Kafka topic
    • [core] Emit call events to aliecs.call topic and include envId
    • [core] Enable IntegratedServiceEvents
    • [core] Pass IntegratedServiceEvents by ref
    • [core] Write to Kafka asynchronously
    • [core] Nullify odc Devices list before emitting events
    • [core] Trim down ODC events some more
    • [core] Publish ODC partition state changes
    • [core] Document events.proto and change currentRunNumber field
    • [core] Document currently unused topics
    • [docs] Document Kafka producer functionality
  • Configuration access:

    • [core] Unit tests for Query and EntriesQuery
    • [core] support query parameters in apricot URIs in JIT
    • [core] approve json-like arrays as apricot query parameters
  • Task state management:

    • [core] test task.State, demonstrate a possible bug
    • [core] assure commutativity of State in case of ERROR state
    • [core] OCTRL-846 report done roles to parent roles
  • Miscellaneous:

    • [build] Bump dependencies
    • [build] Enable tests in CI
    • [coconut] Fix Protobuf generator call

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.6.0

This release includes support for using Apricot with JIT subworkflow template generation, and improves the sequence of events sent to the GUI by an auto-transitioning environment (such as calibrations).

  • Apricot JIT support:

    • [core] hash processed config payloads from apricot during JIT update checks
  • Miscellaneous:

    • [build] Bump google.golang.org/protobuf from 1.31.0 to 1.33.0
    • [core] Improve auto-transitioning environment events to GUI

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.5.0

This release improves the handling of error conditions in auto-transitioning environments (including calibrations) and fixes two rare crashes.

  • Auto-transitioning environments:

    • [core] Ensure auto-transitioning environments always end
  • Fixes:

    • [core] Make Bookkeeping client thread-safe
    • [core] Make task manager eventStream thread safe

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.4.0

This release includes API extensions for the AliECS GUI, fixes for two issues that prevented the use of the new Calibration page because of too eager DCS interaction, and miscellaneous bug fixes.

  • API extensions:

    • [apricot] implementation of GetRuntimeEntries rpc call
    • [core] implemented GetAvailableDetectors
  • Calibration page:

    • [core] Add support for dcspfr/sorgrace_period variables
    • [core] Ensure AutoEnv SOR failure goes to ERROR
  • Miscellaneous fixes:

    • [core] Ensure the CTP readout FLP is included in BK call if enabled
    • [core] Improve wording in warning message
    • [core] Make event timestamps UnixMilli
    • [core] Enable trace logs when core is run with --veryVerbose

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.3.0

This release fixes interaction with the DCS, ensuring its operation availability is reported correctly.

  • DCS integration:
    • [core] Fix DCS op availability check + unit test
    • [core] Ensure the DCS client gets op availability on STATECHANGEEVENT

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.2.1

This release fixes a critical issue in DCS-ECS communication.

  • [core] Use PFR-specific values for DCS PFR op availability

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.2.0

This release adds support for PFR and SOR operation availability as sent by DCS, and improves DCS-related output exposed to the AliECS GUI.

  • DCS improvements:
    • [core] Pull changes in DCS protofile
    • [core] DCS client now uses operation availability values from DCS
    • [core] Reformat DCS client data output

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.1.0

This release improves log output coming from Apricot and executor.

  • [apricot] Improve log output
  • [executor] Delay acquiring executor PID for IL
  • [executor] Add executorId in critical log messages

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.0.1

This patch release ensures AliECS is always built with cgo disabled to avoid issues with the current CS8 toolchain.

  • [build] Check for minimum Go version 1.20
  • [build] Report go executable path if version insufficient
  • [build] Make sure cgo is disabled
  • [docs] Improve Apricot HTTP documentation
  • [docs] Make it look right in Mkdocs

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v1.0.0

This release includes a new HTTP API for Apricot which exposes component configuration payloads, and other bug fixes and improvements.

  • Apricot HTTP API:

    • [apricot] Implement component configuration access in HTTP API
    • [apricot] Update pongo2 to v6
    • [apricot] Use pongo2 template cache, one per component/rt/role directory
    • [apricot] Implement component config template cache invalidation
    • [apricot] Correct documentation path
    • [apricot] Add http-swaggo inline documentation to Apricot HTTP API
    • [build] Add Swaggo tool for generating Apricot HTTP docs
    • [build] Add swaggo http and update dependencies
    • [build] Commit swaggo-generated Apricot HTTP API docs
    • [docs] Add swaggo to Makefile
    • [docs] Update Apricot HTTP API intro
    • [docs] Update gRPC APIdocs for AliECS and Apricot
    • [docs] Add Apricot documentation to mkdocs.yml
    • [docs] Fix mkdocs pointer to Apricot HTTP API and improve docs
  • DCS interaction fix:

    • [core] DCS actually uses SORAVAILABLE to mean PFRAVAILABLE
  • Miscellaneous:

    • [occ] Add missing include for newer compilers
    • [build] Update dependencies
    • [build] Bump Go in GH workflow

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v0.84.0

This release includes improvements to DCS and TRG integration clients.

  • DCS client support for PFR/SOR availability checking:

    • [core] Prevent DCS PFR and SOR if ops are declared unavailable by DCS
    • [core] React to PFR/SOR UNAVAILABLE reported by DCS on response stream
    • [core] Minor consistency fix
  • TRG client support for CTP readout enabled setting:

    • [core] Update CTP proto interface
    • [core] Add support for ctpreadoutenabled forwarding to CTP service

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v0.83.1

This is a retag of v0.83.0, fixing a broken OCC build.

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v0.83.0

This release includes improvements to the ODC and DCS clients, as well as a workaround for a known FairMQ issue in the OCClite plugin.

  • ODC client:

    • [core] Only call ODC Shutdown if Run was previously called for this env
  • DCS client:

    • [core] Update DCS proto interface
    • [core] React to DCS detector TIMEOUT state
    • [core] Expose DCS last known detector state matrix to GUI
  • OCC:

    • [occ] Try to make plugin unsubscribe from FairMQ at right time

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v0.82.2

This release makes the an internal data structure of the AliECS core thread-safe in order to prevent a race condition from causing a crash.

  • [common] Protect stringmap operations with rwmutex
  • [core] Restrict raw access to stringmap underlying structure

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v0.82.1

This release increases the maximum inbound message size from ODC to 32MB to allow larger workflows on EPN.

  • [core] Double max inbound ODC message size to 32MB

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v0.82.0

This release includes a behaviour change to the executor. The executor will now stay alive even if no tasks are present.

  • Executor keep alive:
    • [executor] Keep executor alive even with no tasks

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v0.81.2

This release fixes two executor crashes.

  • [executor] Prevent crash on nil RPC connection
  • [executor] Prevent crash on bad incoming message

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v0.81.1

This patch release fixes a core crash in the task manager.

  • [core] Fix crash in executor/agent failed handler in task manager

- Go
Published by teo over 2 years ago

https://github.com/aliceo2group/control - v0.81.0

This release includes a fix for broken InfoLogger output in some cases of failed deployment.

  • Log output:
    • [core] Fix counters and string building issue in TasksDeploymentError

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.80.0

This release includes improvements to the API and to log messages for termination conditions.

  • ODC error logging:

    • [core] Print a fatal message to IL if ODC went to error
  • DESTROY transition information:

    • [core] Declare current transition DESTROY if tearing down env
  • Active tasks count:

    • [coconut] Display number of active/inactive/total tasks in env
    • [core] Add active/inactive task counts to proto
    • [core] Return number of active/inactive tasks in env requests

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.79.0

This release reduces the response payload size of GetEnvironments, and improves documentation.

  • GetEnvironments response:

    • [core] Implement GetEnvironmentsShortData for all plugins
    • [core] Only return short data in GetEnvironments to keep response small
  • Documentation:

    • [docs] Update coconut documentation
    • [docs] Update apidocs

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.78.80

Alpha release, testing new documentation facilities

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.78.0

This release fixes an important bug that prevented concurrent environment deployments.

  • Concurrent deployments:
    • [core] Refactor taskclass package
    • [core] Ensure task class refresh cannot happen concurrently w/ matching
    • [core] 7 day eviction policy for task class cache

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.77.0

This release improves handling of the ODC ERROR state, and disables AliECS core RPC logging.

  • RPC logging:

    • [core] Move server rpc logging to Debug severity
  • ODC ERROR handling:

    • [core] GO_ERROR if ODC is in ERROR

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.76.1

This release fixes an issue that prevented AliECS from tracking the full ODC status.

  • [core] Increase ODC call recv message size to 16MiB (was 4)

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.76.0

This release fixes a crash in the ODC plugin. It also fixes fill info variables propagation to tasks, and adds support for odc_extract_topology_resources in the ODC plugin.

  • ODC:

    • [core] Add support for odcextracttopology_resources flag
    • [core] Prevent crash on missing partition id
  • Fill info variables propagation:

    • [core] Pull fill info from varstack of root role of env

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.75.0

This release changes AliECS behavior in order to react to spontaneous transitions to ERROR coming from ODC.

  • ODC ERROR event handling:
    • [core] New event type IntegratedServiceEvent
    • [core] ODC client notifies envMan on every state change
    • [core] React to ODC state change during RUNNING with STOP->CONFIGURED

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.74.0

This release includes support for current transition information in EnvironmentInfo payloads.

  • Current transition:
    • [coconut] Show transition field for env if not empty
    • [core] Add currentTransition field to o2control.proto + regenerate
    • [core] Fill and return currentTransition field

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.73.0

This release includes a fix in the CCDB client and improved debug output.

  • CCDB client:

    • [core] Prevent nil pointer dereference in CCDB client
  • Miscellaneous:

    • [core] Include payload size for GetEnvironment integrated services data

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.72.0

This release includes build system improvements, an overhaul of the unit testing framework, and an update of all dependencies. It also implements the gRPC health checking API to the AliECS core and Apricot, and brings minor fixes.

  • Dependencies:

    • [build] Update all dependencies
    • [core] Adapt to current looplab/fsm package
  • gRPC health checking:

    • [build] Add grpc health package and bump dependencies
    • [core] Add basic grpc health check to core and Apricot
  • Build system and testing framework:

    • [build] Relocate shmcleaner script
    • [build] Point to where the tests actually are
    • [build] Treat separately execution of Go tests vs. Ginkgo tests
    • [build] Improve make test and make debugtest output
    • [build] Update Ginkgo/Gomega testing framework
    • [build] Use new instead of deprecated ginkgo params
    • [core] Ensure the Repo structure always answers with a correct protocol
    • [core] Fix repo system tests
  • Miscellaneous:

    • [core] Enable RPC call logging in non-verbose operation
    • [OCTRL-802][core] no fill info if the last fill had no stable beams

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.71.0

This release adds integration features, including extended partition and topology information from EPN, and support for the TRG PrepareForRun operation.

  • TRG PrepareForRun:

    • [core] Update CTP protofile and convert CRLF->LF
    • [core] Regenerate CTP proto
    • [core] Implement CTP PrepareForRun call
    • [core] Log output consistency
  • ODC extended EPN partition and topology information:

    • [core] Pull ODC protofile
    • [core] Regenerate ODC proto
    • [core] Query ODC for detailed partition information
    • [core] Ensure correct JSON output for odc.GetEnvironmentData
    • [core] Output ODC task id as string
    • [core] Make sure to output ODC details
  • Miscellaneous:

    • [core] Fix crash
    • [core] Fix NumberOfHosts and implement NumberOfTasks

- Go
Published by teo almost 3 years ago

https://github.com/aliceo2group/control - v0.70.2

This patch release again fixes the setting of the autoBind property on inbound connections of FairMQ devices.

  • [occ] Special case autoBind to be pushed as bool

- Go
Published by teo about 3 years ago

https://github.com/aliceo2group/control - v0.70.1

This patch release fixes the setting of the autoBind property on inbound connections of FairMQ devices.

  • [O2-2818][core] another go at disabling the autoBind

- Go
Published by teo about 3 years ago

https://github.com/aliceo2group/control - v0.70.0

This release includes a mechanism that pushes run context information to all tasks, including LHC fill info, run type, LHC period, and O² start/end timestamps. It also includes a change that makes autoBind disabled for FairMQ devices, and updated documentation.

  • Push context information to tasks:

    • [core] use the parent role to propagate lhc info to the whole env
    • [core] Push LHC fill info, O2 start/end, run type, LHC period to tasks
    • [core] Download lhcFill.proto from BK in the Makefile
    • [occ] Push properties before FairMQ RUN and STOP transitions
    • [OCTRL-791] Allow to fetch LHC fill info from BK, propagate to varStack
  • Miscellaneous:

    • [O2-2818][core] Disable autoBind in FairMQ
    • [docs] Update docs on workflow/task configuration

- Go
Published by teo about 3 years ago

https://github.com/aliceo2group/control - v0.69.1

This release fixes an issue with runs not being closed in O² Bookkeeping.

  • [core] Cache run number for bookkeeping cleanup use

- Go
Published by teo about 3 years ago

https://github.com/aliceo2group/control - v0.69.0

This release adds support for the DCS Prepare For Run operation, as well as fixing a crash and improving log output.

  • DCS Prepare For Run:

    • [core] Update DCS protofile
    • [core] PrepareForRun request in DCS protofile
    • [core] Update DCS protofile
    • [core] Implement DCS PrepareForRun operation
  • Miscellaneous:

    • [build] Support fetching dcs.proto in make vendor
    • [core] Don't crash if task has no parent
    • [core] Enhance "creating Mesos task" message

- Go
Published by teo about 3 years ago

https://github.com/aliceo2group/control - v0.68.1

This patch release fixes an issue with default values for resource limits.

  • [core] Explicitly default to infinite cpu/mem resource limits

- Go
Published by teo about 3 years ago

https://github.com/aliceo2group/control - v0.68.0

This release includes cgroups support for CPU and memory limits, a new feature that requires a Mesos agent configuration change and ensures a misbehaving task won't block the whole FLP.

  • Resource limits:
    • [core] Support Mesos task resource limits specification for task classes
    • [core] Prevent crash in incomplete limits
    • [core] Print limits to IL
    • [core] Avoid triggering dead or inactive hooks on teardown
    • [core] Proceed with task kill even if some cannot be killed
    • [core] Explicit handling of executor/agent failed events
    • [core] Only perform a STOP transition for ACTIVE tasks
    • [core] Wait for 500ms for ERROR states to settle before GO_ERROR/STOP

- Go
Published by teo about 3 years ago

https://github.com/aliceo2group/control - v0.67.2

This release includes a crash fix and a build fix for FairMQ 1.5.x.

  • [build] Compile with FairMQ 1.5.x
  • [core] Fix crash caused by map contention in Bookkeeping plugin

- Go
Published by teo about 3 years ago

https://github.com/aliceo2group/control - v0.67.1

This patch release increases the command timeout for the CONFIGURE transition and fixes a run number acquisition issue in the CCDB plugin.

  • [core] do not complain when ccdb plugin cannot get a run number
  • [core] Increase CONFIGURE transition timeout to 120s

- Go
Published by teo about 3 years ago

https://github.com/aliceo2group/control - v0.67.0

This release includes support for internal task error events being raised by tasks. Such an event immediately transitions the environment to the ERROR state.

  • Task error events:
    • [core] React to TASKINTERNALERROR with STOP_ACTIVITY attempt
    • [core] Build TaskInternalError event
    • [executor] Support TASKINTERNALERROR event
    • [occ] Push TASKINTERNALERROR event
    • [occ] Only emit task internal error event once

- Go
Published by teo about 3 years ago