https://github.com/argoeu/ams-publisher

argo-nagios-ams-publisher is a component acting as bridge from Nagios to ARGO Messaging system. It's essential part of software stack running on ARGO monitoring instance and is responsible for forming and dispatching messages that wrap up results of Nagios probes/tests.

https://github.com/argoeu/ams-publisher

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

ams argo-messaging monitoring
Last synced: 5 months ago · JSON representation

Repository

argo-nagios-ams-publisher is a component acting as bridge from Nagios to ARGO Messaging system. It's essential part of software stack running on ARGO monitoring instance and is responsible for forming and dispatching messages that wrap up results of Nagios probes/tests.

Basic Info
  • Host: GitHub
  • Owner: ARGOeu
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 282 KB
Statistics
  • Stars: 0
  • Watchers: 14
  • Forks: 6
  • Open Issues: 0
  • Releases: 16
Topics
ams argo-messaging monitoring
Created about 9 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog

README.md

ams-publisher

Description

ams-publisher is a component acting as bridge from Nagios to ARGO Messaging system. It's essential part of software stack running on ARGO monitoring instance and is responsible for forming and dispatching messages that wrap up results of Nagios probes/tests. It is running as a unix daemon and it consists of two subsystems: - queueing mechanism - publishing/dispatching part

Messages are cached in local directory queue with the help of OCSP Nagios commands and each queue is being monitored and consumed by the daemon. After configurable amount of accumulated messages, publisher that is associated to queue sends them to ARGO Messaging system and drains the queue. ams-publisher is written in multiprocessing manner so there is support for multiple (consume, publish) pairs where for each, new worker process will be spawned.

Filling and draining of directory queue is asynchronous. Nagios delivers results on its own constant rate while ams-publisher consume and publish them on its own configurable constant rate. It's important to keep the two rates close enough so that the results don't pile up in the queue and leave it early. Component has a mechanism of inspection of rates and trends over time to keep the constants in sync. Also it's resilient to network issues so it will retry configurable number of times to send a messages to ARGO Messaging system. It's also important to note that consume and publish of the queue is a serial process so if publish is stopped, consume part of the worker will be also stopped. That could lead to pile up of results in the queue and since every result is represented as a one file on the file system, easily exhaustion of free inodes and therefore unusable monitoring instance.

More about Directory queue design

Features

Complete list of features are: - efficient and scalable directory based queueing - configurable number of (consume, publish) pairs - workers - two type of publishers: metric results and alarms - avro serialization of metric results - configurable watch rate for queue - configurable bulk of messages sent to ARGO Messaging system - configurable retry attempts in case of network connection problems - purger that will keep queue only with sound data - message rate inspection of each worker for monitoring purposes

Installation

Component is supported on CentOS 7 and Rocky 9 Linux. RPM packages and all needed dependencies are available in ARGO repositories so installation of component simply narrows down to installing a package:

dnf install -y ams-publisher

Component relies on: - argo-ams-library - interaction with ARGO Messaging - avro - avro serialization of messages' payload - python-argparse - ease build and parse of command line arguments - python-daemon - ease daemonizing of component - python-messaging - CERN's library for directory based caching/queueing - pytz - timezone manipulation

| File Types | Destination | |-------------------|----------------------------------------------------| | Configuration | /etc/ams-publisher/ams-publisher.conf | | Daemon component | /usr/bin/ams-publisherd | | Cache delivery | /usr/bin/ams-alarm-to-queue, ams-metric-to-queue | | SystemD Unit (R9) | /usr/lib/systemd/system/ams-publisher.service | | Local caches | /var/spool/ams-publisher/ | | Inspection socket | /run/ams-publisher/sock | | Log files | /var/log/ams-publisher/ |

Configuration

Central configuration is in ams-publisher.conf. Configuration consists of [General] section and [Queue_<workername>], [Topic_<workername>] section pairs.

General section

[General] Host = NAGIOS.FQDN.EXAMPLE.COM RunAsUser = nagios StatsEveryHour = 24 PublishMsgFile = False PublishMsgFileDir = /published PublishArgoMessaging = True TimeZone = UTC StatSocket = /var/run/ams-publisher/sock

  • Host - FQDN of ARGO Monitoring instance that will be part of formed messages dispatched to ARGO Messaging system
  • RunAsUser - component will run with effective UID and GID of given user, usually nagios
  • StatsEveryHour - write periodic report in system logs. Example is given in Running
  • PublishMsgFile, PublishMsgFileDir - "file publisher" that is actually only for testing purposes. If enabled, messages will not be dispatched to ARGO Messaging System, instead it will just be appended to plain text file
  • TimeZone - construct timestamp of messages within specified timezone
  • StatsSocket - query socket that is used for inspection of rates of each worker. It used by the ams-publisher Nagios probe.

Queue, Topic pair section

Eachs (queue, topic) section pair designates one worker process. Two sections are linked by giving the same workername so Public_Metrics is associated to Queue_Metrics but not Queue_MetricsDevel. Each worker means one new process spawned. One part of the process inspects local on-disk queue with results, forms and accumulates messages into in-memory buffer - consumer. Other part of the worker process dispatchs accumulated messages to ARGO Messaging system when the targeted number is reached - publisher. Publisher part is resilient to network connection problems and it can be configured for arbitrary number of connection retries.

Example of one such pair: ``` [Queue_Metrics] Directory = /var/spool/ams-publisher/metrics/ Rate = 10 Purge = True PurgeEverySec = 300 MaxTemp = 300 MaxLock = 0 Granularity = 60

[TopicMetrics] Host = msg.argo.grnet.gr Key = AMSTENANTPROJECTKEY Project = NAMEOFAMSTENANTPROJECT Topic = metricdata Bulksize = 100 MsgType = metricdata Avro = True AvroSchema = /etc/ams-publisher/metric_data.avsc Retry = 5 Timeout = 60 SleepRetry = 300 ```

  • [Queue_Metrics].Directory - path of directory queue on the filesystem where local cache delivery tools write results of Nagios tests/probes.
  • [Queue_Metrics].Rate - local cache inspection rate. 10 means that cache will be inspected 10 times at a second because thats the number of status results expected from Nagios that will be picked up verly early. For low volume tenants this could be a lower number.
  • [Queue_Metrics].Purge,PurgeEverySec,MaxTemp,MaxLock - purge the staled elements of directory queue every PurgeEverySec seconds. It cleans the empty intermediate directories below directory queue path, temporary results that exceeded MaxTemp time and locked results that exceeded MaxLock. > It is advisable to leave MaxLock = 0 which skips every result that have been transformed into a message and added into in-memory queue, but had not yet been dispatched.
  • [Queue_Metrics].Granularity - new intermediate directory in the toplevel directory queue path is created every Granularity seconds
  • [Topic_Metrics].Host,Key,Project,Topic - options needed for delivering of messages to ARGO Messaging system. Host designates the FQDN, Key is authorization token, Project represents a tenant name and Topic is final destination scoped to tenant
  • [Topic_Metric].Bulksize - accumulate a bulk of Bulksize messages before dispatching them to ARGO Messaging system in one request.
  • [Topic_Metric].MsgType - string indicating what type of message publisher is sending. It can be metric_data or alarm
  • [Topic_Metrics].Retry,SleepRetry - these configures how publisher is resilient to network connection problems and ARGO Messaging system downtimes. Default 5 x 300 means that the publisher worker will be resilient to a minimum of 1500 seconds (25 min) of connection problems retrying 5 times and sleeping aproximately 300 seconds before each next attempt
  • [Topic_Metrics].Timeout - defines the maximum amount of seconds within which request to ARGO Messaging system must be finalized. Consider raising it up for bigger Bulksize number.

Running

Components needs to be properly configured before starting it. At least one worker must be spawned so at least one directory queue needs to exist, Nagios should deliver results to it and publisher part of the worker needs to be associated to it. Afterward, it's just a matter of starting the service.

CentOS 7 and Rocky 9: systemctl start ams-publisher.service

Component periodically reports rates of each worker in system logs. It does so every StatsEveryHour. 2020-04-08 08:53:42 ams-publisher[963]: INFO - Periodic report (every 6.0h) 2020-04-08 08:53:42 ams-publisher[983]: INFO - ConsumerQueue metrics: consumed 45787 msgs in 6.00 hours 2020-04-08 08:53:42 ams-publisher[983]: INFO - MessagingPublisher metrics: sent 45700 msgs in 6.00 hours 2020-04-08 08:53:42 ams-publisher[1005]: INFO - ConsumerQueue metricsdevel: consumed 45787 msgs in 6.00 hours 2020-04-08 08:53:42 ams-publisher[1005]: INFO - MessagingPublisher metricsdevel: sent 45700 msgs in 6.00 hours 2020-04-08 08:53:42 ams-publisher[993]: INFO - ConsumerQueue alarms: consumed 164 msgs in 6.00 hours 2020-04-08 08:53:42 ams-publisher[993]: INFO - MessagingPublisher alarms: sent 164 msgs in 6.00 hours

Check the rates of configured workers for past minutes: % ams-publisherd -q 60 INFO - Asked for statistics for last 60 minutes INFO - worker:metrics published:8500 INFO - worker:alarms published:26 INFO - worker:metricsdevel published:8500 INFO - worker:metrics consumed:8588 INFO - worker:alarms consumed:26 INFO - worker:metricsdevel consumed:8587

Owner

  • Name: ARGOeu
  • Login: ARGOeu
  • Kind: organization

Α team working with the latest technologies about accounting, monitoring, messaging and eSeal Capabilities

GitHub Events

Total
Last Year

Dependencies

setup.py pypi