https://github.com/argoeu/ams-publisher
argo-nagios-ams-publisher is a component acting as bridge from Nagios to ARGO Messaging system. It's essential part of software stack running on ARGO monitoring instance and is responsible for forming and dispatching messages that wrap up results of Nagios probes/tests.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Keywords
Repository
argo-nagios-ams-publisher is a component acting as bridge from Nagios to ARGO Messaging system. It's essential part of software stack running on ARGO monitoring instance and is responsible for forming and dispatching messages that wrap up results of Nagios probes/tests.
Basic Info
Statistics
- Stars: 0
- Watchers: 14
- Forks: 6
- Open Issues: 0
- Releases: 16
Topics
Metadata Files
README.md
ams-publisher
Description
ams-publisher is a component acting as bridge from Nagios to ARGO Messaging system. It's essential part of software stack running on ARGO monitoring instance and is responsible for forming and dispatching messages that wrap up results of Nagios probes/tests. It is running as a unix daemon and it consists of two subsystems:
- queueing mechanism
- publishing/dispatching part
Messages are cached in local directory queue with the help of OCSP Nagios commands and each queue is being monitored and consumed by the daemon. After configurable amount of accumulated messages, publisher that is associated to queue sends them to ARGO Messaging system and drains the queue. ams-publisher is written in multiprocessing manner so there is support for multiple (consume, publish) pairs where for each, new worker process will be spawned.
Filling and draining of directory queue is asynchronous. Nagios delivers results on its own constant rate while ams-publisher consume and publish them on its own configurable constant rate. It's important to keep the two rates close enough so that the results don't pile up in the queue and leave it early. Component has a mechanism of inspection of rates and trends over time to keep the constants in sync. Also it's resilient to network issues so it will retry configurable number of times to send a messages to ARGO Messaging system. It's also important to note that consume and publish of the queue is a serial process so if publish is stopped, consume part of the worker will be also stopped. That could lead to pile up of results in the queue and since every result is represented as a one file on the file system, easily exhaustion of free inodes and therefore unusable monitoring instance.
More about Directory queue design
Features
Complete list of features are: - efficient and scalable directory based queueing - configurable number of (consume, publish) pairs - workers - two type of publishers: metric results and alarms - avro serialization of metric results - configurable watch rate for queue - configurable bulk of messages sent to ARGO Messaging system - configurable retry attempts in case of network connection problems - purger that will keep queue only with sound data - message rate inspection of each worker for monitoring purposes
Installation
Component is supported on CentOS 7 and Rocky 9 Linux. RPM packages and all needed dependencies are available in ARGO repositories so installation of component simply narrows down to installing a package:
dnf install -y ams-publisher
Component relies on:
- argo-ams-library - interaction with ARGO Messaging
- avro - avro serialization of messages' payload
- python-argparse - ease build and parse of command line arguments
- python-daemon - ease daemonizing of component
- python-messaging - CERN's library for directory based caching/queueing
- pytz - timezone manipulation
| File Types | Destination |
|-------------------|----------------------------------------------------|
| Configuration | /etc/ams-publisher/ams-publisher.conf |
| Daemon component | /usr/bin/ams-publisherd |
| Cache delivery | /usr/bin/ams-alarm-to-queue, ams-metric-to-queue |
| SystemD Unit (R9) | /usr/lib/systemd/system/ams-publisher.service |
| Local caches | /var/spool/ams-publisher/ |
| Inspection socket | /run/ams-publisher/sock |
| Log files | /var/log/ams-publisher/ |
Configuration
Central configuration is in ams-publisher.conf. Configuration consists of [General] section and [Queue_<workername>], [Topic_<workername>] section pairs.
General section
[General]
Host = NAGIOS.FQDN.EXAMPLE.COM
RunAsUser = nagios
StatsEveryHour = 24
PublishMsgFile = False
PublishMsgFileDir = /published
PublishArgoMessaging = True
TimeZone = UTC
StatSocket = /var/run/ams-publisher/sock
Host- FQDN of ARGO Monitoring instance that will be part of formed messages dispatched to ARGO Messaging systemRunAsUser- component will run with effective UID and GID of given user, usuallynagiosStatsEveryHour- write periodic report in system logs. Example is given in RunningPublishMsgFile,PublishMsgFileDir- "file publisher" that is actually only for testing purposes. If enabled, messages will not be dispatched to ARGO Messaging System, instead it will just be appended to plain text fileTimeZone- construct timestamp of messages within specified timezoneStatsSocket- query socket that is used for inspection of rates of each worker. It used by theams-publisherNagios probe.
Queue, Topic pair section
Eachs (queue, topic) section pair designates one worker process. Two sections are linked by giving the same workername so Public_Metrics is associated to Queue_Metrics but not Queue_MetricsDevel. Each worker means one new process spawned. One part of the process inspects local on-disk queue with results, forms and accumulates messages into in-memory buffer - consumer. Other part of the worker process dispatchs accumulated messages to ARGO Messaging system when the targeted number is reached - publisher. Publisher part is resilient to network connection problems and it can be configured for arbitrary number of connection retries.
Example of one such pair: ``` [Queue_Metrics] Directory = /var/spool/ams-publisher/metrics/ Rate = 10 Purge = True PurgeEverySec = 300 MaxTemp = 300 MaxLock = 0 Granularity = 60
[TopicMetrics] Host = msg.argo.grnet.gr Key = AMSTENANTPROJECTKEY Project = NAMEOFAMSTENANTPROJECT Topic = metricdata Bulksize = 100 MsgType = metricdata Avro = True AvroSchema = /etc/ams-publisher/metric_data.avsc Retry = 5 Timeout = 60 SleepRetry = 300 ```
[Queue_Metrics].Directory- path of directory queue on the filesystem where local cache delivery tools write results of Nagios tests/probes.[Queue_Metrics].Rate- local cache inspection rate. 10 means that cache will be inspected 10 times at a second because thats the number of status results expected from Nagios that will be picked up verly early. For low volume tenants this could be a lower number.[Queue_Metrics].Purge,PurgeEverySec,MaxTemp,MaxLock- purge the staled elements of directory queue everyPurgeEverySecseconds. It cleans the empty intermediate directories below directory queue path, temporary results that exceededMaxTemptime and locked results that exceededMaxLock. > It is advisable to leaveMaxLock = 0which skips every result that have been transformed into a message and added into in-memory queue, but had not yet been dispatched.[Queue_Metrics].Granularity- new intermediate directory in the toplevel directory queue path is created everyGranularityseconds[Topic_Metrics].Host,Key,Project,Topic- options needed for delivering of messages to ARGO Messaging system.Hostdesignates the FQDN,Keyis authorization token,Projectrepresents a tenant name andTopicis final destination scoped to tenant[Topic_Metric].Bulksize- accumulate a bulk ofBulksizemessages before dispatching them to ARGO Messaging system in one request.[Topic_Metric].MsgType- string indicating what type of message publisher is sending. It can bemetric_dataoralarm[Topic_Metrics].Retry,SleepRetry- these configures how publisher is resilient to network connection problems and ARGO Messaging system downtimes. Default5 x 300means that the publisher worker will be resilient to a minimum of1500seconds (25 min) of connection problems retrying5times and sleeping aproximately300seconds before each next attempt[Topic_Metrics].Timeout- defines the maximum amount of seconds within which request to ARGO Messaging system must be finalized. Consider raising it up for biggerBulksizenumber.
Running
Components needs to be properly configured before starting it. At least one worker must be spawned so at least one directory queue needs to exist, Nagios should deliver results to it and publisher part of the worker needs to be associated to it. Afterward, it's just a matter of starting the service.
CentOS 7 and Rocky 9:
systemctl start ams-publisher.service
Component periodically reports rates of each worker in system logs. It does so every StatsEveryHour.
2020-04-08 08:53:42 ams-publisher[963]: INFO - Periodic report (every 6.0h)
2020-04-08 08:53:42 ams-publisher[983]: INFO - ConsumerQueue metrics: consumed 45787 msgs in 6.00 hours
2020-04-08 08:53:42 ams-publisher[983]: INFO - MessagingPublisher metrics: sent 45700 msgs in 6.00 hours
2020-04-08 08:53:42 ams-publisher[1005]: INFO - ConsumerQueue metricsdevel: consumed 45787 msgs in 6.00 hours
2020-04-08 08:53:42 ams-publisher[1005]: INFO - MessagingPublisher metricsdevel: sent 45700 msgs in 6.00 hours
2020-04-08 08:53:42 ams-publisher[993]: INFO - ConsumerQueue alarms: consumed 164 msgs in 6.00 hours
2020-04-08 08:53:42 ams-publisher[993]: INFO - MessagingPublisher alarms: sent 164 msgs in 6.00 hours
Check the rates of configured workers for past minutes:
% ams-publisherd -q 60
INFO - Asked for statistics for last 60 minutes
INFO - worker:metrics published:8500
INFO - worker:alarms published:26
INFO - worker:metricsdevel published:8500
INFO - worker:metrics consumed:8588
INFO - worker:alarms consumed:26
INFO - worker:metricsdevel consumed:8587
Owner
- Name: ARGOeu
- Login: ARGOeu
- Kind: organization
- Repositories: 75
- Profile: https://github.com/ARGOeu
Α team working with the latest technologies about accounting, monitoring, messaging and eSeal Capabilities