flowd

Network flow and packet marking service written in Python

https://github.com/scitags/flowd

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 4 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary

Keywords

flow labels network packet science-research tags
Last synced: 6 months ago · JSON representation ·

Repository

Network flow and packet marking service written in Python

Basic Info
  • Host: GitHub
  • Owner: scitags
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 157 KB
Statistics
  • Stars: 6
  • Watchers: 3
  • Forks: 3
  • Open Issues: 1
  • Releases: 8
Topics
flow labels network packet science-research tags
Created over 4 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License Citation Authors

README.md

Network flow and packet marking service

flowd is a network flow and packet marking service developed in Python (based on the technical specification of the Scitags project). It provides a pluggable system for testing different flow and packet marking strategies, using plugins to retrieve flow identifiers and a set of backends to implement the actual marking. In the base case, flowd is used to tag packets or network flows for a third party system/process (storage or transfer service). It uses plugins to identify the network flows and determine which science domain and activity to use, and backends to determine how exactly these flows should be tagged.

The framework is extensible and can be used to implement other use cases. Plugins and backends can be combined to create complex functionality. There is also an initial support for systemd/journal integration (Linux daemon), Docker image and development environment, STUN/TURN IP detection and configuration.

The following plugins are currently available: - np_api - simple API which accepts flow identifiers via named pipe - firefly - listens for incoming UDP firefly packets and uses information contained in the packet as flow identifier - netlink - scans the network connections on a host (using netlink/ss) and generates a flow identifier with the science domain as provided in the configuration file (only fixed/partial tagging) - netstat - scans the network connections on a host (using netstat) and generates a flow identifier with the science domain provided in the configuration (only fixed/partial tagging). - iperf - clone of the netstat plugin, which only scans for iperf3 connections

The following backends are currently available: - udp-firefly - implementation of the UDP firefly packets. - ebpf - implementation of the IPv6 packet marking (encodes science domain and activity in the IPv6 flow label, see technical spec for details) - prometheus - exposes all network flows seen by flowd (including science domain and activity fields) via prometheus client API

This project is in beta stage. It has been tested on RHEL8 and 9 compatible systems and Ubuntu 22.04. It requires kernel 4.4+ to run packet marking (ebpf backends).

Installation

Containerised version of the flowd is currently provided as the only official distribution (work is in progress to support package distribution for RHEL8/9 compatible systems). Please checkout the repo to start.

flowd image can be build with the following command: ```shell

docker build -t : .

```

and can be started with: ```shell

docker run --privileged --network=host -d [-v //flowd:/usr/src/app]

        [-v <config_path>:/etc/flowd] <image>:<tag>  

```

Before staring the service an initial configuration needs to be mounted under /etc/flowd/flowd.cfg (please check an example etc/flowd.cfg) or alternatively as part of the localforkpath. To start the service run: ```shell

sbin/flowd --debug [-c ]

```

To run flowd as a systemd service, you can install it as any other python package and integrate it via systemd as follows: ```shell

python setup.py install

cp etc/flowd.cfg /etc/flowd/flowd-tags.cfg (edit the config)

cp etc/flowd@.service /etc/systemd/system/ (systemd directory can be os specific)

systemctl daemon-reload

systemctl enable flowd@tags

systemctl start flowd@tags

```

Documentation

flowd command synopsis: ```shell

sbin/flowd --help

usage: flowd [-h] [--version] [-c CONFIG] [-d] [-f]

flowd - flow and packet marking daemon

optional arguments: -h, --help show this help message and exit --version show program's version number and exit -c CONFIG, --config CONFIG Specify path of the configuration file (defaults to /etc/flowd/flowd.cfg) -d, --debug Debug mode **flowd** configuration file is a simple list of key-value pairs, the following parameters are mandatory: python PLUGIN='netstat' BACKEND='udpfirefly,prometheus' FLOWMAPAPI='' ``` - PLUGIN - plugin that should be enabled (only one plugin can be specified) - BACKEND - comma separated list of backends to enable - FLOWMAP_API - URL with a list of science domains and activities (see technical spec for json schema and examples)

Plugins Reference

NP_API

Function

This plugin creates a named pipe which can be used to submit information about a network flow from another process. It expects a space separated entry followed by a new line (one line per flow) with the following syntax: shell state prot src_ip src_port dst_ip dst_port exp act Notes: The named pipe is opened as non-blocking, so it should be safe to write from multiple processes. It is expected that third party process will check existance of the file (named pipe), checks if it's a pipe and tries to write a sample message before sending any production markings. Sending sample message to a non-connected pipe should end up with an exception, which can be used to determine if flowd is running/listening or not.

Parameters

shell NP_API_FILE - controlled via settings only

Defaults

shell NP_API_FILE=/var/run/flowd

Example

```shell A sample interaction would like this:

echo "start tcp 192.168.0.1 2345 192.168.0.2 5777 1 2" > /var/run/flowd

echo "end tcp 192.168.0.1 2345 192.168.0.2 5777 1 2" > /var/run/flowd

```

FIREFLY

Function

This plugin listens for incoming UDP fireflies, parses the information it contains and creates a new event which is sent to the backend(s).

Parameters

shell FIREFLY_LISTENER_HOST - host to be used to open listen socket FIREFLY_LISTENER_PORT - port

Defaults

shell FIREFLY_LISTENER_HOST="0.0.0.0" FIREFLY_LISTENER_PORT=10514

NETSTAT

Function

This plugin scans existing network connections on the host using netstat, creates event for each and assigns it the configured science domain and activity.

Parameters

shell NETSTAT_EXPERIMENT - science domain id to use (for all events) NETSTAT_ACTIVITY - activity id (for all events) NETSTAT_INTERNAL_NETWORKS - list of destination networks to ignore NETSTAT_TIMEOUT - time to wait between scans (poll time) - only via settings

Defaults

shell NETSTAT_TIMEOUT=2

Example

shell NETSTAT_EXPERIMENT=1 NETSTAT_ACTIVITY=1 NETSTAT_INTERNAL_NETWORKS=('192.168.0.0/16')

NETLINK

Function

This plugin scans existing network connections on the host using netlink, creates event for each and assigns it the configured science domain and activity.

Parameters

shell NETLINK_EXPERIMENT - science domain id to use (for all events) NETLINK_ACTIVITY - activity id (for all events) NETLINK_INTERNAL_NETWORKS - list of destination networks to ignore (see netstat) NETLINK_TIMEOUT - time to wait between scans (poll time) - only via settings

Defaults

shell NETLINK_TIMEOUT=2

IPERF

Function

This plugin scans existing network connections on the host using netstat, filters iperf connections, creates event for each and assigns them science domain and activity from a pre-configured set at random (This plugin is experimental).

Parameters

shell IPERF_FLOW_ID - list of tuples with science domain and activity IPERF_INTERNAL_NETWORKS - list of destination networks to ignore (see netstat)

Defaults

shell NETSTAT_TIMEOUT=2

Backends Reference

UDP_FIREFLY

Function

This backend implements UDP firefly flow marking. For each event triggered by a plugin it sends a UDP packet with the information about the corresponding science domain and activity.

Parameters

shell IP_DISCOVERY_ENABLED - attempts to detect source IP via STUN server (can be tuned via settings). IP discovered will be used as a source in the UDP firefly metadata UDP_FIREFLY_IP4_SRC - IPv4 address to be used as a source in the UDP firefly metadata UDP_FIREFLY_IP6_SRC - IPv6 address to be used as a source in the UDP firefly metadata UDP_FIREFLY_NETLINK - add netlink information (scan connections via netlink, retrieve information for particular connection and add it to the UDP packet).

EPBFEL8/EBPFEL9

Function

This backend implements packet marking, it uses eBPF-TC to encode science domain and activity in the IPv6 packet flow label field. Note that this backend requires kernel headers, bcc (ebpf libs) and a working compiler as it will compile and load eBPF-TC program in the kernel. ebpfel8 backend should work on RHEL8 compatible systems; ebpfel9 backend should work on RHEL9 compatible systems. Both backends require at least kernel 4.4+ to work.

Parameters

shell NETWORK_INTERFACE - network interface on which eBPF-TC program should be loaded (required)

Example

shell NETWORK_INTERFACE='eth0'

PROMETHEUS

Function

This backend exposes network flows visible to flowd to Prometheus. It will attempt to fetch all related netlink data for each network flow via ss command line tool and will show them alongside science domain and activity.

Parameters

shell PROMETHEUS_SS_PATH - path to ss command PROMETHEUS_SRV_PORT - port where Prometheus client should listen

Example

shell PROMETHEUS_SS_PATH='/usr/sbin/ss' PROMETHEUS_SRV_PORT=9000

Example

The following is a sample output from Prometheus client: ```shell

All metrics start with flow_tcp followed by corresponding netlink metric

(metrics depend on the kernel version, ss tool version and TCP options enabled)

The following labels are populated:

exp - science domain

act - activity

src/dst - source/destination IPs

flow - flow src/dst ports

opts - TCP/IP options used

flowtcpskmemrmemalloc{act="cache",dst="",exp="cms",flow="int:int",src=""} 0.0 flowtcpskmemrcvbuf{act="cache",dst="",exp="cms",flow="int:int",src=""} 131072.0 flowtcpskmemwmemallow{act="cache",dst="",exp="cms",flow="int:int",src=""} 0.0 flowtcpskmemsndbuf{act="cache",dst="",exp="cms",flow="int:int",src=""} 5.778432e+06 flowtcpskmemfwdalloc{act="cache",dst="",exp="cms",flow="int:int",src=""} 0.0 flowtcpskmemwmemqueued{act="cache",dst="",exp="cms",flow="int:int",src=""} 0.0 flowtcpskmemoptmem{act="cache",dst="",exp="cms",flow="int:int",src=""} 0.0 flowtcpskmembacklog{act="cache",dst="",exp="cms",flow="int:int",src=""} 0.0 flowtcpskmemsockdrop{act="cache",dst="",exp="cms",flow="int:int",src=""} 0.0 flowtcprto{act="cache",dst="",exp="cms",flow="int:int",src=""} 201.0 flowtcprtt{act="cache",dst="",exp="cms",flow="int:int",src=""} 0.361 flowtcprttvar{act="cache",dst="",exp="cms",flow="int:int",src=""} 0.091 flowtcpato{act="cache",dst="",exp="cms",flow="int:int",src=""} 40.0 flowtcpmss{act="cache",dst="",exp="cms",flow="int:int",src=""} 65464.0 flowtcppmtu{act="cache",dst="",exp="cms",flow="int:int",src=""} 65536.0 flowtcprcvmss{act="cache",dst="",exp="cms",flow="int:int",src=""} 536.0 flowtcpadvmss{act="cache",dst="",exp="cms",flow="int:int",src=""} 65464.0 flowtcpcwnd{act="cache",dst="",exp="cms",flow="int:int",src=""} 47.0 flowtcpssthresh{act="cache",dst="",exp="cms",flow="int:int",src=""} 21.0 flowtcpbytessenttotal{act="cache",dst="",exp="cms",flow="int:int",src=""} 1.61221034e+09 flowtcpbytesackedtotal{act="cache",dst="",exp="cms",flow="int:int",src=""} 1.61221034e+09 flowtcpbytesreceivedtotal{act="cache",dst="",exp="cms",flow="int:int",src=""} 5212.0 flowtcpsegsouttotal{act="cache",dst="",exp="cms",flow="int:int",src=""} 26162.0 flowtcpsegsintotal{act="cache",dst="",exp="cms",flow="int:int",src=""} 4946.0 flowtcpdatasegsouttotal{act="cache",dst="",exp="cms",flow="int:int",src=""} 26153.0 flowtcpdatasegsintotal{act="cache",dst="",exp="cms",flow="int:int",src=""} 195.0 flowtcpsend{act="cache",dst="",exp="cms",flow="int:int",src=""} 6.8184110803e+010 flowtcplastsnd{act="cache",dst="",exp="cms",flow="int:int",src=""} 38.0 flowtcplastrcv{act="cache",dst="",exp="cms",flow="int:int",src=""} 43.0 flowtcplastack{act="cache",dst="",exp="cms",flow="int:int",src=""} 38.0 flowtcppacingrate{act="cache",dst="",exp="cms",flow="int:int",src=""} 8.1792611416e+010 flowtcpdeliveryrate{act="cache",dst="",exp="cms",flow="int:int",src=""} 2.8156559136e+010 flowtcpdelivered{act="cache",dst="",exp="cms",flow="int:int",src=""} 26154.0 flowtcpbusy{act="cache",dst="",exp="cms",flow="int:int",src=""} 645.0 flowtcprwndlimited{act="cache",dst="",exp="cms",flow="int:int",src=""} 1.0 flowtcprcvspace{act="cache",dst="",exp="cms",flow="int:int",src=""} 65464.0 flowtcprcvssthresh{act="cache",dst="",exp="cms",flow="int:int",src=""} 65464.0 flowtcpminrtt{act="cache",dst="",exp="cms",flow="int:int",src=""} 0.007 flowtcpsndwnd{act="cache",dst="",exp="cms",flow="int:int",src=""} 2.0119552e+07 flowtcpcainfo{act="cache",dst="",exp="cms",flow="int:int",opts="ts sack cubic wscale:14,14",src=""} 1.0 ```

Contribution Guide

Extending flowd can be done either by implementing a new plugin or a new backend. The core plugin interface requires two methods: ```python def init(): # used to run any initialisation required (check config sanity, etc.)

def run(flowqueue, termevent, ipconfig): # Implements a particular way to identify network flow and assigns it a
# science domain and ctivity. # Sends scitags.FlowID object via flow
queue to the backend(s) # # Parameters: # flowqueue - pub/sub queue used to transmit scitags.FlowID to the backends # termevent - termination event object # ipconfig - host IP configuration (ip4 and ip6 addresses) # while not termevent.isset(): # implementation to determine flowid (see scitags.FlowID for details) flowid = scitags.FlowID(flowstate, proto, src, srcport, dst, dstport, expid, activityid, starttime, endtime, netlink) flowqueue.put(flowid)

```

Backend is expected to parse the information contained in the FlowID object and implement a mechanism how to pass it to the network layer. python def run(flow_queue, term_event, flow_map, ip_config): # Implements a particular network flow or packet tagging mechanism # Uses scitags.FlowID object to identify the flow, its science domain and activity # while not term_event.is_set(): try: flow_id = flow_queue.get(block=True, timeout=0.5) except queue.Empty: continue # implementation

Owner

  • Name: Scientific Network Tags
  • Login: scitags
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Babik"
  given-names: "Marian"
  orcid: "https://orcid.org/0000-0003-4141-5635"
- family-names: "Sullivan"
  given-names: "Tristan"
title: "flowd"
version: 1.0.2
date-released: 2023-04-14
url: "https://github.com/scitags/flowd"

GitHub Events

Total
  • Release event: 2
  • Watch event: 1
  • Delete event: 2
  • Push event: 4
  • Create event: 3
Last Year
  • Release event: 2
  • Watch event: 1
  • Delete event: 2
  • Push event: 4
  • Create event: 3

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 103
  • Total Committers: 4
  • Avg Commits per committer: 25.75
  • Development Distribution Score (DDS): 0.379
Past Year
  • Commits: 71
  • Committers: 4
  • Avg Commits per committer: 17.75
  • Development Distribution Score (DDS): 0.493
Top Committers
Name Email Commits
Marian Babik m****k@c****h 64
Marian Babik m****k 31
Tristan Sullivan t****v@u****a 5
root r****t@e****h 3
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 11
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 hours
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: about 4 hours
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • fitsie007 (1)
Pull Request Authors
  • marian-babik (12)
  • expuss2000 (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 34 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 8
  • Total maintainers: 1
pypi.org: flowd

Flow and Packet Marking Service

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 13 Last month
Rankings
Dependent packages count: 7.0%
Forks count: 17.3%
Average: 20.8%
Stargazers count: 28.4%
Dependent repos count: 30.5%
Maintainers (1)
Last synced: 6 months ago
pypi.org: scitags

Flow and Packet Marking Service

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 21 Last month
Rankings
Dependent packages count: 10.3%
Average: 34.2%
Dependent repos count: 58.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • psutil *
  • psutil ==5.6.7
  • python-daemon ==2.2.4
  • requests *
  • systemd-python *
setup.py pypi
  • python-daemon *
  • python2-psutil *
  • python2-requests *
  • systemd-python *
.github/workflows/pythonpackage.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
Dockerfile docker
  • python 3.9.6-slim build