venn

venn: set operations with a command line shell script

https://github.com/sixarm/venn

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary
Last synced: 8 months ago · JSON representation ·

Repository

venn: set operations with a command line shell script

Basic Info
  • Host: GitHub
  • Owner: SixArm
  • Language: Shell
  • Default Branch: main
  • Homepage: http://sixarm.com
  • Size: 38.1 KB
Statistics
  • Stars: 10
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 8 years ago · Last pushed about 1 year ago
Metadata Files
Readme Code of conduct Citation Codeowners

README.md

venn: set operations with a command line shell script

Realm

The venn command does set operations on the shell command line, for example to process text files and do set union, set intersection, etc.

Introduction

Script: venn

Syntax

Syntax:

venn (union|intersection|...) <input> ...

Syntax example:

venn union file-1 file-2

Set operations

Set operations that venn can process:

  • union: A ∪ B (lines that are in any input stream)

  • intersection: A ∩ B (lines that are in all input streams)

  • difference: A ⊕ B (lines that are in one input stream)

  • except: A - B (lines that are solely in the first input steam)

  • extra: B - A (lines that are solely in the last input stream)

  • joint: is any line in more than one input stream?

  • disjoint: is each line in exactly one input stream?

Options

Options on the command line:

  • -h --help: show help

  • -v --version: show version

Examples

Examples use these two example data files:

$ cat a
red
green

$ cat b
red
blue

Union:

$ venn union a b
red
green
blue

Intersection:

$ venn intersection a b
red

Difference:

$ venn difference a b
green
blue

Except:

$ venn except a b
green

Extra:

$ venn extra a b
blue

Disjoint:

$ venn disjoint a b
false

Install

Venn is one shell script, and you install it by putting the script anywhere in your path.

preflight

Verify that you have the awk command, such as:

$ awk --version

We target GNU Awk:

GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.1, GNU MP 6.1.2)

If you have a different Awk, then venn should still work fine.

git

To install via git:

git clone https://github.com/SixArm/venn.git
cp venn/bin/venn /usr/local/bin/venn

curl

To install via curl:

curl -fsSL https://raw.githubusercontent.com/SixArm/venn/master/bin/venn > /usr/local/bin/venn
chmod +x /usr/local/bin/venn

We want to create typical packages, such as for Debian apt, RedHat yum, macOS brew, etc. If you're a developer, and want to create packages, then we welcome your help.

Set operations details

Union

Set theory operation (A union B).

Print lines that are in any of the input streams.

Also known as "logical or", "logical inclusive disjunction".

Synonyms:

  • union

  • u (letter u)

  • (U+222A union)

  • (U+2228 logical or)

  • + (U+002B plus sign)

  • & (U+0026 ampersand)

  • or

Example:

$ venn union a b c
$ venn or a b c
=> print lines that are in any of a, b, c

Intersection

Set theory operation (A intersection B).

Print lines that are in all of the input streams.

Also known as "logical and", "logical conjunction".

Synonyms:

  • intersection

  • i (letter i)

  • (U+2229 intersection)

  • (U+2227 logical and)

  • | (U+007C vertical line)

  • and

Example:

$ venn intersection a b c
$ venn and a b c
=> print lines that are in all of a, b, c

Difference

Set theory operation (A symmetric difference B).

Print lines that are in one of the input streams.

Also known as "logical xor", "logical exclusive disjuntion".

Synonyms:

  • difference

  • d (letter d)

  • (U+2295 circled plus)

  • (U+2206 increment)

  • Δ (U+0394 delta)

  • (U+22BB logical xor)

  • xor

Examples:

$ venn difference a b c
$ venn xor a b c
=> print lines that are in one of a, b, c

Except a.k.a. First

Set operation (A except B) a.k.a. (A - B)

Print lines that are solely in the first input.

Synonyms:

  • except

  • first

  • sub

  • subtract

  • subtraction

  • - (U+2212 minus sign)

Examples:

$ venn except a b c
$ venn first a b c
=> print lines that are in a, not b, c

Extra a.k.a. Last

Set theory operation (A extra B) a.k.a. (B - A).

The lines that are solely in the last input.

Synonyms:

  • extra

  • last

Examples:

$ venn extra a b c
$ venn last a b c
=> print lines that are in c, not a, b

Joint

Set operation is (A joint B).

Do any of the input streams have any overlap i.e. any lines in common?

If so, print $TRUE and exit 0, otherwise $FALSE and exit 1.

Synonyms:

  • joint

  • codependent

Examples:

$ venn joint a b c
$ venn codependent a b c
=> print "true" if any of a, b, c, have any lines in common
=> print "false" otherwise

Disjoint

Set operation is (A disjoint B).

Do all of the input streams have no overlap i.e. no lines in common?

If so, print $TRUE and exit 0, otherwise $FALSE and exit 1.

Also known as "pairwise disjoint", "mutually disjoint".

Synonyms:

  • disjoint

  • independent

Examples:

$ venn disjoint a b c
$ venn independent a b c
=> print "true" if all of a, b, c, have no lines in common
=> print "false" otherwise

Customization

Custom output for true or false

The joint operation and the disjoint operation produce output that is either true or false.

Example:

$ venn joint a b
true

$ venn disjoint a b
false

You can customize the output text by using environment variables:

$ TRUE=yes FALSE=no venn joint a b
yes

We like to customize the output text by using environment variables and the Unicode symbols (U+22A4 down tack) and (U+22A5 up tack) like this:

$ TRUE=⊤ FALSE=⊥ venn joint a b
⊤

Implemenation

This command is currently implemented using awk and POSIX.

The goal is to maximize usability on a wide range of Unix systems, including older systems, and pure POSIX systems.

TODO

Ideas to implement:

  • Add a "--help" option?

  • Add a way to automatically do unique?

  • Add exception handling, such as if an input stream is not unique?

Want to help? We welcome help. You can open a GitHub issue, or send a GitHub pull request, or email us at sixarm@sixarm.com.

References

Documentation:

  • Benchmarks: Benchmarks of millions of lines of data, such as random unsorted data.
  • Comparisons: Comparisons to other implementations, such as Unix/POSIX shell scripts.

See also:

Contributors, advisors, thanks:

Tracking

  • Program: venn
  • Version: 4.3.0
  • Created: 2017-01-30
  • Updated: 2018-06-01
  • License: GPL
  • Contact: Joel Parker Henderson (joel@joelparkerhenderson.com)

Owner

  • Name: SixArm
  • Login: SixArm
  • Kind: organization
  • Email: sixarm@sixarm.com
  • Location: San Francisco

SixArm Software

Citation (CITATION.cff)

cff-version: 1.2.0
title: venn: set operations with a command line shell script
message: >-
  If you use this work and you want to cite it,
  then you can use the metadata from this file.
type: software
authors:
  - given-names: Joel Parker
    family-names: Henderson
    email: joel@joelparkerhenderson.com
    affiliation: joelparkerhenderson.com
    orcid: 'https://orcid.org/0009-0000-4681-282X'
identifiers:
  - type: url
    value: 'https://github.com/SixArm/venn/'
    description: venn: set operations with a command line shell script
repository-code: 'https://github.com/SixArm/venn/'
abstract: >-
  venn: set operations with a command line shell script
license: See license file

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Committers

Last synced: 12 months ago

All Time
  • Total Commits: 35
  • Total Committers: 1
  • Avg Commits per committer: 35.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Joel Parker Henderson j****l@j****m 35
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels