graphtage

A semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV.

https://github.com/trailofbits/graphtage

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary

Keywords

command-line-tool diff graph-algorithms hacktoberfest hacktoberfest2021 library python utility

Keywords from Contributors

mesh sequences interactive hacking
Last synced: 6 months ago · JSON representation ·

Repository

A semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV.

Basic Info
  • Host: GitHub
  • Owner: trailofbits
  • License: lgpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 8.09 MB
Statistics
  • Stars: 2,422
  • Watchers: 50
  • Forks: 49
  • Open Issues: 26
  • Releases: 14
Topics
command-line-tool diff graph-algorithms hacktoberfest hacktoberfest2021 library python utility
Created almost 6 years ago · Last pushed 6 months ago
Metadata Files
Readme License Code of conduct Citation Codeowners

README.md

Graphtage

PyPI version Tests Slack Status

Graphtage is a command-line utility and underlying library for semantically comparing and merging tree-like structures, such as JSON, XML, HTML, YAML, plist, and CSS files. Its name is a portmanteau of “graph” and “graftage”—the latter being the horticultural practice of joining two trees together such that they grow as one.

console $ echo Original: && cat original.json && echo Modified: && cat modified.json json Original: { "foo": [1, 2, 3, 4], "bar": "testing" } Modified: { "foo": [2, 3, 4, 5], "zab": "testing", "woo": ["foobar"] } console $ graphtage original.json modified.json json { "z̟b̶ab̟r̶": "testing", "foo": [ 1̶,̶ 2, 3, 4,̟ 5̟ ],̟ "̟w̟o̟o̟"̟:̟ ̟[̟ "̟f̟o̟o̟b̟a̟r̟"̟ ]̟ }

Installation

console $ pip3 install graphtage

Command Line Usage

Output Formatting

Graphtage performs an analysis on an intermediate representation of the trees that is divorced from the filetypes of the input files. This means, for example, that you can diff a JSON file against a YAML file. Also, the output format can be different from the input format(s). By default, Graphtage will format the output diff in the same file format as the first input file. But one could, for example, diff two JSON files and format the output in YAML. There are several command-line arguments to specify these transformations, such as --format; please check the --help output for more information.

By default, Graphtage pretty-prints its output with as many line breaks and indents as possible. json { "foo": [ 1, 2, 3 ], "bar": "baz" } Use the --join-lists or -jl option to suppress linebreaks after list items: json { "foo": [1, 2, 3], "bar": "baz" } Likewise, use the --join-dict-items or -jd option to suppress linebreaks after key/value pairs in a dict: json {"foo": [ 1, 2, 3 ], "bar": "baz"} Use --condensed or -j to apply both of these options: json {"foo": [1, 2, 3], "bar": "baz"}

The --only-edits or -e option will print out a list of edits rather than applying them to the input file in place.

The --edit-digest or -d option is like --only-edits but prints a more concise context for each edit that is more human-readable.

Matching Options

By default, Graphtage tries to match all possible pairs of elements in a dictionary.

Matching two dictionaries with each other is hard. Although computationally tractable, this can sometimes be onerous for input files with huge dictionaries. Graphtage has three different strategies for matching dictionaries: 1. --dict-strategy match (the most computationally expensive) tries to match all pairs of keys and values between the two dictionaries, resulting in a match of minimum edit distance; 2. --dict-strategy none (the least computationally expensive) will not attempt to match any key/value pairs unless they have the exact same key; and 3. --dict-strategy auto (the default) will automatically match the values of any key-value pairs that have identical keys and then use the match strategy for the remainder of key/value pairs.

See Pull Request #51 for some examples of how these strategies affect output.

The --no-list-edits or -l option will not consider interstitial insertions and removals when comparing two lists. The --no-list-edits-when-same-length or -ll option is a less drastic version of -l that will behave normally for lists that are of different lengths but behave like -l for lists that are of the same length.

ANSI Color

By default, Graphtage will only use ANSI color in its output if it is run from a TTY. If, for example, you would like to have Graphtage emit colorized output from a script or pipe, use the --color or -c argument. To disable color even when running on a TTY, use --no-color.

HTML Output

Graphtage can optionally emit the diff in HTML with the --html option. console $ graphtage --html original.json modified.json > diff.html

Status and Logging

By default, Graphtage prints status messages and a progress bar to STDERR. To suppress this, use the --no-status option. To additionally suppress all but critical log messages, use --quiet. Fine-grained control of log messages is via the --log-level option.

Why does Graphtage exist?

Diffing tree-like structures with unordered elements is tough. Say you want to compare two JSON files. There are limited tools available, which are effectively equivalent to canonicalizing the JSON (e.g., sorting dictionary elements by key) and performing a standard diff. This is not always sufficient. For example, if a key in a dictionary is changed but its value is not, a traditional diff will conclude that the entire key/value pair was replaced by the new one, even though the only change was the key itself. See our documentation for more information.

Using Graphtage as a Library

Graphtage has a complete API for programmatically operating its diffing capabilities. When using Graphtage as a library, it is also capable of diffing in-memory Python objects. This can be useful for debugging Python code, for example, to determine a differential between two objects. See our documentation for more information.

Extending Graphtage

Graphtage is designed to be extensible: New filetypes can easily be defined, as well as new node types, edit types, formatters, and printers. See our documentation for more information.

Complete API documentation is available here.

License and Acknowledgements

This research was developed by Trail of Bits with partial funding from the Defense Advanced Research Projects Agency (DARPA) under the SafeDocs program as a subcontractor to Galois. It is licensed under the GNU Lesser General Public License v3.0. Contact us if you're looking for an exception to the terms. © 2020–2023, Trail of Bits.

Owner

  • Name: Trail of Bits
  • Login: trailofbits
  • Kind: organization
  • Email: opensource@trailofbits.com
  • Location: New York, New York

More code: binary lifters @lifting-bits, blockchain @crytic, forks @trail-of-forks

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Graphtage
message: >-
  Graphtage is a command-line utility and underlying library
  for semantically comparing and merging tree-like
  structures, such as JSON, XML, HTML, YAML, plist, and CSS
  files.
type: software
authors:
  - given-names: Evan
    family-names: Sultanik
    email: evan.sultanik@trailofbits.com
    affiliation: Trail of Bits
    orcid: 'https://orcid.org/0000-0002-6246-1422'
repository-code: 'https://github.com/trailofbits/graphtage'
url: 'https://trailofbits.github.io/graphtage/'
abstract: >-
  Graphtage is a command-line utility and underlying library
  for semantically comparing and merging tree-like
  structures, such as JSON, XML, HTML, YAML, plist, and CSS
  files. Its name is a portmanteau of “graph” and
  “graftage”—the latter being the horticultural practice of
  joining two trees together such that they grow as one.
keywords:
  - diffing
  - graph isomorphism
  - edit distance
license: LGPL-3.0

GitHub Events

Total
  • Issues event: 2
  • Watch event: 64
  • Delete event: 3
  • Push event: 6
  • Pull request event: 3
  • Fork event: 4
  • Create event: 4
Last Year
  • Issues event: 2
  • Watch event: 64
  • Delete event: 3
  • Push event: 6
  • Pull request event: 3
  • Fork event: 4
  • Create event: 4

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 532
  • Total Committers: 11
  • Avg Commits per committer: 48.364
  • Development Distribution Score (DDS): 0.056
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Evan Sultanik e****k@t****m 502
dependabot[bot] 4****] 9
Nicholas Bollweg n****g@g****m 8
William Woodruff w****m@t****m 6
c00k133 a****b@n****v 1
Loïc Lengrand 4****5 1
James Olds j****s@t****m 1
IroncladLandship j****r@g****m 1
Ernie Hershey g****b@e****g 1
Artem Dinaburg a****m@d****g 1
Brad Larsen b****n@t****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 38
  • Total pull requests: 51
  • Average time to close issues: 15 days
  • Average time to close pull requests: 16 days
  • Total issue authors: 27
  • Total pull request authors: 12
  • Average comments per issue: 0.89
  • Average comments per pull request: 0.31
  • Merged pull requests: 46
  • Bot issues: 0
  • Bot pull requests: 12
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ESultanik (7)
  • multimeric (2)
  • RCoeurjoly (2)
  • chrisfw (2)
  • technopagan (2)
  • dkasak (1)
  • SamWilsn (1)
  • jberger (1)
  • NPN (1)
  • kslgrd (1)
  • paulzhn (1)
  • martijnthe (1)
  • Vad1mo (1)
  • AdaRoseCannon (1)
  • Hideman85 (1)
Pull Request Authors
  • ESultanik (28)
  • dependabot[bot] (18)
  • woodruffw (4)
  • bollwyvl (1)
  • ZombieNub (1)
  • oldsj (1)
  • bradlarsen (1)
  • artemdinaburg (1)
  • c00k133 (1)
  • loic5 (1)
  • dguido (1)
  • ehershey (1)
  • CarlQLange (1)
Top Labels
Issue Labels
enhancement (8) bug (7) good first issue (3) help wanted (1)
Pull Request Labels
enhancement (19) dependencies (18) bug (9) github_actions (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 254 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 29
  • Total maintainers: 2
proxy.golang.org: github.com/trailofbits/graphtage
  • Versions: 14
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 7.0%
Average: 8.2%
Dependent repos count: 9.3%
Last synced: 6 months ago
pypi.org: graphtage

A utility to diff tree-like files such as JSON and XML.

  • Versions: 15
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 254 Last month
Rankings
Stargazers count: 1.5%
Forks count: 6.0%
Dependent packages count: 9.8%
Average: 9.8%
Downloads: 10.0%
Dependent repos count: 21.9%
Maintainers (2)
Last synced: 6 months ago

Dependencies

setup.py pypi
  • PyYAML *
  • colorama *
  • intervaltree *
  • json5 ==0.9.5
  • numpy >=1.19.4
  • scipy >=1.4.0
  • tqdm *
  • typing_extensions >=3.7.4.3
.github/workflows/artifacts.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • softprops/action-gh-release v0.1.15 composite
.github/workflows/check_version.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/pip-audit.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • pypa/gh-action-pip-audit v1.0.5 composite
.github/workflows/publish_docs.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • ad-m/github-push-action master composite
.github/workflows/pythonpackage.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/pythonpublish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite