zss

Tree edit distance using the Zhang Shasha algorithm

https://github.com/timtadh/zhang-shasha

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 13 committers (7.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary
Last synced: 8 months ago · JSON representation

Repository

Tree edit distance using the Zhang Shasha algorithm

Basic Info
  • Host: GitHub
  • Owner: timtadh
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 151 KB
Statistics
  • Stars: 457
  • Watchers: 12
  • Forks: 65
  • Open Issues: 6
  • Releases: 0
Created over 15 years ago · Last pushed over 5 years ago
Metadata Files
Readme License

README.rst

Zhang-Shasha: Tree edit distance in Python
------------------------------------------

.. image:: https://travis-ci.org/timtadh/zhang-shasha.svg?branch=master
    :target: https://travis-ci.org/timtadh/zhang-shasha

The ``zss`` module provides a function (``zss.distance``) that
computes the edit distance between the two given trees, as well as a small set
of utilities to make its use convenient.

If you'd like to learn more about how it works, see References below.

Brought to you by Tim Henderson (tim.tadh@gmail.com).

`Read the full documentation for more information.
`_

Installation
------------

You can get ``zss`` and its soft requirements (
``editdist`` and ``numpy`` >= 1.7) from PyPI::

    pip install zss

Both modules are optional. ``editdist`` uses string edit distance to
compare node labels rather than a simple equal/not-equal check, and
``numpy`` significantly speeds up the library. The only reason version
1.7 of ``numpy`` is required is that earlier versions have trouble
installing on current versions of Mac OS X.

You can install ``zss`` from the source code without dependencies in the
usual way::

    python setup.py install

If you want to build the docs, you'll need to install Sphinx >= 1.0.

Usage
-----

To compare the distance between two trees, you need:

1. A tree.
2. Another tree.
3. A node-node distance function. By default, ``zss`` compares the edit
   distance between the nodes' labels. ``zss`` currently only knows how
   to handle nodes with string labels.
4. Functions to let ``zss.distance`` traverse your tree.

Here is an example using the library's built-in default node structure and edit
distance function

.. code-block:: python

    from zss import simple_distance, Node

    A = (
        Node("f")
            .addkid(Node("a")
                .addkid(Node("h"))
                .addkid(Node("c")
                    .addkid(Node("l"))))
            .addkid(Node("e"))
        )
    B = (
        Node("f")
            .addkid(Node("a")
                .addkid(Node("d"))
                .addkid(Node("c")
                    .addkid(Node("b"))))
            .addkid(Node("e"))
        )
    assert simple_distance(A, B) == 2


Specifying Custom Tree Formats
------------------------------

Specifying custom tree formats and distance metrics is easy. The
``zss.simple_distance`` function takes 3 extra parameters besides the two tree
to compare:

1. ``get_children`` - a function to retrieve a list of children from a node.
2. ``get_label`` - a function to retrieve the label object from a node.
3. ``label_dist`` - a function to compute the non-negative integer distance
   between two node labels.

Example
^^^^^^^

.. code-block:: python

    #!/usr/bin/env python

    import zss

    try:
        from editdist import distance as strdist
    except ImportError:
        def strdist(a, b):
            if a == b:
                return 0
            else:
                return 1

    def weird_dist(A, B):
        return 10*strdist(A, B)

    class WeirdNode(object):

        def __init__(self, label):
            self.my_label = label
            self.my_children = list()

        @staticmethod
        def get_children(node):
            return node.my_children

        @staticmethod
        def get_label(node):
            return node.my_label

        def addkid(self, node, before=False):
            if before:  self.my_children.insert(0, node)
            else:   self.my_children.append(node)
            return self

    A = (
    WeirdNode("f")
        .addkid(WeirdNode("d")
        .addkid(WeirdNode("a"))
        .addkid(WeirdNode("c")
            .addkid(WeirdNode("b"))
        )
        )
        .addkid(WeirdNode("e"))
    )
    B = (
    WeirdNode("f")
        .addkid(WeirdNode("c")
        .addkid(WeirdNode("d")
            .addkid(WeirdNode("a"))
            .addkid(WeirdNode("b"))
        )
        )
        .addkid(WeirdNode("e"))
    )

    dist = zss.simple_distance(
        A, B, WeirdNode.get_children, WeirdNode.get_label, weird_dist)

    print dist
    assert dist == 20


References
----------

The algorithm used by ``zss`` is taken directly from the original paper by
Zhang and Shasha. If you would like to discuss the paper, or the the tree edit
distance problem (we have implemented a few other algorithms as well) please
email the authors.

`approxlib `_ by Dr. Nikolaus Augstent
contains a good Java implementation of Zhang-Shasha as well as a number of
other useful tree distance algorithms.

`Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 18:1245–1262, 1989.`__ (the original paper)

__ http://www.grantjenks.com/wiki/_media/ideas:simple_fast_algorithms_for_the_editing_distance_between_tree_and_related_problems.pdf

`Slide deck overview of Zhang-Shasha `_

`Another paper describing Zhang-Shasha `_

Owner

  • Name: Tim Henderson
  • Login: timtadh
  • Kind: user
  • Company: @google

Hacker / Software Engineering and Data Mining Researcher from CWRU. Now at Google. keybase.io/tadh

GitHub Events

Total
  • Watch event: 22
  • Fork event: 3
Last Year
  • Watch event: 22
  • Fork event: 3

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 168
  • Total Committers: 13
  • Avg Commits per committer: 12.923
  • Development Distribution Score (DDS): 0.619
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Tim Henderson t****h@g****m 64
Stephen Johnson s****5@c****u 50
Steve Johnson s****e@g****m 20
Stephen Johnson s****e@s****m 18
Erick Fonseca e****a@g****m 5
Daniel Radetsky d****y@g****m 3
Kyle E. Mitchell k****e@k****m 2
mikeazo u****e@g****m 1
Néstor Arocha Rodríguez n****o@g****m 1
Daniel Omar Vergara Pérez d****a@g****m 1
Arne Skjærholt a****t@g****m 1
Xiaojun Bao a****o@t****m 1
Gustavo Sousa g****o@y****r 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 33
  • Total pull requests: 26
  • Average time to close issues: 2 months
  • Average time to close pull requests: 11 days
  • Total issue authors: 26
  • Total pull request authors: 11
  • Average comments per issue: 2.33
  • Average comments per pull request: 2.5
  • Merged pull requests: 21
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: 6 minutes
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • irskep (6)
  • Daniel63656 (2)
  • yasodajayaweera (2)
  • dsinghvi (1)
  • smola (1)
  • petrbel (1)
  • aestebanlansaque (1)
  • borgr (1)
  • SamWilsn (1)
  • any333 (1)
  • davide1995 (1)
  • guludo (1)
  • mahsa7823 (1)
  • adamalfian (1)
  • afcruzs (1)
Pull Request Authors
  • irskep (8)
  • timtadh (7)
  • erickrf (2)
  • dradetsky (2)
  • kemitchell (1)
  • danvergara (1)
  • arnsholt (1)
  • nesaro (1)
  • mikeazo (1)
  • agnesbao (1)
  • guludo (1)
Top Labels
Issue Labels
help wanted (3) good first issue (2) bug (2) feature request (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 325,420 last-month
  • Total docker downloads: 98
  • Total dependent packages: 3
  • Total dependent repositories: 35
  • Total versions: 6
  • Total maintainers: 1
pypi.org: zss

Tree edit distance using the Zhang Shasha algorithm

  • Versions: 6
  • Dependent Packages: 3
  • Dependent Repositories: 35
  • Downloads: 325,420 Last month
  • Docker Downloads: 98
Rankings
Downloads: 2.0%
Dependent packages count: 2.3%
Dependent repos count: 2.5%
Average: 3.2%
Stargazers count: 3.2%
Docker downloads count: 3.6%
Forks count: 5.4%
Maintainers (1)
Last synced: 8 months ago

Dependencies

dev-requirements.txt pypi
  • pytest * development
  • tox * development
docs/requirements.txt pypi
  • smartypants ==1.8.2
  • sphinx-better-theme ==0.1.4
requirements.txt pypi
  • editdistance *
  • nose >=1.1.2
  • numpy >=1.7
  • six >=1.8
setup.py pypi
  • six *