Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 13 committers (7.7%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Last synced: 8 months ago
·
JSON representation
Repository
Tree edit distance using the Zhang Shasha algorithm
Basic Info
Statistics
- Stars: 457
- Watchers: 12
- Forks: 65
- Open Issues: 6
- Releases: 0
Created over 15 years ago
· Last pushed over 5 years ago
Metadata Files
Readme
License
README.rst
Zhang-Shasha: Tree edit distance in Python
------------------------------------------
.. image:: https://travis-ci.org/timtadh/zhang-shasha.svg?branch=master
:target: https://travis-ci.org/timtadh/zhang-shasha
The ``zss`` module provides a function (``zss.distance``) that
computes the edit distance between the two given trees, as well as a small set
of utilities to make its use convenient.
If you'd like to learn more about how it works, see References below.
Brought to you by Tim Henderson (tim.tadh@gmail.com).
`Read the full documentation for more information.
`_
Installation
------------
You can get ``zss`` and its soft requirements (
``editdist`` and ``numpy`` >= 1.7) from PyPI::
pip install zss
Both modules are optional. ``editdist`` uses string edit distance to
compare node labels rather than a simple equal/not-equal check, and
``numpy`` significantly speeds up the library. The only reason version
1.7 of ``numpy`` is required is that earlier versions have trouble
installing on current versions of Mac OS X.
You can install ``zss`` from the source code without dependencies in the
usual way::
python setup.py install
If you want to build the docs, you'll need to install Sphinx >= 1.0.
Usage
-----
To compare the distance between two trees, you need:
1. A tree.
2. Another tree.
3. A node-node distance function. By default, ``zss`` compares the edit
distance between the nodes' labels. ``zss`` currently only knows how
to handle nodes with string labels.
4. Functions to let ``zss.distance`` traverse your tree.
Here is an example using the library's built-in default node structure and edit
distance function
.. code-block:: python
from zss import simple_distance, Node
A = (
Node("f")
.addkid(Node("a")
.addkid(Node("h"))
.addkid(Node("c")
.addkid(Node("l"))))
.addkid(Node("e"))
)
B = (
Node("f")
.addkid(Node("a")
.addkid(Node("d"))
.addkid(Node("c")
.addkid(Node("b"))))
.addkid(Node("e"))
)
assert simple_distance(A, B) == 2
Specifying Custom Tree Formats
------------------------------
Specifying custom tree formats and distance metrics is easy. The
``zss.simple_distance`` function takes 3 extra parameters besides the two tree
to compare:
1. ``get_children`` - a function to retrieve a list of children from a node.
2. ``get_label`` - a function to retrieve the label object from a node.
3. ``label_dist`` - a function to compute the non-negative integer distance
between two node labels.
Example
^^^^^^^
.. code-block:: python
#!/usr/bin/env python
import zss
try:
from editdist import distance as strdist
except ImportError:
def strdist(a, b):
if a == b:
return 0
else:
return 1
def weird_dist(A, B):
return 10*strdist(A, B)
class WeirdNode(object):
def __init__(self, label):
self.my_label = label
self.my_children = list()
@staticmethod
def get_children(node):
return node.my_children
@staticmethod
def get_label(node):
return node.my_label
def addkid(self, node, before=False):
if before: self.my_children.insert(0, node)
else: self.my_children.append(node)
return self
A = (
WeirdNode("f")
.addkid(WeirdNode("d")
.addkid(WeirdNode("a"))
.addkid(WeirdNode("c")
.addkid(WeirdNode("b"))
)
)
.addkid(WeirdNode("e"))
)
B = (
WeirdNode("f")
.addkid(WeirdNode("c")
.addkid(WeirdNode("d")
.addkid(WeirdNode("a"))
.addkid(WeirdNode("b"))
)
)
.addkid(WeirdNode("e"))
)
dist = zss.simple_distance(
A, B, WeirdNode.get_children, WeirdNode.get_label, weird_dist)
print dist
assert dist == 20
References
----------
The algorithm used by ``zss`` is taken directly from the original paper by
Zhang and Shasha. If you would like to discuss the paper, or the the tree edit
distance problem (we have implemented a few other algorithms as well) please
email the authors.
`approxlib `_ by Dr. Nikolaus Augstent
contains a good Java implementation of Zhang-Shasha as well as a number of
other useful tree distance algorithms.
`Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 18:1245–1262, 1989.`__ (the original paper)
__ http://www.grantjenks.com/wiki/_media/ideas:simple_fast_algorithms_for_the_editing_distance_between_tree_and_related_problems.pdf
`Slide deck overview of Zhang-Shasha `_
`Another paper describing Zhang-Shasha `_
Owner
- Name: Tim Henderson
- Login: timtadh
- Kind: user
- Company: @google
- Website: http://hackthology.com
- Repositories: 86
- Profile: https://github.com/timtadh
Hacker / Software Engineering and Data Mining Researcher from CWRU. Now at Google. keybase.io/tadh
GitHub Events
Total
- Watch event: 22
- Fork event: 3
Last Year
- Watch event: 22
- Fork event: 3
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Tim Henderson | t****h@g****m | 64 |
| Stephen Johnson | s****5@c****u | 50 |
| Steve Johnson | s****e@g****m | 20 |
| Stephen Johnson | s****e@s****m | 18 |
| Erick Fonseca | e****a@g****m | 5 |
| Daniel Radetsky | d****y@g****m | 3 |
| Kyle E. Mitchell | k****e@k****m | 2 |
| mikeazo | u****e@g****m | 1 |
| Néstor Arocha Rodríguez | n****o@g****m | 1 |
| Daniel Omar Vergara Pérez | d****a@g****m | 1 |
| Arne Skjærholt | a****t@g****m | 1 |
| Xiaojun Bao | a****o@t****m | 1 |
| Gustavo Sousa | g****o@y****r | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 33
- Total pull requests: 26
- Average time to close issues: 2 months
- Average time to close pull requests: 11 days
- Total issue authors: 26
- Total pull request authors: 11
- Average comments per issue: 2.33
- Average comments per pull request: 2.5
- Merged pull requests: 21
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: 6 minutes
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- irskep (6)
- Daniel63656 (2)
- yasodajayaweera (2)
- dsinghvi (1)
- smola (1)
- petrbel (1)
- aestebanlansaque (1)
- borgr (1)
- SamWilsn (1)
- any333 (1)
- davide1995 (1)
- guludo (1)
- mahsa7823 (1)
- adamalfian (1)
- afcruzs (1)
Pull Request Authors
- irskep (8)
- timtadh (7)
- erickrf (2)
- dradetsky (2)
- kemitchell (1)
- danvergara (1)
- arnsholt (1)
- nesaro (1)
- mikeazo (1)
- agnesbao (1)
- guludo (1)
Top Labels
Issue Labels
help wanted (3)
good first issue (2)
bug (2)
feature request (1)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 325,420 last-month
- Total docker downloads: 98
- Total dependent packages: 3
- Total dependent repositories: 35
- Total versions: 6
- Total maintainers: 1
pypi.org: zss
Tree edit distance using the Zhang Shasha algorithm
- Homepage: https://www.github.com/timtadh/zhang-shasha
- Documentation: https://zss.readthedocs.io/
- License: other
-
Latest release: 1.2.0
published about 8 years ago
Rankings
Downloads: 2.0%
Dependent packages count: 2.3%
Dependent repos count: 2.5%
Average: 3.2%
Stargazers count: 3.2%
Docker downloads count: 3.6%
Forks count: 5.4%
Maintainers (1)
Last synced:
8 months ago
Dependencies
dev-requirements.txt
pypi
- pytest * development
- tox * development
docs/requirements.txt
pypi
- smartypants ==1.8.2
- sphinx-better-theme ==0.1.4
requirements.txt
pypi
- editdistance *
- nose >=1.1.2
- numpy >=1.7
- six >=1.8
setup.py
pypi
- six *