alpino-query

Library for creating xpath queries by example for Alpino treebanks

https://github.com/centrefordigitalhumanities/alpino-query

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Library for creating xpath queries by example for Alpino treebanks

Basic Info
  • Host: GitHub
  • Owner: CentreForDigitalHumanities
  • License: other
  • Language: Python
  • Default Branch: develop
  • Size: 119 KB
Statistics
  • Stars: 1
  • Watchers: 6
  • Forks: 0
  • Open Issues: 2
  • Releases: 3
Created over 4 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

Alpino Query

DOI PyPI version Actions Status

bash pip install alpino-query

When running locally without installing, instead of alpino-query use python -m alpino_query.

Parse

Parse a (tokenized) sentence using the Alpino instance running on gretel.hum.uu.nl.

For example:

bash alpino-query parse Dit is een voorbeeldzin.

It also works when the sentence is passed as a single argument.

bash alpino-query parse "Dit is een voorbeeldzin ."

Mark

Mark which part of the treebank should be selected for filtering. It has three inputs:

  1. Lassy/Alpino XML
  2. the tokens of the sentence
  3. for each token specify the properties which should be marked

For example:

bash alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos pos pos pos"

It is also possible to mark multiple properties for a token, this is done by separating them with a comma. Each of these can also be specified to be negated. These will then be marked as 'exclude' in the tree.

bash alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos,-word,rel pos pos pos"

Subtree

Generates a subtree containing only the marked properties. It will also contain additional attributes to mark that properties should be excluded and/or case sensitive.

The second argument can be empty, cat, rel or both (i.e. catrel or cat,rel). This indicates which attributes should be removed from the top node. When only one node is left in the subtree, this argument is ignored.

bash alpino-query subtree "$(<tests/data/001.marked.xml)" cat

XPath

Generates an XPath to query a treebank from the generated subtree. Second argument indicates whether a query should be generated which is order-sensitive.

bash alpino-query xpath "$(<tests/data/001.subtree.xml)" 0

Using as Module

```python from alpino_query import AlpinoQuery

tokens = ["Dit", "is", "een", "voorbeeldzin", "."] attributes = ["pos", "pos,-word,rel", "pos", "pos", "pos"]

query = AlpinoQuery() alpinoxml = query.parse(tokens) query.mark(alpinoxml, tokens, attributes) print(query.marked_xml) # query.marked contains the lxml Element

query.generatesubtree(["rel", "cat"]) print(query.subtreexml) # query.subtree contains the lxml Element

query.generate_xpath(False) # True to make order sensitive print(query.xpath) ```

Considerations

Exclusive

When querying a node this could be exclusive in multiple ways. For example:

  • a node should not be a noun node[@pos!="noun"]
  • it should not have a node which is a noun not(node[@pos="noun"])

The first statement does require the existence of a node, whereas the second also holds true if there is no node at all. When a token is only exclusive (e.g. not a noun) a query of the second form will be generated, if a token has both inclusive and exclusive properties a query of the first form will be generated.

Relations

@cat and @rel are always preserved for nodes which have children. The only way for this to be dropped is for when all the children are removed by specifying the na property for the child tokens.

Upload to PyPi

bash pip install twine python setup.py sdist twine upload dist/*

Credits

This was original part of the GrETEL codebase and is (still) used by its Example Based Search functionality.

License

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (cc-by-sa-4.0). See the LICENSE file for license rights and limitations.

Owner

  • Name: Centre for Digital Humanities
  • Login: CentreForDigitalHumanities
  • Kind: organization
  • Email: cdh@uu.nl
  • Location: Netherlands

Interdisciplinary centre for research and education in computational and data-driven methods in the humanities.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: alpino-query
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
 - name: >-
      Research Software Lab, Centre for Digital Humanities,
      Utrecht University
    website: >-
      https://cdh.uu.nl/centre-for-digital-humanities/research-software-lab/
    city: Utrecht
    country: NL
identifiers:
  - type: doi
    value: 10.5281/zenodo.10418666
repository-code: 'https://github.com/CentreForDigitalHumanities/alpino-query'
license: Attribution-NonCommercial-ShareAlike 4.0 International

GitHub Events

Total
  • Create event: 1
  • Issues event: 1
  • Release event: 1
  • Watch event: 1
  • Delete event: 1
  • Member event: 1
  • Push event: 1
Last Year
  • Create event: 1
  • Issues event: 1
  • Release event: 1
  • Watch event: 1
  • Delete event: 1
  • Member event: 1
  • Push event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 104
  • Total Committers: 9
  • Avg Commits per committer: 11.556
  • Development Distribution Score (DDS): 0.471
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Sheean Spoel s****l@u****l 55
Bram Vanroy b****y@h****m 27
LiesbethA L****A 11
Ben Bonfil b****l@u****l 3
Martijn van der Klis M****s@u****l 3
Gerson Foks g****s@g****m 2
Vincent Vandeghinste v****t@c****e 1
Jelte van Boheemen j****n@g****m 1
Bram.Vanroy@UGent.be B****y@U****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 1
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 3 months
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.33
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • RuiHe1999 (1)
  • bbonf (1)
Pull Request Authors
  • oktaal (3)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 145 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 14
  • Total maintainers: 1
pypi.org: alpino-query

Generating XPATH queries based on a Dutch Alpino syntax tree and user-specified token properties.

  • Versions: 14
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 145 Last month
Rankings
Dependent packages count: 10.1%
Dependent repos count: 21.6%
Downloads: 22.2%
Average: 24.5%
Forks count: 29.8%
Stargazers count: 38.8%
Maintainers (1)
Last synced: 7 months ago

Dependencies

requirements.txt pypi
  • certifi ==2022.6.15
  • charset-normalizer ==2.1.0
  • idna ==3.3
  • lxml ==4.9.1
  • requests ==2.28.1
  • urllib3 ==1.26.11
setup.py pypi
  • lxml *
  • requests *
.github/workflows/python-package.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite