alpino-query
Library for creating xpath queries by example for Alpino treebanks
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Repository
Library for creating xpath queries by example for Alpino treebanks
Basic Info
- Host: GitHub
- Owner: CentreForDigitalHumanities
- License: other
- Language: Python
- Default Branch: develop
- Size: 119 KB
Statistics
- Stars: 1
- Watchers: 6
- Forks: 0
- Open Issues: 2
- Releases: 3
Metadata Files
README.md
Alpino Query
bash
pip install alpino-query
When running locally without installing, instead of alpino-query use python -m alpino_query.
Parse
Parse a (tokenized) sentence using the Alpino instance running on gretel.hum.uu.nl.
For example:
bash
alpino-query parse Dit is een voorbeeldzin.
It also works when the sentence is passed as a single argument.
bash
alpino-query parse "Dit is een voorbeeldzin ."
Mark
Mark which part of the treebank should be selected for filtering. It has three inputs:
- Lassy/Alpino XML
- the tokens of the sentence
- for each token specify the properties which should be marked
For example:
bash
alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos pos pos pos"
It is also possible to mark multiple properties for a token, this is done by separating them with a comma. Each of these can also be specified to be negated. These will then be marked as 'exclude' in the tree.
bash
alpino-query mark "$(<tests/data/001.xml)" "Dit is een voorbeeldzin ." "pos pos,-word,rel pos pos pos"
Subtree
Generates a subtree containing only the marked properties. It will also contain additional attributes to mark that properties should be excluded and/or case sensitive.
The second argument can be empty, cat, rel or both (i.e. catrel or cat,rel). This indicates which attributes should be removed from the top node. When only one node is left in the subtree, this argument is ignored.
bash
alpino-query subtree "$(<tests/data/001.marked.xml)" cat
XPath
Generates an XPath to query a treebank from the generated subtree. Second argument indicates whether a query should be generated which is order-sensitive.
bash
alpino-query xpath "$(<tests/data/001.subtree.xml)" 0
Using as Module
```python from alpino_query import AlpinoQuery
tokens = ["Dit", "is", "een", "voorbeeldzin", "."] attributes = ["pos", "pos,-word,rel", "pos", "pos", "pos"]
query = AlpinoQuery() alpinoxml = query.parse(tokens) query.mark(alpinoxml, tokens, attributes) print(query.marked_xml) # query.marked contains the lxml Element
query.generatesubtree(["rel", "cat"]) print(query.subtreexml) # query.subtree contains the lxml Element
query.generate_xpath(False) # True to make order sensitive print(query.xpath) ```
Considerations
Exclusive
When querying a node this could be exclusive in multiple ways. For example:
- a node should not be a noun
node[@pos!="noun"] - it should not have a node which is a noun
not(node[@pos="noun"])
The first statement does require the existence of a node, whereas the second also holds true if there is no node at all. When a token is only exclusive (e.g. not a noun) a query of the second form will be generated, if a token has both inclusive and exclusive properties a query of the first form will be generated.
Relations
@cat and @rel are always preserved for nodes which have children. The only way for this to be dropped is for when all the children are removed by specifying the na property for the child tokens.
Upload to PyPi
bash
pip install twine
python setup.py sdist
twine upload dist/*
Credits
This was original part of the GrETEL codebase and is (still) used by its Example Based Search functionality.
- Liesbeth Augustinus and Vincent Vandeghinste: concept and initial implementation
- Bram Vanroy: GrETEL 3 improvements and design
- Sheean Spoel: rewritten in Python, moved to separate library and added some improvements
License
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (cc-by-sa-4.0). See the LICENSE file for license rights and limitations.
Owner
- Name: Centre for Digital Humanities
- Login: CentreForDigitalHumanities
- Kind: organization
- Email: cdh@uu.nl
- Location: Netherlands
- Website: https://cdh.uu.nl/
- Repositories: 39
- Profile: https://github.com/CentreForDigitalHumanities
Interdisciplinary centre for research and education in computational and data-driven methods in the humanities.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: alpino-query
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- name: >-
Research Software Lab, Centre for Digital Humanities,
Utrecht University
website: >-
https://cdh.uu.nl/centre-for-digital-humanities/research-software-lab/
city: Utrecht
country: NL
identifiers:
- type: doi
value: 10.5281/zenodo.10418666
repository-code: 'https://github.com/CentreForDigitalHumanities/alpino-query'
license: Attribution-NonCommercial-ShareAlike 4.0 International
GitHub Events
Total
- Create event: 1
- Issues event: 1
- Release event: 1
- Watch event: 1
- Delete event: 1
- Member event: 1
- Push event: 1
Last Year
- Create event: 1
- Issues event: 1
- Release event: 1
- Watch event: 1
- Delete event: 1
- Member event: 1
- Push event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Sheean Spoel | s****l@u****l | 55 |
| Bram Vanroy | b****y@h****m | 27 |
| LiesbethA | L****A | 11 |
| Ben Bonfil | b****l@u****l | 3 |
| Martijn van der Klis | M****s@u****l | 3 |
| Gerson Foks | g****s@g****m | 2 |
| Vincent Vandeghinste | v****t@c****e | 1 |
| Jelte van Boheemen | j****n@g****m | 1 |
| Bram.Vanroy@UGent.be | B****y@U****e | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 1
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 3 months
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.33
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- RuiHe1999 (1)
- bbonf (1)
Pull Request Authors
- oktaal (3)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 145 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 14
- Total maintainers: 1
pypi.org: alpino-query
Generating XPATH queries based on a Dutch Alpino syntax tree and user-specified token properties.
- Homepage: https://github.com/CentreForDigitalHumanities/alpino-query
- Documentation: https://alpino-query.readthedocs.io/
- License: CC BY-NC-SA 4.0
-
Latest release: 2.1.12
published 8 months ago
Rankings
Maintainers (1)
Dependencies
- certifi ==2022.6.15
- charset-normalizer ==2.1.0
- idna ==3.3
- lxml ==4.9.1
- requests ==2.28.1
- urllib3 ==1.26.11
- lxml *
- requests *
- actions/checkout v2 composite
- actions/setup-python v2 composite