semantic-tagger
A tool to add semantic tags to your text data
https://github.com/australian-text-analytics-platform/semantic-tagger
Science Score: 52.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization australian-text-analytics-platform has institutional domain (atap.edu.au) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary
Repository
A tool to add semantic tags to your text data
Basic Info
- Host: GitHub
- Owner: Australian-Text-Analytics-Platform
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 70 MB
Statistics
- Stars: 7
- Watchers: 3
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
Semantic-Tagger (English)
Abstract: with the Semantic Tagger, you can use Python Multilingual Ucrel Semantic Analysis System (PyMUSAS) to tag your text so you can extract token level semantic tags from the tagged text. PyMUSAS, is a rule based token and Multi Word Expression (MWE) semantic tagger. The tagger can support any semantic tagset, however the currently released tagset is for the UCREL Semantic Analysis System (USAS) semantic tags.
In addition to the USAS tags, you will also see the lemmas and Part-of-Speech (POS) tags in the text. For English, the tagger also identifies and tags Multi Word Expressions (MWE), i.e., expressions formed by two or more words that behave like a unit such as 'South Australia'.
Semantic Tagger User Guide
For instructions on how to use the Semantic Tagger, please refer to the Semantic Tagger User Guide.
Setup
This tool has been designed for use with minimal setup from users. You are able to run it in the cloud and any dependencies with other packages will be installed for you automatically. In order to launch and use the tool, you just need to click the below icon.
Note: CILogon authentication is required. You can use your institutional, Google or Microsoft account to login. If you have trouble authenticating, please refer to the CILogon troubleshooting guide.
If you do not have access to any of the above accounts, you can use the below link to access the tool (this is a free Binder version, limited to 2GB memory only).
It may take a few minutes for Binder to launch the notebook and install the dependencies for the tool. Please be patient.
Languages
This Semantic Tagger supports English language. For Chinese, Italian and Spanish, please visit this page or refer to the PyMUSAS GitHub page for other languages.
Load the data
| |
|
|
|
This tagger will allow you to tag text data in a text file (or a number of text files). Alternatively, you can also tag text inside a text column inside your excel spreadsheet.
Note: If you have a large number of text files (more than 10MB in total), we suggest you compress (zip) them and upload the zip file instead. If you need assistance on how to compress your file, please check the user guide.
Add Semantic Tags
Once your texts have been uploaded, you can begin to add semantic tags to the texts and analyse them using the tools included in the notebook. You can display the semantic tags, the pos-tagging and the MWE indicator for each token in a particular text, and compared them side by side with those from another text.

You can also compare the top-n statistics between texts (or all texts in the corpus) in the below charts.
Lastly, you can save the tagged texts onto a comma separated values (csv) file containing the tagged texts, or a zip of pseudo-xml (.txt) tagged text files and download it to your local computer.
csv format
pseudo-xml format
Reference
This code has been adapted from the PyMUSAS GitHub page and modified to run on a Jupyter Notebook. PyMUSAS is an open-source project that has been created and funded by the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University. For more information about PyMUSAS, please visit the Usage Guides page.
Citation
If you find the Semantic Tagger useful in your research, please cite the following:
Jufri, Sony & Sun, Chao (2022). Semantic Tagger. v1.0. Australian Text Analytics Platform. Software. https://github.com/Australian-Text-Analytics-Platform/semantic-tagger
Owner
- Name: Australian-Text-Analytics-Platform
- Login: Australian-Text-Analytics-Platform
- Kind: organization
- Website: https://atap.edu.au
- Repositories: 9
- Profile: https://github.com/Australian-Text-Analytics-Platform
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Semantic Tagger
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Sony
family-names: Jufri
email: sony.jufri@sydney.edu.au
affiliation: >-
Sydney Informatics Hub, a core research facility of
the University of Sydney
- given-names: Chao
family-names: Sun
email: chao.sun@sydney.edu.au
affiliation: >-
Sydney Informatics Hub, a core research facility of
the University of Sydney
repository-code: >-
https://github.com/Australian-Text-Analytics-Platform/semantic-tagger
abstract: >-
The Semantic Tagger is an automatic semantic analysis tool
that you can use to tag your text so you can extract token
level semantic tags from the tagged text. The tagger can
support any semantic tagset, however the currently
released tagset is for the UCREL Semantic Analysis System
(USAS) semantic tags. In addition to the USAS tags, you
will also see the lemmas and Part-of-Speech (POS) tags in
the text. For English, the tagger also identifies and tags
Multi Word Expressions (MWE), i.e., expressions formed by
two or more words that behave like a unit such as 'South
Australia'.
keywords:
- semantic tag
- semantic
- semantic tagger
license: Apache-2.0
version: '1.0'
date-released: '2022-11-22'
GitHub Events
Total
Last Year
Dependencies
- pyexcelerate *