semantic-tagger

A tool to add semantic tags to your text data

https://github.com/australian-text-analytics-platform/semantic-tagger

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization australian-text-analytics-platform has institutional domain (atap.edu.au)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A tool to add semantic tags to your text data

Basic Info
  • Host: GitHub
  • Owner: Australian-Text-Analytics-Platform
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 70 MB
Statistics
  • Stars: 7
  • Watchers: 3
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Created over 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

Semantic-Tagger (English)

Abstract: with the Semantic Tagger, you can use Python Multilingual Ucrel Semantic Analysis System (PyMUSAS) to tag your text so you can extract token level semantic tags from the tagged text. PyMUSAS, is a rule based token and Multi Word Expression (MWE) semantic tagger. The tagger can support any semantic tagset, however the currently released tagset is for the UCREL Semantic Analysis System (USAS) semantic tags.

In addition to the USAS tags, you will also see the lemmas and Part-of-Speech (POS) tags in the text. For English, the tagger also identifies and tags Multi Word Expressions (MWE), i.e., expressions formed by two or more words that behave like a unit such as 'South Australia'.

Semantic Tagger User Guide

For instructions on how to use the Semantic Tagger, please refer to the Semantic Tagger User Guide.

Setup

This tool has been designed for use with minimal setup from users. You are able to run it in the cloud and any dependencies with other packages will be installed for you automatically. In order to launch and use the tool, you just need to click the below icon.

Binder

Note: CILogon authentication is required. You can use your institutional, Google or Microsoft account to login. If you have trouble authenticating, please refer to the CILogon troubleshooting guide.

If you do not have access to any of the above accounts, you can use the below link to access the tool (this is a free Binder version, limited to 2GB memory only).

Binder

It may take a few minutes for Binder to launch the notebook and install the dependencies for the tool. Please be patient.

Languages

This Semantic Tagger supports English language. For Chinese, Italian and Spanish, please visit this page or refer to the PyMUSAS GitHub page for other languages.

Load the data

This tagger will allow you to tag text data in a text file (or a number of text files). Alternatively, you can also tag text inside a text column inside your excel spreadsheet.

Note: If you have a large number of text files (more than 10MB in total), we suggest you compress (zip) them and upload the zip file instead. If you need assistance on how to compress your file, please check the user guide.

Add Semantic Tags

Once your texts have been uploaded, you can begin to add semantic tags to the texts and analyse them using the tools included in the notebook. You can display the semantic tags, the pos-tagging and the MWE indicator for each token in a particular text, and compared them side by side with those from another text.


You can also compare the top-n statistics between texts (or all texts in the corpus) in the below charts.

Lastly, you can save the tagged texts onto a comma separated values (csv) file containing the tagged texts, or a zip of pseudo-xml (.txt) tagged text files and download it to your local computer.

csv format

pseudo-xml format

Reference

This code has been adapted from the PyMUSAS GitHub page and modified to run on a Jupyter Notebook. PyMUSAS is an open-source project that has been created and funded by the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University. For more information about PyMUSAS, please visit the Usage Guides page.

Citation

If you find the Semantic Tagger useful in your research, please cite the following:

Jufri, Sony & Sun, Chao (2022). Semantic Tagger. v1.0. Australian Text Analytics Platform. Software. https://github.com/Australian-Text-Analytics-Platform/semantic-tagger

Owner

  • Name: Australian-Text-Analytics-Platform
  • Login: Australian-Text-Analytics-Platform
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Semantic Tagger
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Sony
    family-names: Jufri
    email: sony.jufri@sydney.edu.au
    affiliation: >-
      Sydney Informatics Hub, a core research facility of
      the University of Sydney
  - given-names: Chao
    family-names: Sun
    email: chao.sun@sydney.edu.au
    affiliation: >-
      Sydney Informatics Hub, a core research facility of
      the University of Sydney
repository-code: >-
  https://github.com/Australian-Text-Analytics-Platform/semantic-tagger
abstract: >-
  The Semantic Tagger is an automatic semantic analysis tool
  that you can use to tag your text so you can extract token
  level semantic tags from the tagged text. The tagger can
  support any semantic tagset, however the currently
  released tagset is for the UCREL Semantic Analysis System
  (USAS) semantic tags. In addition to the USAS tags, you
  will also see the lemmas and Part-of-Speech (POS) tags in
  the text. For English, the tagger also identifies and tags
  Multi Word Expressions (MWE), i.e., expressions formed by
  two or more words that behave like a unit such as 'South
  Australia'.
keywords:
  - semantic tag
  - semantic
  - semantic tagger
license: Apache-2.0
version: '1.0'
date-released: '2022-11-22'

GitHub Events

Total
Last Year

Dependencies

environment.yml pypi
  • pyexcelerate *