pymusas

Python Multilingual Ucrel Semantic Analysis System

https://github.com/ucrel/pymusas

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 5 committers (20.0%) from academic institutions
  • Institutional organization owner
    Organization ucrel has institutional domain (ucrel.lancs.ac.uk)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary

Keywords

natural-language-processing nlp python spacy spacy-pipeline
Last synced: 6 months ago · JSON representation ·

Repository

Python Multilingual Ucrel Semantic Analysis System

Basic Info
Statistics
  • Stars: 31
  • Watchers: 8
  • Forks: 14
  • Open Issues: 20
  • Releases: 3
Topics
natural-language-processing nlp python spacy spacy-pipeline
Created over 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License Citation Roadmap

README.md

PyMUSAS

Python Multilingual Ucrel Semantic Analysis System, is a rule based token and Multi Word Expression semantic tagger. The tagger can support any semantic tagset, however the tagset we have concentrated on and released pre-configured spaCy components for is the Ucrel Semantic Analysis System (USAS).


CI License Code coverage

PyPI Version Supported Python Versions

Number of PyMUSAS PyPI downloads for the last month Launch Binder

Documentation

  • 📚 Usage Guides - What the package is, tutorials, how to guides, and explanations.
  • 🔎 API Reference - The docstrings of the library, with minimum working examples.
  • 🚀 Roadmap

Language support

PyMUSAS currently support 10 different languages with pre-configured spaCy components that can be downloaded, each language has it's own guide on how to tag text using PyMUSAS. Below we show the languages supported, if the model for that language supports Multi Word Expression (MWE) identification and tagging (all languages support token level tagging by default), and size of the model:

| Language (BCP 47 language code) | MWE Support | Size | | --- | --- | --- | | Mandarin Chinese (cmn) | :heavycheckmark: | 1.28MB | | Welsh (cy) | :heavycheckmark: | 1.09MB | | Spanish, Castilian (es) | :heavycheckmark: | 0.20MB | | Finnish (fi) | :x: | 0.63MB | | French (fr) | :x: | 0.08MB | | Indonesian (id) | :x: | 0.24MB | | Italian (it) | :heavycheckmark: | 0.50MB | | Dutch, Flemish (nl) | :x: | 0.15MB | | Portuguese (pt) | :heavycheckmark: | 0.27MB | | English (en) | :heavycheckmark: | 0.88MB |

Install PyMUSAS

Can be installed on all operating systems and supports Python version >= 3.7, to install run:

pip install pymusas

Development

When developing on the project you will want to install the Python package locally in editable format with all the extra requirements, this can be done like so:

bash pip install -e .[tests]

For a zsh shell, which is the default shell for the new Macs you will need to escape with \ the brackets:

zsh pip install -e .\[tests\]

Running linters and tests

This code base uses flake8 and mypy to ensure that the format of the code is consistent and contain type hints. The flake8 settings can be found in ./setup.cfg and the mypy settings within ./pyproject.toml. To run these linters:

bash isort pymusas tests scripts flake8 mypy

To run the tests with code coverage (NOTE these are the code coverage tests that the Continuos Integration (CI) reports at the top of this README, the doc tests are not part of this report):

bash coverage run # Runs the tests (uses pytest) coverage report # Produces a report on the test coverage

To run the doc tests, these are tests to ensure that examples within the documentation run as expected:

bash coverage run -m pytest --doctest-modules pymusas/ # Runs the doc tests coverage report # Produces a report on the doc tests coverage

Team

PyMUSAS is an open-source project that has been created and funded by the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University. For more information on who has contributed to this code base see the contributions page.

Owner

  • Name: UCREL
  • Login: UCREL
  • Kind: organization
  • Email: ucrel@lancaster.ac.uk
  • Location: Lancaster, UK

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  PyMUSAS: Python Multilingual Ucrel Semantic
  Analysis System
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Andrew
    family-names: Moore
    email: a.moore@lancaster.ac.uk
    affiliation: Lancaster University
    orcid: 'https://orcid.org/0000-0002-3395-0841'
  - given-names: Paul
    family-names: Rayson
    orcid: 'https://orcid.org/0000-0002-1257-2191'
    email: p.rayson@lancaster.ac.uk
    affiliation: Lancaster University
repository-code: 'https://github.com/ucrel/pymusas'
url: 'https://ucrel.github.io/pymusas/'
license: Apache-2.0
version: 0.3.0
date-released: '2022-04-04'

GitHub Events

Total
  • Watch event: 2
  • Issue comment event: 1
  • Pull request event: 1
  • Fork event: 2
Last Year
  • Watch event: 2
  • Issue comment event: 1
  • Pull request event: 1
  • Fork event: 2

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 397
  • Total Committers: 5
  • Avg Commits per committer: 79.4
  • Development Distribution Score (DDS): 0.05
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Andrew Moore a****4@g****m 377
Robin Long r****1@h****k 11
Paul Rayson p****n@l****k 7
Nathan Ellis Rasmussen e****n 1
Daisy Lal 1****1 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 32
  • Total pull requests: 11
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 20 hours
  • Total issue authors: 5
  • Total pull request authors: 3
  • Average comments per issue: 1.09
  • Average comments per pull request: 1.0
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 3.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • apmoore1 (26)
  • FahdCodes (3)
  • jasp9559 (1)
  • karmalet (1)
  • MarcRoigVilamala (1)
Pull Request Authors
  • apmoore1 (8)
  • longr (3)
  • eritain (1)
Top Labels
Issue Labels
low priority (11) documentation (10) enhancement (10) Potential Future Enhancement (5) license (2) bug (1)
Pull Request Labels
documentation (2) enhancement (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 272 last-month
  • Total docker downloads: 439
  • Total dependent packages: 0
  • Total dependent repositories: 3
  • Total versions: 3
  • Total maintainers: 1
pypi.org: pymusas

PYthon Multilingual Ucrel Semantic Analysis System

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 272 Last month
  • Docker Downloads: 439
Rankings
Docker downloads count: 2.4%
Dependent repos count: 9.0%
Average: 9.9%
Dependent packages count: 10.0%
Forks count: 10.5%
Stargazers count: 12.9%
Downloads: 14.4%
Maintainers (1)
Last synced: 6 months ago

Dependencies

docs/package.json npm
  • @docusaurus/core 2.0.0-beta.14
  • @docusaurus/preset-classic 2.0.0-beta.14
  • @mdx-js/react ^1.6.21
  • clsx ^1.1.1
  • prism-react-renderer ^1.2.1
  • react ^17.0.1
  • react-dom ^17.0.1
docs/yarn.lock npm
  • 1132 dependencies
benchmarks/ud_conll/requirements.txt pypi
  • conllu ==4.4.1
  • datasets ==1.18.3
  • it_core_news_sm *
dev_requirements.txt pypi
  • coverage >=6.0.0 development
  • flake8 >=3.8.0,<3.10.0 development
  • isort >=5.5.4 development
  • mypy ==0.910 development
  • pydoc-markdown >=4.0.0,<4.6.0 development
  • pytest >=6.0.0, development
  • responses >=0.16.0 development
  • types-requests * development
requirements.txt pypi
  • click <8.1.0
  • requests >=2.13.0,<3.0.0
  • spacy >=3.0
  • spacy >=3.1.4
  • srsly >=2.4.1,<3.0.0
  • tqdm >=4.50.0,<5.0.0
.github/workflows/ci.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/checkout v3 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v2.3.0 composite
  • codecov/codecov-action v2 composite
  • dieghernan/cff-validator main composite
.github/workflows/documentation.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/checkout v3 composite
  • actions/setup-node v2 composite
  • actions/setup-node v3 composite
  • actions/setup-python v2 composite
  • peaceiris/actions-gh-pages v3 composite
binder/environment.yml conda
  • pip
  • python 3.9.*
pyproject.toml pypi