https://github.com/alexeyev/mystem-scala

Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

https://github.com/alexeyev/mystem-scala

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.5%) to scientific vocabulary

Keywords

computational-linguistics java lemmatizer mystem natural-language-processing russian-morphology russian-specific scala tokenizer yandex
Last synced: 5 months ago · JSON representation

Repository

Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

Basic Info
  • Host: GitHub
  • Owner: alexeyev
  • License: mit
  • Language: Scala
  • Default Branch: master
  • Homepage:
  • Size: 59.6 KB
Statistics
  • Stars: 24
  • Watchers: 2
  • Forks: 16
  • Open Issues: 1
  • Releases: 0
Topics
computational-linguistics java lemmatizer mystem natural-language-processing russian-morphology russian-specific scala tokenizer yandex
Created over 10 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

A Scala wrapper for morphological analyzer Yandex.MyStem

Introduction

Details about the algorithm can be found in I. Segalovich «A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine», MLMTA-2003, Las Vegas, Nevada, USA.

The wrapper's code in under MIT license, but please remember that Yandex.MyStem is not open source and licensed under conditions of the Yandex License.

System Requirements

The wrapper should at least work on Ubuntu Linux 12.04+, Windows 7+ (+ people say it also works on OS X).

Install

Maven

Maven central

xml <dependency> <groupId>ru.stachek66.nlp</groupId> <artifactId>mystem-scala</artifactId> <version>0.1.6</version> </dependency>

Issues

Only mystem 3.{0,1} are supported currently. Please create issues for compatibility troubles and other requests.

Examples

Probably the most important thing to remember when working with mystem-scala is that you should have just one MyStem instance per mystem/mystem.exe file in your application.

Scala

```scala import java.io.File

import ru.stachek66.nlp.mystem.holding.{Factory, MyStem, Request}

object MystemSingletonScala {

val mystemAnalyzer: MyStem = new Factory("-igd --eng-gr --format json --weight") .newMyStem( "3.0", Option(new File("/home/coolguy/coolproject/3dparty/mystem"))).get() }

object AppExampleScala extends App {

MystemSingletonScala .mystemAnalyzer .analyze(Request("Есть большие пассажиры мандариновой травы")) .info .foreach(info => println(info.initial + " -> " + info.lex)) } ```

Java

```java import ru.stachek66.nlp.mystem.holding.Factory; import ru.stachek66.nlp.mystem.holding.MyStem; import ru.stachek66.nlp.mystem.holding.MyStemApplicationException; import ru.stachek66.nlp.mystem.holding.Request; import ru.stachek66.nlp.mystem.model.Info; import scala.Option; import scala.collection.JavaConversions;

import java.io.File;

public class MyStemJavaExample {

private final static MyStem mystemAnalyzer =
        new Factory("-igd --eng-gr --format json --weight")
                .newMyStem("3.0", Option.<File>empty()).get();

public static void main(final String[] args) throws MyStemApplicationException {

    final Iterable<Info> result =
            JavaConversions.asJavaIterable(
                    mystemAnalyzer
                            .analyze(Request.apply("И вырвал грешный мой язык"))
                            .info()
                            .toIterable());

    for (final Info info : result) {
        System.out.println(info.initial() + " -> " + info.lex() + " | " + info.rawResponse());
    }
}

} ```

How to Cite

The references to this repository are highly appreciated, if you use our work.

bibtex @misc{alekseev2018mystemscala, author = {Anton Alekseev}, title = {mystem-scala}, year = {2018}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/alexeyev/mystem-scala/}}, commit = {the latest commit of the codebase you have used} }

If you do cite it, please do not forget to cite the original algorithm's author's paper as well.

Contacts

Anton Alekseev anton.m.alexeyev@gmail.com

Thanks for reviews, reports and contributions

  • Vladislav Dolbilov, @darl
  • Mikhail Malchevsky
  • @anton-shirikov
  • Filipp Malkovsky
  • @dizzy7

Also please see

  • https://tech.yandex.ru/mystem/
  • https://nlpub.ru/Mystem
  • https://github.com/Digsolab/pymystem3

Owner

  • Name: Anton Alekseev
  • Login: alexeyev
  • Kind: user

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 1
  • Total pull requests: 10
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 year
  • Total issue authors: 1
  • Total pull request authors: 4
  • Average comments per issue: 2.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 7
Past Year
  • Issues: 0
  • Pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 22 minutes
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.67
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 3
Top Authors
Issue Authors
  • giuliorm (1)
Pull Request Authors
  • dependabot[bot] (10)
  • merqlove (2)
  • anton-shirikov (1)
  • dizzy7 (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (10)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 17
  • Total versions: 5
repo1.maven.org: ru.stachek66.nlp:mystem-scala

A Scala wrapper for morphological analyzer Yandex.MyStem

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 17
Rankings
Dependent repos count: 5.9%
Average: 32.2%
Forks count: 34.2%
Stargazers count: 38.5%
Dependent packages count: 50.1%
Last synced: 6 months ago

Dependencies

pom.xml maven
  • ch.qos.logback:logback-classic 1.2.3
  • com.typesafe:config 1.2.1
  • commons-io:commons-io 2.7
  • org.apache.commons:commons-compress 1.21
  • org.json:json 20140107
  • org.scala-lang:scala-library 2.13.4
  • org.slf4j:slf4j-api 1.7.25
  • junit:junit 4.13.1 test
  • org.scalatest:scalatest_2.13 3.0.9 test