Recent Releases of huspacy

huspacy - huspacy-v0.9.0

Changed

  • Added support for new models (hu_core_news_md-v3.5.2, hu_core_news_lg-v3.5.2, hu_core_news_trf_xl-v3.5.2, hu_core_news_trf_xl-v3.5.2)
  • Updated documentation with benepar usage and the noun chunking

- Python
Published by oroszgy about 3 years ago

huspacy - huspacy-v0.8.1

Fixed

  • Replace bogus transformer model versions with fixed ones (hu_core_news_trf_xl-v3.5.1, hu_core_news_trf_xl-v3.5.1)

- Python
Published by oroszgy about 3 years ago

huspacy - huspacy-v0.8.0

Fixed

  • Applied an edit-tree lemmatizer fix, based on explosion/spaCy#12017 ### New
  • Added support for new models (hu_core_news_md-v3.5.1, hu_core_news_lg-v3.5.1, hu_core_news_trf_xl-v3.5.0, hu_core_news_trf_xl-v3.5.1)

- Python
Published by oroszgy about 3 years ago

huspacy - huspacy-v0.7.0

New

  • Added support for new models (hu_core_news_md-v3.5.0, hu_core_news_lg-v3.5.0, hu_core_news_trf_xl-v3.4.0)
  • Updated documentation

- Python
Published by oroszgy over 3 years ago

huspacy - huspacy-v0.6.0

New

  • Added a lookup component for sentiment lexicons
  • Added integration for novakat's onpp NER model (nerpp)
  • Added support for new models (hu_core_news_trf-v3.4.0, hu_core_news_md-v3.4.2, hu_core_news_lg-v3.4.4)

- Python
Published by oroszgy over 3 years ago

huspacy - huspacy-v0.5.1

- Python
Published by oroszgy over 3 years ago

huspacy - hu_core_ud_lg-0.3.1

Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse, named entity tags and lemmata.

Feature | Description -- | -- Name | hucoreud_lg Version | 0.3.1 spaCy | >=2.2.1 Model size | 1360 MB Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer, ner Vectors | 1140008 unique vectors (300 dimensions) Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia, Hunnerwiki, Szeged NER corpora License | CC BY-NC-SA 4.0

Pipeline details

  | Vectors | Tokenizer | Sentencizer | Tagger | Parser | Lemmatizer | NER -- | -- | -- | -- | -- | -- | -- | -- | Model | Word2Vec CBOW dim=300 minfreq=10 | Rule-based implemented in SpaCy | Rule-based | Multi-task CNN | Multi-task CNN | Lemmy (CST-like) | CNN Training data | Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus | - | - | CONLL'17 training data | CONLL'17 training data | UD converted Szeged Korpusz | Hunnerwiki, Szeged NER Business & Criminal Test data | Hungarian analogical questions | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | Szeged NER Business & Criminal Accuracy | ACC 20.95 | F1 99.89 | F1 96.97| ACC 94.81 | UAS 76.18 LAS 66.58 | ACC 95.51 | F1 93.95

- Python
Published by oroszgy over 6 years ago

huspacy - hu_core_ud_lg-0.3.0

Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse, named entity tags and lemmata.

Feature | Description -- | -- Name | hucoreud_lg Version | 0.3.0 spaCy | >=2.1.8 Model size | 1360 MB Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer, ner Vectors | 1140008 unique vectors (300 dimensions) Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia, Hunnerwiki, Szeged NER corpora License | CC BY-NC-SA 4.0

Pipeline details

  | Vectors | Tokenizer | Sentencizer | Tagger | Parser | Lemmatizer | NER -- | -- | -- | -- | -- | -- | -- | -- | Model | Word2Vec CBOW dim=300 minfreq=10 | Rule-based implemented in SpaCy | Rule-based | Multi-task CNN | Multi-task CNN | Lemmy (CST-like) | CNN Training data | Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus | - | - | CONLL'17 training data | CONLL'17 training data | UD converted Szeged Korpusz | Hunnerwiki, Szeged NER Business & Criminal Test data | Hungarian analogical questions | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | Szeged NER Business & Criminal Accuracy | ACC 20.95 | F1 99.89 | F1 96.97| ACC 94.91 | UAS 75.73 LAS 66.16 | ACC 95.49 | F1 93.95

- Python
Published by oroszgy over 6 years ago

huspacy - hu_core_ud_lg-0.2.0

Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse and lemmata.

Feature | Description -- | -- Name | hucoreud_lg Version | 0.2.0 spaCy | >=2.1.0 Model size | 1360 MB Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer Vectors | 1140008 unique vectors (300 dimensions) Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia License | CC BY-NC-SA 4.0

Pipeline details

  | Vectors | Tokenizer | Sentencizer | Tagger | Parser | Lemmatizer -- | -- | -- | -- | -- | -- | -- | Model | Word2Vec CBOW dim=300 minfreq=10 | Rule-based implemented in SpaCy | Rule-based | Multi-task CNN | multi-task CNN | Lemmy (CST-like) Training data | Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus | - | - | CONLL'17 training data | CONLL'17 training data | UD converted Szeged Korpusz Test data | Hungarian analogical questions | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data Accuracy | ACC 20.95 | F1 99.89 | F1 96.97| ACC 94.82 | UAS 78.02 LAS 67.92 | ACC 95.60

- Python
Published by oroszgy about 7 years ago

huspacy - hu_core_ud_lg-0.1.0

Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse and lemmata.

Feature | Description -- | -- Name | hucoreud_lg Version | 0.1.0 spaCy | >=2.0.0 Model size | 1350 MB Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer Vectors | 1140008 unique vectors (300 dimensions) Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia License | CC BY-NC-SA 4.0

Pipeline details

  | Vectors | Tokenizer | Sentencizer | Tagger | Parser | Lemmatizer -- | -- | -- | -- | -- | -- | -- | Model | Word2Vec CBOW dim=300 minfreq=10 | Rule-based implemented in SpaCy | Rule-based | Multi-task CNN | multi-task CNN | Lemmy (CST-like) Training data | Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus | - | - | CONLL'17 training data | CONLL'17 training data | UD converted Szeged Korpusz Test data | Hungarian analogical questions | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data Accuracy | ACC 20.95 | F1 99.88 | F1 96.64| ACC 95.11 | UAS 77.52 LAS 68.45 | ACC 95.60

- Python
Published by oroszgy over 7 years ago

huspacy - Hungarian tagger and vocabulary model with vectors (medium)

Baseline tagger and parser from Universal dependencies + vocabulary and word vector model generated from the Hungarian Webcorpus and Wikipedia

Feature | Description ------- | ------------ Tagger | 98.23 ACC trained/tested on the Szeged Corpus (Universal Morphology transcript) Word vectors | word2vec bow with 150 dimensions, generated from the Hungarian Webcorpus and Wikipedia Brown clusters | 1024 clusters generated from the Hungarian Webcorpus and Wikipedia

- Python
Published by oroszgy almost 9 years ago

huspacy - Hungarian parser, tagger and vocabulary model with vectors (medium)

Baseline tagger and parser from Universal dependencies + vocabulary and word vector model generated from the Hungarian Webcorpus and Wikipedia

Feature | Description ------- | ------------ Tagger | 93.95 ACC trained/tested on Universal dependencies corpus Parser | 75.12 UAS and 64.85 LAS trained/tested on Universal dependencies corpus Word vectors | word2vec bow with 150 dimensions, generated from the Hungarian Webcorpus and Wikipedia Brown clusters | 1024 clusters generated from the Hungarian Webcorpus and Wikipedia

- Python
Published by oroszgy almost 9 years ago

huspacy - Hungarian vocabulary model with vectors (medium)

Vocabulary and word vector model trained on the Hungarian Webcorpus and Wikipedia

Feature | Description ------- | ------------ Corpora | Hungarian Webcorpus, Hungarian Wikipedia Word vectors | 150 dimension, word2vec Brown clusters | 1024

- Python
Published by oroszgy almost 9 years ago

huspacy - Hungarian vocabulary model with vectors (large)

Vocabulary and word vector model trained on the Hungarian Webcorpus and Wikipedia

Feature | Description ------- | ------------ Corpora | Hungarian Webcorpus, Hungarian Wikipedia Word vectors | 300 dimension, word2vec Brown clusters | 1024

- Python
Published by oroszgy almost 9 years ago