Recent Releases of huspacy
huspacy - huspacy-v0.9.0
Changed
- Added support for new models (
hu_core_news_md-v3.5.2,hu_core_news_lg-v3.5.2,hu_core_news_trf_xl-v3.5.2,hu_core_news_trf_xl-v3.5.2) - Updated documentation with
beneparusage and the noun chunking
- Python
Published by oroszgy about 3 years ago
huspacy - huspacy-v0.8.1
Fixed
- Replace bogus transformer model versions with fixed ones (
hu_core_news_trf_xl-v3.5.1,hu_core_news_trf_xl-v3.5.1)
- Python
Published by oroszgy about 3 years ago
huspacy - huspacy-v0.8.0
Fixed
- Applied an edit-tree lemmatizer fix, based on explosion/spaCy#12017 ### New
- Added support for new models (
hu_core_news_md-v3.5.1,hu_core_news_lg-v3.5.1,hu_core_news_trf_xl-v3.5.0,hu_core_news_trf_xl-v3.5.1)
- Python
Published by oroszgy about 3 years ago
huspacy - huspacy-v0.7.0
New
- Added support for new models (
hu_core_news_md-v3.5.0,hu_core_news_lg-v3.5.0,hu_core_news_trf_xl-v3.4.0) - Updated documentation
- Python
Published by oroszgy over 3 years ago
huspacy - huspacy-v0.6.0
New
- Added a lookup component for sentiment lexicons
- Added integration for novakat's onpp NER model (
nerpp) - Added support for new models (
hu_core_news_trf-v3.4.0,hu_core_news_md-v3.4.2,hu_core_news_lg-v3.4.4)
- Python
Published by oroszgy over 3 years ago
huspacy - hu_core_ud_lg-0.3.1
Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse, named entity tags and lemmata.
Feature | Description -- | -- Name | hucoreud_lg Version | 0.3.1 spaCy | >=2.2.1 Model size | 1360 MB Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer, ner Vectors | 1140008 unique vectors (300 dimensions) Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia, Hunnerwiki, Szeged NER corpora License | CC BY-NC-SA 4.0
Pipeline details
| Vectors | Tokenizer | Sentencizer | Tagger | Parser | Lemmatizer | NER
-- | -- | -- | -- | -- | -- | -- | -- |
Model | Word2Vec CBOW dim=300 minfreq=10 | Rule-based implemented in SpaCy | Rule-based | Multi-task CNN | Multi-task CNN | Lemmy (CST-like) | CNN
Training data | Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus | - | - | CONLL'17 training data | CONLL'17 training data | UD converted Szeged Korpusz | Hunnerwiki, Szeged NER Business & Criminal
Test data | Hungarian analogical questions | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | Szeged NER Business & Criminal
Accuracy | ACC 20.95 | F1 99.89 | F1 96.97| ACC 94.81 | UAS 76.18 LAS 66.58 | ACC 95.51 | F1 93.95
- Python
Published by oroszgy over 6 years ago
huspacy - hu_core_ud_lg-0.3.0
Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse, named entity tags and lemmata.
Feature | Description -- | -- Name | hucoreud_lg Version | 0.3.0 spaCy | >=2.1.8 Model size | 1360 MB Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer, ner Vectors | 1140008 unique vectors (300 dimensions) Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia, Hunnerwiki, Szeged NER corpora License | CC BY-NC-SA 4.0
Pipeline details
| Vectors | Tokenizer | Sentencizer | Tagger | Parser | Lemmatizer | NER
-- | -- | -- | -- | -- | -- | -- | -- |
Model | Word2Vec CBOW dim=300 minfreq=10 | Rule-based implemented in SpaCy | Rule-based | Multi-task CNN | Multi-task CNN | Lemmy (CST-like) | CNN
Training data | Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus | - | - | CONLL'17 training data | CONLL'17 training data | UD converted Szeged Korpusz | Hunnerwiki, Szeged NER Business & Criminal
Test data | Hungarian analogical questions | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | Szeged NER Business & Criminal
Accuracy | ACC 20.95 | F1 99.89 | F1 96.97| ACC 94.91 | UAS 75.73 LAS 66.16 | ACC 95.49 | F1 93.95
- Python
Published by oroszgy over 6 years ago
huspacy - hu_core_ud_lg-0.2.0
Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse and lemmata.
Feature | Description -- | -- Name | hucoreud_lg Version | 0.2.0 spaCy | >=2.1.0 Model size | 1360 MB Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer Vectors | 1140008 unique vectors (300 dimensions) Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia License | CC BY-NC-SA 4.0
Pipeline details
| Vectors | Tokenizer | Sentencizer | Tagger | Parser | Lemmatizer
-- | -- | -- | -- | -- | -- | -- |
Model | Word2Vec CBOW dim=300 minfreq=10 | Rule-based implemented in SpaCy | Rule-based | Multi-task CNN | multi-task CNN | Lemmy (CST-like)
Training data | Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus | - | - | CONLL'17 training data | CONLL'17 training data | UD converted Szeged Korpusz
Test data | Hungarian analogical questions | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data
Accuracy | ACC 20.95 | F1 99.89 | F1 96.97| ACC 94.82 | UAS 78.02 LAS 67.92 | ACC 95.60
- Python
Published by oroszgy about 7 years ago
huspacy - hu_core_ud_lg-0.1.0
Hungarian multi-task CNN trained on Universal Dependencies data. Assigns context-specific token vectors, Brown cluster IDs, word probabilities, POS tags, dependency parse and lemmata.
Feature | Description -- | -- Name | hucoreud_lg Version | 0.1.0 spaCy | >=2.0.0 Model size | 1350 MB Pipeline | tokenizer, sentencizer, tagger, parser, lemmatizer Vectors | 1140008 unique vectors (300 dimensions) Sources | Universal Dependencies, Szeged Corpus, Web Corpus, Wikipedia License | CC BY-NC-SA 4.0
Pipeline details
| Vectors | Tokenizer | Sentencizer | Tagger | Parser | Lemmatizer
-- | -- | -- | -- | -- | -- | -- |
Model | Word2Vec CBOW dim=300 minfreq=10 | Rule-based implemented in SpaCy | Rule-based | Multi-task CNN | multi-task CNN | Lemmy (CST-like)
Training data | Wikipedia dump (2017-04-21)) and the Hungarian Webcorpus | - | - | CONLL'17 training data | CONLL'17 training data | UD converted Szeged Korpusz
Test data | Hungarian analogical questions | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data | CONLL'17 test data
Accuracy | ACC 20.95 | F1 99.88 | F1 96.64| ACC 95.11 | UAS 77.52 LAS 68.45 | ACC 95.60
- Python
Published by oroszgy over 7 years ago
huspacy - Hungarian tagger and vocabulary model with vectors (medium)
Baseline tagger and parser from Universal dependencies + vocabulary and word vector model generated from the Hungarian Webcorpus and Wikipedia
Feature | Description ------- | ------------ Tagger | 98.23 ACC trained/tested on the Szeged Corpus (Universal Morphology transcript) Word vectors | word2vec bow with 150 dimensions, generated from the Hungarian Webcorpus and Wikipedia Brown clusters | 1024 clusters generated from the Hungarian Webcorpus and Wikipedia
- Python
Published by oroszgy almost 9 years ago
huspacy - Hungarian parser, tagger and vocabulary model with vectors (medium)
Baseline tagger and parser from Universal dependencies + vocabulary and word vector model generated from the Hungarian Webcorpus and Wikipedia
Feature | Description ------- | ------------ Tagger | 93.95 ACC trained/tested on Universal dependencies corpus Parser | 75.12 UAS and 64.85 LAS trained/tested on Universal dependencies corpus Word vectors | word2vec bow with 150 dimensions, generated from the Hungarian Webcorpus and Wikipedia Brown clusters | 1024 clusters generated from the Hungarian Webcorpus and Wikipedia
- Python
Published by oroszgy almost 9 years ago
huspacy - Hungarian vocabulary model with vectors (medium)
Vocabulary and word vector model trained on the Hungarian Webcorpus and Wikipedia
Feature | Description ------- | ------------ Corpora | Hungarian Webcorpus, Hungarian Wikipedia Word vectors | 150 dimension, word2vec Brown clusters | 1024
- Python
Published by oroszgy almost 9 years ago
huspacy - Hungarian vocabulary model with vectors (large)
Vocabulary and word vector model trained on the Hungarian Webcorpus and Wikipedia
Feature | Description ------- | ------------ Corpora | Hungarian Webcorpus, Hungarian Wikipedia Word vectors | 300 dimension, word2vec Brown clusters | 1024
- Python
Published by oroszgy almost 9 years ago