https://github.com/adbar/german-nlp

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

https://github.com/adbar/german-nlp

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.1%) to scientific vocabulary

Keywords

computational-linguistics corpus-linguistics german-language natural-language-processing nlp text-mining
Last synced: 6 months ago · JSON representation

Repository

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

Basic Info
  • Host: GitHub
  • Owner: adbar
  • Default Branch: master
  • Homepage:
  • Size: 144 KB
Statistics
  • Stars: 440
  • Watchers: 45
  • Forks: 63
  • Open Issues: 0
  • Releases: 0
Topics
computational-linguistics corpus-linguistics german-language natural-language-processing nlp text-mining
Created over 7 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing

README.md

German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German Awesome

Resources and tools which can be used either off-the-shelf or with minor adjustments and which are currently maintained are primarily chosen for this list. It is deliberately biased in terms of usability and user-friendliness.

Community support is needed to keep this list up-to-date, pull requests and suggestions are welcome! See contributing guidelines.

Table of Contents

Text corpora

General-purpose

Historical

Specialized

Swiss German

Learner and Error Corpora

Word lists

Data acquisition

Lists of corpora

Generic resources

Frameworks

Treebanks

Deep learning models and transformers

Annotation

Standards

Linguistic processing

Preprocessing

Tokenization / Sentence boundary detection

Stemming

Lemmatization

Morphological analysis

Normalization

Phonology

POS-tagging

Syntactical parsing

Named Entity Recognition

Misc

Text generation

Industry/Applications

Evaluation

Semantic analysis

Datasets

Word embeddings and senses

Sentiment analysis datasets / polarity clues

Sentiment detection

GermEval

(category to improve) * Official GermEval tools list * GermEval 2015 data (Lexical Substitution) * Germeval Task 2017 * GermEval-2018 data * germeval-rug * IWGhatespeechpublic * jpadillamontani/germeval2018 * uhh-lt/GermEval2017-Baseline * UKP embeddings for GermEval 2017

Discourse

Summarization and Simplification

Psycholinguistics

Speech NLP

Machine Translation

(category to improve) * Tensorflow NMT DE-EN * NMT English to German * Unsupervised Word Segmentation for NMT

Parallel corpora

Large Language Models

Teaching resources and tutorials

More lists

German

General

Comparable lists

Larger institutional GitHub groups

Contributors

See the list of contributors.

License

CC-BY

Owner

  • Name: Adrien Barbaresi
  • Login: adbar
  • Kind: user
  • Location: Berlin
  • Company: Berlin-Brg. Academy of Sciences (BBAW)

Research scientist – natural language processing, web scraping and text analytics. Mostly with Python.

GitHub Events

Total
  • Watch event: 41
  • Push event: 1
  • Pull request event: 1
  • Fork event: 5
Last Year
  • Watch event: 41
  • Push event: 1
  • Pull request event: 1
  • Fork event: 5

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 5
  • Total pull requests: 22
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 9 days
  • Total issue authors: 4
  • Total pull request authors: 15
  • Average comments per issue: 1.4
  • Average comments per pull request: 0.59
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ulf1 (2)
  • xv44586 (1)
  • thorstenMueller (1)
  • dragonnikkirocks (1)
Pull Request Authors
  • Akron (4)
  • k00ni (2)
  • wartaal (2)
  • hoffart (2)
  • D4ve-R (2)
  • susannehaaf (1)
  • oliverguhr (1)
  • thorstenMueller (1)
  • Wolkenstein (1)
  • reckart (1)
  • zesch (1)
  • heyarne (1)
  • malteos (1)
  • wrznr (1)
  • tsterbak (1)
Top Labels
Issue Labels
question (1)
Pull Request Labels