jacy

The Jacy Japanese Grammar

https://github.com/delph-in/jacy

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

The Jacy Japanese Grammar

Basic Info

Host: GitHub
Owner: delph-in
License: other
Language: Common Lisp
Default Branch: develop
Homepage: http://moin.delph-in.net/JacyTop
Size: 59 MB

Statistics

Stars: 15
Watchers: 11
Forks: 5
Open Issues: 45
Releases: 2

Created almost 12 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog Contributing License Citation

Jacy

The Jacy Japanese grammar (Siegel, Bender, & Bond, 2016; Siegel & Bender, 2002) is a broad-coverage HPSG grammar of Japanese. In combination with a parser (such as the LKB, ACE, or agree), it can analyze Japanese sentences, yielding derivation trees and MRS semantic representations, and also generate sentences from semantic representations.

Input sentences are tokenized using a morphological analyzer like MeCab.

Quick Start

The ACE parser/generator works on Linux and Mac machines. After installing ACE, the following commands will let you parse and generate with Jacy:

bash ~$ git clone https://github.com/delph-in/jacy.git ~$ cd jacy/ ~/jacy$ ace -g ace/config.tdl -G jacy.dat ~/jacy$ echo "太郎が次郎に本を渡した" | ace -g jacy.dat [...] NOTE: parsed 1 / 1 sentences, avg 2837k, time 0.02782s ~/jacy$ echo "太郎が次郎に本を渡した" | ace -g jacy.dat | ace -g jacy.dat -e [...] 太郎が次郎に本を渡した次郎に太郎が本を渡した次郎に本を太郎が渡した [...] NOTE: generated 1 / 9 sentences, avg 2653k, time 0.06851s

You can use a tokenizer such as mecab to tokenize the input: echo "カタカナも漢字も大丈夫です。" | mecab -O wakati | ace -g ~/git/jacy/jacy.dat SENT: カタカナも漢字も大丈夫です。 [...] NOTE: parsed 1 / 1 sentences, avg 1733k, time 0.00666s

You can use a different tokenizer, but mecab is what we used. In this configuration it will not handle unknown words, so every word has to be in the lexicon. This is good for grammar development, but not so robust. For example, the word 平仮名 is not in the dictionary while 片仮名 and カタカナ, so it could not parse the sentence "平仮名もカタカナも漢字も大丈夫です。 ".

To be more robust, you can use use an input lattice (yy-mode) that also passes through part of speech. The system will then handle (some) unknown words. There is a utility to do this in the jacy repository. Assuming you have all the dependencies installed you can go:

echo 'JACYは平仮名もカタカナも漢字も大丈夫です。' | utils/jpn2yy | ace -g jacy.dat -yy SENT: (yy mode) [...] NOTE: parsed 1 / 1 sentences, avg 2676k, time 0.01190s

Owner

Name: DELPH-IN
Login: delph-in
Kind: organization

Website: http://delph-in.net
Repositories: 23
Profile: https://github.com/delph-in

Deep Linguistic Processing with HPSG

Citation (citation.bib)

@Book{Siegel:Bender:Bond:2016,
 author    = {Siegel, Melanie and Bender, Emily M. and Bond, Francis},
 title     = {Jacy: {A}n Implemented Grammar of {J}apanese},
 publisher = {CSLI Publications},
 series    = {CSLI Studies in Computational Linguistics},
 year      = {2016},
 month     = nov,
 address   = {Stanford},
 isbn      = {9781684000180},
 url       = {http://web.stanford.edu/group/cslipublications/cslipublications/site/9781684000180.shtml}
}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science