Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.4%) to scientific vocabulary
Repository
Will it Unblend? (Findings of EMNLP 2020)
Basic Info
- Host: GitHub
- Owner: yuvalpinter
- License: gpl-3.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 1.24 MB
Statistics
- Stars: 6
- Watchers: 5
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Will it Unblend?
This is the home for code and data from the paper Will it Unblend?, Findings of EMNLP, November 2020.
Contents
November 13, 2021: We released the complex words dataset of 312 novel blends and compounds. The data is in the following schema: * class: whether the word is a blend or a compound. * word: a word first appearing in the New York Times between November 2017 and March 2019 (taken from NYTWIT, follow link for details). * bases: the words contributing to the complex word (space-delimited), manually annotated with help of originating NYT context. * sequence: character-level annotation of the word reflecting each character's origin: Prefix, A/B/C one of the bases (labeled successively according to their order in the bases column), X more than one base, O additional material, Suffix. See section 2 of the paper for details. * linearity: whether the relation between the base-contributing parts of the word is linear: no O; no A preceded by a B or X; no B followed by an A or X; natural extension to words with a C. Compounds, by definition, contain no X or O and are always linear. * semantic relation: the relationship between the bases, annotated according to the schema from Tratz and Hovy, 2010.
Stay tuned for the following releases: - [x] Code and data for reproducing the similarity experiments in section 3, including all BERT activations and lists of smoothies. (February 16, 2021) - [ ] Code and data for reproducing the segmentation experiments in section 4.1, including models for the character LM, ~~the character tagger and~~ the news-trained BPE table. - [x] Code and data for reproducing the recovery experiments in section 4.2, including candidate lists. (December 2, 2021)
Citing is Caring
Please use the following citation when you use our data or methods:
@inproceedings{pinter-etal-2020-will,
title = "Will it Unblend?",
author = "Pinter, Yuval and
Jacobs, Cassandra L. and
Eisenstein, Jacob",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.findings-emnlp.138",
pages = "1525--1535",
}
Owner
- Name: Yuval Pinter
- Login: yuvalpinter
- Kind: user
- Location: Beersheba, Israel
- Company: @ben-gurion-university
- Website: www.yuvalpinter.com
- Twitter: melelbgu
- Repositories: 3
- Profile: https://github.com/yuvalpinter
Senior Lecturer at the Department of Computer Science at Ben-Gurion University, focusing on NLP. PhD in CS from Georgia Tech (2021).
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Pinter" given-names: "Yuval" orcid: "https://orcid.org/0000-0003-3174-1621" - family-names: "Jacobs" given-names: "Cassandra L." - family-names: "Eisenstein" given-names: "Jacob" title: "Will it Unblend?" booktitle: "Findings of the Association for Computational Linguistics: EMNLP 2020" publisher: "Association for Computational Linguistics" version: 1.0.0 doi: 10.18653/v1/2020.findings-emnlp.138 date-released: 2020-11-01 url: "https://aclanthology.org/2020.findings-emnlp.138"
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0