annotated-reference-strings
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary
Keywords
Repository
Basic Info
- Host: GitHub
- Owner: kylase
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://kylase.github.io/annotated-reference-strings/
- Size: 406 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
Annotated Reference Strings Dataset
Introduction
annotated_reference_strings dataset consists of millions of reference strings synthesized to at most 17 CSL styles using CSL processor (citeproc-js) with the short sequence of tokens (segment) annotated as the variable it is derived from.
This library provide some utility to parse the raw annotated string to a sequence of tuples of token and its label.
For more information on the library and also the dataset, refer to the documentation.
Obtaining the dataset
The dataset is prepared in National University of Singapore (NUS), School of Computing (SoC), Web Information Retrieval / Natural Language Processing Group (WING) as part of a Master project.
You can obtain the dataset in parts or full in 2 ways as they are bundled in separated files:
If you are downloading from the Google Drive, it will be faster to download them by using gdown as Google will zip up the files if you download them through the web interface:
shell
pip install gdown
gdown <url of the file>
If you are using Hugging Face's datasets library:
python
from datasets import load_dataset
dataset = load_dataset('yuanchuan/annotated_reference_strings')
Citing
If you are using the dataset, please cite the following:
bibtex
@techreport{kee-nus-2021,
author = {Yuan Chuan Kee},
title = {Synthesis of a large dataset of annotated reference strings for developing citation parsers},
institution = {National University of Singapore},
year = {2021}
}
Owner
- Name: Yuan Chuan Kee
- Login: kylase
- Kind: user
- Location: Singapore
- Twitter: kylase
- Repositories: 6
- Profile: https://github.com/kylase
Reducing entropy, managing chaos.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this dataset, please cite it as below."
authors:
- family-names: Kee
given-names: "Yuan Chuan"
title: "Annotated Reference Strings Dataset"
type: "dataset"
url: "https://kylase.github.io/annotated_reference_strings/"
license: MIT
preferred-citation:
authors:
- family-names: Kee
given-names: "Yuan Chuan"
title: "Synthesis of a large dataset of annotated reference strings for developing citation parsers"
type: report
year: 2021
institution:
name: "National University of Singapore"
GitHub Events
Total
Last Year
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Yuan Chuan Kee | k****e@o****m | 13 |
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 3
- Total pull requests: 4
- Average time to close issues: 11 months
- Average time to close pull requests: 23 days
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- kylase (3)
Pull Request Authors
- kylase (4)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- furo *
- myst-parser *