Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.5%) to scientific vocabulary

Keywords

dataset
Last synced: 8 months ago · JSON representation ·

Repository

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
dataset
Created almost 5 years ago · Last pushed over 4 years ago
Metadata Files
Readme License Citation

README.md

Annotated Reference Strings Dataset

Introduction

annotated_reference_strings dataset consists of millions of reference strings synthesized to at most 17 CSL styles using CSL processor (citeproc-js) with the short sequence of tokens (segment) annotated as the variable it is derived from.

This library provide some utility to parse the raw annotated string to a sequence of tuples of token and its label.

For more information on the library and also the dataset, refer to the documentation.

Obtaining the dataset

The dataset is prepared in National University of Singapore (NUS), School of Computing (SoC), Web Information Retrieval / Natural Language Processing Group (WING) as part of a Master project.

You can obtain the dataset in parts or full in 2 ways as they are bundled in separated files:

If you are downloading from the Google Drive, it will be faster to download them by using gdown as Google will zip up the files if you download them through the web interface:

shell pip install gdown gdown <url of the file>

If you are using Hugging Face's datasets library:

python from datasets import load_dataset dataset = load_dataset('yuanchuan/annotated_reference_strings')

Citing

If you are using the dataset, please cite the following:

bibtex @techreport{kee-nus-2021, author = {Yuan Chuan Kee}, title = {Synthesis of a large dataset of annotated reference strings for developing citation parsers}, institution = {National University of Singapore}, year = {2021} }

Owner

  • Name: Yuan Chuan Kee
  • Login: kylase
  • Kind: user
  • Location: Singapore

Reducing entropy, managing chaos.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this dataset, please cite it as below."
authors:
- family-names: Kee
  given-names: "Yuan Chuan"
title: "Annotated Reference Strings Dataset"
type: "dataset"
url: "https://kylase.github.io/annotated_reference_strings/"
license: MIT
preferred-citation:
  authors:
    - family-names: Kee
      given-names: "Yuan Chuan"
  title: "Synthesis of a large dataset of annotated reference strings for developing citation parsers"
  type: report
  year: 2021
  institution:
    name: "National University of Singapore"

GitHub Events

Total
Last Year

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 13
  • Total Committers: 1
  • Avg Commits per committer: 13.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Yuan Chuan Kee k****e@o****m 13

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 3
  • Total pull requests: 4
  • Average time to close issues: 11 months
  • Average time to close pull requests: 23 days
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kylase (3)
Pull Request Authors
  • kylase (4)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

docs/requirements.txt pypi
  • furo *
  • myst-parser *