Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: mariolpantunes
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 6.32 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created almost 5 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

semantic_dl

Instalation

  • This commands are run on the git root folder
  • make sure to have c++11 installed as fastText needs it.
  • compile fasttext
    • wget https://github.com/facebookresearch/fastText/archive/v0.9.2.zip
    • unzip v0.9.2.zip
    • cd fastText-0.9.2
    • make
    • move the binary file fasttext to the fasttext folder
      • mv fastText-0.9.2/fasttext fasttext/
  • install python3-dev
    • sudo apt install python3-dev (for debain based systems)
  • Download dataset for similarity training https://raw.githubusercontent.com/AlexGrinch/ro_sgns/master/datasets/rg65.csv
    • put it in dataset/train_sim/
  • Download constrained corpus https://www.kaggle.com/datasets/mantunes/semantic-corpus-from-web-search-snippets
    • put the uncompressed files (.csv format) in dataset/train/
  • Download dataset for similarity evaluation (IoT) https://www.kaggle.com/datasets/mantunes/semantic-iot
    • put it in dataset/test with the name en-mc-30.csv
  • Download dataset for similarity evaluation (MC) https://www.kaggle.com/datasets/mantunes/millercharles
    • put it in dataset/test with the name en-iot-30.csv
  • Download pretrained fasttext https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip
    • put it in fasttext/pre-trained/
    • rename it pretrained.vec
  • Download pretrained glove https://nlp.stanford.edu/data/glove.6B.zip
    • put it in glove/pre-trained/
  • Install python libraries
    • pip install -r requirements.txt

Authors

License

This project is licensed under the MIT License - see the LICENSE file for details

Citation

Teixeira, Rafael & Antunes, Mrio & Gomes, Diogo & Aguiar, Rui. (2022). Comparison of Semantic Similarity Models on Constrained Scenarios. Information Systems Frontiers. 10.1007/s10796-022-10350-w.

Owner

  • Name: Mário Antunes
  • Login: mariolpantunes
  • Kind: user
  • Location: Aveiro
  • Company: @ATNoG

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Antunes"
  given-names: "Mário"
  orcid: "https://orcid.org/0000-0002-6504-9441"
- family-names: "Rafael"
  given-names: "Teixeira"
  orcid: "https://orcid.org/0000-0000-0000-0000"
title: "Comparison of Semantic Similarity Models on Constrained Scenarios"
version: 1.0.0
doi: 10.1007/s10796-022-10350-w
date-released: 2022-11-10
url: "https://github.com/mariolpantunes/semantic_dl/"
preferred-citation:
  type: journal
  authors:
  - family-names: "Teixeira"
    given-names: "Rafael"
    orcid: "https://orcid.org/0000-0001-7211-382X"
  - family-names: "Antunes"
    given-names: "Mário"
    orcid: "https://orcid.org/0000-0002-6504-9441"
  - family-names: "Gomes"
    given-names: "Diogo"
    orcid: "https://orcid.org/0000-0002-5848-2802"
  - family-names: "Aguiar"
    given-names: "Rui L."
    orcid: "https://orcid.org/0000-0003-0107-6253"
  title: "Comparison of Semantic Similarity Models on Constrained Scenarios"
  doi: 10.1007/s10796-022-10350-w
  journal: "Information Systems Frountier"
  month: 11
  year: 2022

GitHub Events

Total
Last Year

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 47
  • Total Committers: 3
  • Avg Commits per committer: 15.667
  • Development Distribution Score (DDS): 0.17
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Rafael Teixeira r****a@u****t 39
Rafael Teixeira r****a@u****t 5
Mario Antunes m****s@g****m 3
Committer Domains (Top 20 + Academic)
ua.pt: 2

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • rgtzths (1)
Top Labels
Issue Labels
Pull Request Labels