mderanklib

Library for MDERank model to AKE

https://github.com/oeg-upm/mderanklib

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Library for MDERank model to AKE

Basic Info
  • Host: GitHub
  • Owner: oeg-upm
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 38.1 KB
Statistics
  • Stars: 1
  • Watchers: 6
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

mderanklib

Library for MDERank model to AKE

It has been adapted and improved to work also in Spanish language.

Original Paper: MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction Original Repo: https://github.com/LinhanZ/mderank

Install

This project has been developed under Python 3.9.6

Use requirements.txt and requirements-torch.txt to install the required libraries. You can use torch with and without gpu

Download also the required spacy models of Spanish/English

``` python -m spacy download encorewebsm python -m spacy download escorenewssm

```

Run

If you want to evaluate a dataset as in the State of the Art, put the dataset in a data folder (data/datasetname/docsutf8 and data/datasetname/keys). Configure parameters and arguments of evaluate.sh and run it.

bash eval.sh If you want to execute it over texts on a folder, configure and execute run.sh file bash run.sh

Docker run

For a fast run use the dockerfile and this two commands. In these commands, mderank will read a folder named example with all the documents that are inside and it will create a file .key for each file with the keywords detected

``` docker build -t mderanklib .

```

docker run --rm -v ./example:/app/example mderanklib --dataset_dir example --batch_size 1 --doc_embed_mode max --log_dir log_path --model_name_or_path PlanTL-GOB-ES/roberta-base-bne --model_type roberta --dataset_name example --type_execution eval --k_value 15 --layer_num -1 --lang es --no_cuda

Acknowledgments

Para su desarrollo este código ha recibido financiación del proyecto INESData (Infraestructura para la INvestigación de ESpacios de DAtos distribuidos en UPM), un proyecto financiado en el contexto de la convocatoria UNICO I+D CLOUD del Ministerio para la Transformación Digital y de la Función Pública en el marco del PRTR financiado por Unión Europea (NextGenerationEU).

Este código se ha mejorado y adaptado en el marco del proyecto TeresIA, proyecto de investigación financiado con fondos de la Unión Europea Next GenerationEU / PRTR a través del Ministerio de Asuntos Económicos y Transformación Digital (hoy Ministerio para la Transformación Digital y de la Función Pública).

Paper Citation

bibtext @inproceedings{Calleja2024, author = {Pablo Calleja and Patricia Martín-Chozas and Elena Montiel-Ponsoda}, title = {Benchmark for Automatic Keyword Extraction in Spanish: Datasets and Methods}, booktitle = {Poster Proceedings of the 40th Annual Conference of the Spanish Association for Natural Language Processing 2024 (SEPLN-P 2024)}, series = {CEUR Workshop Proceedings}, volume = {3846}, pages = {132--141}, year = {2024}, publisher = {CEUR-WS.org}, address = {Valladolid, Spain}, month = {September 24-27}, urn = {urn:nbn:de:0074-3846-7}, url = {https://ceur-ws.org/Vol-3846/} }

Owner

  • Name: Ontology Engineering Group (UPM)
  • Login: oeg-upm
  • Kind: organization
  • Email: oeg-dev@delicias.dia.fi.upm.es
  • Location: Boadilla del Monte, Madrid, Spain

Citation (CITATION.cff)

cff-version: 1.2.0
message: "Si usas este código, por favor cita el siguiente artículo:"
authors:
  - name: "Pablo Calleja"
  - name: "Patricia Martín-Chozas"
  - name: "Elena Montiel-Ponsoda"
title: "Benchmark for Automatic Keyword Extraction in Spanish: Datasets and Methods"
booktitle: "Poster Proceedings of the 40th Annual Conference of the Spanish Association for Natural Language Processing 2024 (SEPLN-P 2024)"
series: "CEUR Workshop Proceedings"
volume: "3846"
pages: "132-141"
year: "2024"
publisher: "CEUR-WS.org"
conference: 
  name: "40th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2024)"
  place: "Valladolid, Spain"
  date-start: "2024-09-24"
  date-end: "2024-09-27"
url: "https://ceur-ws.org/Vol-3846/"
identifiers:
  - type: "urn"
    value: "urn:nbn:de:0074-3846-7"
date-released: "2024-09-24"

GitHub Events

Total
  • Watch event: 1
  • Push event: 2
Last Year
  • Watch event: 1
  • Push event: 2

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 10 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
pypi.org: mderanklib

Librería para realizar ranking multilingüe de entidades.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 10 Last month
Rankings
Dependent packages count: 9.9%
Average: 32.9%
Dependent repos count: 56.0%
Maintainers (1)
Last synced: 10 months ago

Dependencies

requirements.txt pypi
  • StanfordCoreNLP ==3.9.1.1
  • accelerate ==0.5.1
  • allennlp ==3.1.0
  • nltk ==3.8.1
  • pandas ==1.3.5
  • transformers ==4.37.2
Dockerfile docker
  • python 3.9.6 build
requirements-pytorch.txt pypi
  • torch ==2.1.2
  • torchaudio ==2.1.2
  • torchvision ==0.16.2