mderanklib

Library for MDERank model to AKE

https://github.com/oeg-upm/mderanklib

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Library for MDERank model to AKE

Basic Info

Host: GitHub
Owner: oeg-upm
License: apache-2.0
Language: Python
Default Branch: main
Size: 38.1 KB

Statistics

Stars: 1
Watchers: 6
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

mderanklib

Library for MDERank model to AKE

It has been adapted and improved to work also in Spanish language.

Original Paper: MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction Original Repo: https://github.com/LinhanZ/mderank

Install

This project has been developed under Python 3.9.6

Use requirements.txt and requirements-torch.txt to install the required libraries. You can use torch with and without gpu

Download also the required spacy models of Spanish/English

``` python -m spacy download encorewebsm python -m spacy download escorenewssm

```

Run

If you want to evaluate a dataset as in the State of the Art, put the dataset in a data folder (data/datasetname/docsutf8 and data/datasetname/keys). Configure parameters and arguments of evaluate.sh and run it.

bash eval.sh If you want to execute it over texts on a folder, configure and execute run.sh file bash run.sh

Docker run

For a fast run use the dockerfile and this two commands. In these commands, mderank will read a folder named example with all the documents that are inside and it will create a file .key for each file with the keywords detected

``` docker build -t mderanklib .

```

docker run --rm -v ./example:/app/example mderanklib --dataset_dir example --batch_size 1 --doc_embed_mode max --log_dir log_path --model_name_or_path PlanTL-GOB-ES/roberta-base-bne --model_type roberta --dataset_name example --type_execution eval --k_value 15 --layer_num -1 --lang es --no_cuda

Acknowledgments

Para su desarrollo este código ha recibido financiación del proyecto INESData (Infraestructura para la INvestigación de ESpacios de DAtos distribuidos en UPM), un proyecto financiado en el contexto de la convocatoria UNICO I+D CLOUD del Ministerio para la Transformación Digital y de la Función Pública en el marco del PRTR financiado por Unión Europea (NextGenerationEU).

Este código se ha mejorado y adaptado en el marco del proyecto TeresIA, proyecto de investigación financiado con fondos de la Unión Europea Next GenerationEU / PRTR a través del Ministerio de Asuntos Económicos y Transformación Digital (hoy Ministerio para la Transformación Digital y de la Función Pública).

Paper Citation

bibtext @inproceedings{Calleja2024, author = {Pablo Calleja and Patricia Martín-Chozas and Elena Montiel-Ponsoda}, title = {Benchmark for Automatic Keyword Extraction in Spanish: Datasets and Methods}, booktitle = {Poster Proceedings of the 40th Annual Conference of the Spanish Association for Natural Language Processing 2024 (SEPLN-P 2024)}, series = {CEUR Workshop Proceedings}, volume = {3846}, pages = {132--141}, year = {2024}, publisher = {CEUR-WS.org}, address = {Valladolid, Spain}, month = {September 24-27}, urn = {urn:nbn:de:0074-3846-7}, url = {https://ceur-ws.org/Vol-3846/} }

Owner

Name: Ontology Engineering Group (UPM)
Login: oeg-upm
Kind: organization
Email: oeg-dev@delicias.dia.fi.upm.es
Location: Boadilla del Monte, Madrid, Spain

Website: https://oeg.fi.upm.es/
Repositories: 294
Profile: https://github.com/oeg-upm

Citation (CITATION.cff)

cff-version: 1.2.0
message: "Si usas este código, por favor cita el siguiente artículo:"
authors:
  - name: "Pablo Calleja"
  - name: "Patricia Martín-Chozas"
  - name: "Elena Montiel-Ponsoda"
title: "Benchmark for Automatic Keyword Extraction in Spanish: Datasets and Methods"
booktitle: "Poster Proceedings of the 40th Annual Conference of the Spanish Association for Natural Language Processing 2024 (SEPLN-P 2024)"
series: "CEUR Workshop Proceedings"
volume: "3846"
pages: "132-141"
year: "2024"
publisher: "CEUR-WS.org"
conference: 
  name: "40th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2024)"
  place: "Valladolid, Spain"
  date-start: "2024-09-24"
  date-end: "2024-09-27"
url: "https://ceur-ws.org/Vol-3846/"
identifiers:
  - type: "urn"
    value: "urn:nbn:de:0074-3846-7"
date-released: "2024-09-24"

GitHub Events

Total

Watch event: 1
Push event: 2

Last Year

Watch event: 1
Push event: 2

Packages

Total packages: 1
Total downloads:
- pypi 10 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

pypi.org: mderanklib

Librería para realizar ranking multilingüe de entidades.

Homepage: https://github.com/oeg-upm/mderanklib
Documentation: https://mderanklib.readthedocs.io/
License: Apache
Latest release: 0.1.0
published over 1 year ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 10 Last month

Rankings

Dependent packages count: 9.9%

Average: 32.9%

Dependent repos count: 56.0%

Maintainers (1)

AlbertoGarcia

Last synced: 10 months ago

Dependencies

requirements.txt pypi

StanfordCoreNLP ==3.9.1.1
accelerate ==0.5.1
allennlp ==3.1.0
nltk ==3.8.1
pandas ==1.3.5
transformers ==4.37.2

Dockerfile docker

python 3.9.6 build

requirements-pytorch.txt pypi

torch ==2.1.2
torchaudio ==2.1.2
torchvision ==0.16.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

mderanklib

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

mderanklib

Install

Run

Docker run

Acknowledgments

Paper Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Packages

pypi.org: mderanklib

Rankings

Maintainers (1)

Dependencies