https://github.com/alrobles/mammals_virus_text_class

https://github.com/alrobles/mammals_virus_text_class

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: alrobles
  • Language: R
  • Default Branch: main
  • Size: 1.91 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 4 years ago · Last pushed almost 3 years ago

https://github.com/alrobles/mammals_virus_text_class/blob/main/

Mammal parasite paper recommender
================

## Methodology

This is a brief summary of the methodology we use

-   From global parasite databse () we extract all the papers  
-   Because we only have the name title of the paper we extract the
    title and search in Cross Reference ()
    the doi and the abstract if is avaliable.
-   Also we search with the DOI and title in PubMed
    ()
-   We finally build a table with title, DOI, abstract and year.
-   We label this information as gmpd
-   We generate a 1000 random abstracts sample in order to has a second
    class we call unknown
-   We tokenize each abstract and create a vocabulary for each word in
    abstract
-   We also take in accout the bigrams and remove the common stoping
    words in english.
-   We keep with a vocabulary of terms with a minimum term count of 20
    overall the papers
-   We vectorize or vocabulary and create a document term matrix wich
    contains
-   With the DTM we train a penalized logistic regression model. We
    model the two clases (gmpd and uknown) and interpret the probability
    as a measure of how an abstracts has information close to GMPD or
    not.
-   We build a shiny app that search an arbitrary string in PubMed and
    evaluates the abstracts of each paper founded acoording to our
    linear model

## App deploy

The app is accesible in

Owner

  • Name: Angel Luis Robles Fernández
  • Login: alrobles
  • Kind: user
  • Location: Xalapa Mexico
  • Company: Vida Analytics

PhD student at Arizona State University

GitHub Events

Total
Last Year