https://github.com/alrobles/mammals_virus_text_class
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: pubmed.ncbi, ncbi.nlm.nih.gov -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.3%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Basic Info
- Host: GitHub
- Owner: alrobles
- Language: R
- Default Branch: main
- Size: 1.91 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created about 4 years ago
· Last pushed almost 3 years ago
https://github.com/alrobles/mammals_virus_text_class/blob/main/
Mammal parasite paper recommender
================
## Methodology
This is a brief summary of the methodology we use
- From global parasite databse () we extract all the papers
- Because we only have the name title of the paper we extract the
title and search in Cross Reference ( )
the doi and the abstract if is avaliable.
- Also we search with the DOI and title in PubMed
( )
- We finally build a table with title, DOI, abstract and year.
- We label this information as gmpd
- We generate a 1000 random abstracts sample in order to has a second
class we call unknown
- We tokenize each abstract and create a vocabulary for each word in
abstract
- We also take in accout the bigrams and remove the common stoping
words in english.
- We keep with a vocabulary of terms with a minimum term count of 20
overall the papers
- We vectorize or vocabulary and create a document term matrix wich
contains
- With the DTM we train a penalized logistic regression model. We
model the two clases (gmpd and uknown) and interpret the probability
as a measure of how an abstracts has information close to GMPD or
not.
- We build a shiny app that search an arbitrary string in PubMed and
evaluates the abstracts of each paper founded acoording to our
linear model
## App deploy
The app is accesible in
Owner
- Name: Angel Luis Robles Fernández
- Login: alrobles
- Kind: user
- Location: Xalapa Mexico
- Company: Vida Analytics
- Website: https://vidaanalytics.com/
- Repositories: 60
- Profile: https://github.com/alrobles
PhD student at Arizona State University