https://github.com/3mmarand/bsc-proj-tut-1

https://github.com/3mmarand/bsc-proj-tut-1

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 14 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: 3mmaRand
  • Language: R
  • Default Branch: main
  • Size: 11 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

bsc-proj-tut-1

The goal of bsc-proj-tut-1 is to demonstrate one workflow useful for a BSc project in text analysis.

These projects use methods for small or large-scale text analysis (aka text mining) which can be used to address a wide variety of problems. Some examples are as follows:

  • analysis of scientific literature to discover major trends in published studies. Previous students have:

    • characterised research in gene editing following the advent of CRISPR to discover a research trajectory from techniques through applications to ethical and safety concerns
    • characterised research in Alzheimer's disease before and after the approval of the first treatment that addresses the underlying biology of Alzheimer's and changes the course of the disease
  • analysis of social media or news stories to quantify public attitude or understanding of health conditions or biotechnology. Previous students have:

    • analysed the reporting of the COVID-19 epidemic in different types of media
    • compared discussion of the conservation of charismatic vs non-charismatic species on twitter or new stories

Projects could use:

  • rOpenSci R packages to access papers. https://ropensci.org/packages/
  • The free book, Welcome to Text Mining with R by Julia Silge and David Robinson: https://www.tidytextmining.com/
  • Taguette (https://app.taguette.org/), a free and open-source research tool that allows you to import files of various formats and highlight terminology, concepts, sentences, etc and tag them with the codes you create. Taguette generates excel/csv files of well-organised data that are relatively easy to analyse.

This repo demonstrates the use of the easyPubMed package to access PubMed articles along with tidytext and tidyverse packages to analyse the text.

install.packages("devtools") install.packages("easyPubMed") install.packages("tidyverse") install.packages("tidytext") install.packages("textdata")

You can download this repo with the following command:

usethis::use_course("3mmaRand/bsc-proj-tut-1")

Example

There are two scripts

  • retrieve-pubmed.R Uses the query '("virtual reality") AND (treatment)' to find and download abstracts of papers from PubMed. The abstracts are saved in a csv file: data-raw/abstracts.csv

  • analysis.R Does some simple analysis

Packages

To cite package ‘tidyverse’ in publications use:

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.

A BibTeX entry for LaTeX users is

@Article{, title = {Welcome to the {tidyverse}}, author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani}, year = {2019}, journal = {Journal of Open Source Software}, volume = {4}, number = {43}, pages = {1686}, doi = {10.21105/joss.01686}, }

To cite package ‘tidytext’ in publications use:

Silge J, Robinson D (2016). “tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” JOSS, 1(3). doi:10.21105/joss.00037 https://doi.org/10.21105/joss.00037, http://dx.doi.org/10.21105/joss.00037.

A BibTeX entry for LaTeX users is

@Article{, title = {tidytext: Text Mining and Analysis Using Tidy Data Principles in R}, author = {Julia Silge and David Robinson}, doi = {10.21105/joss.00037}, url = {http://dx.doi.org/10.21105/joss.00037}, year = {2016}, publisher = {The Open Journal}, volume = {1}, number = {3}, journal = {JOSS}, }

To cite package ‘easyPubMed’ in publications use:

Fantini D (2019). easyPubMed: Search and Retrieve Scientific Publication Records from PubMed. R package version 2.13, https://CRAN.R-project.org/package=easyPubMed.

A BibTeX entry for LaTeX users is

@Manual{, title = {easyPubMed: Search and Retrieve Scientific Publication Records from PubMed}, author = {Damiano Fantini}, year = {2019}, note = {R package version 2.13}, url = {https://CRAN.R-project.org/package=easyPubMed}, }

Owner

  • Name: Emma Rand
  • Login: 3mmaRand
  • Kind: user
  • Location: York, UK
  • Company: University of York

Lecturer at @UniOfYork sharing my enthusiasm for all things data, mainly in R. Ridiculously lucky. Talks too fast, thinks too slow.

GitHub Events

Total
  • Watch event: 1
  • Push event: 5
Last Year
  • Watch event: 1
  • Push event: 5