biorecap
Retrieve and summarize bioRxiv preprints with a local LLM using ollama
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org, medrxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.0%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
·
Repository
Retrieve and summarize bioRxiv preprints with a local LLM using ollama
Basic Info
- Host: GitHub
- Owner: stephenturner
- License: other
- Language: R
- Default Branch: main
- Homepage: https://stephenturner.github.io/biorecap/
- Size: 2.08 MB
Statistics
- Stars: 70
- Watchers: 2
- Forks: 10
- Open Issues: 1
- Releases: 3
Created over 1 year ago
· Last pushed over 1 year ago
Metadata Files
Readme
Changelog
Contributing
License
Citation
README.Rmd
---
output: github_document
---
```{r, eval=FALSE, echo=FALSE}
# Run interactively
devtools::build_readme()
pkgdown::build_site()
```
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# biorecap
[](https://github.com/stephenturner/biorecap/actions/workflows/R-CMD-check.yaml)
[](https://doi.org/10.48550/arXiv.2408.11707)
[](https://stephenturner.r-universe.dev/biorecap)
Retrieve and summarize [bioRxiv](https://www.biorxiv.org/) and [medRxiv](https://www.medrxiv.org/) preprints using a local LLM with [Ollama](https://ollama.com/) via [ollamar](https://cran.r-project.org/package=ollamar).
Turner, S. D. (2024). biorecap: an R package for summarizing bioRxiv preprints with a local LLM. _arXiv_, 2408.11707. https://doi.org/10.48550/arXiv.2408.11707.
## Installation
Install biorecap from GitHub (keep `dependencies=TRUE` to get Suggests packages needed to create the HTML report):
```{r, eval=FALSE}
# install.packages("remotes")
remotes::install_github("stephenturner/biorecap", dependencies=TRUE)
```
## Usage
### Quick start
First, load the biorecap library.
```{r}
library(biorecap)
```
Let's make sure Ollama is running and that we can talk to it through R:
```{r, eval=FALSE}
test_connection()
```
```
#> Ollama local server running
#>
#> GET http://localhost:11434/
#> Status: 200 OK
#> Content-Type: text/plain
#> Body: In memory (17 bytes)
```
Next we can list our available models:
```{r, eval=FALSE}
list_models()
```
```
name size parameter_size quantization_level modified
1 gemma2:latest 5.4 GB 9.2B Q4_0 2024-08-07T07:35:15
3 llama3.1:70b 40 GB 70.6B Q4_0 2024-07-24T10:57:08
4 llama3.1:latest 4.7 GB 8.0B Q4_0 2024-07-31T09:38:38
5 llama3.2:latest 2 GB 3.2B Q4_K_M 2024-09-25T14:54:23
6 phi3:latest 2.2 GB 3.8B Q4_0 2024-08-28T04:37:58
```
Write an HTML report containing summaries of recent preprints in select subject areas to the current working directory. You can include both bioRxiv and medRxiv subjects, and biorecap will know which RSS feed to use.
```{r, eval=FALSE}
biorecap_report(output_dir=".",
subject=c("bioinformatics", "infectious_diseases"),
model="llama3.2")
```
Example HTML report generated from bioRxiv (bioinformatics) and infectious diseases (medRxiv) subjects on September 25, 2024:
```{r, echo=FALSE}
knitr::include_graphics(here::here("man/figures/report_screenshot.jpg"))
```
### Details
The `get_preprints()` function retrieves preprints from the RSS feed of either bioRxiv or medRxiv, based on the subject you provided. You pass one or more subjects to the `subject` argument.
```{r, eval=FALSE}
pp <- get_preprints(subject=c("bioinformatics",
"infectious_diseases"))
head(pp)
tail(pp)
```
```{r, echo=FALSE}
pp <- example_preprints
pp |> dplyr::select(-prompt, -summary) |> head()
pp |> dplyr::select(-prompt, -summary) |> tail()
```
The `add_prompt()` function adds a prompt to each preprint that will be used to prompt the model.
```{r, eval=FALSE}
pp <- pp |> add_prompt()
pp
```
```{r, echo=FALSE}
pp |> dplyr::select(-summary)
```
Let's take a look at one of these prompts:
> I am giving you a paper’s title and abstract. Summarize the paper in as many sentences as I instruct. Do not include any preamble text. Just give me the summary.
>
> Number of sentences in summary: 2
>
> Title: SeuratExtend: Streamlining Single-Cell RNA-Seq Analysis Through an Integrated and Intuitive Framework
>
> Abstract: Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of cellular heterogeneity, but the rapid expansion of analytical tools has proven to be both a blessing and a curse, presenting researchers with significant challenges. Here, we present SeuratExtend, a comprehensive R package built upon the widely adopted Seurat framework, which streamlines scRNA-seq data analysis by integrating essential tools and databases. SeuratExtend offers a user-friendly and intuitive interface for performing a wide range of analyses, including functional enrichment, trajectory inference, gene regulatory network reconstruction, and denoising. The package seamlessly integrates multiple databases, such as Gene Ontology and Reactome, and incorporates popular Python tools like scVelo, Palantir, and SCENIC through a unified R interface. SeuratExtend enhances data visualization with optimized plotting functions and carefully curated color schemes, ensuring both aesthetic appeal and scientific rigor. We demonstrate SeuratExtend’s performance through case studies investigating tumor-associated high-endothelial venules and autoinflammatory diseases, and showcase its novel applications in pathway-Level analysis and cluster annotation. SeuratExtend empowers researchers to harness the full potential of scRNA-seq data, making complex analyses accessible to a wider audience. The package, along with comprehensive documentation and tutorials, is freely available at GitHub, providing a valuable resource for the single-cell genomics community.
The `add_summary()` function uses a locally running LLM available through Ollama to summarize the preprint. Let's add the summary. Notice that we can do this all in a single pipeline. This takes a few minutes!
```{r, eval=FALSE}
pp <-
get_preprints(subject=c("bioinformatics", "infectious_diseases")) |>
add_prompt() |>
add_summary(model="llama3.2")
```
Let's take a look at the results:
```{r}
pp
```
Let's look at one of those summaries. Here's the summary for the SeuratExtend paper (abstract above):
> SeuratExtend is an R package that integrates essential tools and databases for single-cell RNA sequencing (scRNA-seq) data analysis, streamlining the process through a user-friendly interface. The package offers various analyses, including functional enrichment and gene regulatory network reconstruction, and seamlessly integrates multiple databases and popular Python tools.
The `biorecap_report()` function runs this code in an RMarkdown template, writing the resulting HTML and CSV file with results to the current working directory.
```{r, eval=FALSE}
biorecap_report(output_dir=".",
subject=c("bioinformatics", "infectious_diseases"),
model="llama3.2")
```
The built-in `subjects` is a list with vectors containing all the available bioRxiv and medRxiv subjects.
```{r}
subjects$biorxiv
subjects$medrxiv
```
You could create a report for _all_ subjects like this (note, this could take some time):
```{r, eval=FALSE}
biorecap_report(output_dir=".",
subject=c(subjects$biorxiv, subjects$medrxiv)
model="llama3.2")
```
Owner
- Name: Stephen Turner
- Login: stephenturner
- Kind: user
- Location: Charlottesville, VA
- Company: @colossal-compsci
- Website: http://StephenTurner.us
- Twitter: strnr
- Repositories: 125
- Profile: https://github.com/stephenturner
Data scientist in biotech, former academic, Principal Scientist and Head of Genomic Strategy at Colossal Biosciences
Citation (CITATION.cff)
# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
cff-version: 1.2.0
message: 'To cite package "biorecap" in publications use:'
type: software
license: MIT
title: 'biorecap: Retrieve and summarize bioRxiv preprints with a local LLM using
ollama'
version: 0.1.0
abstract: Retrieve and summarize bioRxiv preprints with a local LLM using ollama.
authors:
- family-names: Turner
given-names: Stephen
email: vustephen@gmail.com
orcid: https://orcid.org/0000-0001-9140-9028
repository-code: https://github.com/stephenturner/biorecap
url: https://stephenturner.github.io/biorecap/
contact:
- family-names: Turner
given-names: Stephen
email: vustephen@gmail.com
orcid: https://orcid.org/0000-0001-9140-9028
references:
- type: software
title: 'R: A Language and Environment for Statistical Computing'
notes: Depends
url: https://www.R-project.org/
authors:
- name: R Core Team
institution:
name: R Foundation for Statistical Computing
address: Vienna, Austria
year: '2024'
version: '>= 4.2.0'
- type: software
title: dplyr
abstract: 'dplyr: A Grammar of Data Manipulation'
notes: Imports
url: https://dplyr.tidyverse.org
repository: https://CRAN.R-project.org/package=dplyr
authors:
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
orcid: https://orcid.org/0000-0003-4757-117X
- family-names: François
given-names: Romain
orcid: https://orcid.org/0000-0002-2444-4226
- family-names: Henry
given-names: Lionel
- family-names: Müller
given-names: Kirill
orcid: https://orcid.org/0000-0002-1416-3412
- family-names: Vaughan
given-names: Davis
email: davis@posit.co
orcid: https://orcid.org/0000-0003-4777-038X
year: '2024'
doi: 10.32614/CRAN.package.dplyr
- type: software
title: ollamar
abstract: 'ollamar: ''Ollama'' Language Models'
notes: Imports
url: https://hauselin.github.io/ollama-r/
repository: https://CRAN.R-project.org/package=ollamar
authors:
- family-names: Lin
given-names: Hause
email: hauselin@gmail.com
orcid: https://orcid.org/0000-0003-4590-7039
year: '2024'
doi: 10.32614/CRAN.package.ollamar
- type: software
title: rlang
abstract: 'rlang: Functions for Base Types and Core R and ''Tidyverse'' Features'
notes: Imports
url: https://rlang.r-lib.org
repository: https://CRAN.R-project.org/package=rlang
authors:
- family-names: Henry
given-names: Lionel
email: lionel@posit.co
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
year: '2024'
doi: 10.32614/CRAN.package.rlang
- type: software
title: rmarkdown
abstract: 'rmarkdown: Dynamic Documents for R'
notes: Imports
url: https://pkgs.rstudio.com/rmarkdown/
repository: https://CRAN.R-project.org/package=rmarkdown
authors:
- family-names: Allaire
given-names: JJ
email: jj@posit.co
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
- family-names: Dervieux
given-names: Christophe
email: cderv@posit.co
orcid: https://orcid.org/0000-0003-4474-2498
- family-names: McPherson
given-names: Jonathan
email: jonathan@posit.co
- family-names: Luraschi
given-names: Javier
- family-names: Ushey
given-names: Kevin
email: kevin@posit.co
- family-names: Atkins
given-names: Aron
email: aron@posit.co
- family-names: Wickham
given-names: Hadley
email: hadley@posit.co
- family-names: Cheng
given-names: Joe
email: joe@posit.co
- family-names: Chang
given-names: Winston
email: winston@posit.co
- family-names: Iannone
given-names: Richard
email: rich@posit.co
orcid: https://orcid.org/0000-0003-3925-190X
year: '2024'
doi: 10.32614/CRAN.package.rmarkdown
- type: software
title: tidyRSS
abstract: 'tidyRSS: Tidy RSS for R'
notes: Imports
url: https://github.com/RobertMyles/tidyrss
repository: https://CRAN.R-project.org/package=tidyRSS
authors:
- family-names: McDonnell
given-names: Robert Myles
email: robertmylesmcdonnell@gmail.com
year: '2024'
doi: 10.32614/CRAN.package.tidyRSS
- type: software
title: tinytable
abstract: 'tinytable: Simple and Configurable Tables in ''HTML'', ''LaTeX'', ''Markdown'',
''Word'', ''PNG'', ''PDF'', and ''Typst'' Formats'
notes: Imports
url: https://vincentarelbundock.github.io/tinytable/
repository: https://CRAN.R-project.org/package=tinytable
authors:
- family-names: Arel-Bundock
given-names: Vincent
email: vincent.arel-bundock@umontreal.ca
orcid: https://orcid.org/0000-0003-2042-7063
year: '2024'
doi: 10.32614/CRAN.package.tinytable
- type: software
title: knitr
abstract: 'knitr: A General-Purpose Package for Dynamic Report Generation in R'
notes: Suggests
url: https://yihui.org/knitr/
repository: https://CRAN.R-project.org/package=knitr
authors:
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
year: '2024'
doi: 10.32614/CRAN.package.knitr
- type: software
title: markdown
abstract: 'markdown: Render Markdown with ''commonmark'''
notes: Suggests
url: https://github.com/rstudio/markdown
repository: https://CRAN.R-project.org/package=markdown
authors:
- family-names: Xie
given-names: Yihui
email: xie@yihui.name
orcid: https://orcid.org/0000-0003-0645-5666
- family-names: Allaire
given-names: JJ
- family-names: Horner
given-names: Jeffrey
year: '2024'
doi: 10.32614/CRAN.package.markdown
GitHub Events
Total
- Watch event: 10
- Fork event: 3
Last Year
- Watch event: 10
- Fork event: 3
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Stephen Turner | v****n@g****m | 37 |
| VP Nagraj | p****j@s****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 5
- Total pull requests: 5
- Average time to close issues: 6 days
- Average time to close pull requests: 14 minutes
- Total issue authors: 5
- Total pull request authors: 2
- Average comments per issue: 1.4
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 5
- Average time to close issues: 6 days
- Average time to close pull requests: 14 minutes
- Issue authors: 5
- Pull request authors: 2
- Average comments per issue: 1.4
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- vpnagraj (1)
- Michael-Geuenich (1)
- sunta3iouxos (1)
- huyvuong (1)
- danieljking8 (1)
Pull Request Authors
- stephenturner (8)
- vpnagraj (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v4 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml
actions
- JamesIves/github-pages-deploy-action v4.5.0 composite
- actions/checkout v4 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 4.2.0 depends
- dplyr * imports
- ollamar * imports
- rlang * imports
- rmarkdown * imports
- tidyRSS * imports
- tinytable * imports
- knitr * suggests
- markdown * suggests