https://github.com/chronchi/molecular_landscape

Scripts and prose for the molecular landscape paper

https://github.com/chronchi/molecular_landscape

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Scripts and prose for the molecular landscape paper

Basic Info
  • Host: GitHub
  • Owner: chronchi
  • License: gpl-3.0
  • Language: HTML
  • Default Branch: main
  • Size: 2.18 GB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 4 years ago · Last pushed about 2 years ago
Metadata Files
Readme License

README.md

EMBER creates a unified space for independent breast cancer transcriptomic datasets enabling precision oncology.

This repo contain the scripts used to generate all the figures from the EMBER paper and also instructions on how to run the analysis yourself.

Directory structure

scripts contains all the scripts used to generate the images and associated files. It is organized into a quarto book.

results contains all the output of the analysis, including html and other files.

data contains raw data that is used in the project and is not available on the server. Note here we are using preprocessed data already available on the server. For now this folder is empty.

Docs

The analysis when generated is available on the folder docs. Also an online version is available at chronchi.github.io/molecular_landscape.

Files that are ignored

Overall we are ignoring all the rds, rdb and RData files that are generated in the analysis. We also ignore the cache folders and the files folders. The data folders and results folders are also ignored, they would make the repo too big. Check the gitignore files for a complete list.

Docker

The analysis here has a docker image with all the datasets available. Unfortunately we cannot use github actions to automatically generate the report and feed to github pages. The docker image size is too big (~10GB compressed) and the github runners provide up to 14GB SSD storage space. So instead one would need to run locally the whole docker. For that before pushing to the main branch of github I check if there is any difference in the renv.lock file. If there is, a new image is automatically generated and submitted to docker hub to provide images for running the analysis.

One step that was crucial when creating the image was to first isolate the cache from the package directly into the private library (renv::isolate()). This way we ensure that all the packages are copied to the docker image.

After the image is run another Dockerfile is used to generate the report that will be used in the github pages. The report is saved in the docs folder. So if you would like to run the analysis locally the only thing that you will need to do is run the command below at the root of this repository. You will need sudo access to run it.

```bash

clone the repo to have the latest script available

git clone git@github.com:chronchi/molecular_landscape.git

bash generate_docs.sh ```

After this you should be able to access the report on docs/index.html.

The docker image does not contain all the intermediate files necessary to run the analysis. They are generated when creating the docs.

Moreover, if you want to play with the data and the code, you can access the RStudio server available from the docker image directly using the commands below. The username is rstudio and the password is ember. The scripts in the docker image are the latest available upon the creation of the image. RStudio can be run from localhost:8000 after that.

bash docker run \ -p 8000:8787 \ --name ember \ -e PASSWORD=ember \ chronchi/ember:latest

At the beginning of each chapter (with exception of the first one surv_analysis_estrogen.qmd), files that were created in the previous chapters are loaded. Therefore it is important to run everything in order according to the _quarto.yml file.

My suggestion is to run docker, open the container/rstudio, then go to the scripts folder and run the following:

```bash echo "firstrun <- TRUE" > R/firstrun.R

bash quarto render --to=html ```

This will generate the docs and all the associated RDS files. This way you can go to any script and run the analysis there.

Owner

  • Name: Carlos
  • Login: chronchi
  • Kind: user

GitHub Events

Total
Last Year