https://github.com/bihealth/swibrid_paper

Code and scripts for SWIBRID publication

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (4.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Code and scripts for SWIBRID publication

Basic Info

Host: GitHub
Owner: bihealth
Language: Jupyter Notebook
Default Branch: master
Size: 3.47 MB

Statistics

Stars: 0
Watchers: 4
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed 12 months ago

Metadata Files

Readme

README.md

paper repository for Vazquez-Garcia & Obermayer et al.

contents:

data: contains input data for paper figures (prepared using prepare_data.R
paper_figures.Rmd: R code to produce paper figures (uses data in data, nothing else needed)

sessionInfo

R version 4.3.2 (2023-10-31)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LCCTYPE=enUS.UTF-8, LCNUMERIC=C, _LCTIME=enUS.UTF-8, LCCOLLATE=enUS.UTF-8, LCMONETARY=enUS.UTF-8, LCMESSAGES=enUS.UTF-8, LCPAPER=enUS.UTF-8, LCNAME=C, _LCADDRESS=C, _LCTELEPHONE=C, _LCMEASUREMENT=enUS.UTF-8 and LCIDENTIFICATION=C_

attached base packages: grid, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: dendextend(v.1.17.1), lme4(v.1.1-35.1), gtools(v.3.9.5), RColorBrewer(v.1.1-3), variancePartition(v.1.32.5), BiocParallel(v.1.36.0), limma(v.3.58.1), readxl(v.1.4.3), pROC(v.1.18.5), glmnet(v.4.1-8), Matrix(v.1.6-5), car(v.3.1-2), carData(v.3.0-5), ggrepel(v.0.9.5), circlize(v.0.4.16), ComplexHeatmap(v.2.18.0), cowplot(v.1.1.3), scales(v.1.3.0), caret(v.6.0-94), lattice(v.0.21-9), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.1), purrr(v.1.0.2), readr(v.2.1.5), tidyr(v.1.3.1), tibble(v.3.2.1), tidyverse(v.2.0.0), dplyr(v.1.1.4), ggpubr(v.0.6.0) and ggplot2(v.3.5.1)

loaded via a namespace (and not attached): bitops(v.1.0-7), Rdpack(v.2.6), gridExtra(v.2.3), rlang(v.1.1.3), magrittr(v.2.0.3), clue(v.0.3-65), GetoptLong(v.1.0.5), matrixStats(v.1.2.0), compiler(v.4.3.2), png(v.0.1-8), vctrs(v.0.6.5), reshape2(v.1.4.4), pkgconfig(v.2.0.3), shape(v.1.4.6.1), crayon(v.1.5.2), backports(v.1.4.1), pander(v.0.6.5), caTools(v.1.18.2), utf8(v.1.2.4), prodlim(v.2023.08.28), tzdb(v.0.4.0), nloptr(v.2.0.3), xfun(v.0.42), EnvStats(v.2.8.1), recipes(v.1.0.10), remaCor(v.0.0.18), broom(v.1.0.5), parallel(v.4.3.2), cluster(v.2.1.4), R6(v.2.5.1), stringi(v.1.8.3), boot(v.1.3-28.1), parallelly(v.1.37.0), rpart(v.4.1.21), numDeriv(v.2016.8-1.1), cellranger(v.1.1.0), Rcpp(v.1.0.12), iterators(v.1.0.14), knitr(v.1.45), future.apply(v.1.11.1), IRanges(v.2.36.0), splines(v.4.3.2), nnet(v.7.3-19), timechange(v.0.3.0), tidyselect(v.1.2.0), viridis(v.0.6.5), rstudioapi(v.0.15.0), abind(v.1.4-5), timeDate(v.4032.109), gplots(v.3.1.3.1), doParallel(v.1.0.17), codetools(v.0.2-19), listenv(v.0.9.1), lmerTest(v.3.1-3), plyr(v.1.8.9), Biobase(v.2.62.0), withr(v.3.0.0), future(v.1.33.1), survival(v.3.5-7), pillar(v.1.9.0), KernSmooth(v.2.23-22), foreach(v.1.5.2), stats4(v.4.3.2), generics(v.0.1.3), S4Vectors(v.0.40.2), hms(v.1.1.3), aod(v.1.3.3), munsell(v.0.5.0), minqa(v.1.2.6), globals(v.0.16.2), RhpcBLASctl(v.0.23-42), class(v.7.3-22), glue(v.1.7.0), tools(v.4.3.2), fANCOVA(v.0.6-1), data.table(v.1.15.0), ModelMetrics(v.1.2.2.2), gower(v.1.0.1), ggsignif(v.0.6.4), mvtnorm(v.1.2-4), rbibutils(v.2.2.16), ipred(v.0.9-14), colorspace(v.2.1-0), nlme(v.3.1-163), cli(v.3.6.2), fansi(v.1.0.6), viridisLite(v.0.4.2), lava(v.1.8.0), corpcor(v.1.6.10), gtable(v.0.3.4), rstatix(v.0.7.2), digest(v.0.6.34), BiocGenerics(v.0.48.1), pbkrtest(v.0.5.2), rjson(v.0.2.21), lifecycle(v.1.0.4), hardhat(v.1.3.1), GlobalOptions(v.0.1.2), statmod(v.1.5.0) and MASS(v.7.3-60)

swibrid_runs: contains config files for various SWIBRID runs on human or mouse data, or the simulations
- benchmarks: config files for the benchmarks
  - dense: using dense MSA, can be run as is using swibrid test in that folder
  - sparse: using sparse MSA. for this, the sparsecluster package needs to be installed
- mouse: config files for mouse data
  - download raw fastq files from SRA (accession PRJNA1190672) into raw_data and run demultiplex_dataset.sh; this will put fastq and info.csv files for individual samples into input and make it possible to run all samples in one go
  - download mm10 genome from UCSC or elsewhere
  - download gencode M12 reference and use swibrid prepare_annotation
  - use config.yaml for running all mouse data (assumed to produce the folder output)
  - use config_noSg.yaml for running everything only on Sm + Sa (potentially restrict info files in input to reads with Sa primer; assumed to produce the folder output_no_Sg)
- human: config files for human data and various scripts to create plots
  
  raw sequencing data for human donors cannot be shared due to patient privacy legislation
  - demultiplex_dataset.sh is used to demultiplex input for each run, demultiplexed fastq and info.csv files would be expected in input
  - get hg38 genome and gencode v33 reference, create LAST index
  - config.yaml for "regular" runs (assumed to produce the folder output)
  - config_reads_averaging.yaml to use averaging of features over reads not clusters (assumed to produce the folder output_reads_averaging)
  - combine_replicates.sh to pool reads from technical replicates
  - plot_bars.sh and plot_bars.py to plot isotype fractions as in Fig. 1
  - plot_circles.sh and plot_circles.py to create bubble plots of Fig. 1
  - plot_clustering.sh to create read plots for Fig. 1 and S2
  - plot_breakpoints.sh and plot_breakpoint_stats.py to create breakpoint matrix plot of Fig. 2A
  - meta_clustering.py and meta_clustering.sh is used for meta-clustering of clusters from multiple samples
  - cluster_tracing.ipynb is used to create the annotated read plots for tracing clusters across samples
  - minION_vs_pacBIO_vs_HTGTS.ipynb is used to compare different technologies in different species and regions
  - external: config files for public datasets (Vincendeau et al. and Panchakshari et al.)
  - for Vincendeau et al., download data from SRA (PRJNA831666) into the Vincendeau subfolder and run make_info.py on every sample to create dummy files with primer locations
  - for Panchakshari et al., use get_data.sh in the HTGTS folder to download data (accessions SRR2104731-47 and SRR6293456-63), collapse read mates with bbmerge and create info files
supplementary_note.ipynb: python code to make plots for supplementary note (needs numpy, scipy, pandas, seaborn)

Owner

Name: Berlin Institute of Health
Login: bihealth
Kind: organization

Website: https://www.cubi.bihealth.org/
Repositories: 215
Profile: https://github.com/bihealth

BIH Core Unit Bioinformatics & BIH HPC IT

GitHub Events

Total

Push event: 1
Public event: 1

Last Year

Push event: 1
Public event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bihealth/swibrid_paper

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Owner

GitHub Events

Total

Last Year