https://github.com/bihealth/swibrid_paper

Code and scripts for SWIBRID publication

https://github.com/bihealth/swibrid_paper

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Code and scripts for SWIBRID publication

Basic Info
  • Host: GitHub
  • Owner: bihealth
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 3.47 MB
Statistics
  • Stars: 0
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 9 months ago
Metadata Files
Readme

README.md

paper repository for Vazquez-Garcia & Obermayer et al.

contents:

  • data: contains input data for paper figures (prepared using prepare_data.R

  • paper_figures.Rmd: R code to produce paper figures (uses data in data, nothing else needed)

sessionInfo R version 4.3.2 (2023-10-31)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LCCTYPE=enUS.UTF-8, LCNUMERIC=C, _LCTIME=enUS.UTF-8, LCCOLLATE=enUS.UTF-8, LCMONETARY=enUS.UTF-8, LCMESSAGES=enUS.UTF-8, LCPAPER=enUS.UTF-8, LCNAME=C, _LCADDRESS=C, _LCTELEPHONE=C, _LCMEASUREMENT=enUS.UTF-8 and LCIDENTIFICATION=C_

attached base packages: grid, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: dendextend(v.1.17.1), lme4(v.1.1-35.1), gtools(v.3.9.5), RColorBrewer(v.1.1-3), variancePartition(v.1.32.5), BiocParallel(v.1.36.0), limma(v.3.58.1), readxl(v.1.4.3), pROC(v.1.18.5), glmnet(v.4.1-8), Matrix(v.1.6-5), car(v.3.1-2), carData(v.3.0-5), ggrepel(v.0.9.5), circlize(v.0.4.16), ComplexHeatmap(v.2.18.0), cowplot(v.1.1.3), scales(v.1.3.0), caret(v.6.0-94), lattice(v.0.21-9), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.1), purrr(v.1.0.2), readr(v.2.1.5), tidyr(v.1.3.1), tibble(v.3.2.1), tidyverse(v.2.0.0), dplyr(v.1.1.4), ggpubr(v.0.6.0) and ggplot2(v.3.5.1)

loaded via a namespace (and not attached): bitops(v.1.0-7), Rdpack(v.2.6), gridExtra(v.2.3), rlang(v.1.1.3), magrittr(v.2.0.3), clue(v.0.3-65), GetoptLong(v.1.0.5), matrixStats(v.1.2.0), compiler(v.4.3.2), png(v.0.1-8), vctrs(v.0.6.5), reshape2(v.1.4.4), pkgconfig(v.2.0.3), shape(v.1.4.6.1), crayon(v.1.5.2), backports(v.1.4.1), pander(v.0.6.5), caTools(v.1.18.2), utf8(v.1.2.4), prodlim(v.2023.08.28), tzdb(v.0.4.0), nloptr(v.2.0.3), xfun(v.0.42), EnvStats(v.2.8.1), recipes(v.1.0.10), remaCor(v.0.0.18), broom(v.1.0.5), parallel(v.4.3.2), cluster(v.2.1.4), R6(v.2.5.1), stringi(v.1.8.3), boot(v.1.3-28.1), parallelly(v.1.37.0), rpart(v.4.1.21), numDeriv(v.2016.8-1.1), cellranger(v.1.1.0), Rcpp(v.1.0.12), iterators(v.1.0.14), knitr(v.1.45), future.apply(v.1.11.1), IRanges(v.2.36.0), splines(v.4.3.2), nnet(v.7.3-19), timechange(v.0.3.0), tidyselect(v.1.2.0), viridis(v.0.6.5), rstudioapi(v.0.15.0), abind(v.1.4-5), timeDate(v.4032.109), gplots(v.3.1.3.1), doParallel(v.1.0.17), codetools(v.0.2-19), listenv(v.0.9.1), lmerTest(v.3.1-3), plyr(v.1.8.9), Biobase(v.2.62.0), withr(v.3.0.0), future(v.1.33.1), survival(v.3.5-7), pillar(v.1.9.0), KernSmooth(v.2.23-22), foreach(v.1.5.2), stats4(v.4.3.2), generics(v.0.1.3), S4Vectors(v.0.40.2), hms(v.1.1.3), aod(v.1.3.3), munsell(v.0.5.0), minqa(v.1.2.6), globals(v.0.16.2), RhpcBLASctl(v.0.23-42), class(v.7.3-22), glue(v.1.7.0), tools(v.4.3.2), fANCOVA(v.0.6-1), data.table(v.1.15.0), ModelMetrics(v.1.2.2.2), gower(v.1.0.1), ggsignif(v.0.6.4), mvtnorm(v.1.2-4), rbibutils(v.2.2.16), ipred(v.0.9-14), colorspace(v.2.1-0), nlme(v.3.1-163), cli(v.3.6.2), fansi(v.1.0.6), viridisLite(v.0.4.2), lava(v.1.8.0), corpcor(v.1.6.10), gtable(v.0.3.4), rstatix(v.0.7.2), digest(v.0.6.34), BiocGenerics(v.0.48.1), pbkrtest(v.0.5.2), rjson(v.0.2.21), lifecycle(v.1.0.4), hardhat(v.1.3.1), GlobalOptions(v.0.1.2), statmod(v.1.5.0) and MASS(v.7.3-60)

  • swibrid_runs: contains config files for various SWIBRID runs on human or mouse data, or the simulations

    • benchmarks: config files for the benchmarks

      • dense: using dense MSA, can be run as is using swibrid test in that folder
      • sparse: using sparse MSA. for this, the sparsecluster package needs to be installed
    • mouse: config files for mouse data

      • download raw fastq files from SRA (accession PRJNA1190672) into raw_data and run demultiplex_dataset.sh; this will put fastq and info.csv files for individual samples into input and make it possible to run all samples in one go
      • download mm10 genome from UCSC or elsewhere
      • download gencode M12 reference and use swibrid prepare_annotation
      • use config.yaml for running all mouse data (assumed to produce the folder output)
      • use config_noSg.yaml for running everything only on Sm + Sa (potentially restrict info files in input to reads with Sa primer; assumed to produce the folder output_no_Sg)
    • human: config files for human data and various scripts to create plots

      raw sequencing data for human donors cannot be shared due to patient privacy legislation

      • demultiplex_dataset.sh is used to demultiplex input for each run, demultiplexed fastq and info.csv files would be expected in input
      • get hg38 genome and gencode v33 reference, create LAST index
      • config.yaml for "regular" runs (assumed to produce the folder output)
      • config_reads_averaging.yaml to use averaging of features over reads not clusters (assumed to produce the folder output_reads_averaging)
      • combine_replicates.sh to pool reads from technical replicates
      • plot_bars.sh and plot_bars.py to plot isotype fractions as in Fig. 1
      • plot_circles.sh and plot_circles.py to create bubble plots of Fig. 1
      • plot_clustering.sh to create read plots for Fig. 1 and S2
      • plot_breakpoints.sh and plot_breakpoint_stats.py to create breakpoint matrix plot of Fig. 2A
      • meta_clustering.py and meta_clustering.sh is used for meta-clustering of clusters from multiple samples
      • cluster_tracing.ipynb is used to create the annotated read plots for tracing clusters across samples
      • minION_vs_pacBIO_vs_HTGTS.ipynb is used to compare different technologies in different species and regions
      • external: config files for public datasets (Vincendeau et al. and Panchakshari et al.)
      • for Vincendeau et al., download data from SRA (PRJNA831666) into the Vincendeau subfolder and run make_info.py on every sample to create dummy files with primer locations
      • for Panchakshari et al., use get_data.sh in the HTGTS folder to download data (accessions SRR2104731-47 and SRR6293456-63), collapse read mates with bbmerge and create info files
  • supplementary_note.ipynb: python code to make plots for supplementary note (needs numpy, scipy, pandas, seaborn)

Owner

  • Name: Berlin Institute of Health
  • Login: bihealth
  • Kind: organization

BIH Core Unit Bioinformatics & BIH HPC IT

GitHub Events

Total
  • Push event: 1
  • Public event: 1
Last Year
  • Push event: 1
  • Public event: 1