https://github.com/bihealth/swibrid_paper
Code and scripts for SWIBRID publication
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.2%) to scientific vocabulary
Repository
Code and scripts for SWIBRID publication
Basic Info
- Host: GitHub
- Owner: bihealth
- Language: Jupyter Notebook
- Default Branch: master
- Size: 3.47 MB
Statistics
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
paper repository for Vazquez-Garcia & Obermayer et al.
contents:
data: contains input data for paper figures (prepared usingprepare_data.Rpaper_figures.Rmd: R code to produce paper figures (uses data indata, nothing else needed)
sessionInfo
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LCCTYPE=enUS.UTF-8, LCNUMERIC=C, _LCTIME=enUS.UTF-8, LCCOLLATE=enUS.UTF-8, LCMONETARY=enUS.UTF-8, LCMESSAGES=enUS.UTF-8, LCPAPER=enUS.UTF-8, LCNAME=C, _LCADDRESS=C, _LCTELEPHONE=C, _LCMEASUREMENT=enUS.UTF-8 and LCIDENTIFICATION=C_
attached base packages: grid, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: dendextend(v.1.17.1), lme4(v.1.1-35.1), gtools(v.3.9.5), RColorBrewer(v.1.1-3), variancePartition(v.1.32.5), BiocParallel(v.1.36.0), limma(v.3.58.1), readxl(v.1.4.3), pROC(v.1.18.5), glmnet(v.4.1-8), Matrix(v.1.6-5), car(v.3.1-2), carData(v.3.0-5), ggrepel(v.0.9.5), circlize(v.0.4.16), ComplexHeatmap(v.2.18.0), cowplot(v.1.1.3), scales(v.1.3.0), caret(v.6.0-94), lattice(v.0.21-9), lubridate(v.1.9.3), forcats(v.1.0.0), stringr(v.1.5.1), purrr(v.1.0.2), readr(v.2.1.5), tidyr(v.1.3.1), tibble(v.3.2.1), tidyverse(v.2.0.0), dplyr(v.1.1.4), ggpubr(v.0.6.0) and ggplot2(v.3.5.1)
loaded via a namespace (and not attached): bitops(v.1.0-7), Rdpack(v.2.6), gridExtra(v.2.3), rlang(v.1.1.3), magrittr(v.2.0.3), clue(v.0.3-65), GetoptLong(v.1.0.5), matrixStats(v.1.2.0), compiler(v.4.3.2), png(v.0.1-8), vctrs(v.0.6.5), reshape2(v.1.4.4), pkgconfig(v.2.0.3), shape(v.1.4.6.1), crayon(v.1.5.2), backports(v.1.4.1), pander(v.0.6.5), caTools(v.1.18.2), utf8(v.1.2.4), prodlim(v.2023.08.28), tzdb(v.0.4.0), nloptr(v.2.0.3), xfun(v.0.42), EnvStats(v.2.8.1), recipes(v.1.0.10), remaCor(v.0.0.18), broom(v.1.0.5), parallel(v.4.3.2), cluster(v.2.1.4), R6(v.2.5.1), stringi(v.1.8.3), boot(v.1.3-28.1), parallelly(v.1.37.0), rpart(v.4.1.21), numDeriv(v.2016.8-1.1), cellranger(v.1.1.0), Rcpp(v.1.0.12), iterators(v.1.0.14), knitr(v.1.45), future.apply(v.1.11.1), IRanges(v.2.36.0), splines(v.4.3.2), nnet(v.7.3-19), timechange(v.0.3.0), tidyselect(v.1.2.0), viridis(v.0.6.5), rstudioapi(v.0.15.0), abind(v.1.4-5), timeDate(v.4032.109), gplots(v.3.1.3.1), doParallel(v.1.0.17), codetools(v.0.2-19), listenv(v.0.9.1), lmerTest(v.3.1-3), plyr(v.1.8.9), Biobase(v.2.62.0), withr(v.3.0.0), future(v.1.33.1), survival(v.3.5-7), pillar(v.1.9.0), KernSmooth(v.2.23-22), foreach(v.1.5.2), stats4(v.4.3.2), generics(v.0.1.3), S4Vectors(v.0.40.2), hms(v.1.1.3), aod(v.1.3.3), munsell(v.0.5.0), minqa(v.1.2.6), globals(v.0.16.2), RhpcBLASctl(v.0.23-42), class(v.7.3-22), glue(v.1.7.0), tools(v.4.3.2), fANCOVA(v.0.6-1), data.table(v.1.15.0), ModelMetrics(v.1.2.2.2), gower(v.1.0.1), ggsignif(v.0.6.4), mvtnorm(v.1.2-4), rbibutils(v.2.2.16), ipred(v.0.9-14), colorspace(v.2.1-0), nlme(v.3.1-163), cli(v.3.6.2), fansi(v.1.0.6), viridisLite(v.0.4.2), lava(v.1.8.0), corpcor(v.1.6.10), gtable(v.0.3.4), rstatix(v.0.7.2), digest(v.0.6.34), BiocGenerics(v.0.48.1), pbkrtest(v.0.5.2), rjson(v.0.2.21), lifecycle(v.1.0.4), hardhat(v.1.3.1), GlobalOptions(v.0.1.2), statmod(v.1.5.0) and MASS(v.7.3-60)
swibrid_runs: contains config files for various SWIBRID runs on human or mouse data, or the simulationsbenchmarks: config files for the benchmarksdense: using dense MSA, can be run as is usingswibrid testin that foldersparse: using sparse MSA. for this, thesparseclusterpackage needs to be installed
mouse: config files for mouse data- download raw fastq files from SRA (accession PRJNA1190672) into
raw_dataand rundemultiplex_dataset.sh; this will put fastq andinfo.csvfiles for individual samples intoinputand make it possible to run all samples in one go - download mm10 genome from UCSC or elsewhere
- download gencode M12 reference and use
swibrid prepare_annotation - use
config.yamlfor running all mouse data (assumed to produce the folderoutput) - use
config_noSg.yamlfor running everything only on Sm + Sa (potentially restrict info files ininputto reads with Sa primer; assumed to produce the folderoutput_no_Sg)
- download raw fastq files from SRA (accession PRJNA1190672) into
human: config files for human data and various scripts to create plotsraw sequencing data for human donors cannot be shared due to patient privacy legislation
demultiplex_dataset.shis used to demultiplex input for each run, demultiplexed fastq andinfo.csvfiles would be expected ininput- get hg38 genome and gencode v33 reference, create LAST index
config.yamlfor "regular" runs (assumed to produce the folderoutput)config_reads_averaging.yamlto use averaging of features over reads not clusters (assumed to produce the folderoutput_reads_averaging)combine_replicates.shto pool reads from technical replicatesplot_bars.shandplot_bars.pyto plot isotype fractions as in Fig. 1plot_circles.shandplot_circles.pyto create bubble plots of Fig. 1plot_clustering.shto create read plots for Fig. 1 and S2plot_breakpoints.shandplot_breakpoint_stats.pyto create breakpoint matrix plot of Fig. 2Ameta_clustering.pyandmeta_clustering.shis used for meta-clustering of clusters from multiple samplescluster_tracing.ipynbis used to create the annotated read plots for tracing clusters across samplesminION_vs_pacBIO_vs_HTGTS.ipynbis used to compare different technologies in different species and regionsexternal: config files for public datasets (Vincendeau et al. and Panchakshari et al.)- for Vincendeau et al., download data from SRA (PRJNA831666) into the Vincendeau subfolder and run
make_info.pyon every sample to create dummy files with primer locations - for Panchakshari et al., use
get_data.shin theHTGTSfolder to download data (accessions SRR2104731-47 and SRR6293456-63), collapse read mates withbbmergeand create info files
supplementary_note.ipynb: python code to make plots for supplementary note (needsnumpy,scipy,pandas,seaborn)
Owner
- Name: Berlin Institute of Health
- Login: bihealth
- Kind: organization
- Website: https://www.cubi.bihealth.org/
- Repositories: 215
- Profile: https://github.com/bihealth
BIH Core Unit Bioinformatics & BIH HPC IT
GitHub Events
Total
- Push event: 1
- Public event: 1
Last Year
- Push event: 1
- Public event: 1