https://github.com/a-slide/dna_photolitography_seq

Repository containing analyses and datasets

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.4%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Repository containing analyses and datasets

Basic Info

Host: GitHub
Owner: a-slide
License: mit
Language: HTML
Default Branch: main
Homepage: https://a-slide.github.io/DNA_photolitography_seq
Size: 75.6 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created over 5 years ago · Last pushed over 5 years ago

https://github.com/a-slide/DNA_photolitography_seq/blob/main/

#  DNA photolithography sequencing methods and results

## Overview of sequencing and bioinformatic methods

### Sequence panel design

Sequences were designed to maximise the kmer content diversity while avoiding long internal homopolymers and oligonucleotides with a probability of having a strong secondary structure. To do so we started by generating 1 million random candidate sequences of 50 bases long, flanked by an 11 base "A" homopolymer in 3' and an 6 bases "C" homopolymer in 5'. These terminal homopolymers were added for library preparation and data processing reasons. No internal homopolymer longer that 5 bases was allowed during sequence generation. We ran RNAFold (ViennaRNA_2.4.3) to calculate the minimum free energy (MFE) of each sequences, and filtered out any sequence with an MFE lower than -17.6. We obtained a pool of 948,364 valid sequences covering all possible 16356 7-mers, not counting kmers containing homopolymers which were excluded. We then randomly sampled 20,000 sequences from the pool 500,000 times and selected the set of sequencing optimizing the 7-mers content.

* Full analysis notebook: [Jupyter notebook](https://a-slide.github.io/DNA_photolitography_seq/notebooks/Sequence_panel_design.html)
* Reference sequence panel generated: [FASTA](https://github.com/a-slide/DNA_photolitography_seq/raw/main/results/reference/DNA_test_sequence_panel_selection.fa.gz)

### Library preparation and Illumina sequencing

Samples were all prepared using the Accel-NGS 1S Plus DNA Library Kit from Swift Biosciences. We used 5 ng of each sample as input into the library preparation protocol following manufacturer's instructions.

In order to multiplex the samples for sequencing we used the following barcoded :
* O1 (index 2) = Normal DNA synthesis parameters
* O2 (index 4) = Cap protecting step between each iteration
* O3 (index 5) = Increase space between synthesis clusters

Final libraries were sequenced using a MiSeq Instrument and v2 Nano cartridge following the manufacture instructions in paired-end mode (2x150bp)

### Sequencing data analysis

Fastq files were obtained and demultiplexed with Illumina Casava bcl2fastq2 Conversion Software v2.20 and adapters were trimmed off using Cutadapt (v1.1.18) to a minimal read length of 20 bases.
Reads were subsequently aligned to the sequence panel reference previously generated using Bowtie2 (v2.3.4.3), then sorted and indexed with samtools (v1.9). Finally, we performed the error rate analysis overlayed on the flowcell layout or as a function the reference base position using python programming language via a Jupyter Notebook.

* Full analysis notebook: [Jupyter notebook](https://a-slide.github.io/DNA_photolitography_seq/notebooks/Illumina_error_rate_analysis.html)
* Plots generated during the analysis: [Flowcell layout error rate](https://github.com/a-slide/DNA_photolitography_seq/tree/main/results/flowcell_layout_error_rate/) and [Reference position error rate](https://github.com/a-slide/DNA_photolitography_seq/tree/main/results/ref_position_error_rate)
* Raw illumina datasets: [ENA_PRJEB43002](https://www.ebi.ac.uk/ena/browser/view/PRJEB43002)

## Citation

* Publication: []
* Repository: Adrien Leger. (2021, February 9). a-slide/DNA_photolitography_seq: v0.1 (Version v0.1). Zenodo. http://doi.org/10.5281/zenodo.4524788

## Licence

MIT

Copyright  2020 Adrien Leger

## Authors

Adrien Leger / aleg@ebi.ac.uk / https://adrienleger.com

Owner

Name: Adrien Leger
Login: a-slide
Kind: user
Location: Oxford, UK
Company: @nanoporetech

Website: https://adrienleger.com/
Twitter: AdrienLeger2
Repositories: 50
Profile: https://github.com/a-slide

Research scientist at Oxford Nanopore Technologies

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/a-slide/dna_photolitography_seq

Science Score: 13.0%

Repository

Basic Info

Statistics

https://github.com/a-slide/DNA_photolitography_seq/blob/main/

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels