https://github.com/casper-schutte/fantastic-lamp
Repository for the FantasticLamp pipeline
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary
Keywords
Repository
Repository for the FantasticLamp pipeline
Basic Info
Statistics
- Stars: 6
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
Fantastic-lamp:
Project Description:
This project aims to evaluate the success of genome editing by measuring the coverage of reads mapping to edited regions compared to the corresponding reference sequence. This is accomplished by aligning reads from the edited genome to a genome graph built from the reference and intended edits. The pipeline can simultaneously calculate the coverage of multiple populations, making it an efficient tool for quantifying the success of novel editing methods or verifying multiple edits. The efficacy of the edits can be inferred from the output, which is a TSV file containing the list of intended edits, their homology coverage, and their reference coverage.
Installation and Dependencies:
This pipeline is initiated with the "find_coverage.sh" script from the command line and requires no explicit installation. All the dependencies can be installed with Conda from the "environment.yaml" file. This pipeline was developed and tested on Ubuntu 20.04 with Python 3.10, although earlier versions of Python may also be compatible. For more information about the system configuration that has been confirmed to run this pipeline correctly, please refer to the "Test.yml" file in the /workflows directory.
Verification and testing:
The following files from the /Test folder are strictly necessary: - DesignLibraryDetailsODD126.withEditWindow.csv - Datanames.txt - environment.yaml - refandmt.fna - simple_test.fastq.gz
Copy the following scripts from the main page into the data folder: - findcoverage.sh - comparecoveragereadinfo.py
ODD126augmentedCB39.fasta is not strictly necessary, but there will be an error message if the pipeline does not find it. However, the pipeline will still run correctly, as this test does not include reads from a vector plasmid sequence.
The pipeline needs to be run with Conda
Install deps:
conda env update --file environment.yaml
The Python script "comparecoveragereadinfo.py" and the bash script "findcoverage.sh" need to be copied to the
Test folder. In the main folder, run the following command:
cp compare_coverage_read_info.py find_coverage.sh Test/
Run pipeline: (Use this exact command)
```
conda run -n fantastic-lamp bash find_coverage.sh
```
Descriptions of steps and files used by the pipeline:
1) Homology arms (homarms) and the reference sequence for each homology arm (refhomarms) are extracted from DesignLibraryDetailsODD126.withEditWindow.csv and combined into a single file: ODD126refandhomarms.fa
2) minimap2 is used to map the homarms and refhomarms to the reference sequence (refandmt.fna) which includes both the reference sequence and its mitochondrial (mtDNA) sequence. The alignment is saved as ODD126refandhomarms.paf 3) refandmt.fna and ODD126refandhomarms.fa are combined with ODD126augmentedCB39.fasta (this is the plasmid sequence) to make yeast+edits.fa 4) Using yeast+edits.fa and the alignment from step 2, seqwish is used to create the variation graph (yeast+edits.gfa). 5) The graph is sorted and chopped using odgi and then converted into xg format (yeast+edits.og.gfa.xg), before finally being indexed -> yeast+edits.og.gfa.gcsa 6) The file Datanames.txt contains the names of the files which contain the sequencing reads. They are in .fastq.gz format. The script can handle paired-end reads. This can be changed in the bash script (the files need to be named appropriately) This file is iterated over and for each line (file) the following steps (7 & 8) are executed: 7) The reads from both files are mapped onto the graph (yeast+edits.og.gfa.xg), creating "filename".gaf 8) The python script "comparecoverage.py" is called with the .gaf file and the yeast+edits.og file as input. From the yeast+edits.og file, a dictionary is created mapping nodeid to path names. From this dictionary, homology arm and reference homology arm paths are created and edges are created from the nodes. Edges that are shared between the refhomarms and homarms are discarded. Then, the number of reads mapping to edges within homarms and refhomarms are counted (from the .gaf file) and put into a dictionary mapping edges to the read count for that edge. Coverage for a path calculated as the sum of the number of reads mapping to an edge in the path divided by the number of edges in the path. These coverages are written to a .tsv file.
Compiling the paper:
- Download the /paper/ folder.
- In the folder, run: ```
- make clean
- make ``` This will compile the main.pdf
Owner
- Login: casper-schutte
- Kind: user
- Repositories: 1
- Profile: https://github.com/casper-schutte
GitHub Events
Total
Last Year
Dependencies
- actions/checkout v3 composite
- actions/setup-python v3 composite
- actions/checkout v3 composite
- actions/upload-artifact v1 composite
- openjournals/openjournals-draft-action master composite
- continuumio/miniconda3 latest build