https://github.com/barakcohenlab/crx-dms-manuscript
Code for data analysis relevant to CRX DMS project
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.7%) to scientific vocabulary
Repository
Code for data analysis relevant to CRX DMS project
Basic Info
- Host: GitHub
- Owner: barakcohenlab
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 5.25 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
This repository includes code for the analyses described in:
"Mutational scanning of CRX classifies clinical variants and reveals biochemical properties of the transcriptional effector domain" (10.1101/gr.279415.124)
James L. Shepherdson¹﹐²\ David M. Granas¹﹐²\ Jie Li¹﹐²\ Zara Shariff¹﹐²\ Stephen P. Plassmeyer³﹐⁴\ Alex S. Holehouse³﹐⁴\ Michael A. White¹﹐²\ Barak A. Cohen¹﹐²^
¹Department of Genetics,\ ²Edison Family Center for Genome Sciences & Systems Biology,\ ³Department of Biochemistry and Molecular Biophysics,\ ⁴Center for Molecular Condensates,\ Washington University in St. Louis School of Medicine, St. Louis, MO 63110, USA
^ Correspondence: Barak A. Cohen cohen@wustl.edu
Description
- DMS_analysis: includes code and scripts for analysis of barcode abundance data from sorted fractions to compute variant activity scores and generate figures
call_variants.pygenerates a barcode-to-variant map from PacBio long read sequencing data to associate sequence barcodes with coding variants
- tools: includes software tools developed for this manuscript that are used in other scripts:
- bcbuddy is used to extract barcodes from sequencing reads
- dms_tools includes a library for parsing the output of
minimap2for long read data analysis; used bycall_variants.py - fphd implements a parallelized Hamming distance calculation for barcode error correction
Notes
crx_genomic_positions.tsv is provided to translate protein-level variant coordinates into cDNA and gDNA coordinates. To produce a VCF-compatible file, awk or a similar tool can be used. For example:
awk 'BEGIN {FS="\t"; OFS="\t"} {print 19, $5, ".", $6, $7, ".", "PASS", "var="$1$2$3}' crx_genomic_positions.tsv
Note that you may wish to first filter the crx_genomic_positions table to particular variants of interest, or join it with the DMS or computational predictor data to include scores in the VCF.
Owner
- Name: BarakCohenLab
- Login: barakcohenlab
- Kind: organization
- Repositories: 3
- Profile: https://github.com/barakcohenlab