https://github.com/barakcohenlab/crx-dms-manuscript

Code for data analysis relevant to CRX DMS project

https://github.com/barakcohenlab/crx-dms-manuscript

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Code for data analysis relevant to CRX DMS project

Basic Info
  • Host: GitHub
  • Owner: barakcohenlab
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 5.25 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.md

This repository includes code for the analyses described in:

"Mutational scanning of CRX classifies clinical variants and reveals biochemical properties of the transcriptional effector domain" (10.1101/gr.279415.124)

James L. Shepherdson¹﹐²\ David M. Granas¹﹐²\ Jie Li¹﹐²\ Zara Shariff¹﹐²\ Stephen P. Plassmeyer³﹐⁴\ Alex S. Holehouse³﹐⁴\ Michael A. White¹﹐²\ Barak A. Cohen¹﹐²^

¹Department of Genetics,\ ²Edison Family Center for Genome Sciences & Systems Biology,\ ³Department of Biochemistry and Molecular Biophysics,\ ⁴Center for Molecular Condensates,\ Washington University in St. Louis School of Medicine, St. Louis, MO 63110, USA

^ Correspondence: Barak A. Cohen cohen@wustl.edu

Description

  • DMS_analysis: includes code and scripts for analysis of barcode abundance data from sorted fractions to compute variant activity scores and generate figures
    • call_variants.py generates a barcode-to-variant map from PacBio long read sequencing data to associate sequence barcodes with coding variants
  • tools: includes software tools developed for this manuscript that are used in other scripts:
    • bcbuddy is used to extract barcodes from sequencing reads
    • dms_tools includes a library for parsing the output of minimap2 for long read data analysis; used by call_variants.py
    • fphd implements a parallelized Hamming distance calculation for barcode error correction

Notes

crx_genomic_positions.tsv is provided to translate protein-level variant coordinates into cDNA and gDNA coordinates. To produce a VCF-compatible file, awk or a similar tool can be used. For example:

awk 'BEGIN {FS="\t"; OFS="\t"} {print 19, $5, ".", $6, $7, ".", "PASS", "var="$1$2$3}' crx_genomic_positions.tsv

Note that you may wish to first filter the crx_genomic_positions table to particular variants of interest, or join it with the DMS or computational predictor data to include scores in the VCF.

Owner

  • Name: BarakCohenLab
  • Login: barakcohenlab
  • Kind: organization

GitHub Events

Total
Last Year