https://github.com/broadinstitute/prism_data_processing

Data processing pipeline for PRISM MTS

https://github.com/broadinstitute/prism_data_processing

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: science.org
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
    Organization broadinstitute has institutional domain (www.broadinstitute.org)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

Data processing pipeline for PRISM MTS

Basic Info
  • Host: GitHub
  • Owner: broadinstitute
  • License: bsd-3-clause
  • Language: R
  • Default Branch: master
  • Size: 5.46 MB
Statistics
  • Stars: 8
  • Watchers: 21
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created over 6 years ago · Last pushed about 4 years ago
Metadata Files
Readme License

README.md

This repository is deprecated please see the new MTS pipeline

Prism Data Processing

Public version of the data processing pipeline for PRISM medium throughput screens (MTS). For use by collaborators to regenerate tables and plots correlating drug response to cell line features. Public cell line data for equivalent analysis is available on the DepMap Portal.

For more general biomarker analysis see the cdsr_biomarker package which is based on models in the cdsr_models package.

This repository contains 3 primary scripts:

  1. make_logMFI.R
  2. MTS_Data_Processing.R
  3. MTS_Analysis.R

FIRST run setup.R either in RStudio or terminal to install packages required for analysis. For shell execute: bash cd ./prism_data_processing Rscript src/setup.R

For more information and FAQs see the info folder.

make_logMFI.R

Only necessary for processing raw files downloaded from clue.io.

Converts raw .gctx and .txt files downloaded from clue.io to readable logMFI.csv. This file contains raw log2 median fluorescence intensity (MFI) data for each cell line at each treatment. Requires: - PR300_LMFI.gctx: readout for PR300 - PR500_LMFI.gctx: readout for PR500 - PR300_inst_info.txt: treatment info for PR300 - PR300_cell_info.txt: cell line info for PR300 - PR500_inst_info.txt: treatment info for PR500 - PR500_cell_info.txt: cell line info for PR500 - skipped_wells.csv: file indicating which wells did not receive compound (optional)

MTS_Data_Processing.R

Does pre-processing based on raw median fluorescent intensity (MFI) values and generates tables with data on QC, viabilities, and dose-response parameters, as well as figures showing dose response curves.

Steps of pre-processing outlined in MTS_pipeline.md.

MTS_Analysis.R

Generates biomarker analysis for the processed data, including, univariate and multivariate analyses. Requires a directory of expression data (RNA, mutations, etc.) and the results of MTS_Data_Processing.R. See below for more details on each analysis function. Relies on analysis_functions.R.

NOTE: in order to run biomarker analysis, files containing omics data for cell lines must be downloaded from DepMap. In particular we recommend the latest versions of: - Achilles_gene_effect.csv (CRISPR dependencies) - CCLE_expression.csv (gene expression) - primary-screen-replicate-collapsed-logfold-change.csv (repurposing) - D2_Achilles_gene_dep_scores.csv (shRNA) - CCLE_metabolomics_20190502.csv (metablomics) - CCLE_RPPA_20181003.csv (proteomics) - CCLE_miRNA_20181103.gct (miRNA) - CCLE_gene_cn.csv (copy number) - CCLE_mutations.csv (mutations) - sample_info.csv (lineages)

Once these files are downloaded, use biomarker_tables.R to convert tables to matrix form (this may require tweaking due to changes in file structures). This R script also generates two combined datasets: x-ccle.csv and x-all.csv. These are used for multivariate models and are based on CCLE data and all DepMap data respectively.

MTS_functions.R and analysis_functions.R

These files contain helper functions used in the scripts above and are sourced at the beginning of each (to install necessary packages and define functions). analysis_functions.R contains functions used in MTS_Analysis.R to generate biomarker analyses, while MTS_functions.R contains functions used in MTS_Data_Processing.R.

In analysis_functions.R, each function takes a matrix of features (X) and a vector of responses (y) as input. multivariate fits both an elastic net and random forest to the data.


This repository is maintained by Cancer Data Science at the Broad Institute of MIT and Harvard

Owner

  • Name: Broad Institute
  • Login: broadinstitute
  • Kind: organization
  • Location: Cambridge, MA

Broad Institute of MIT and Harvard

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 74
  • Total Committers: 1
  • Avg Commits per committer: 74.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
aboghossian a****s@b****g 74
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 5
  • Total pull requests: 0
  • Average time to close issues: about 1 month
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 1.2
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hendrikweisser (4)
  • tylerhill90 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels