https://github.com/broadinstitute/prism_data_processing
Data processing pipeline for PRISM MTS
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: science.org -
✓Committers with academic emails
1 of 1 committers (100.0%) from academic institutions -
✓Institutional organization owner
Organization broadinstitute has institutional domain (www.broadinstitute.org) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Repository
Data processing pipeline for PRISM MTS
Basic Info
- Host: GitHub
- Owner: broadinstitute
- License: bsd-3-clause
- Language: R
- Default Branch: master
- Size: 5.46 MB
Statistics
- Stars: 8
- Watchers: 21
- Forks: 2
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
This repository is deprecated please see the new MTS pipeline
Prism Data Processing
Public version of the data processing pipeline for PRISM medium throughput screens (MTS). For use by collaborators to regenerate tables and plots correlating drug response to cell line features. Public cell line data for equivalent analysis is available on the DepMap Portal.
For more general biomarker analysis see the cdsr_biomarker package which is based on models in the cdsr_models package.
This repository contains 3 primary scripts:
make_logMFI.RMTS_Data_Processing.RMTS_Analysis.R
FIRST run setup.R either in RStudio or terminal to install packages required for analysis. For shell execute:
bash
cd ./prism_data_processing
Rscript src/setup.R
For more information and FAQs see the info folder.
make_logMFI.R
Only necessary for processing raw files downloaded from clue.io.
Converts raw .gctx and .txt files downloaded from clue.io to readable logMFI.csv. This file contains raw log2 median fluorescence intensity (MFI) data for each cell line at each treatment. Requires:
- PR300_LMFI.gctx: readout for PR300
- PR500_LMFI.gctx: readout for PR500
- PR300_inst_info.txt: treatment info for PR300
- PR300_cell_info.txt: cell line info for PR300
- PR500_inst_info.txt: treatment info for PR500
- PR500_cell_info.txt: cell line info for PR500
- skipped_wells.csv: file indicating which wells did not receive compound (optional)
MTS_Data_Processing.R
Does pre-processing based on raw median fluorescent intensity (MFI) values and generates tables with data on QC, viabilities, and dose-response parameters, as well as figures showing dose response curves.
Steps of pre-processing outlined in MTS_pipeline.md.
MTS_Analysis.R
Generates biomarker analysis for the processed data, including, univariate and multivariate analyses. Requires a directory of expression data (RNA, mutations, etc.) and the results of MTS_Data_Processing.R. See below for more details on each analysis function. Relies on analysis_functions.R.
NOTE: in order to run biomarker analysis, files containing omics data for cell lines must be downloaded from DepMap. In particular we recommend the latest versions of:
- Achilles_gene_effect.csv (CRISPR dependencies)
- CCLE_expression.csv (gene expression)
- primary-screen-replicate-collapsed-logfold-change.csv (repurposing)
- D2_Achilles_gene_dep_scores.csv (shRNA)
- CCLE_metabolomics_20190502.csv (metablomics)
- CCLE_RPPA_20181003.csv (proteomics)
- CCLE_miRNA_20181103.gct (miRNA)
- CCLE_gene_cn.csv (copy number)
- CCLE_mutations.csv (mutations)
- sample_info.csv (lineages)
Once these files are downloaded, use biomarker_tables.R to convert tables to matrix form (this may require tweaking due to changes in file structures). This R script also generates two combined datasets: x-ccle.csv and x-all.csv. These are used for multivariate models and are based on CCLE data and all DepMap data respectively.
MTS_functions.R and analysis_functions.R
These files contain helper functions used in the scripts above and are sourced at the beginning of each (to install necessary packages and define functions). analysis_functions.R contains functions used in MTS_Analysis.R to generate biomarker analyses, while MTS_functions.R contains functions used in MTS_Data_Processing.R.
In analysis_functions.R, each function takes a matrix of features (X) and a vector of responses (y) as input. multivariate fits both an elastic net and random forest to the data.
This repository is maintained by Cancer Data Science at the Broad Institute of MIT and Harvard
Owner
- Name: Broad Institute
- Login: broadinstitute
- Kind: organization
- Location: Cambridge, MA
- Website: http://www.broadinstitute.org/
- Twitter: broadinstitute
- Repositories: 1,083
- Profile: https://github.com/broadinstitute
Broad Institute of MIT and Harvard
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| aboghossian | a****s@b****g | 74 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 5
- Total pull requests: 0
- Average time to close issues: about 1 month
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 1.2
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- hendrikweisser (4)
- tylerhill90 (1)