https://github.com/broadinstitute/prism_data_processing

Data processing pipeline for PRISM MTS

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: science.org
✓
Committers with academic emails
1 of 1 committers (100.0%) from academic institutions
✓
Institutional organization owner
Organization broadinstitute has institutional domain (www.broadinstitute.org)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Data processing pipeline for PRISM MTS

Basic Info

Host: GitHub
Owner: broadinstitute
License: bsd-3-clause
Language: R
Default Branch: master
Size: 5.46 MB

Statistics

Stars: 8
Watchers: 21
Forks: 2
Open Issues: 0
Releases: 0

Created almost 7 years ago · Last pushed over 4 years ago

Metadata Files

Readme License

README.md

This repository is deprecated please see the new MTS pipeline

Prism Data Processing

Public version of the data processing pipeline for PRISM medium throughput screens (MTS). For use by collaborators to regenerate tables and plots correlating drug response to cell line features. Public cell line data for equivalent analysis is available on the DepMap Portal.

For more general biomarker analysis see the cdsr_biomarker package which is based on models in the cdsr_models package.

This repository contains 3 primary scripts:

make_logMFI.R
MTS_Data_Processing.R
MTS_Analysis.R

FIRST run setup.R either in RStudio or terminal to install packages required for analysis. For shell execute: bash cd ./prism_data_processing Rscript src/setup.R

For more information and FAQs see the info folder.

`make_logMFI.R`

Only necessary for processing raw files downloaded from clue.io.

Converts raw .gctx and .txt files downloaded from clue.io to readable logMFI.csv. This file contains raw log2 median fluorescence intensity (MFI) data for each cell line at each treatment. Requires: - PR300_LMFI.gctx: readout for PR300 - PR500_LMFI.gctx: readout for PR500 - PR300_inst_info.txt: treatment info for PR300 - PR300_cell_info.txt: cell line info for PR300 - PR500_inst_info.txt: treatment info for PR500 - PR500_cell_info.txt: cell line info for PR500 - skipped_wells.csv: file indicating which wells did not receive compound (optional)

`MTS_Data_Processing.R`

Does pre-processing based on raw median fluorescent intensity (MFI) values and generates tables with data on QC, viabilities, and dose-response parameters, as well as figures showing dose response curves.

Steps of pre-processing outlined in MTS_pipeline.md.

`MTS_Analysis.R`

Generates biomarker analysis for the processed data, including, univariate and multivariate analyses. Requires a directory of expression data (RNA, mutations, etc.) and the results of MTS_Data_Processing.R. See below for more details on each analysis function. Relies on analysis_functions.R.

NOTE: in order to run biomarker analysis, files containing omics data for cell lines must be downloaded from DepMap. In particular we recommend the latest versions of: - Achilles_gene_effect.csv (CRISPR dependencies) - CCLE_expression.csv (gene expression) - primary-screen-replicate-collapsed-logfold-change.csv (repurposing) - D2_Achilles_gene_dep_scores.csv (shRNA) - CCLE_metabolomics_20190502.csv (metablomics) - CCLE_RPPA_20181003.csv (proteomics) - CCLE_miRNA_20181103.gct (miRNA) - CCLE_gene_cn.csv (copy number) - CCLE_mutations.csv (mutations) - sample_info.csv (lineages)

Once these files are downloaded, use biomarker_tables.R to convert tables to matrix form (this may require tweaking due to changes in file structures). This R script also generates two combined datasets: x-ccle.csv and x-all.csv. These are used for multivariate models and are based on CCLE data and all DepMap data respectively.

`MTS_functions.R` and `analysis_functions.R`

These files contain helper functions used in the scripts above and are sourced at the beginning of each (to install necessary packages and define functions). analysis_functions.R contains functions used in MTS_Analysis.R to generate biomarker analyses, while MTS_functions.R contains functions used in MTS_Data_Processing.R.

In analysis_functions.R, each function takes a matrix of features (X) and a vector of responses (y) as input. multivariate fits both an elastic net and random forest to the data.

This repository is maintained by Cancer Data Science at the Broad Institute of MIT and Harvard

Owner

Name: Broad Institute
Login: broadinstitute
Kind: organization
Location: Cambridge, MA

Website: http://www.broadinstitute.org/
Twitter: broadinstitute
Repositories: 1,083
Profile: https://github.com/broadinstitute

Broad Institute of MIT and Harvard

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Committers

Last synced: 11 months ago

All Time

Total Commits: 74
Total Committers: 1
Avg Commits per committer: 74.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
aboghossian	a**s@b**g	74

Committer Domains (Top 20 + Academic)

broadinstitute.org: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 5
Total pull requests: 0
Average time to close issues: about 1 month
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 0
Average comments per issue: 1.2
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/broadinstitute/prism_data_processing

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Prism Data Processing

`make_logMFI.R`

`MTS_Data_Processing.R`

`MTS_Analysis.R`

`MTS_functions.R` and `analysis_functions.R`

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

https://github.com/broadinstitute/prism_data_processing

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Prism Data Processing

make_logMFI.R

MTS_Data_Processing.R

MTS_Analysis.R

MTS_functions.R and analysis_functions.R

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

`make_logMFI.R`

`MTS_Data_Processing.R`

`MTS_Analysis.R`

`MTS_functions.R` and `analysis_functions.R`