Science Score: 62.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
8 of 38 committers (21.1%) from academic institutions -
✓Institutional organization owner
Organization paynelab has institutional domain (payne.byu.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary
Repository
Python packaging for CPTAC data
Basic Info
- Host: GitHub
- Owner: PayneLab
- License: other
- Language: Jupyter Notebook
- Default Branch: master
- Size: 562 MB
Statistics
- Stars: 96
- Watchers: 7
- Forks: 27
- Open Issues: 13
- Releases: 53
Metadata Files
README.md
NOTE With current release
We are having difficulty with the API on Zenodo and are working to find a better host for our data. Please be patient while we fix these issues.
Easy access to CPTAC data
This software provides easy access to cancer data from the National Cancer Institute's CPTAC program, which characterizes and studies the proteogenomic landscape of tumors. We implement the software as a Python package called cptac, but you can seamlessly use it in an R environment with the help of the reticulate package (demonstrated in Tutorial 6). Our package is installed in one step with pip:
pip install cptac
See the Installation section below if you have further questions.
The package gives you the data as pandas DataFrame objects in Python. If you are using R, reticulate converts the tables to data.frame objects. By providing the tables natively in your programming environment, we eliminate the need for parsing and formatting, allowing you to quickly feed the data into whatever analysis code you have written. Follow our walkthrough tutorials and use cases for examples of how to use the software.
Additionally, the software automatically handles data downloading, storage, and updates. You need only to tell it which datasets you want downloaded, and it will automatically get the data without requiring you to write any HTTP requests or database queries.
Installation
This package is intended to run on Python 3.6 or greater. If you plan on interfacing with it from R via reticulate, you must still have Python installed on your computer, and download the package into that Python environment.
Installing Python
If you do not already have Python installed on your computer, we suggest using either the standard Python distribution or the Anaconda distribution. Follow the installation instructions at the respective links. The Anaconda distribution allows you to set up multiple distinct Python environments and comes with many useful Python packages pre-installed. For more information, see the Ananconda documentation.
Installing the cptac package
We distribute the package through the Python Package Index (PyPI), so regardless of which Python distribution you are using, you install the package using the pip program:
pip install cptac
If you are using the Anaconda distribution of Python, this will install cptac to the currently active environment as long as pip is available in that environment, which it would be by default. If pip is not installed in your environment, you can install it with conda install -c anaconda pip. Then, you can use pip to install the cptac package. We plan on making cptac directly available through conda in the near future.
The package depends on several other Python libraries including numpy, pandas, requests, and others. Normally, pip will automatically handle these dependencies when it installs cptac and you don't have to worry about any of it. However, if you have a special use case or are interested in exactly which versions of which packages are needed, you can consult the install_requires list in the setup.py file.
Documentation
Our goal is that our documentation will make this software and data accessible both to people without a computer science background, and people without a biology background. We provide two types of documentation to accomplish this: tutorials and use cases. The tutorials give a basic introduction to the software as well as conventions for storing and accessing the data. The use cases are short examples focused on a biological question and show practical uses of the software and data for biological discovery. Each use case works with a different combination of data types and explores meaningful cancer research hypotheses.
You can access the tutorials and use cases as static webpages using the links below. They were originally written in Python as interactive Jupyter notebooks, so if you want to run them interactively with Jupyter you can download the notebooks from the notebooks folder on the GitHub repository. If you are unfamiliar with Jupyter, follow the installation and usage instructions given here on the Jupyter website. You will then be able to run our tutorials as interactive, exploratory data analyses. If you want to run them interactively without installing anything, please visit our Binder site which hosts the notebooks here.
Tutorials
- Tutorial 1: CPTAC data introduction
- Tutorial 2: Using pandas to work with cptac dataframes
- Tutorial 3: Joining dataframes with cptac
- Tutorial 4: Understanding multi-indexes
- Tutorial 5: How to keep up to date with new package and data releases
- Tutorial 6: Easy integration with R
Use cases
- Use Case 1: Comparing transcriptomics and proteomics
- Use Case 2: Correlation between clinical attributes
- Use Case 3: Associating clinical variables with omics data
- Use Case 4: How Do Mutations Affect Protein Abundance?
- Use Case 5: Gene Set Enrichment Analysis
- Use Case 6: Comparing Derived Molecular Data with Proteomics
- Use Case 7: Trans Genetics Effects
- Use Case 8: Outliers
- Use Case 9: Clinical Outcomes
- Use Case 10: Pathway diagram overlay
Developer documentation
Documentation for anyone wanting to understand the internal workings of the package is available on the GitHub repository in the devdocs folder.
License
See the LICENSE.md document on the GitHub repository. Please note the difference between the license as it applies to code versus data.
Contact
This package is maintained by the Payne lab at Brigham Young University.
Owner
- Name: Payne Lab, Biology Department, BYU
- Login: PayneLab
- Kind: organization
- Website: payne.byu.edu
- Repositories: 25
- Profile: https://github.com/PayneLab
Citation (CITATION.cff)
# YAML 1.2
---
authors:
-
family-names: Lindgren
given-names: "Caleb M."
orcid: "https://orcid.org/0000-0001-6484-9757"
-
family-names: Adams
given-names: "David W."
-
family-names: Kimball
given-names: Benjamin
-
family-names: Boekweg
given-names: Hannah
-
family-names: Tayler
given-names: Sadie
-
family-names: Pugh
given-names: "Samuel L."
-
family-names: Payne
given-names: "Samuel H."
cff-version: "1.1.0"
date-released: 2021-02-09
doi: "10.1021/acs.jproteome.0c00919"
keywords:
- Cancer
- Genetics
- Genomics
- Proteomics
- Software
message: "If you use this software, please cite it using these metadata."
title: "Simplified and Unified Access to Cancer Proteogenomic Data"
version: "1.5.14"
...
GitHub Events
Total
- Issues event: 6
- Watch event: 10
- Issue comment event: 5
- Fork event: 5
Last Year
- Issues event: 6
- Watch event: 10
- Issue comment event: 5
- Fork event: 5
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 1,900
- Total Committers: 38
- Avg Commits per committer: 50.0
- Development Distribution Score (DDS): 0.633
Top Committers
| Name | Commits | |
|---|---|---|
| Caleb Lindgren | c****n@g****m | 697 |
| Robert Oldroyd | o****t@g****m | 225 |
| David | d****2@g****m | 219 |
| corbinday | r****r@g****m | 209 |
| benkk331 | k****1@g****m | 84 |
| blhmc1 | 4****1@u****m | 82 |
| Lindsey Olsen | l****5@g****m | 80 |
| seanjib | s****b@g****m | 43 |
| Sam Payne | s****e@g****m | 40 |
| blhmc1 | b****2@g****m | 36 |
| hboekweg | h****g@g****m | 20 |
| Jose H. Giraldez | j****d@u****m | 20 |
| unknown | s****r@g****m | 20 |
| polarisXD | 3****D@u****m | 19 |
| Samuel Pugh | s****4@g****m | 19 |
| Robert Oldroyd | 4****b@u****m | 18 |
| Drew Bonnett | a****t@g****m | 9 |
| thomashmolina | t****a@g****m | 8 |
| caleb-lindgren | 4****n@u****m | 6 |
| JonJarman | j****n@g****m | 6 |
| sadietayler | 3****r@u****m | 6 |
| sdsquire | s****s@g****m | 5 |
| cbminor | 4****r@u****m | 5 |
| Chelsie Minor | c****r@g****m | 3 |
| corbinday | c****y@C****l | 3 |
| Teancum Paquette | t****h@g****m | 3 |
| Sam Squires | s****s@p****n | 3 |
| Robert Oldroyd | r****d@M****u | 2 |
| HannahBoekweg | h****g@D****u | 1 |
| HannahBoekweg | h****g@S****u | 1 |
| and 8 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 69
- Total pull requests: 6
- Average time to close issues: 10 days
- Average time to close pull requests: 1 day
- Total issue authors: 41
- Total pull request authors: 5
- Average comments per issue: 2.41
- Average comments per pull request: 0.33
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 8
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 8
- Pull request authors: 0
- Average comments per issue: 0.63
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- zihaoxingstudy1 (13)
- sgosline (7)
- tobsecret (2)
- sky1ove (2)
- seunghun23 (2)
- Liz-m57 (2)
- Sunmile (2)
- sooheon (2)
- SAlkh90-temp (2)
- martingarridorc (2)
- CCranney (2)
- sebastianffx (2)
- DennisGankin (2)
- nieyage (1)
- smdb21 (1)
Pull Request Authors
- seanjib (2)
- awilliamson518 (1)
- jhgirald (1)
- dependabot[bot] (1)
- hsiaoyi0504 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 1,112 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 60
- Total maintainers: 2
pypi.org: cptac
Python packaging for CPTAC data
- Homepage: http://github.com/PayneLab/cptac
- Documentation: https://paynelab.github.io/cptac/
- License: Apache 2.0
-
Latest release: 1.5.14
published over 1 year ago
Rankings
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite