Recent Releases of cptac
cptac -
Fixed a file-mapping error that didn't allow for ucec transcriptomics data from the broad source to be downloaded
- Jupyter Notebook
Published by bm600 about 2 years ago
cptac - v.1.5.0rc2
The release candidate for cptac v.1.5
- Jupyter Notebook
Published by bm600 over 2 years ago
cptac - Release Candidate 1 for version 1.5
This release significantly changes the structure of the cptac package. In order to better handle the increasing quantity and complexity of data available, the package has been broken up into cancer classes that each have several different sources for their data. This means that the methods for accessing data have changed slightly. There shouldn't be any changes to the data itself, but more testing is required to ensure that this is the case. This release candidate will be used to find and fix any problems with the new structure to make this transition as smooth as possible.
The tutorials and use cases will be updated soon to reflect the changes for this release.
- Jupyter Notebook
Published by old-rob over 3 years ago
cptac - 1.1.0
Added a parameter to the cptac.pancan.download function so that users can manually enter a temporary Box access token if working on a remote machine without web browser access for Box sign in. The user generates the token on a local machine with a web browser, and then pastes it in. Instructions for this can be found in the pancan tutorial: https://github.com/PayneLab/cptac/blob/master/notebooks/tutorial07_pancan.ipynb
Bugfix for PDC download function.
Updated dependencies for pandas
- Jupyter Notebook
Published by old-rob about 4 years ago
cptac - Release Candidate for 1.0
Release candidate for cptac 1.0, which includes formal testing and CI
UcecConf small update (version 2.0.1): - Update follow-up information and histology types in the meta table, which is the only data table changed. This makes the distribution of histology type much clearer.
Other: - Documentation updates - Formal tests with much greater code coverage - Available for download with conda
- Jupyter Notebook
Published by old-rob over 4 years ago
cptac - UcecConf 2.0, updates and fixes
UcecConf - Added version 2.0, which has the following changes:
1. Recalculated WGS-based CNV using the same pipeline
2. Updated CNV-H identification by the new CNV (1 POLE is CNV-H)
3. Updated the ABSOLUTE tumor purity by the new CNV
4. Fixed a bug for RNAseq data processing
5. Updated ESTIMATE score derived from RNAseq data
6. Added category level somatic mutation
NOTE: UcecConf 2.0 has the following minor issues that need correction and will be updated:
C3L-05848 with tumor size -2. C3L-02747 and C3N-00155 with tumor size 0.
C3L-03143 with a height 2 cm and derived a BMI 287456.75
BRCA: Column "Tumor.Stage" renamed to "Stage" in brca clinical data
Pancan: -Function listsourcesdata() renamed to listdatasources() -Adjust argument order for get_deconvolution() to be consistent with other functions -Add readme files to help in choosing desired pipeline
Other: -Bugfix for cptac.utils function getproteininteractions_bioplex -Use case 7 notebook updated
- Jupyter Notebook
Published by old-rob over 4 years ago
cptac - Pancan, Use Cases, Ucecconf 1.2
Updates to Pancan Most use cases have been updated to work with the latest version of this package and any packages used with them The endometrial confirmatory data has been updated to 1.2
- Jupyter Notebook
Published by old-rob over 4 years ago
cptac - Bug fixes and updates
This release adds a ubiquitinomics getter function for access to that data, along with restoring the somatic mutation data to the brca dataset, which was unintentionally left out. The tutorials have also been updated to reflect the latest version of the package.
- Jupyter Notebook
Published by old-rob over 4 years ago
cptac - Fix Errors
Patch release for bug fixes:
-Ucecconf is working again -Brca should load correctly now -The mygene dependency was added
- Jupyter Notebook
Published by corbinday over 4 years ago
cptac - Fix Errors
Patch release for bugfixes
- Jupyter Notebook
Published by old-rob over 4 years ago
cptac - Updated LSCC, Endometrial Confirmatory, and BRCA data
Updated the package to use the most recent data for three types of cancer UcecConf is now 1.1 LSCC is now 3.3 BRCA is now 5.4
- Jupyter Notebook
Published by old-rob over 4 years ago
cptac - Bug fixes and updates
This release includes various code updates, and fixes a compatibility issue for Windows (#22).
- Jupyter Notebook
Published by caleb-lindgren almost 5 years ago
cptac - Functionality updates
Various updates to package functionality.
- Jupyter Notebook
Published by caleb-lindgren almost 5 years ago
cptac - Update dependencies
This patch release updates dependency requirements to avoid issues that were encountered with certain old versions of pandas and the new versions of xlrd.
- Jupyter Notebook
Published by caleb-lindgren about 5 years ago
cptac - GitHub pages, dependency and other updates
This release includes various small bugfixes and updates. xlrd has stopped supporting .xlsx files, so pandas now uses openpyxl; some small changes were made to accommodate this, and the dependencies in setup.py were updated. We also updated the cptac.list_datasets function to stream data availability info from Box, so it can be updated more easily. Various other small updates were made.
We also have now made the documentation available on a GitHub pages site, https://paynelab.github.io/cptac/
- Jupyter Notebook
Published by caleb-lindgren about 5 years ago
cptac - Make BRCA data publicly accessible
This release reflects that the BRCA data is now publicly accessible, and a password is no longer required to download the files.
- Jupyter Notebook
Published by caleb-lindgren over 5 years ago
cptac - Fix WikiPathways interface bug, updated LSCC data
This release fixes a bug in the WikiPathways data parser that had previously caused the various cptac.utils functions that access WikiPathways data to return incomplete results.
It also includes an update to the LSCC data to now use the log-ratio CNV file.
- Jupyter Notebook
Published by caleb-lindgren over 5 years ago
cptac - Minor bug fixes, utils supports Welch's t test
This release includes additional bug fixes, and provides the option of using Welch's t test instead of Student's t test in the cptac.utils.wrapttest function, by passing False to the `equalvar` parameter.
- Jupyter Notebook
Published by caleb-lindgren over 5 years ago
cptac - join_omics_to_mutations bug fix
This patch release fixes a bug in the joinomicstomutations function that arose due to a change in the way pandas works. The bug caused an exception to be thrown when the function was executed with the mutationsfilter parameter set to None. It is now fixed.
- Jupyter Notebook
Published by caleb-lindgren over 5 years ago
cptac - Updated data access, multi_join, and small changes
This patch release handles password restrictions applied to the BRCA and GBM datasets from an external source. Additionally, the following changes and enhancements were added:
- Added Dataset.multi_join fuction
- Added tissue_type parameter for metadata getters
- Added clinical column to show that all GBM tumors are stage IV (by definition)
Renamed/combined utils functions:
- Combined: getproteinpathways => getpathwayswithproteins, database="wikipathways" searchreactomepathwayswithproteins => getpathwayswithproteins, database="reactome" searchreactomeproteinsinpathways => getproteinsinpathways, database="reactome" getproteinsinpathway => getproteinsin_pathways, database="wikipathways"
- Renamed: listpathways => listpathways_wikipathways
- Jupyter Notebook
Published by caleb-lindgren over 5 years ago
cptac - Updated LSCC Data
Updated LSCC data, including circular_RNA and ubiquitinomics.
We also moved the DataSet.reduce_multiindex and DataSet.search functions to be part of the cptac.utils sub-module, instead of members of the DataSet class. You can now access those functions as cptac.utils.reduce_multiindex and cptac.utils.search, respectively. We also added an optional quiet parameter to reduce_multiindex.
We also renamed the abstract DataSet class to Dataset.
- Jupyter Notebook
Published by benkk331 over 5 years ago
cptac - Added "tissue_type" parameter to joins, along with new utils, updated lscc version 1.0 data
Join functions now include a "tissue_type" parameter to allow specification for "tumor","normal", or defaults to "both" types of cell tissue data to be returned.
Several utils have been added with this release, including: "wrappearsoncorr", which returns a dataframe with columns comparison, correlation coefficient, and p value from a given dataframe.
"getinteractingproteins_wikipathways", which takes a given protein and finds a list of all proteins that interact with it according to wikipathways.
"getproteinpathways", which takes a given protein and uses wikipathways to find the pathways that the protein is involved in.
"list_pathways", which returns all possible pathways from wikipathways.
"getproteinsin_pathway", which takes a pathway and lists all proteins involved in it.
Bug fixes for missing dataframe in lscc version 1.0 data.
- Jupyter Notebook
Published by benkk331 almost 6 years ago
cptac - Add "quiet", "how", and "sample_type" parameters
We added a "quiet" parameter as part of our join functions in order to allow users to quiet the warnings produced when filling dataframes with NaNs.
We also added a "how" parameter to allow users to specify their desired join type, whether inner, outer, left or right.
Also included is a functionality to specify which sample type users want to be returned in a dataframe, whether tumor, normal, or both.
- Jupyter Notebook
Published by benkk331 almost 6 years ago
cptac - LUAD followup data bugfix; followup data improvements
This release fixes a bug that dropped all of the LUAD followup data. It also changes the data loaders so that when the followup table includes samples that aren't anywhere else in the dataset (because they're from a different cohort), those samples are kept and not dropped like they were previously.
This release also switches the install to require pandas 1.0.0 or greater.
- Jupyter Notebook
Published by caleb-lindgren about 6 years ago
cptac - LSCC data, get_genotype_all_vars
This release adds the LSCC dataset (password protected). It also includes the DataSet.getgenotypeall_vars function, as well as minor formatting improvements.
- Jupyter Notebook
Published by caleb-lindgren about 6 years ago
cptac - Patient IDs instead of Sample IDs, followup data, and increased data availability
This release stops using generated Sample IDs (e.g. S001, S002, etc.) for indexing tables, and instead uses the original CPTAC Patient IDs (e.g. C3L-00078, etc.). Normal samples are indicated by a ".N" appended to the end of the Patient ID (e.g. "C3L-00078.N") in all datasets.
This release also includes updates to the GBM, LUAD, and HNSCC datasets. We also have added a new "followup" table in all datasets besides GBM, which contains followup data for the patients in the cohort.
Finally, this release includes increased access to datasets. The BRCA dataset now has no access restrictions. The GBM and LUAD datasets are both under publication embargo, but no longer require passwords for access.
- Jupyter Notebook
Published by caleb-lindgren about 6 years ago
cptac - GBM and CCRCC data updates
This release contains the newest GBM data freeze (2.1), and adds the sample immune classification to the CCRCC clinical table. We also fixed a small bug in utils.getfrequentlymutated that was preventing it from working properly with the HNSCC dataset.
- Jupyter Notebook
Published by caleb-lindgren over 6 years ago
cptac - Colon and endometrial data bug fixes
This release contains some small bug fixes. We removed duplicate rows from the colon cancer dataset's somaticmutation dataframe, and renamed a column in the endometrial cancer's derivedmolecular dataframe that was causing a column name overlap when joining the derived molecular dataframe to mutations for the JAK1 gene.
- Jupyter Notebook
Published by caleb-lindgren over 6 years ago
cptac - Prevent new version loading errors, and fix HNSCC join bugs
This release adds a feature that, in the future, will prevent an old version of the package from loading data versions that were released after that version of the package was released, and it fixes indexing errors in joins with the HNSCC dataset.
- Jupyter Notebook
Published by caleb-lindgren over 6 years ago
cptac - GBM backwards compatibility
This release adds some minor edits so the package can still load the old GBM data freeze (1.0), in addition to the new data freeze (2.0).
- Jupyter Notebook
Published by caleb-lindgren over 6 years ago
cptac - GBM data freeze 2.0
This update includes data freeze 2.0 of the glioblastoma (GBM) data, as well as other minor data cleanups in the other datasets, including dropping 28 samples from the CCRCC methylation and CNV tables that had been later excluded from analysis, and dropping 9 duplicate columns in the CCRCC transcriptomics table that were filled with zeros.
- Jupyter Notebook
Published by caleb-lindgren over 6 years ago
cptac - GBM, HNSCC, LUAD, and BRCA data. Multi-level indexing
- Added Gbm, Hnscc, Brca, and Luad datasets
- Where needed, gave dataframes multi-level column indices to remove duplicates and provide additional identifiers. Edited join functions to handle multi-level indices, and created the reduce_muiltiindex function for simplifying multi-level indices.
- Package checks its own version and warns if it's out of date
- Renamed RenalCcrcc to Ccrcc
- Optimized join functions
- Changed join functions to use full outer joins, and warn you if they fill any missing values with NaNs
- Errors and warnings generated by cptac are now sent to stderr, instead of being printed to stdout
- Renamed the algorithms submodule to utils
- getmutations renamed to getsomaticmutation, and getmutationsbinary renamed to getsomaticmutationbinary
- Jupyter Notebook
Published by hboekweg over 6 years ago
cptac - Updated dependency requirements
The package dependency requirements are now more specific, to make sure it can run properly.
- Jupyter Notebook
Published by caleb-lindgren over 6 years ago
cptac - Version 0.5 bug fixes
Fixed minor bugs in algorithm for filtering multiple mutations when merging omics and metadata dataframes with mutations dataframe. Update getfreqmut function in cptac.algorithms to handle silent mutations in the renal dataset.
- Jupyter Notebook
Published by hboekweg over 6 years ago
cptac - Algorithms and renal CCRCC data
In this release, we added the cptac.algorithms sub-module, and the new renal CCRCC dataset. We also replaced the "sync" function with an improved "download" function, and eliminated unnecessary output during dataset download and loading. Finally, we made some minor changes to the API: - Dataframes no longer have a .name attribute. Use descriptive variable names instead. - Standardized all CNA/CNV names to CNV. This fixed a bug in accessing the Ovarian copy number dataframe. - Renamed merge/join functions: - compareomics............................joinomicstoomics - appendmutationstoomics.........joinomicstomutations - appendmetadatatoomics..........joinmetadatatoomics - New join functions: - joinmetadatatomutations - joinmetadatatometadata - In joinomicstomutations, change "mutationgenes" parameter to "mutationsgenes" (plural "mutations") - In joinomicstoomics, change "omicsdf1name" and "omicsdf2name" to "df1name" and "df2name"
For details on how these changes may affect usage, consult the use cases under the "docs" folder.
- Jupyter Notebook
Published by caleb-lindgren over 6 years ago
cptac - Remote Storage for Colon, Endometrial, and Ovarian Data
This release implements remote data storage, and includes the Colon and Ovarian datasets, as well as the Endometrial dataset. The package now just includes the source code, and the data files for each dataset are downloaded later from another location the first time the user loads each dataset. When a user loads a dataset, the package will also check that any data files already downloaded are up-to-date with the most current data files on the remote storage. If there are any discrepancies, the package gives the user the option of updating the files.
- Jupyter Notebook
Published by caleb-lindgren almost 7 years ago
cptac - Endometrial data with DataSet class abstraction
This release includes the full endometrial dataset, with all the utilities functions for working with it. We'll include the ovarian and colon cancer datasets in a future release, once we've set up remote data storage.
We have changed the dataset to be a class that the user instantiates, rather than a submodule that they load. As a result, the syntax for loading the dataset has changed slightly. Instead of running "import cptac.endometrial as en", the user would run two commands: first "import cptac", then "en = cptac.Endometrial()". However, manipulating the dataset thereafter uses the same syntax as before, working with the variable the dataset was assigned to, e.g. "clinical = en.get_clinical" and so on.
We have changed the syntax for the three merging functions: compareomics, appendmetadatatoomics, and appendmutationstoomics. Instead of separately loading the dataframes you want to merge, and then passing them to the function, you just pass a string to the function containing the name of the dataframe you want to merge, e.g. "appended = en.appendmetadatatoomics(metadatadfname="derivedmolecular", omicsdfname="phosphoproteomics")". Note that the parameter names now have "name" added to the end. These functions no longer accept dataframes; you must pass dataframe names instead.
- Jupyter Notebook
Published by caleb-lindgren almost 7 years ago