Releases | Open Source Science

Addressed the Zenodo API update that changed the key name from filename to key in the data repository response. This ensures smooth data fetching and prevents crashes due to key mismatches.

- Jupyter Notebook
Published by sabmel over 2 years ago

cptac - v1.5.5

Changes were made in the download functions to be able to comply with the new update of Zenodo

- Jupyter Notebook
Published by bm600 over 2 years ago

cptac - v1.5.4

Fixed Acetylproteomics dataframes

- Jupyter Notebook
Published by bm600 over 2 years ago

cptac - v1.5.3

All available normal-tissue samples for the Baylor (bcm) source are now in the package.

- Jupyter Notebook
Published by bm600 over 2 years ago

cptac - 1.5.2

- Jupyter Notebook
Published by bm600 over 2 years ago

cptac - 1.5.1

Baylor data included

- Jupyter Notebook
Published by sabmel over 2 years ago

cptac - CPTAC

- Jupyter Notebook
Published by sabmel over 2 years ago

cptac - v.1.5.0rc2

The release candidate for cptac v.1.5

- Jupyter Notebook
Published by bm600 over 2 years ago

cptac - Release Candidate 1 for version 1.5

This release significantly changes the structure of the cptac package. In order to better handle the increasing quantity and complexity of data available, the package has been broken up into cancer classes that each have several different sources for their data. This means that the methods for accessing data have changed slightly. There shouldn't be any changes to the data itself, but more testing is required to ensure that this is the case. This release candidate will be used to find and fix any problems with the new structure to make this transition as smooth as possible.

The tutorials and use cases will be updated soon to reflect the changes for this release.

- Jupyter Notebook
Published by old-rob over 3 years ago

cptac - 1.1.2

Publicize pdac data and address pandas warnings

- Jupyter Notebook
Published by old-rob almost 4 years ago

cptac - 1.1.1

Added GBM confirmatory data freeze 2.0

- Jupyter Notebook
Published by old-rob almost 4 years ago

cptac - 1.1.0

Added a parameter to the cptac.pancan.download function so that users can manually enter a temporary Box access token if working on a remote machine without web browser access for Box sign in. The user generates the token on a local machine with a web browser, and then pastes it in. Instructions for this can be found in the pancan tutorial: https://github.com/PayneLab/cptac/blob/master/notebooks/tutorial07_pancan.ipynb

Bugfix for PDC download function.

Updated dependencies for pandas

- Jupyter Notebook
Published by old-rob about 4 years ago

cptac - 1.0.0

1.0.0 normal release

Installation should be available on conda through the bioconda channel: conda install -c bioconda cptac

Additionally, version checks are now run asynchronously to reduce import lag.

- Jupyter Notebook
Published by old-rob about 4 years ago

cptac - Release Candidate for 1.0

Release candidate for cptac 1.0, which includes formal testing and CI

UcecConf small update (version 2.0.1): - Update follow-up information and histology types in the meta table, which is the only data table changed. This makes the distribution of histology type much clearer.

Other: - Documentation updates - Formal tests with much greater code coverage - Available for download with conda

- Jupyter Notebook
Published by old-rob over 4 years ago

cptac - UcecConf 2.0, updates and fixes

UcecConf - Added version 2.0, which has the following changes: 1. Recalculated WGS-based CNV using the same pipeline 2. Updated CNV-H identification by the new CNV (1 POLE is CNV-H)
3. Updated the ABSOLUTE tumor purity by the new CNV 4. Fixed a bug for RNAseq data processing 5. Updated ESTIMATE score derived from RNAseq data 6. Added category level somatic mutation NOTE: UcecConf 2.0 has the following minor issues that need correction and will be updated: C3L-05848 with tumor size -2. C3L-02747 and C3N-00155 with tumor size 0. C3L-03143 with a height 2 cm and derived a BMI 287456.75

BRCA: Column "Tumor.Stage" renamed to "Stage" in brca clinical data

Pancan: -Function listsourcesdata() renamed to listdatasources() -Adjust argument order for get_deconvolution() to be consistent with other functions -Add readme files to help in choosing desired pipeline

Other: -Bugfix for cptac.utils function getproteininteractions_bioplex -Use case 7 notebook updated

- Jupyter Notebook
Published by old-rob over 4 years ago

cptac - Pancan, Use Cases, Ucecconf 1.2

Updates to Pancan Most use cases have been updated to work with the latest version of this package and any packages used with them The endometrial confirmatory data has been updated to 1.2

- Jupyter Notebook
Published by old-rob over 4 years ago

cptac - Bug fixes and updates

This release adds a ubiquitinomics getter function for access to that data, along with restoring the somatic mutation data to the brca dataset, which was unintentionally left out. The tutorials have also been updated to reflect the latest version of the package.

- Jupyter Notebook
Published by old-rob over 4 years ago

cptac - Fix Errors

Patch release for bug fixes:

-Ucecconf is working again -Brca should load correctly now -The mygene dependency was added

- Jupyter Notebook
Published by corbinday over 4 years ago

cptac - Fix Errors

Patch release for bugfixes

- Jupyter Notebook
Published by old-rob over 4 years ago

cptac - Updated LSCC, Endometrial Confirmatory, and BRCA data

Updated the package to use the most recent data for three types of cancer UcecConf is now 1.1 LSCC is now 3.3 BRCA is now 5.4

- Jupyter Notebook
Published by old-rob over 4 years ago

cptac - Bug fixes and updates

This release includes various code updates, and fixes a compatibility issue for Windows (#22).

- Jupyter Notebook
Published by caleb-lindgren almost 5 years ago

cptac - Functionality updates

Various updates to package functionality.

- Jupyter Notebook
Published by caleb-lindgren almost 5 years ago

cptac - Update dependencies

This patch release updates dependency requirements to avoid issues that were encountered with certain old versions of pandas and the new versions of xlrd.

- Jupyter Notebook
Published by caleb-lindgren about 5 years ago

cptac - GitHub pages, dependency and other updates

This release includes various small bugfixes and updates. xlrd has stopped supporting .xlsx files, so pandas now uses openpyxl; some small changes were made to accommodate this, and the dependencies in setup.py were updated. We also updated the cptac.list_datasets function to stream data availability info from Box, so it can be updated more easily. Various other small updates were made.

We also have now made the documentation available on a GitHub pages site, https://paynelab.github.io/cptac/

- Jupyter Notebook
Published by caleb-lindgren about 5 years ago

cptac - Make BRCA data publicly accessible

This release reflects that the BRCA data is now publicly accessible, and a password is no longer required to download the files.

- Jupyter Notebook
Published by caleb-lindgren over 5 years ago

cptac - Fix WikiPathways interface bug, updated LSCC data

This release fixes a bug in the WikiPathways data parser that had previously caused the various cptac.utils functions that access WikiPathways data to return incomplete results.

It also includes an update to the LSCC data to now use the log-ratio CNV file.

- Jupyter Notebook
Published by caleb-lindgren over 5 years ago

cptac - Minor bug fixes, utils supports Welch's t test

This release includes additional bug fixes, and provides the option of using Welch's t test instead of Student's t test in the cptac.utils.wrapttest function, by passing False to the `equalvar` parameter.

- Jupyter Notebook
Published by caleb-lindgren over 5 years ago

cptac - join_omics_to_mutations bug fix

This patch release fixes a bug in the joinomicstomutations function that arose due to a change in the way pandas works. The bug caused an exception to be thrown when the function was executed with the mutationsfilter parameter set to None. It is now fixed.

- Jupyter Notebook
Published by caleb-lindgren over 5 years ago

cptac - Updated data access, multi_join, and small changes

This patch release handles password restrictions applied to the BRCA and GBM datasets from an external source. Additionally, the following changes and enhancements were added:

Added Dataset.multi_join fuction
Added tissue_type parameter for metadata getters
Added clinical column to show that all GBM tumors are stage IV (by definition)
Renamed/combined utils functions:
- Combined: getproteinpathways => getpathwayswithproteins, database="wikipathways" searchreactomepathwayswithproteins => getpathwayswithproteins, database="reactome" searchreactomeproteinsinpathways => getproteinsinpathways, database="reactome" getproteinsinpathway => getproteinsin_pathways, database="wikipathways"
- Renamed: listpathways => listpathways_wikipathways

- Jupyter Notebook
Published by caleb-lindgren over 5 years ago

cptac - Updated LSCC Data

Updated LSCC data, including circular_RNA and ubiquitinomics.

We also moved the DataSet.reduce_multiindex and DataSet.search functions to be part of the cptac.utils sub-module, instead of members of the DataSet class. You can now access those functions as cptac.utils.reduce_multiindex and cptac.utils.search, respectively. We also added an optional quiet parameter to reduce_multiindex.

We also renamed the abstract DataSet class to Dataset.

- Jupyter Notebook
Published by benkk331 over 5 years ago

cptac - Added "tissue_type" parameter to joins, along with new utils, updated lscc version 1.0 data

Join functions now include a "tissue_type" parameter to allow specification for "tumor","normal", or defaults to "both" types of cell tissue data to be returned.

Several utils have been added with this release, including: "wrappearsoncorr", which returns a dataframe with columns comparison, correlation coefficient, and p value from a given dataframe.

"getinteractingproteins_wikipathways", which takes a given protein and finds a list of all proteins that interact with it according to wikipathways.

"getproteinpathways", which takes a given protein and uses wikipathways to find the pathways that the protein is involved in.

"list_pathways", which returns all possible pathways from wikipathways.

"getproteinsin_pathway", which takes a pathway and lists all proteins involved in it.

Bug fixes for missing dataframe in lscc version 1.0 data.

- Jupyter Notebook
Published by benkk331 almost 6 years ago

cptac - Add "quiet", "how", and "sample_type" parameters

We added a "quiet" parameter as part of our join functions in order to allow users to quiet the warnings produced when filling dataframes with NaNs.

We also added a "how" parameter to allow users to specify their desired join type, whether inner, outer, left or right.

Also included is a functionality to specify which sample type users want to be returned in a dataframe, whether tumor, normal, or both.

- Jupyter Notebook
Published by benkk331 almost 6 years ago

cptac - LUAD followup data bugfix; followup data improvements

This release fixes a bug that dropped all of the LUAD followup data. It also changes the data loaders so that when the followup table includes samples that aren't anywhere else in the dataset (because they're from a different cohort), those samples are kept and not dropped like they were previously.

This release also switches the install to require pandas 1.0.0 or greater.

- Jupyter Notebook
Published by caleb-lindgren about 6 years ago

cptac - LSCC data, get_genotype_all_vars

This release adds the LSCC dataset (password protected). It also includes the DataSet.getgenotypeall_vars function, as well as minor formatting improvements.

- Jupyter Notebook
Published by caleb-lindgren about 6 years ago

cptac - Patient IDs instead of Sample IDs, followup data, and increased data availability

This release stops using generated Sample IDs (e.g. S001, S002, etc.) for indexing tables, and instead uses the original CPTAC Patient IDs (e.g. C3L-00078, etc.). Normal samples are indicated by a ".N" appended to the end of the Patient ID (e.g. "C3L-00078.N") in all datasets.

This release also includes updates to the GBM, LUAD, and HNSCC datasets. We also have added a new "followup" table in all datasets besides GBM, which contains followup data for the patients in the cohort.

Finally, this release includes increased access to datasets. The BRCA dataset now has no access restrictions. The GBM and LUAD datasets are both under publication embargo, but no longer require passwords for access.

- Jupyter Notebook
Published by caleb-lindgren about 6 years ago

cptac - GBM and CCRCC data updates

This release contains the newest GBM data freeze (2.1), and adds the sample immune classification to the CCRCC clinical table. We also fixed a small bug in utils.getfrequentlymutated that was preventing it from working properly with the HNSCC dataset.

- Jupyter Notebook
Published by caleb-lindgren over 6 years ago

cptac - Colon and endometrial data bug fixes

This release contains some small bug fixes. We removed duplicate rows from the colon cancer dataset's somaticmutation dataframe, and renamed a column in the endometrial cancer's derivedmolecular dataframe that was causing a column name overlap when joining the derived molecular dataframe to mutations for the JAK1 gene.

- Jupyter Notebook
Published by caleb-lindgren over 6 years ago

cptac - Prevent new version loading errors, and fix HNSCC join bugs

This release adds a feature that, in the future, will prevent an old version of the package from loading data versions that were released after that version of the package was released, and it fixes indexing errors in joins with the HNSCC dataset.

- Jupyter Notebook
Published by caleb-lindgren over 6 years ago

cptac - GBM backwards compatibility

This release adds some minor edits so the package can still load the old GBM data freeze (1.0), in addition to the new data freeze (2.0).

- Jupyter Notebook
Published by caleb-lindgren over 6 years ago

cptac - GBM data freeze 2.0

This update includes data freeze 2.0 of the glioblastoma (GBM) data, as well as other minor data cleanups in the other datasets, including dropping 28 samples from the CCRCC methylation and CNV tables that had been later excluded from analysis, and dropping 9 duplicate columns in the CCRCC transcriptomics table that were filled with zeros.

- Jupyter Notebook
Published by caleb-lindgren over 6 years ago

cptac - GBM, HNSCC, LUAD, and BRCA data. Multi-level indexing

Added Gbm, Hnscc, Brca, and Luad datasets
Where needed, gave dataframes multi-level column indices to remove duplicates and provide additional identifiers. Edited join functions to handle multi-level indices, and created the reduce_muiltiindex function for simplifying multi-level indices.
Package checks its own version and warns if it's out of date
Renamed RenalCcrcc to Ccrcc
Optimized join functions
Changed join functions to use full outer joins, and warn you if they fill any missing values with NaNs
Errors and warnings generated by cptac are now sent to stderr, instead of being printed to stdout
Renamed the algorithms submodule to utils
getmutations renamed to getsomaticmutation, and getmutationsbinary renamed to getsomaticmutationbinary

- Jupyter Notebook
Published by hboekweg over 6 years ago

cptac - Updated dependency requirements

The package dependency requirements are now more specific, to make sure it can run properly.

- Jupyter Notebook
Published by caleb-lindgren over 6 years ago

cptac - Version 0.5 bug fixes

Fixed minor bugs in algorithm for filtering multiple mutations when merging omics and metadata dataframes with mutations dataframe. Update getfreqmut function in cptac.algorithms to handle silent mutations in the renal dataset.

- Jupyter Notebook
Published by hboekweg over 6 years ago

cptac - Algorithms and renal CCRCC data

In this release, we added the cptac.algorithms sub-module, and the new renal CCRCC dataset. We also replaced the "sync" function with an improved "download" function, and eliminated unnecessary output during dataset download and loading. Finally, we made some minor changes to the API: - Dataframes no longer have a .name attribute. Use descriptive variable names instead. - Standardized all CNA/CNV names to CNV. This fixed a bug in accessing the Ovarian copy number dataframe. - Renamed merge/join functions: - compareomics............................joinomicstoomics - appendmutationstoomics.........joinomicstomutations - appendmetadatatoomics..........joinmetadatatoomics - New join functions: - joinmetadatatomutations - joinmetadatatometadata - In joinomicstomutations, change "mutationgenes" parameter to "mutationsgenes" (plural "mutations") - In joinomicstoomics, change "omicsdf1name" and "omicsdf2name" to "df1name" and "df2name"

For details on how these changes may affect usage, consult the use cases under the "docs" folder.

- Jupyter Notebook
Published by caleb-lindgren over 6 years ago

cptac - Remote Storage for Colon, Endometrial, and Ovarian Data

This release implements remote data storage, and includes the Colon and Ovarian datasets, as well as the Endometrial dataset. The package now just includes the source code, and the data files for each dataset are downloaded later from another location the first time the user loads each dataset. When a user loads a dataset, the package will also check that any data files already downloaded are up-to-date with the most current data files on the remote storage. If there are any discrepancies, the package gives the user the option of updating the files.

- Jupyter Notebook
Published by caleb-lindgren almost 7 years ago

cptac - Endometrial data with DataSet class abstraction

This release includes the full endometrial dataset, with all the utilities functions for working with it. We'll include the ovarian and colon cancer datasets in a future release, once we've set up remote data storage.

We have changed the dataset to be a class that the user instantiates, rather than a submodule that they load. As a result, the syntax for loading the dataset has changed slightly. Instead of running "import cptac.endometrial as en", the user would run two commands: first "import cptac", then "en = cptac.Endometrial()". However, manipulating the dataset thereafter uses the same syntax as before, working with the variable the dataset was assigned to, e.g. "clinical = en.get_clinical" and so on.

We have changed the syntax for the three merging functions: compareomics, appendmetadatatoomics, and appendmutationstoomics. Instead of separately loading the dataframes you want to merge, and then passing them to the function, you just pass a string to the function containing the name of the dataframe you want to merge, e.g. "appended = en.appendmetadatatoomics(metadatadfname="derivedmolecular", omicsdfname="phosphoproteomics")". Note that the parameter names now have "name" added to the end. These functions no longer accept dataframes; you must pass dataframe names instead.

- Jupyter Notebook
Published by caleb-lindgren almost 7 years ago

Recent Releases of cptac

cptac - v1.5.14

cptac - v1.5.13

cptac - v1.5.11

cptac -

cptac - v1.5.9

cptac -

cptac -

cptac - v1.5.6

cptac - v1.5.5

cptac - v1.5.4

cptac - v1.5.3

cptac - 1.5.2

cptac - 1.5.1

cptac - CPTAC

cptac - v.1.5.0rc2

cptac - Release Candidate 1 for version 1.5

cptac - 1.1.2

cptac - 1.1.1

cptac - 1.1.0

cptac - 1.0.0

cptac - Release Candidate for 1.0

cptac - UcecConf 2.0, updates and fixes

cptac - Pancan, Use Cases, Ucecconf 1.2

cptac - Bug fixes and updates

cptac - Fix Errors

cptac - Fix Errors

cptac - Updated LSCC, Endometrial Confirmatory, and BRCA data

cptac - Bug fixes and updates

cptac - Functionality updates

cptac - Update dependencies

cptac - GitHub pages, dependency and other updates

cptac - Make BRCA data publicly accessible

cptac - Fix WikiPathways interface bug, updated LSCC data

cptac - Minor bug fixes, utils supports Welch's t test

cptac - join_omics_to_mutations bug fix

cptac - Updated data access, multi_join, and small changes

cptac - Updated LSCC Data

cptac - Added "tissue_type" parameter to joins, along with new utils, updated lscc version 1.0 data

cptac - Add "quiet", "how", and "sample_type" parameters

cptac - LUAD followup data bugfix; followup data improvements

cptac - LSCC data, get_genotype_all_vars

cptac - Patient IDs instead of Sample IDs, followup data, and increased data availability

cptac - GBM and CCRCC data updates

cptac - Colon and endometrial data bug fixes

cptac - Prevent new version loading errors, and fix HNSCC join bugs

cptac - GBM backwards compatibility

cptac - GBM data freeze 2.0

cptac - GBM, HNSCC, LUAD, and BRCA data. Multi-level indexing

cptac - Updated dependency requirements

cptac - Version 0.5 bug fixes

cptac - Algorithms and renal CCRCC data

cptac - Remote Storage for Colon, Endometrial, and Ovarian Data

cptac - Endometrial data with DataSet class abstraction