Recent Releases of hypertools

hypertools - v0.8.1 (April, 2025)

This is a minor release, written by first-time contributor @terrafying(!!), that updates hypertools for use with NumPy 2.0+. Notes: - Removed support for Python <= 3.8 - Added support for Python 3.10, 3.11, and 3.12 - Updated dependency requirements for compatibility with NumPy 2.0+

This release also brings back support for running hypertools in Colaboratory notebooks, which now use NumPy 2.0+.

- Python
Published by jeremymanning about 1 year ago

hypertools - v0.8.0 (February, 2022)

updates to .geo file format

Hypertools now saves DataGeometry objects using the pickle file format internally, rather than HDF5. With improvements made to the built-in pickle module since Hypertools's initial release, this now generally results in smaller files that save and load more quickly. It also allows us to no longer depend on deepdish, which has compatibility issues with various pandas objects, doesn't offer pre-built wheels for more recent Python versions, and is largely no longer maintained.

If you need to load .geo files from the old format, hypertools.load now accepts a keyword-only argument, legacy. Install deepdish if necessary, and pass legacy=True to load older DataGeometry objects. You can then .save() them to convert them to the new format.

improvements to example datasets

All example data files have been upgraded to the new file format. Additionally, the three pre-trained scikit-learn Pipelines Hypertools provides (wiki_model, nips_model, and sotus_model) have been recreated from scratch using a newer scikit-learn version, better text preprocessing, and updated CountVectorizer and LatentDirichletAllocation parameters that result in overall better models.

The example DataGeometry objects associated with these three models (wiki, nips, and sotus) have been updated accordingly, and additionally now use IncrementalPCA as their default reducers, resulting in faster, deterministic transform outputs.

To use the new models and datasets, upgrade Hypertools to v0.8.0 (pip install -U hypertools) and remove the local cache of old versions ([[ -d ~/hypertools_data ]] && rm ~/hypertools_data/*). Older versions of Hypertools will continue to use the old example data.

Other improvements

  • Hypertools is now compatible with Python 3.9! This release is also compatible in principle with Python 3.10, but numba does not yet support Python 3.10, so certain dependencies will fail to install.
  • Hypertools now works with newer scikit-learn versions! The updates above to the example datasets make Hypertools fully compatible with recent scikit-learn releases (>=0.24). This should make Hypertools easier to use in Colaboratory notebooks and more flexible in general. If you need to use an older scikit-learn version, pip-install hypertools<0.8.0.
  • Hypertools now works with newer Matplotlib versions! Recent updates to matplotlib's plotting backends were causing Hypertools's plotting interface to fail on import. We've fixed these bugs and maintained backwards compatibility with older (deprecated) interactive plotting backends as well.

Other assorted changes

  • failures when loading example datasets and .geo files now raise HypertoolsIOError with clearer error messages
  • specifying a compression when saving a DataGeometry object raises a FutureWarning
  • CI tests now run with Python 3.6 -- 3.9, use mamba for faster environment setup, and generate more verbose output
  • dependencies and code required for Python 2/3 compatibility have been removed
  • various code causing RuntimeWarnings has been fixed

- Python
Published by paxtonfitzpatrick over 4 years ago

hypertools - v0.7.0 (June 2021)

Control over matplotlib backend & various bug fixes

New features:

  • Create animated plots in an environment with a non-interactive matplotlib plotting backend set, without disrupting the global plotting backend
  • Create non-animated, interactive plots for easy inspection of data using the new interactive keyword argument
  • Set the plotting backend for a single plot using the new mpl_backend keyword argument, and easily switch between backends within a single Python interpreter session, IPython kernel, and even Jupyter notebook cell
  • Use the new hypertools.set_interactive_backend function to change the backend for all future plots, or use it as a context manager to temporarily switch to a different backend. You can also use this to create multiple animated/interactive plots simultaneously.
  • use hypertools's backend adjustments to control behavior of other plotting libraries
  • Set the $HYPERTOOLS_BACKEND environment variable to permanently set your preferred plotting backend for non-IPython environments

NB: Currently supported backends include TkInter, GTK, wxPython, Qt4, Qt5, Cocoa (aka MacOSX; MacOS only), notebook/nbAgg (Jupyter notebooks only), and ipympl/widget (Jupyter notebooks only). 3D and interactive plots may not render properly in Colab notebooks due to security restrictions imposed by the Colaboratory platform.

Bug fixes

  • importing hypertools in a notebook no longer creates phantom Python processes, issues warnings when TkInter isn't installed, fails if matplotlib.pyplot was imported first, or silently changes the plotting backend (fixes #242)
  • creating 3D plots with hypertools no longer alters the global matplotlib.rcParams object (fixes #243)
  • hypertools can now be imported for non-plotting-related uses in environments without a compatible GUI without throwing an error
  • IPython's TAB-completion no longer triggers a full import of hypertools or improperly sets the plotting backend based on the subprocess's environment
  • require scikit-learn<0.24 (full spec: scikit-learn>=0.19.1,!=0.22,<0.24) to avoid bug when loading pre-trained DataGeometry objects due to renamed sklearn module

- Python
Published by paxtonfitzpatrick almost 5 years ago

hypertools - v0.6.3 (October 2020)

dependency-related updates

  • allow scikit-learn>0.22. scikit-learn==0.22.0 contains a bug that affects the CountVectorizer vocabulary. This has been fixed in 0.23.0.
  • require umap-learn>=0.4.6. We previously avoided a bug in umap-learn<=0.4.5 by installing a pre-release version from GitHub. This has now been fixed in umap-learn==0.4.6
  • Beginning with seaborn==0.11.0, "dark" color palettes are returned in reverse order from how they were previously. This difference in behavior will be reflected in hypertools, but we've changed the default cmap in hypertools._shared.helpers.vals2colors to a non-dark palette for consistent default behavior.
  • Added tests for Python 3.8

- Python
Published by paxtonfitzpatrick over 5 years ago

hypertools - v0.6.2 (December 2019)

minor patch that enables dependencies not hosted on PyPI to install properly

  • setup.py's setup command is now a custom class that inherits from setuptools.command.install.install, runs the regular installation process, then pip-installs UMAP from its GitHub URL at a pre-release commit hash. This is completely equivalent to manually running pip install git+<URL>, but takes the burden of having to do so off of end-users.
  • removed URL from requirements.txt, added a comment in its place
  • added MANIFEST.IN file to include requirements.txt
  • updated minimum Python version listed on PyPI page to 3.5 to reflect that Python 3.4 support was dropped in v0.5.1 (August 2018)

This version is tagged as 0.6.2 to keep the versioning here and on PyPI consistent. The fix intended to be 0.6.1 was unsuccessful on TestPyPI, and PyPI does not allow removing and reuploading an existing version.

- Python
Published by paxtonfitzpatrick over 6 years ago

hypertools - v0.6.0 (December 2019)

Updates to hypertools.reduce

  • fixed bug when to passing a dictionary of parameters to the reduce argument that would result in those parameters being overwritten
  • added some basic support for passing custom embedding models
  • added a warning when resolving conflicts between hypertools arguments and model-specific arguments #### Other changes
  • dropped support for Python 2.7
  • fixed bug in Travis tests
  • replaced depreciated pandas.DataFrame method in hypertools.tools.df2mat
  • require installing UMAP from the GitHub repository due to bug fix not released yet.
  • updated setup.py to comply with PEP 508 guidelines for installing external dependencies
  • added unit test for hypertools.reduce bug fix
  • removed some unused imports and commented-out code
  • removed outdated pages from readthedocs
  • readthedocs build is now Python 3-based
  • build folder is ignored by default when installing from GitHub repository in editable mode

- Python
Published by paxtonfitzpatrick over 6 years ago

hypertools - v0.5.1 (August 2018)

  • added flake8 to travis tests
  • refactored some of procrustes function code
  • removed support for python 3.4
  • removed hdbscan from dependencies (still can be used if installed manually)

Code cleanup (thanks @dwillmer!): + Changed string comparisons from if x is 'str' to if x == 'str'; the former is an identity comparison, not equality. It happens to be true for some strings because of string interning, but == should always be used for normal comparisons. + Removed unused arguments from draw function - returndata and others weren't used in the function body. + Removed unreachable code in normalize function (branch criteria could never be True). + Separated out the multiply-nested function calls in DataGeometry class for clarity. + Changed comparisons of the formif type(x) is list to if isinstance(x, list); The former doesn't return True for subclasses, so isinstance should always be used. + Set unused loop variables to _. + Removed unused imports. + Ensured all imports are at the top of the file (except lazy / circular ones) + Ensure 2 blank lines above functions/classes (PEP8), the code looks a bit weird without this. + Fixed typo repect -> respect, was copy-pasted in multiple docstrings. + Removed redundant pass before error raise

- Python
Published by andrewheusser almost 8 years ago

hypertools - v0.5.0 (April 2018)

Enhancements:

Plotting and transforming text data + hyp.plot now supports plotting text data. Simply pass a string, list of strings or list of lists of strings and the text will be transformed using a semantic model and plotted. By default, the text will be fit to a topic model (LDA) fit to a selection of wikipedia pages. + A new vectorizer argument in hyp.plot to specify a text vectorizer. Currently supports CountVectorizer,TfidfVectorizer, or class instances (fit or unfit) of these models. + A newsemanticargument inhyp.plotthat specifies the semantic model to use to transform text. Current supportsLatentDirichletAllocation,NMF, or class instances (fit or unfit) of these models. + A newcorpusargument inhyp.plotthat allows the user to specify text to fit a semantic model. Can be 'wiki', 'nips', 'sotus' or a custom list of text. + Enhancedhyp.format_data` function that takes data in various forms (numpy array, dataframe, str, or list of str, or mixed list) and returns them in a standard format (a list of numpy arrays). This function can be used to transform text data using a semantic model.

New algorithms + A new clustering algorithm HDBSCAN (thanks @lmcinnes!) e.g. hyp.plot(data, cluster='HDBSCAN') + A new dimensionality reduction algorithm UMAP (thanks @lmcinnes!) e.g. hyp.plot(data, reduce='UMAP')

New parameters + A new size param to resize figure e.g. hyp.plot(data, size=[10,8]) + A new ax param to add figure to existing axis e.g. hyp.plot(data, ax=ax)

New text examples + A new dataset of NIPS papers e.g. hyp.load('nips') (from kaggle) + A new dataset of selected wikipedia pages e.g. hyp.load('wiki') + A new dataset of State of the Union text from 1989-2017. Can be loaded as hyp.load('sotus') (from kaggle)

API changes In hyp.plot changed group arg to hue (group will still be supported but depreciated in a coming release). + Removed deprecated describe_pca function. Please use more general function, describe.

Bugs fixed + When using chemtrails in hyp.plot, the entire timeseries would appear for the first few seconds of an animation and then dissapear. + The legend colors did not align with the data when using the fmt or color args. + When grouping with group/hue arg, labels were not reshuffled. + Fixed bug in describe function where correlations between data and reduced data would asymptote < 1.

NOTE: If you have been using the development version of 0.5.0, please clear your data cache (/Users/yourusername/hypertools_data).

- Python
Published by andrewheusser about 8 years ago

hypertools - v0.4.2 (December 2017)

  • fixed bug in plot function where software would crash if reduce was specified as dict
  • added tutorials to readthedocs

- Python
Published by andrewheusser over 8 years ago

hypertools - v0.4.1 (November 2017)

  • exposed formatdata which formats numpy array, pandas df or mixed list in list of numpy arrays(hypertools.tools.formatdata)
  • added tests for the function to format_data
  • added documentation to format_data

- Python
Published by andrewheusser over 8 years ago

hypertools - v0.4.0 (October 2017)

Enhancements -

  • A new class: DataGeometry with methods for plotting, transforming new data and saving Support for loading *.geo objects
  • A new function: analyze to perform combinations of transformations
  • A new function: describe for characterizing the loss of information due to dimensionality reduction algs
  • In-memory caching of time-intensive reduce, align and describe operations
  • New syntax for reduce function: model and model_params are now passed as a dictionary using the reduce arg
  • New clustering models added to the cluster function: MiniBatchKMeans, AgglomerativeClustering, Birch, FeatureAgglomeration, and SpectralClustering
  • Moved major functions (normalize, align, reduce, cluster, load) to main level (i.e. hyp.load instead of hyp.tools.load, but the latter will still work)

Deprecations -

  • A deprecation warning is thrown for the following align arguments: normalize, ndims, and method
  • A deprecation warning is thrown for the following reduce arguments: model, model_params, align, and normalize
  • A deprecation warning is thrown for the following cluster arguments: ndims
  • A deprecation warning is thrown for the describe_pca function (replaced by describe)

Bugs -

  • fixed #148 bug in hyp.plot where figure would be rendered despite setting show=False (thanks @chaseWilliams !)
  • fixed a bug where n_clusters would not override group, even though a warning message said it would
  • fixed a bug where hyp.plot would quit if any kwargs were not the same length as the number of arrays in the list of input data.

Minor - + added brainiak toolbox citation and github link to align.py docstring + added additional details and fixed typos in align.py docstring + Upgraded seaborn requirement to 8.1 + updated all examples/docs with new syntax changes + added new tests for new features

- Python
Published by andrewheusser over 8 years ago

hypertools - v0.3.1 (August 2017)

  • suppress warning when attempting to switch to TkAgg backend

- Python
Published by andrewheusser almost 9 years ago

hypertools - v0.3.0 (June 2017)

This release extends hypertools to support the following dimensionality reduction / manifold learning models:

  • PCA
  • FastICA
  • IncrementalPCA
  • KernelPCA
  • FactorAnalysis
  • TruncatedSVD
  • SparsePCA
  • MiniBatchSparsePCA
  • DictionaryLearning
  • MiniBatchDictionaryLearning
  • TSNE
  • MDS
  • SpectralEmbedding
  • LocallyLinearEmbedding
  • Isomap

The default reduction algorithm was switched from PCA to IncrementalPCA for better handling of large datasets.

Bugs squashed:

  • fixed plot_procrustes example so that rotation matrix is orthonormal

- Python
Published by andrewheusser almost 9 years ago

hypertools - v0.2.1 (June 2017)

The work for this update was done during the Mozilla Global Sprint 2017. Thank you @alysivji and @rarredon for your contributions! Thanks @stephwright and the Mozilla Open Science Team for organizing an awesome event!

New Features:

  • If legend is not explicitly given, it can be computed implicitly by passing legend=True
  • Align flag added to hyp.plot function
  • Align flag added to hyp.tools.reduce function
  • Reduce flag added to hyp.tools.align and hyp.tools.procrustes functions
  • Align and reduce flags added to hyp.tools.load function
  • Updated examples with new syntax

Bugs Squashed:

  • Fixed import bug for saving animations
  • Fixed bug in align function where an extra column(s) of zeros were appended to data before alignment if ndims<=2

- Python
Published by andrewheusser almost 9 years ago

hypertools - v0.2.0 (May 2017)

new features, new code organization and also some style changes!

Key changes:

  • The way we handle args. Now, all keywords are handled explicitly, instead of unpacking them using the **kwargs syntax. This makes the code much cleaner, and easier to parse arguments.

  • The organization of the plotting code. I eliminated the separate static and animate code bases, bc there was a lot of redundant code and handling args was a mess. its now organized into a plot function, which is the main plotting function that also handles data manipulation prior to plotting, and a draw function, which handles all static and animated drawing.

  • Plot styling. All styles are now consistent, the static plots are now the same as the animated plots. Also, all lines/points are thinner.

  • New keyword arguments to the plot function: +fmt in 0.1.0, format strings were handled as arguments, but now they are passed as a kwarg. since the fmt kwarg is the first param after the data, the API in 0.1 and 0.2 is the same, except in 0.1 format strings could be passed in any position, and now they must be passed immediately after the data. +title can be passed to add a title to the plot +elev can be passed to change the elevation of the plot. useful for static plots in jupyter notebooks +azim can be passed to change the azimuth of a plot. useful for static plots in jupyter notebooks +precog is an animation only feature which plots a low-opacity trace ahead of the data (similar to chemtrails but in the opposite direction. +bullettime (animation only) is the same as the combinations of precog and chemtrails +animate='spin' will create a "static" plot (i.e. all the data is plotted at once) that rotates +return_data has been eliminated. the data is now always returned by default

Minor changes:

  • changed 'weights' example data from 3D numpy array to list of 2D numpy arrays
  • updated examples for new 'weights' data format
  • remove docs _build folder from repo
  • added requests to setup.py file
  • remove rogue class from init file

Bugs fixed:

if the rank is of the input matrix is smaller than the number of dimensions requested, the reduce function will now pad the reduced data matrix with ndims-rank columns of zeros.

- Python
Published by andrewheusser about 9 years ago

hypertools - v0.1.7 (April 2017)

Enhancements: + added load function to load in example data + moved example data out of repo to google drive

Bugs squashed: + added missing import sys statement to helpers.py + fixed bug with in example scripts dealing with missing data + fixed bug that caused software to crash when using PPCA + fixed bug in readthedocs build caused by empty toctree

- Python
Published by andrewheusser about 9 years ago

hypertools - v0.1.6 (February 2017)

  • fixed bug where group kwarg caused code to crash on some systems
  • fixed bug where requirements still listed matplotlib <2.0, even though it is supported
  • patched bug where tail_duration would crash if not an integer value
  • added warning to align function when n features exceeds n samples

- Python
Published by andrewheusser over 9 years ago

hypertools - v0.1.5 (February 2017)

  • Added support for maplotlib 2.0
  • Added future package to setup.py so its automatically installed

- Python
Published by andrewheusser over 9 years ago

hypertools - v0.1.4 (February 2017)

  • Support for Python 3.4+

- Python
Published by andrewheusser over 9 years ago

hypertools - v0.1.3 (February 2017)

Bugs fixed: - patched bug where future division was not being imported into align, procrustes and srm functions - fixed bug where category labels could be returned out of order because of set

- Python
Published by andrewheusser over 9 years ago

hypertools - v0.1.2 (February 2017)

changes: - added docstrings to all public functions - added sphinx-generated documentation of API with examples - modified the examples so they are compatible with sphinx-gallery - simplified readme

bugs fixed: - patched bug where matplotlib 2.0 could be used (working on this for next release) - patched bug where procrustes function uses dimensionality reduction during alignment by default

- Python
Published by andrewheusser over 9 years ago

hypertools - v0.1 (January 2017)

First version! API documented below:

Main function

  • plot - Plots high dimensional data in 1, 2, or 3 dimensions as static image, 3d interactive plot or animated plot.

Sub functions

  • tools.align - align multidimensional data (See here for details)
  • tools.reduce - implements PCA to reduce dimensionality of data
  • tools.cluster - runs k-means clustering and returns cluster labels
  • tools.describe_pca - plotting tool to evaluate how well the principle components describe the data
  • tools.missing_inds - returns indices of missing data (nans)
  • tools.normalize - z-scores the columns/rows of a matrix or list of matrices
  • tools.procrustes - projects from one space to another using Procrustean transformation (shift + scaling + rotation) (Adapted from pyMVPA implementation)
  • tools.df2mat - converts single-level pandas dataframe to numpy matrix

Plot

Plot example

Inputs:

A numpy array, list of arrays, or pandas dataframe or list of dataframes

NOTE: HyperTools currently only supports single-level indexing for pandas dataframes, but we plan to support multi-level indices in the future. Additionally, be aware that if columns containing text are passed to HyperTools, those columns will be automatically converted into dummy variables (see pandas.get_dummies for details).

Arguments:

Format strings can be passed as a string, or tuple/list of length x. See matplotlib API for more styling options

Keyword arguments:

color(s) (list): A list of colors for each line to be plotted. Can be named colors, RGB values (e.g. (.3, .4, .1)) or hex codes. If defined, overrides palette. See http://matplotlib.org/examples/color/named_colors.html for list of named colors. Note: must be the same length as X.

group (list of str, floats or ints): A list of group labels. Length must match the number of rows in your dataset. If the data type is numerical, the values will be mapped to rgb values in the specified palette. If the data type is strings, the points will be labeled categorically. To label a subset of points, use None (i.e. ['a', None, 'b','a'])

linestyle(s) (list): a list of line styles

marker(s) (list): a list of marker types

palette (str): A matplotlib or seaborn color palette

labels (list): A list of labels for each point. Must be dimensionality of data (X). If no label is wanted for a particular point, input None

legend (list): A list of string labels to be plotted in a legend (one for each list item)

ndims (int): an int representing the number of dims to plot in. Must be 1,2, or 3. NOTE: Currently only works with static plots.

normalize (str or False) - If set to 'across', the columns of the input data will be z-scored across lists (default). If set to 'within', the columns will be z-scored within each list that is passed. If set to 'row', each row of the input data will be z-scored. If set to False, the input data will be returned (default is False).

nclusters (int): If nclusters is passed, HyperTools will perform k-means clustering with the k parameter set to n_clusters. The resulting clusters will be plotted in different colors according to the color palette.

animate (bool): If True, plots the data as an animated trajectory (default: False)

show (bool): If set to False, the figure will not be displayed, but the figure, axis and data objects will still be returned (see Outputs) (default: True).

savepath (str): Path to save the image/movie. Must include the file extension in the save path (i.e. `savepath='/path/to/file/image.png'). NOTE: If saving an animation, FFMPEG must be installed (this is a matplotlib req). FFMPEG can be easily installed on a mac via homebrewbrew install ffmpegor linux via apt-getapt-get install ffmpeg. If you don't have homebrew (mac only), you can install it like this:/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"`.

explore (bool): Displays user defined labels will appear on hover. If no labels are passed, The point index and coordinate will be plotted. To use, set explore=True.

Note: Explore more is currently only supported for 3D static plots.

Animation-specific keyword arguments:

duration (int): Length of the animation in seconds (default: 30 seconds)

tail_duration (int): Sets the length of the tail of the data (default: 2 seconds)

rotations (int): Number of rotations around the box (default: 2)

zoom (int): Zoom, positive numbers will zoom in (default: 0)

chem_trails (bool): Added trail with change in opacity (default: False)

Outputs:

-By default, the plot function outputs a figure handle (matplotlib.figure.Figure), axis handle (matplotlib.axes._axes.Axes) and data (list of numpy arrays), e.g. fig,axis,data = hyp.plot(x)

-If animate=True, the plot function additionally outputs an animation handle (matplotlib.animation.FuncAnimation) e.g. fig,axis,data,line_ani = hyp.plot(x, animate=True)

Example uses:

Please see the examples folder for many more implementation examples.

Import the library: import hypertools as hyp

Plot with default color palette: hyp.plot(data)

Plot as movie: hyp.plot(data, animate=True)

Change color palette: hyp.plot(data,palette='Reds')

Specify colors using unlabeled list of format strings: hyp.plot([data[0],data[1]],['r:','b--'])

Plot data as points: hyp.plot([data[0],data[1]],'o')

Specify colors using keyword list of colors (color codes, rgb values, hex codes or a mix): hyp.plot([data[0],data[1],[data[2]],color=['r', (.5,.2,.9), '#101010'])

Specify linestyles using keyword list: hyp.plot([data[0],data[1],[data[2]],linestyle=[':','--','-'])

Specify markers using keyword list: hyp.plot([data[0],data[1],[data[2]],marker=['o','*','^'])

Specify markers with format string and colors with keyword argument: hyp.plot([data[0],data[1],[data[2]], 'o', color=['r','g','b'])

Specify labels:

```

Label first point of each list

labels=[] for idx,i in enumerate(data): tmp=[] for iidx,ii in enumerate(i): if iidx==0: tmp.append('Point ' + str(idx)) else: tmp.append(None) labels.append(tmp)

hyp.plot(data, 'o', labels=labels) ```

Specify group:

```

Label first point of each list

group=[] for idx,i in enumerate(data): tmp=[] for iidx,ii in enumerate(i): tmp.append(np.random.rand()) group.append(tmp)

hyp.plot(data, 'o', group=group) ```

Plot in 2d: hyp.plot(data, ndims=2)

Group clusters by color: hyp.plot(data, n_clusters=10)

Create a legend: hyp.plot([data[0],data[1]], legend=['Group A', 'Group B'])

Turn on explore mode (experimental): hyp.plot(data, 'o', explore=True)

Align

BEFORE

Align before example

AFTER

Align after example

Inputs:

A list of numpy arrays

Outputs:

An aligned list of numpy arrays

Example use:

align a list of arrays: aligned_data = hyp.tools.align(data)

Reduce

Inputs:

A numpy array or list of numpy arrays

Keyword arguments: - ndims - dimensionality of output data - normalize (str or False) - If set to 'across', the columns of the input data will be z-scored across lists. If set to 'within', the columns will be z-scored within each list that is passed. If set to 'row', each row of the input data will be z-scored. If set to False, the input data will be returned. (default is 'across').

Outputs

An array or list of arrays with reduced dimensionality

Example uses

Reduce n-dimensional array to 3d: reduced_data = hyp.tools.reduce(data, ndims=3)

Cluster

Inputs:

A numpy array or list of numpy arrays

Keyword arguments: - n_clusters (int) - number of clusters to fit (default=8) - ndims (int) - reduce data to ndims before running k-means (optional)

Outputs

A list of cluster labels corresponding to each data point. NOTE: During the cluster fitting, the data are stacked across lists, so if multiple lists are passed, the returned list of cluster labels will need to be reshaped.

Example use:

cluster_labels = hyp.tools.cluster(data, n_clusters=10) hyp.plot(data, 'o', group = cluster_labels)

Cluster Example

Describe PCA

Inputs:

A numpy array or list of numpy arrays

Keyword arguments: - show (bool) - if true, returns figure handle, axis handle and dictionary containing the plotted data. If false, the function just returns a dictionary containing the data

Outputs

A plot summarizing the correlation of the covariance matrixes for the raw input data and PCA reduced data

Example use:

hyp.tools.describe_pca(data)

Describe Example

Missing inds

Inputs:

A numpy array or list of numpy arrays

Outputs

A list of indices representing rows with missing data. If a list of numpy arrays is passed, a list of lists will be returned.

Example use:

missing_data_inds = hyp.tools.missing_inds(data)

Normalize

Inputs:

A numpy array or list of numpy arrays

Keyword arguments: - normalize (str or False) - If set to 'across', the columns of the input data will be z-scored across lists. If set to 'within', the columns will be z-scored within each list that is passed. If set to 'row', each row of the input data will be z-scored. If set to False, the input data will be returned. Note: you MUST set the normalize flag equal to 'across', 'within' or 'row or else you will get the same data back that you put in!

Outputs

An array or list of normalized data

Example use:

normalized_data = hyp.tools.normalize(data, normalize='within')

Procrustes

Inputs: - source - a numpy array to be transformed - target - a numpy array to serve as template

Outputs

A (shifted + scaled + rotated) version of source that best matches target

Example use:

source_aligned_to_target = hyp.tools.procrustes(source, target)

df2mat

Inputs: - a single-level pandas dataframe

Outputs

A numpy matrix built from the dataframe with text columns replaced with dummy variables (see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html)

Example use:

matrix = hyp.tools.df2mat(df)

- Python
Published by andrewheusser over 9 years ago