LFSpy

LFSpy: A Python Implementation of Local Feature Selection for Data Classification with scikit-learn Compatibility - Published in JOSS (2020)

https://github.com/mcmasterrs/lfspy

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in JOSS metadata
○
Academic publication links
✓
Committers with academic emails
1 of 9 committers (11.1%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 43% confidence

Earth and Environmental Sciences Physical Sciences - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: McMasterRS
License: bsd-3-clause
Language: Python
Default Branch: master
Size: 19.8 MB

Statistics

Stars: 7
Watchers: 4
Forks: 2
Open Issues: 0
Releases: 1

Created about 6 years ago · Last pushed almost 6 years ago

Metadata Files

Readme License

Localized Feature Selection (LFS)

Full documentation can be found at: lfspy.readthedocs.io

Localized feature selection (LFS) is a supervised machine learning approach for embedding localized feature selection in classification. The sample space is partitioned into overlapping regions, and subsets of features are selected that are optimal for classification within each local region. As the size and membership of the feature subsets can vary across regions, LFS is able to adapt to local variation across the entire sample space.

This repository contains a python implementation of this method that is compatible with scikit-learn pipelines. For a Matlab version, refer to https://github.com/armanfn/LFS

Statement of Need

LFSpy offers an implementation of the Local Feature Selection (LFS) algorithm that is compatible with scikit-learn, one of the most widely used machine learning packages today. LFS combines classification with feature selection, and distinguishes itself by it flexibility in selecting a different subset of features for different data points based on what is most discriminative in local regions of the feature space. This means LFS overcomes a well-known weakness of many classification algorithms, i.e., classification for non-stationary data where the number of features is high relative to the number of samples.

Installation

bash pip install lfspy

Dependancies

LFS requires: * Python 3 * NumPy>=1.14 * SciPy>=1.1 * Scikit-learn>=0.18.2 * pytest>=5.0.0

Testing

We recommend running the provided test after installing LFSpy to ensure the results obtained match expected outputs.

pytest may be installed either directly through pip (pip install pytest) or using the test extra (pip install LFSpy[test]).

bash pytest --pyargs LFSpy

This will output to console whether or not the results of LFSpy on two datasets (the sample dataset provided in this repository, and scikit-learn's Fisher Iris dataset) are exactly as expected.

So far, LFSpy has been tested on Windows 10 with and without Conda, and on Ubuntu. In all cases, results have been exactly the expected results.

Usage

To use LFSpy on its own: ```python from LFSpy import LocalFeatureSelection

lfs = LocalFeatureSelection() lfs.fit(trainingdata, traininglabels) predictedlabels = lfs.predict(testingdata) totalerror, classerror = lfs.score(testingdata, testinglabels) ```

To use LFSpy as part of an sklearn pipeline: ```python from LFS import LocalFeatureSelection from sklearn.pipeline import Pipeline

lfs = LocalFeatureSelection() pipeline = Pipeline([('lfs', lfs)]) pipeline.fit(trainingdata, traininglabels) predictedlabels = pipeline.predict(testingdata) totalerror, classerror = pipeline.score(testingdata, testinglabels) ```

Tunable Parameters

alpha: (default: 19) the maximum number of selected features for each representative point
gamma: (default: 0.2) impurity level tolerance, controls proportion of out-of-class samples can be in local region
tau: (default: 2) number of passes through the training set
sigma: (default: 1) adjusts weightings for observations based on their distance, values greater than 1 result in lower weighting
n_beta: (default: 20) number of beta values to test, controls the relative weighting of intra-class vs. inter-class distance in the objective function
nrrp: (default: 2000) number of iterations for randomized rounding process
knn: (default: 1) number of nearest neighbours to compare for classification

Example

This example uses the sample data (matlab_Data.mat) available in the LFSpy/tests folder. The full example can be found in example.py. On our test system, the fnial output prints the statement, "LFS test accuracy: 0.7962962962962963".

The code provided in [comparisons.py]{https://github.com/McMasterRS/LFSpy/blob/master/LFSpy/comparisons/comparisons.py) serve as additional examples of how to use LFSpy.

```python import numpy as np from scipy.io import loadmat from LFSpy import LocalFeatureSelection from sklearn.pipeline import Pipeline

mat = loadmat('LFSpy/tests/matlabData') xtrain = mat['Train'].T ytrain = mat['TrainLables'][0] xtest = mat['Test'].T y_test = mat['TestLables'][0]

print('Training and testing an LFS model with default parameters.\nThis may take a few minutes...') lfs = LocalFeatureSelection(rrseed=777) pipeline = Pipeline([('classifier', lfs)]) pipeline.fit(xtrain, ytrain) ypred = pipeline.predict(xtest) score = pipeline.score(xtest, y_test) print('LFS test accuracy: {}'.format(score)) ```

Contribution Guidelines

Please see our Contribution Guidelines page.

Authors

Oliver Cook
Kiret Dhindsa
Areeb Khawajaby
Ron Harwood
Thomas Mudway

Acknowledgments

N. Armanfard, JP. Reilly, and M. Komeili, "Local Feature Selection for Data Classification", IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 38, no. 6, pp. 1217-1227, 2016.
N. Armanfard, JP. Reilly, and M. Komeili, "Logistic Localized Modeling of the Sample Space for Feature Selection and Classification", IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 5, pp. 1396-1413, 2018.

Owner

Name: McMaster University Research Software
Login: McMasterRS
Kind: organization
Location: Hamilton, ON Canada

Website: https://www.rhpcs.mcmaster.ca
Repositories: 4
Profile: https://github.com/McMasterRS

JOSS Publication

LFSpy: A Python Implementation of Local Feature Selection for Data Classification with scikit-learn Compatibility

Published

May 10, 2020

DOI

10.21105/joss.01958

Volume 5, Issue 49, Page 1958

Authors

Kiret Dhindsa

Research and High Performance Computing, McMaster University, Vector Institute, Department of Surgery, McMaster University

Oliver Cook

Research and High Performance Computing, McMaster University

Thomas Mudway

Research and High Performance Computing, McMaster University

Areeb Khawaja

Research and High Performance Computing, McMaster University

Ron Harwood

Research and High Performance Computing, McMaster University

Ranil Sonnadara

Research and High Performance Computing, McMaster University, Vector Institute, Department of Surgery, McMaster University

Editor

Dan Foreman-Mackey

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 142
Total Committers: 9
Avg Commits per committer: 15.778
Development Distribution Score (DDS): 0.627

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
kiretd	k**d@g**m	53
Oliver Cook	c**o@m**a	41
Areeb Khawaja	k**1@m**a	23
mudwayt	m**t@m**a	9
Christopher J. Markiewicz	m**z@s**u	5
Dan F-M	f**y@g**m	4
Nathaniel Rivera Saul	n**l@n**m	3
Ron Harwood	h**r@g**m	2
BrainModes	b**s@B**l	2

Committer Domains (Top 20 + Academic)

mcmaster.ca: 3 newrelic.com: 1 stanford.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 3
Total pull requests: 7
Average time to close issues: 7 days
Average time to close pull requests: 5 days
Total issue authors: 1
Total pull request authors: 3
Average comments per issue: 1.33
Average comments per pull request: 0.14
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

effigies (3)

Pull Request Authors

dfm (3)
sauln (2)
effigies (2)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 17 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 5
Total maintainers: 2

pypi.org: lfspy

Homepage: https://github.com/McMasterRS/LFSpy/
Documentation: https://lfspy.readthedocs.io/
License: BSD License
Latest release: 1.0.4
published almost 6 years ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 17 Last month

Rankings

Dependent packages count: 10.0%

Forks count: 19.1%

Stargazers count: 21.5%

Dependent repos count: 21.7%

Average: 22.2%

Downloads: 38.7%

Maintainers (2)

McMasterRSE tmudway

Last synced: 6 months ago

Dependencies

setup.py pypi

numpy >=1.14
scikit-learn >=0.18.2
scipy >=1.1

LFSpy

Science Score: 95.0%

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.md

Localized Feature Selection (LFS)

Statement of Need

Installation

Dependancies

Testing

Usage

Tunable Parameters

Example

Contribution Guidelines

Authors

Acknowledgments

Owner

JOSS Publication

LFSpy: A Python Implementation of Local Feature Selection for Data Classification with scikit-learn Compatibility

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: lfspy

Rankings

Maintainers (2)

Dependencies