MOAFS

MOAFS: A Massive Online Analysis library for feature selection in data streams - Published in JOSS (2020)

https://github.com/mbdemoraes/moafs

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation

Repository

MOAFS: A Massive Online Analysis library for feature selection in data streams

Basic Info
  • Host: GitHub
  • Owner: mbdemoraes
  • License: gpl-3.0
  • Language: Java
  • Default Branch: master
  • Size: 15.8 MB
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created over 6 years ago · Last pushed about 6 years ago
Metadata Files
Readme License

README.md

docs license JOSS

Abstract

MOAFS is a library for the Massive Online Analysis framework. It is based on the MOAReduction extension and contains the implementation of seven feature selection algorithms to be used as dimensionality reduction techniques in data streams classification problems, especially in the text-domain field. MOAFS uses an incremental version of Naïve Bayes as the base classifier.

Available algorithms

Information Gain and Gain Ratio

  • QUINLAN, J. R. Induction of Decision Trees. Machine Learning, v. 1, n. 1, p. 81–106, 1986. ISSN 1573-0565. DOI: 10.1023/A:1022643204877.

Symmetrical Uncertainty

  • YU, L.; LIU, H. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In: Twentieth International Conference on Machine Learning, 2003. Edition: T. Fawcett e N. Mishra. v. 2, p. 856–863. ISBN 1577351894.

Online Feature Selection

  • J. Wang, P. Zhao, S. Hoi, R. Jin, Online feature selection and its applications, IEEE Transactions on Knowledge and Data Engineering 26 (3) (2014) 698–710.

Chi-Squared

  • PEARSON, K. On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling. In: Breakthroughs in Statistics: Methodology and Distribution. Edition: Samuel Kotz e Norman L. Johnson. New York, NY: Springer New York, 1992. p. 11–28. ISBN 978-1-4612-4380-9. DOI: 10.1007/978-1-4612-4380-9_2.

Crammers V-Test

  • Cramér, Harald. 1946. Mathematical Methods of Statistics. Princeton: Princeton University Press, page 282 (Chapter 21. The two-dimensional case). ISBN 0-691-08004-6

Extremal Feature Selection

  • CARVALHO, V. R.; COHEN, W. W. Single-pass Online Learning: Performance, Voting Schemes and Online Feature Selection. In: 12 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA, USA: ACM, 2006. (KDD ’06), p. 548–553. ISBN 1-59593-339-5. DOI: 10.1145/1150402.1150466.

Documentation

For futher documentation, please refer to the Docs.

Installation and requirements

Simply download the moafs.jar from the lib directory in this repository and add the file to the "lib" folder in the directory where MOA is installed. Then, from the same folder, use the follow command:

Example on Windows

From the lib folder where your MOA is installed:

java -cp .;moafs.jar;moa.jar -javaagent:sizeofag-1.0.4.jar moa.gui.GUI

Or using jar's full location:

java -cp .;moa-release-2018.6.0/lib/moafs.jar;moa-release-2018.6.0/lib/moa.jar -javaagent:moa-release-2018.6.0/lib/sizeofag-1.0.4.jar moa.gui.GUI

Example on Linux/Mac

From the lib folder where your MOA is installed:

java -cp moafs.jar:moa.jar -javaagent:sizeofag-1.0.4.jar moa.gui.GUI

Or using full location:

java -cp moa-release-2018.6.0/lib/moa.jar:./moa.jar-javaagent:moa-release-2018.6.0/lib/sizeofag-1.0.4.jar moa.gui.GUI

Requirements

How to test this library

  1. Download the datasets in arff format which you want to experiment. There are sample datasets available in this repository;
  2. Run MOA with MOAFS, from command line or using GUI;
  3. Follow the examples demonstrated in the Docs.

Parameters

MOAFS uses a set of different parameters:

  • -f: Reduction rate - The number of features to select (default = 10)
  • -w: Processing window - The number of instances to process using the specified reduction rate (default = 1)
  • -m: Feature Selection Method - Feature selection method to be used. Options:
    • 0 = No method;
    • 1 = Information Gain;
    • 2 = Symmetrical Uncertainty;
    • 3 = Chi-Squared;
    • 4 = Cramers V-Test;
    • 5 = Gain Ratio;
    • 6 = Extremal Feature Selection;
    • 7 = Online Feature Selection.

Sample datasets

This repository contains some sample datasets at the sample datasets folder, which may be used for experiments. They were obtained from different sources:

  • usenet1, usenet2, usenet3, spam-data and emailing_list were collected by The Machine Learning and Knowledge Discovery (MLKD) group and can be found at Concept Drifting Datasets in Weka;
  • spambase, gassensor, semeion, enron and syntheticcontrol were obtained at UCI
  • usenet_recurrent was collected by Dr. Gama and can be found at Datasets for Concept Drift

Sample outputs

This repository contains sample outputs using the presented feature selection algorithms using the usenet1 dataset at the sample outputs folder, considering window size= 10 and number of features= 20.

Examples from the command line (Linux)

Here is an example using the Interleaved-Test-Then-Train approach with the Chi-Squared algorithm on the Usenet1 data set, selecting 20 features:

java -cp moafs.jar:moa.jar -javaagent:sizeofag-1.0.4.jar moa.DoTask "EvaluateInterleavedTestThenTrain -l (moa.featureselection.classifiers.NaiveBayes -f 20 -m 3) -s (ArffFileStream -f /home/athos/Documentos/datasets/usenet1.arff) -f 100"

This should generate a similar results screen:

MOAFSscreen

License

Distributed under the GNU General Public License v3.0 License. See LICENSE for more information.

Contact

If you wish to contribute to the software, report issues or problems or seek suport, feel free to use the issue report of this repository or to contact us.

Matheus Bernardelli de Moraes -- matheuzmoraes@gmail.com

André Leon S. Gradvohl -- gradvohl@ft.unicamp.br

Owner

  • Login: mbdemoraes
  • Kind: user
  • Location: Piracicaba, SP - Brazil
  • Company: University of Campinas

Ph.D. in Technology. Research interests include bio-inspired algorithms, machine learning, decision-making and multi-objective optimization.

JOSS Publication

MOAFS: A Massive Online Analysis library for feature selection in data streams
Published
January 23, 2020
Volume 5, Issue 45, Page 1970
Authors
Matheus Bernardelli de Moraes ORCID
Faculty of Technology, University of Campinas
André Leon Sampaio Gradvohl ORCID
Faculty of Technology, University of Campinas
Editor
Viviane Pons ORCID
Tags
feature selection data streams concept drift moa

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 66
  • Total Committers: 2
  • Avg Commits per committer: 33.0
  • Development Distribution Score (DDS): 0.015
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
mbdemoraes m****s@g****m 65
Andre Leon S. Gradvohl g****l@f****r 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

pom.xml maven
  • nz.ac.waikato.cms.moa:moa 2017.06
  • nz.ac.waikato.cms.weka:weka-stable 3.8.0