https://github.com/tbonewmy/online-feature-screening-for-datastream-with-sparsity-concept-drifting

https://github.com/tbonewmy/online-feature-screening-for-datastream-with-sparsity-concept-drifting

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: tbonewmy
  • License: apache-2.0
  • Language: C++
  • Default Branch: main
  • Size: 197 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed 11 months ago

https://github.com/tbonewmy/Online-Feature-Screening-for-Datastream-with-Sparsity-Concept-Drifting/blob/main/

# Online-Feature-Screening-for-Datastream-with-Sparsity-Concept-Drifting

This is a Python implementation by the authors of the paper **"Online Feature Screening for Data Streams With Concept Drift"** from Dr. Mingyuan Wang and Dr. Adrian Barbu.

Please cite this paper if you use or build on our method. [doi.org/10.1109/TKDE.2022.3232752](https://doi.org/10.1109/TKDE.2022.3232752)

This project enabled well-known feature screening methods, including gini index, chi-square score, mutual information, fisher-score, T-score to handle streaming data, batch data, data with drifting, and sparse data. It currently only works on binary classification data.

online feature selection

## Installation

### Prerequisites

* `Python` 3.10 or newer
* `pip`
* `numpy` 2.2.4 or newer

### Note
Although the package is designed OS independent, it was only tested on Windows. You might need to use methods listed below other than `pip install pyscreeningfs`.
   \
   \
**For users installing from source (e.g., if no pre-built wheels are available for your system):**
You will need a C++ compiler compatible with your Python installation:
* **Windows:** Microsoft Visual C++ Build Tools (part of Visual Studio, or standalone).
* **Linux:** `gcc` and `g++` (usually included or easily installed via your package manager, e.g., `sudo apt-get install build-essential`).
* **macOS:** Xcode Command Line Tools (install with `xcode-select --install`).

### Install via git clone
1. Clone repository
``` bash
git clone https://github.com/yourusername/repo_name.git
```
2. Navigate into the cloned repository directory
```
cd repo_name 
```
3. Install
```
pip install .
```

### Install via download
1. Download the repository
2. Unpack to your own folder your_folder/repo_name
3. Navigate into the unpacked repository directory
``` bash
cd repo_name  
```
4. Install
``` bash
pip install .
```
### Install via pip

If pre-built wheels are available for your system (Windows) on PyPI, you can install directly:
```
pip install pyscreeningfs
```

## Data
For .svm sparse data, visit [https://www.sysnet.ucsd.edu/projects/url/](https://www.sysnet.ucsd.edu/projects/url/) \
Download and put into `data/url_svmlight/`

For any input data/data files, the Y/label/class vector can only contain numeric value and one of the label must be 1.

## Demo
For a demo, see testing.py in the root directory.

Owner

  • Login: tbonewmy
  • Kind: user

GitHub Events

Total
  • Push event: 11
Last Year
  • Push event: 11

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
pypi.org: pyscreeningfs

This is a Python implementation by the authors of the paper 'Online Feature Screening for Data Streams With Concept Drift' from Dr. Mingyuan Wang and Dr. Adrian Barbu. Contain various feature selection methods.

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 8.7%
Forks count: 31.0%
Average: 32.5%
Stargazers count: 40.9%
Dependent repos count: 49.2%
Maintainers (1)
Last synced: 11 months ago