https://github.com/tbonewmy/online-feature-screening-for-datastream-with-sparsity-concept-drifting

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: tbonewmy
License: apache-2.0
Language: C++
Default Branch: main
Size: 197 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 3 years ago · Last pushed 11 months ago

https://github.com/tbonewmy/Online-Feature-Screening-for-Datastream-with-Sparsity-Concept-Drifting/blob/main/

# Online-Feature-Screening-for-Datastream-with-Sparsity-Concept-Drifting

This is a Python implementation by the authors of the paper **"Online Feature Screening for Data Streams With Concept Drift"** from Dr. Mingyuan Wang and Dr. Adrian Barbu.

Please cite this paper if you use or build on our method. [doi.org/10.1109/TKDE.2022.3232752](https://doi.org/10.1109/TKDE.2022.3232752)

This project enabled well-known feature screening methods, including gini index, chi-square score, mutual information, fisher-score, T-score to handle streaming data, batch data, data with drifting, and sparse data. It currently only works on binary classification data.

online feature selection

## Installation

### Prerequisites

* `Python` 3.10 or newer
* `pip`
* `numpy` 2.2.4 or newer

### Note
Although the package is designed OS independent, it was only tested on Windows. You might need to use methods listed below other than `pip install pyscreeningfs`.
   \
   \
**For users installing from source (e.g., if no pre-built wheels are available for your system):**
You will need a C++ compiler compatible with your Python installation:
* **Windows:** Microsoft Visual C++ Build Tools (part of Visual Studio, or standalone).
* **Linux:** `gcc` and `g++` (usually included or easily installed via your package manager, e.g., `sudo apt-get install build-essential`).
* **macOS:** Xcode Command Line Tools (install with `xcode-select --install`).

### Install via git clone
1. Clone repository
``` bash
git clone https://github.com/yourusername/repo_name.git
```
2. Navigate into the cloned repository directory
```
cd repo_name 
```
3. Install
```
pip install .
```

### Install via download
1. Download the repository
2. Unpack to your own folder your_folder/repo_name
3. Navigate into the unpacked repository directory
``` bash
cd repo_name  
```
4. Install
``` bash
pip install .
```
### Install via pip

If pre-built wheels are available for your system (Windows) on PyPI, you can install directly:
```
pip install pyscreeningfs
```

## Data
For .svm sparse data, visit [https://www.sysnet.ucsd.edu/projects/url/](https://www.sysnet.ucsd.edu/projects/url/) \
Download and put into `data/url_svmlight/`

For any input data/data files, the Y/label/class vector can only contain numeric value and one of the label must be 1.

## Demo
For a demo, see testing.py in the root directory.

Owner

Login: tbonewmy
Kind: user

Repositories: 1
Profile: https://github.com/tbonewmy

GitHub Events

Total

Push event: 11

Last Year

Push event: 11

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

This is a Python implementation by the authors of the paper 'Online Feature Screening for Data Streams With Concept Drift' from Dr. Mingyuan Wang and Dr. Adrian Barbu. Contain various feature selection methods.

Homepage: https://github.com/tbonewmy/Online-Feature-Screening-for-Datastream-with-Sparsity-Concept-Drifting
Documentation: https://pyscreeningfs.readthedocs.io/
License: Apache-2.0
Latest release: 0.1.1
published 11 months ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 8.7%

Forks count: 31.0%

Average: 32.5%

Stargazers count: 40.9%

Dependent repos count: 49.2%

Maintainers (1)

worldkeeping

Last synced: 11 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science