featransform
Featransform: Automated Feature Engineering for Machine Learning
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Keywords
Repository
Featransform: Automated Feature Engineering for Machine Learning
Basic Info
Statistics
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Featransform: Automated Feature Engineering Framework for Supervised Machine Learning
Framework Contextualization
The Featransform project constitutes an objective and integrated proposition to automate feature engineering through the integration of various approachs of input pattern recognition known in Machine Learning such as dimensionality reduction, anomaly detection, clustering approaches and datetime feature constrution. This package provides an ensemble of diverse applications of each specific approach, aggregating and generating them all as added engineered features based on the original input columns.
In order to avoid generation of noisy data for predictive consumption, after the engineered features ensemble are concatenated with the original features, a backwards wrapper feature selection also known as backward elimination is implemented to iteratively remove features based on evaluation of relevance, maintaining only valuable columns available for future models performance improvement purposes.
The architecture design includes three main sections, these being: data preprocessing, diverse feature engineering ensembles and optimized feature selection validation.
This project aims at providing the following application capabilities:
General applicability on tabular datasets: The developed feature engineering procedures are applicable on any data table associated with any Supervised ML scopes, based on input data columns to be built up on.
Improvement of predictive results: The application of the
Featransformaims at improve the predictive performance of future applied Machine Learning models through added feature construction, increased pattern recognition and optimization of existing input features.Continuous integration: After the train data is fitted, the created object can be saved and implemented in future data with the same structure.
Main Development Tools
Major frameworks used to built this project:
Where to get it
Binary installer for the latest released version is available at the Python Package Index (PyPI).
Installation
To install this package from Pypi repository run the following command:
pip install featransform
Usage Example
Featransform - Automated Feature Engineering Pipeline
In order to be able to apply the automated feature engineering featransform pipeline you need first to import the package.
The following needed step is to load a dataset and define your to be predicted target column name into the variable target.
You can customize the fit_engineering method by altering the following running pipeline parameters:
* configs: Nested dictionary in which are contained all methods specific parameters configurations. Feel free to customize each method as you see fit (customization example shown bellow);
* optimizeiters: Number of iterations generated for backwards feature selection optimization.
* validationsplit: Division ratio in which the feature engineering methods will be evaluated within the loaded Dataset (range: [0.05, 0.45]).
Relevant Note:
* Although functional, Featransform pipeline is not optimized for big data purposes yet.
```py
import pandas as pd from sklearn.modelselection import traintest_split from featransform.pipeline import (Featransform, configurations) import warnings warnings.filterwarnings("ignore", category=Warning) # -> For a clean console
data = pd.readcsv('csvdirectory_path', encoding='latin', delimiter=',') # Dataframe Loading Example
train,test = traintestsplit(data, trainsize=0.8) train,test = train.resetindex(drop=True), test.reset_index(drop=True) # -> Required
Load and Customize Parameters
configs = configurations() print(configs)
configs['Unsupervised']['IsolationForest']['nestimators'] = 300 configs['Clustering']['KMeans']['nclusters'] = 3 configs['DimensionalityReduction']['UMAP']['ncomponents'] = 6
Fit Data
ft = Featransform(configs = configs, # validationsplit:float, optimizeiters:int optimizeiters = 10, validationsplit = 0.30)
ft.fitengineering(X = train, # X:pd.DataFrame, target:str="TargetColumn" target = "TargetColumnName")
Transform Data
train = ft.transform(X=train) test = ft.transform(X=test)
Export Featransform Metadata
import pickle output = open("ft_eng.pkl", 'wb') pickle.dump(ft, output)
```
Further Implementations
Further automated and customizable feature engineering ensemble methods applications can be checked here: Featransform Examples
License
Distributed under the MIT License. See LICENSE for more information.
Contact
Owner
- Name: Luís Santos
- Login: TsLu1s
- Kind: user
- Location: Braga, Portugal
- Repositories: 4
- Profile: https://github.com/TsLu1s
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this Python package, please cite it as below." authors: - family-names: "Santos" given-names: "Luís" orcid: "https://orcid:0000-0002-4121-1133" title: "Featransform - Automated Feature Engineering Framework for Supervised Machine Learning" version: 0.1.10 doi: "" date-released: 2024-02-06 url: "https://pypi.org/project/featransform/"
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0