Feature-engine
Feature-engine: A Python package for feature engineering for machine learning - Published in JOSS (2021)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Scientific Fields
Repository
Feature engineering package with sklearn like functionality
Basic Info
- Host: GitHub
- Owner: feature-engine
- License: bsd-3-clause
- Language: Python
- Default Branch: main
- Homepage: https://feature-engine.trainindata.com/
- Size: 14.3 MB
Statistics
- Stars: 2,118
- Watchers: 35
- Forks: 328
- Open Issues: 71
- Releases: 5
Topics
Metadata Files
README.md
Feature-engine
| | |
| --- | --- |
| Open Source |
|
| Tutorials |
|
| Code |
|
| Downloads |
|
| Meta |
|
| Documentation |
|
| Citation |
|
| Testing |
|
Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the transforming parameters from the data and then transform it.
Feature-engine features in the following resources
Blogs about Feature-engine
Feature-engine: A new open-source Python package for feature engineering
Practical Code Implementations of Feature Engineering for Machine Learning with Python
Documentation
Pst! How did you find us?
We want to share Feature-engine with more people. It'd help us loads if you tell us how you discovered us.
Then we'd know what we are doing right and which channels to use to share the love.
Please share your story by answering 1 quick question at this link . 😃
Current Feature-engine's transformers include functionality for:
- Missing Data Imputation
- Categorical Encoding
- Discretisation
- Outlier Capping or Removal
- Variable Transformation
- Variable Creation
- Variable Selection
- Datetime Features
- Time Series
- Preprocessing
- Scaling
- Scikit-learn Wrappers
Imputation Methods
- MeanMedianImputer
- ArbitraryNumberImputer
- RandomSampleImputer
- EndTailImputer
- CategoricalImputer
- AddMissingIndicator
- DropMissingData
Encoding Methods
- OneHotEncoder
- OrdinalEncoder
- CountFrequencyEncoder
- MeanEncoder
- WoEEncoder
- RareLabelEncoder
- DecisionTreeEncoder
- StringSimilarityEncoder
Discretisation methods
- EqualFrequencyDiscretiser
- EqualWidthDiscretiser
- GeometricWidthDiscretiser
- DecisionTreeDiscretiser
- ArbitraryDiscreriser
Outlier Handling methods
- Winsorizer
- ArbitraryOutlierCapper
- OutlierTrimmer
Variable Transformation methods
- LogTransformer
- LogCpTransformer
- ReciprocalTransformer
- ArcsinTransformer
- PowerTransformer
- BoxCoxTransformer
- YeoJohnsonTransformer
Variable Scaling methods
- MeanNormalizationScaler
Variable Creation:
- MathFeatures
- RelativeFeatures
- CyclicalFeatures
- DecisionTreeFeatures()
Feature Selection:
- DropFeatures
- DropConstantFeatures
- DropDuplicateFeatures
- DropCorrelatedFeatures
- SmartCorrelationSelection
- ShuffleFeaturesSelector
- SelectBySingleFeaturePerformance
- SelectByTargetMeanPerformance
- RecursiveFeatureElimination
- RecursiveFeatureAddition
- DropHighPSIFeatures
- SelectByInformationValue
- ProbeFeatureSelection
- MRMR
Datetime
- DatetimeFeatures
- DatetimeSubtraction
Time Series
- LagFeatures
- WindowFeatures
- ExpandingWindowFeatures
Pipelines
- Pipeline
- make_pipeline
Preprocessing
- MatchCategories
- MatchVariables
Wrappers:
- SklearnTransformerWrapper
Installation
From PyPI using pip:
pip install feature_engine
From Anaconda:
conda install -c conda-forge feature_engine
Or simply clone it:
git clone https://github.com/feature-engine/feature_engine.git
Example Usage
```python
import pandas as pd from feature_engine.encoding import RareLabelEncoder
data = {'varA': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1} data = pd.DataFrame(data) data['varA'].value_counts() ```
Out[1]:
A 10
B 10
C 2
D 1
Name: var_A, dtype: int64
```python
rareencoder = RareLabelEncoder(tol=0.10, ncategories=3) dataencoded = rareencoder.fittransform(data) dataencoded['varA'].valuecounts() ```
Out[2]:
A 10
B 10
Rare 3
Name: var_A, dtype: int64
Find more examples in our Jupyter Notebook Gallery or in the documentation.
Contribute
Details about how to contribute can be found in the Contribute Page
Briefly:
- Fork the repo
- Clone your fork into your local computer:
git clone https://github.com/<YOURUSERNAME>/feature_engine.git - navigate into the repo folder
cd feature_engine - Install Feature-engine as a developer:
pip install -e . - Optional: Create and activate a virtual environment with any tool of choice
- Install Feature-engine developer dependencies:
pip install -e ".[tests]" - Create a feature branch with a meaningful name for your feature:
git checkout -b myfeaturebranch - Develop your feature, tests and documentation
- Make sure the tests pass
- Make a PR
Thank you!!
Documentation
Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.
To build the documentation make sure you have the dependencies installed: from the root directory:
pip install -r docs/requirements.txt
Now you can build the docs using:
sphinx-build -b html docs build
License
The content of this repository is licensed under a BSD 3-Clause license.
Sponsor us
Sponsor us and support further our mission to democratize machine learning and programming tools through open-source software.
Owner
- Name: Feature-engine
- Login: feature-engine
- Kind: organization
- Repositories: 4
- Profile: https://github.com/feature-engine
JOSS Publication
Feature-engine: A Python package for feature engineering for machine learning
Authors
Train in Data
Tags
python feature engineering feature selection machine learning data scienceGitHub Events
Total
- Issues event: 42
- Watch event: 194
- Delete event: 18
- Issue comment event: 130
- Push event: 79
- Pull request review comment event: 21
- Pull request review event: 18
- Pull request event: 51
- Fork event: 19
- Create event: 19
Last Year
- Issues event: 42
- Watch event: 194
- Delete event: 18
- Issue comment event: 130
- Push event: 79
- Pull request review comment event: 21
- Pull request review event: 18
- Pull request event: 51
- Fork event: 19
- Create event: 19
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Soledad Galli | s****i@p****m | 236 |
| Soledad Galli | s****1@g****m | 22 |
| Alfonso Tobar | 4****R | 10 |
| Gleb Levitski | 3****v | 7 |
| Gleb Levitski | 3****V | 7 |
| david-cortes | d****a@g****m | 6 |
| Edoardo Argiolas | e****6@g****m | 6 |
| Michał Gromiec | m****c@p****l | 5 |
| Morgan Sell | M****l@g****m | 5 |
| Cainã Max Couto da Silva | c****a@g****m | 4 |
| ChristopherGS | c****h@g****m | 3 |
| Kishan Manani | 3****i | 3 |
| Gurjinder Kaur | 3****i | 3 |
| Claudio Salvatore Arcidiacono | 2****o | 3 |
| Ashok kumar | 7****3 | 2 |
| Karthik Kothareddy | k****p@g****m | 2 |
| Miguel Trejo Marrufo | 4****l | 2 |
| Noah Green | n****5@g****m | 2 |
| Nodar Okroshiashvili | n****i@g****m | 2 |
| Luis Seabra | 7****s | 2 |
| Sana Ben Driss | b****a@g****m | 2 |
| Sangam | 3****K | 2 |
| Surya Krishnamurthy | s****1@g****m | 2 |
| hectorpatino | 6****o | 2 |
| pradumna123 | p****5@g****m | 2 |
| Andrew Tan | 5****B | 1 |
| tomtom-95 | 7****5 | 1 |
| px39n | 5****n | 1 |
| piecot | p****t | 1 |
| olkr | 1****a | 1 |
| and 23 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 129
- Total pull requests: 208
- Average time to close issues: 8 months
- Average time to close pull requests: 29 days
- Total issue authors: 49
- Total pull request authors: 31
- Average comments per issue: 3.7
- Average comments per pull request: 2.63
- Merged pull requests: 155
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 29
- Pull requests: 54
- Average time to close issues: about 2 months
- Average time to close pull requests: about 1 month
- Issue authors: 16
- Pull request authors: 8
- Average comments per issue: 2.21
- Average comments per pull request: 1.8
- Merged pull requests: 40
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- solegalli (58)
- ClaudioSalvatoreArcidiacono (5)
- Morgan-Sell (5)
- fkiraly (4)
- david-cortes (4)
- TremaMiguel (3)
- jolespin (3)
- dlaprins (2)
- Okroshiashvili (2)
- NicoGalli (2)
- PeterPirog (2)
- michaelrussell4 (2)
- kylegilde (2)
- lukaspistelak (2)
- darigovresearch (2)
Pull Request Authors
- solegalli (166)
- ClaudioSalvatoreArcidiacono (13)
- olikra (12)
- Morgan-Sell (10)
- cmcouto-silva (9)
- gurjinderbassi (8)
- glevv (7)
- datacubeR (6)
- ranja-sarkar (5)
- kylegilde (3)
- dlaprins (3)
- david-cortes (3)
- VascoSch92 (3)
- michaelrussell4 (2)
- sergiobemar (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- pypi 297,991 last-month
- Total docker downloads: 148
-
Total dependent packages: 89
(may contain duplicates) -
Total dependent repositories: 440
(may contain duplicates) - Total versions: 65
- Total maintainers: 1
pypi.org: feature-engine
Feature engineering and selection package with Scikit-learn's fit transform functionality
- Homepage: http://github.com/feature-engine/feature_engine
- Documentation: https://feature-engine.readthedocs.io/
- License: BSD 3 clause
-
Latest release: 1.9.3
published 4 months ago
Rankings
Maintainers (1)
proxy.golang.org: github.com/feature-engine/feature_engine
- Documentation: https://pkg.go.dev/github.com/feature-engine/feature_engine#section-documentation
- License: bsd-3-clause
-
Latest release: v1.2.0
published almost 4 years ago
Rankings
conda-forge.org: feature_engine
- Homepage: https://github.com/feature-engine/feature_engine
- License: BSD-3-Clause
-
Latest release: 1.5.1
published about 3 years ago
Rankings
Dependencies
- Sphinx >=4.3.2
- docutils ==0.16
- numpy >=1.18.2
- numpydoc >=0.9.2
- pandas >=1.0.3
- pydata_sphinx_theme >=0.7.2
- scikit-learn >=1.0.0
- scipy >=1.4.1
- sphinx_autodoc_typehints >=1.11.1,<=1.21.3
- statsmodels >=0.11.1
- numpy >=1.18.2
- pandas >=1.0.3
- scikit-learn >=1.0.0
- scipy >=1.4.1
- statsmodels >=0.11.1
- black >=21.5b1 test
- coverage >=6.4.4 test
- flake8 >=3.9.2 test
- isort >=5.8.0 test
- mypy >=0.740 test
- pytest >=5.4.1 test

