Feature-engine

Feature-engine: A Python package for feature engineering for machine learning - Published in JOSS (2021)

https://github.com/feature-engine/feature_engine

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

data-science feature-engineering feature-extraction feature-selection machine-learning python scikit-learn

Scientific Fields

Mathematics Computer Science - 88% confidence
Economics Social Sciences - 63% confidence
Last synced: 4 months ago · JSON representation

Repository

Feature engineering package with sklearn like functionality

Basic Info
Statistics
  • Stars: 2,118
  • Watchers: 35
  • Forks: 328
  • Open Issues: 71
  • Releases: 5
Topics
data-science feature-engineering feature-extraction feature-selection machine-learning python scikit-learn
Created almost 7 years ago · Last pushed 4 months ago
Metadata Files
Readme Contributing Funding License Code of conduct

README.md

Feature-engine

feature-engine logo

| | | | --- | --- | | Open Source | GitHub GC.OS Sponsored | | Tutorials | !youtube | | Code | PyPI - Python Version PyPI Conda | | Downloads | Monthly Downloads Downloads| | Meta | GitHub contributors first-timers-only Sponsorship | | Documentation | Read the Docs | | Citation | DOI JOSS | | Testing | CircleCI Codecov Code style: black |

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the transforming parameters from the data and then transform it.

Feature-engine features in the following resources

Blogs about Feature-engine

Documentation

Pst! How did you find us?

We want to share Feature-engine with more people. It'd help us loads if you tell us how you discovered us.

Then we'd know what we are doing right and which channels to use to share the love.

Please share your story by answering 1 quick question at this link . 😃

Current Feature-engine's transformers include functionality for:

  • Missing Data Imputation
  • Categorical Encoding
  • Discretisation
  • Outlier Capping or Removal
  • Variable Transformation
  • Variable Creation
  • Variable Selection
  • Datetime Features
  • Time Series
  • Preprocessing
  • Scaling
  • Scikit-learn Wrappers

Imputation Methods

  • MeanMedianImputer
  • ArbitraryNumberImputer
  • RandomSampleImputer
  • EndTailImputer
  • CategoricalImputer
  • AddMissingIndicator
  • DropMissingData

Encoding Methods

  • OneHotEncoder
  • OrdinalEncoder
  • CountFrequencyEncoder
  • MeanEncoder
  • WoEEncoder
  • RareLabelEncoder
  • DecisionTreeEncoder
  • StringSimilarityEncoder

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • GeometricWidthDiscretiser
  • DecisionTreeDiscretiser
  • ArbitraryDiscreriser

Outlier Handling methods

  • Winsorizer
  • ArbitraryOutlierCapper
  • OutlierTrimmer

Variable Transformation methods

  • LogTransformer
  • LogCpTransformer
  • ReciprocalTransformer
  • ArcsinTransformer
  • PowerTransformer
  • BoxCoxTransformer
  • YeoJohnsonTransformer

Variable Scaling methods

  • MeanNormalizationScaler

Variable Creation:

  • MathFeatures
  • RelativeFeatures
  • CyclicalFeatures
  • DecisionTreeFeatures()

Feature Selection:

  • DropFeatures
  • DropConstantFeatures
  • DropDuplicateFeatures
  • DropCorrelatedFeatures
  • SmartCorrelationSelection
  • ShuffleFeaturesSelector
  • SelectBySingleFeaturePerformance
  • SelectByTargetMeanPerformance
  • RecursiveFeatureElimination
  • RecursiveFeatureAddition
  • DropHighPSIFeatures
  • SelectByInformationValue
  • ProbeFeatureSelection
  • MRMR

Datetime

  • DatetimeFeatures
  • DatetimeSubtraction

Time Series

  • LagFeatures
  • WindowFeatures
  • ExpandingWindowFeatures

Pipelines

  • Pipeline
  • make_pipeline

Preprocessing

  • MatchCategories
  • MatchVariables

Wrappers:

  • SklearnTransformerWrapper

Installation

From PyPI using pip:

pip install feature_engine

From Anaconda:

conda install -c conda-forge feature_engine

Or simply clone it:

git clone https://github.com/feature-engine/feature_engine.git

Example Usage

```python

import pandas as pd from feature_engine.encoding import RareLabelEncoder

data = {'varA': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1} data = pd.DataFrame(data) data['varA'].value_counts() ```

Out[1]: A 10 B 10 C 2 D 1 Name: var_A, dtype: int64

```python

rareencoder = RareLabelEncoder(tol=0.10, ncategories=3) dataencoded = rareencoder.fittransform(data) dataencoded['varA'].valuecounts() ```

Out[2]: A 10 B 10 Rare 3 Name: var_A, dtype: int64

Find more examples in our Jupyter Notebook Gallery or in the documentation.

Contribute

Details about how to contribute can be found in the Contribute Page

Briefly:

  • Fork the repo
  • Clone your fork into your local computer: git clone https://github.com/<YOURUSERNAME>/feature_engine.git
  • navigate into the repo folder cd feature_engine
  • Install Feature-engine as a developer: pip install -e .
  • Optional: Create and activate a virtual environment with any tool of choice
  • Install Feature-engine developer dependencies: pip install -e ".[tests]"
  • Create a feature branch with a meaningful name for your feature: git checkout -b myfeaturebranch
  • Develop your feature, tests and documentation
  • Make sure the tests pass
  • Make a PR

Thank you!!

Documentation

Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.

To build the documentation make sure you have the dependencies installed: from the root directory: pip install -r docs/requirements.txt

Now you can build the docs using: sphinx-build -b html docs build

License

The content of this repository is licensed under a BSD 3-Clause license.

Sponsor us

Sponsor us and support further our mission to democratize machine learning and programming tools through open-source software.

Owner

  • Name: Feature-engine
  • Login: feature-engine
  • Kind: organization

JOSS Publication

Feature-engine: A Python package for feature engineering for machine learning
Published
September 22, 2021
Volume 6, Issue 65, Page 3642
Authors
Soledad Galli
Train in Data
Editor
Øystein Sørensen ORCID
Tags
python feature engineering feature selection machine learning data science

GitHub Events

Total
  • Issues event: 42
  • Watch event: 194
  • Delete event: 18
  • Issue comment event: 130
  • Push event: 79
  • Pull request review comment event: 21
  • Pull request review event: 18
  • Pull request event: 51
  • Fork event: 19
  • Create event: 19
Last Year
  • Issues event: 42
  • Watch event: 194
  • Delete event: 18
  • Issue comment event: 130
  • Push event: 79
  • Pull request review comment event: 21
  • Pull request review event: 18
  • Pull request event: 51
  • Fork event: 19
  • Create event: 19

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 370
  • Total Committers: 53
  • Avg Commits per committer: 6.981
  • Development Distribution Score (DDS): 0.362
Past Year
  • Commits: 39
  • Committers: 8
  • Avg Commits per committer: 4.875
  • Development Distribution Score (DDS): 0.205
Top Committers
Name Email Commits
Soledad Galli s****i@p****m 236
Soledad Galli s****1@g****m 22
Alfonso Tobar 4****R 10
Gleb Levitski 3****v 7
Gleb Levitski 3****V 7
david-cortes d****a@g****m 6
Edoardo Argiolas e****6@g****m 6
Michał Gromiec m****c@p****l 5
Morgan Sell M****l@g****m 5
Cainã Max Couto da Silva c****a@g****m 4
ChristopherGS c****h@g****m 3
Kishan Manani 3****i 3
Gurjinder Kaur 3****i 3
Claudio Salvatore Arcidiacono 2****o 3
Ashok kumar 7****3 2
Karthik Kothareddy k****p@g****m 2
Miguel Trejo Marrufo 4****l 2
Noah Green n****5@g****m 2
Nodar Okroshiashvili n****i@g****m 2
Luis Seabra 7****s 2
Sana Ben Driss b****a@g****m 2
Sangam 3****K 2
Surya Krishnamurthy s****1@g****m 2
hectorpatino 6****o 2
pradumna123 p****5@g****m 2
Andrew Tan 5****B 1
tomtom-95 7****5 1
px39n 5****n 1
piecot p****t 1
olkr 1****a 1
and 23 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 129
  • Total pull requests: 208
  • Average time to close issues: 8 months
  • Average time to close pull requests: 29 days
  • Total issue authors: 49
  • Total pull request authors: 31
  • Average comments per issue: 3.7
  • Average comments per pull request: 2.63
  • Merged pull requests: 155
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 29
  • Pull requests: 54
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 1 month
  • Issue authors: 16
  • Pull request authors: 8
  • Average comments per issue: 2.21
  • Average comments per pull request: 1.8
  • Merged pull requests: 40
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • solegalli (58)
  • ClaudioSalvatoreArcidiacono (5)
  • Morgan-Sell (5)
  • fkiraly (4)
  • david-cortes (4)
  • TremaMiguel (3)
  • jolespin (3)
  • dlaprins (2)
  • Okroshiashvili (2)
  • NicoGalli (2)
  • PeterPirog (2)
  • michaelrussell4 (2)
  • kylegilde (2)
  • lukaspistelak (2)
  • darigovresearch (2)
Pull Request Authors
  • solegalli (166)
  • ClaudioSalvatoreArcidiacono (13)
  • olikra (12)
  • Morgan-Sell (10)
  • cmcouto-silva (9)
  • gurjinderbassi (8)
  • glevv (7)
  • datacubeR (6)
  • ranja-sarkar (5)
  • kylegilde (3)
  • dlaprins (3)
  • david-cortes (3)
  • VascoSch92 (3)
  • michaelrussell4 (2)
  • sergiobemar (2)
Top Labels
Issue Labels
priority (8) good first issue (7) enhancement (7) urgent (6) new transformer (6) docs (6) easy (5) wontfix (3) code quality (3) question (3) jupyter notebook (1)
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 297,991 last-month
  • Total docker downloads: 148
  • Total dependent packages: 89
    (may contain duplicates)
  • Total dependent repositories: 440
    (may contain duplicates)
  • Total versions: 65
  • Total maintainers: 1
pypi.org: feature-engine

Feature engineering and selection package with Scikit-learn's fit transform functionality

  • Versions: 46
  • Dependent Packages: 89
  • Dependent Repositories: 437
  • Downloads: 297,991 Last month
  • Docker Downloads: 148
Rankings
Dependent packages count: 0.2%
Dependent repos count: 0.7%
Downloads: 0.9%
Average: 1.7%
Stargazers count: 1.7%
Forks count: 3.2%
Docker downloads count: 3.5%
Maintainers (1)
Last synced: 4 months ago
proxy.golang.org: github.com/feature-engine/feature_engine
  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 4 months ago
conda-forge.org: feature_engine
  • Versions: 15
  • Dependent Packages: 0
  • Dependent Repositories: 3
Rankings
Stargazers count: 11.0%
Forks count: 11.2%
Dependent repos count: 18.1%
Average: 23.0%
Dependent packages count: 51.6%
Last synced: 4 months ago

Dependencies

docs/requirements.txt pypi
  • Sphinx >=4.3.2
  • docutils ==0.16
  • numpy >=1.18.2
  • numpydoc >=0.9.2
  • pandas >=1.0.3
  • pydata_sphinx_theme >=0.7.2
  • scikit-learn >=1.0.0
  • scipy >=1.4.1
  • sphinx_autodoc_typehints >=1.11.1,<=1.21.3
  • statsmodels >=0.11.1
requirements.txt pypi
  • numpy >=1.18.2
  • pandas >=1.0.3
  • scikit-learn >=1.0.0
  • scipy >=1.4.1
  • statsmodels >=0.11.1
setup.py pypi
test_requirements.txt pypi
  • black >=21.5b1 test
  • coverage >=6.4.4 test
  • flake8 >=3.9.2 test
  • isort >=5.8.0 test
  • mypy >=0.740 test
  • pytest >=5.4.1 test