pytorch-widedeep

pytorch-widedeep: A flexible package for multimodal deep learning - Published in JOSS (2023)

https://github.com/jrzaurin/pytorch-widedeep

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: arxiv.org, joss.theoj.org
  • Committers with academic emails
    1 of 12 committers (8.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

deep-learning images model-hub multimodal-deep-learning python pytorch pytorch-cv pytorch-nlp pytorch-tabular-data pytorch-transformers tabular-data text

Scientific Fields

Mathematics Computer Science - 84% confidence
Engineering Computer Science - 60% confidence
Artificial Intelligence and Machine Learning Computer Science - 45% confidence
Last synced: 4 months ago · JSON representation ·

Repository

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

Basic Info
  • Host: GitHub
  • Owner: jrzaurin
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 100 MB
Statistics
  • Stars: 1,367
  • Watchers: 24
  • Forks: 195
  • Open Issues: 7
  • Releases: 28
Topics
deep-learning images model-hub multimodal-deep-learning python pytorch pytorch-cv pytorch-nlp pytorch-tabular-data pytorch-transformers tabular-data text
Created about 8 years ago · Last pushed 5 months ago
Metadata Files
Readme Contributing License Citation

README.md

PyPI version Python 3.8 3.9 3.10 3.11 Build Status Documentation Status codecov Code style: black Maintenance contributions welcome Slack DOI

pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

Documentation: https://pytorch-widedeep.readthedocs.io

Companion posts and tutorials: infinitoml

Experiments and comparison with LightGBM: TabularDL vs LightGBM

Slack: if you want to contribute or just want to chat with us, join slack

The content of this document is organized as follows:

Introduction

pytorch-widedeep is based on Google's Wide and Deep Algorithm, adjusted for multi-modal datasets.

In general terms, pytorch-widedeep is a package to use deep learning with tabular data. In particular, is intended to facilitate the combination of text and images with corresponding tabular data using wide and deep models. With that in mind there are a number of architectures that can be implemented with the library. The main components of those architectures are shown in the Figure below:

In math terms, and following the notation in the paper, the expression for the architecture without a deephead component can be formulated as:

Where σ is the sigmoid function, 'W' are the weight matrices applied to the wide model and to the final activations of the deep models, 'a' are these final activations, φ(x) are the cross product transformations of the original features 'x', and , and 'b' is the bias term. In case you are wondering what are "cross product transformations", here is a quote taken directly from the paper: "For binary features, a cross-product transformation (e.g., “AND(gender=female, language=en)”) is 1 if and only if the constituent features (“gender=female” and “language=en”) are all 1, and 0 otherwise".

It is perfectly possible to use custom models (and not necessarily those in the library) as long as the the custom models have an property called output_dim with the size of the last layer of activations, so that WideDeep can be constructed. Examples on how to use custom components can be found in the Examples folder and the section below.

Architectures

The pytorch-widedeep library offers a number of different architectures. In this section we will show some of them in their simplest form (i.e. with default param values in most cases) with their corresponding code snippets. Note that all the snippets below shoud run locally. For a more detailed explanation of the different components and their parameters, please refer to the documentation.

For the examples below we will be using a toy dataset generated as follows:

```python import os import random

import numpy as np import pandas as pd from PIL import Image from faker import Faker

def createandsaverandomimage(image_number, size=(32, 32)):

if not os.path.exists("images"):
    os.makedirs("images")

array = np.random.randint(0, 256, (size[0], size[1], 3), dtype=np.uint8)

image = Image.fromarray(array)

image_name = f"image_{image_number}.png"
image.save(os.path.join("images", image_name))

return image_name

fake = Faker()

cities = ["New York", "Los Angeles", "Chicago", "Houston"] names = ["Alice", "Bob", "Charlie", "David", "Eva"]

data = { "city": [random.choice(cities) for _ in range(100)], "name": [random.choice(names) for _ in range(100)], "age": [random.uniform(18, 70) for _ in range(100)], "height": [random.uniform(150, 200) for _ in range(100)], "sentence": [fake.sentence() for _ in range(100)], "othersentence": [fake.sentence() for _ in range(100)], "imagename": [createandsaverandomimage(i) for i in range(100)], "target": [random.choice([0, 1]) for _ in range(100)], }

df = pd.DataFrame(data) ```

This will create a 100 rows dataframe and a dir in your local folder, called images with 100 random images (or images with just noise).

Perhaps the simplest architecture would be just one component, wide, deeptabular, deeptext or deepimage on their own, which is also possible, but let's start the examples with a standard Wide and Deep architecture. From there, how to build a model comprised only of one component will be straightforward.

Note that the examples shown below would be almost identical using any of the models available in the library. For example, TabMlp can be replaced by TabResnet, TabNet, TabTransformer, etc. Similarly, BasicRNN can be replaced by AttentiveRNN, StackedAttentiveRNN, or HFModel with their corresponding parameters and preprocessor in the case of the Hugging Face models.

1. Wide and Tabular component (aka deeptabular)

```python from pytorchwidedeep.preprocessing import TabPreprocessor, WidePreprocessor from pytorchwidedeep.models import Wide, TabMlp, WideDeep from pytorch_widedeep.training import Trainer

Wide

widecols = ["city"] crossedcols = [("city", "name")] widepreprocessor = WidePreprocessor(widecols=widecols, crossedcols=crossedcols) Xwide = widepreprocessor.fittransform(df) wide = Wide(inputdim=np.unique(Xwide).shape[0])

Tabular

tabpreprocessor = TabPreprocessor( embedcols=["city", "name"], continuouscols=["age", "height"] ) Xtab = tabpreprocessor.fittransform(df) tabmlp = TabMlp( columnidx=tabpreprocessor.columnidx, catembedinput=tabpreprocessor.catembedinput, continuouscols=tabpreprocessor.continuouscols, mlphiddendims=[64, 32], )

WideDeep

model = WideDeep(wide=wide, deeptabular=tab_mlp)

Train

trainer = Trainer(model, objective="binary")

trainer.fit( Xwide=Xwide, Xtab=Xtab, target=df["target"].values, nepochs=1, batchsize=32, ) ```

2. Tabular and Text data

```python from pytorchwidedeep.preprocessing import TabPreprocessor, TextPreprocessor from pytorchwidedeep.models import TabMlp, BasicRNN, WideDeep from pytorch_widedeep.training import Trainer

Tabular

tabpreprocessor = TabPreprocessor( embedcols=["city", "name"], continuouscols=["age", "height"] ) Xtab = tabpreprocessor.fittransform(df) tabmlp = TabMlp( columnidx=tabpreprocessor.columnidx, catembedinput=tabpreprocessor.catembedinput, continuouscols=tabpreprocessor.continuouscols, mlphiddendims=[64, 32], )

Text

textpreprocessor = TextPreprocessor( textcol="sentence", maxlen=20, maxvocab=100, ncpus=1 ) Xtext = textpreprocessor.fittransform(df) rnn = BasicRNN( vocabsize=len(textpreprocessor.vocab.itos), embeddim=16, hiddendim=8, nlayers=1, )

WideDeep

model = WideDeep(deeptabular=tab_mlp, deeptext=rnn)

Train

trainer = Trainer(model, objective="binary")

trainer.fit( Xtab=Xtab, Xtext=Xtext, target=df["target"].values, nepochs=1, batchsize=32, ) ```

3. Tabular and text with a FC head on top via the head_hidden_dims param in WideDeep

```python from pytorchwidedeep.preprocessing import TabPreprocessor, TextPreprocessor from pytorchwidedeep.models import TabMlp, BasicRNN, WideDeep from pytorch_widedeep.training import Trainer

Tabular

tabpreprocessor = TabPreprocessor( embedcols=["city", "name"], continuouscols=["age", "height"] ) Xtab = tabpreprocessor.fittransform(df) tabmlp = TabMlp( columnidx=tabpreprocessor.columnidx, catembedinput=tabpreprocessor.catembedinput, continuouscols=tabpreprocessor.continuouscols, mlphiddendims=[64, 32], )

Text

textpreprocessor = TextPreprocessor( textcol="sentence", maxlen=20, maxvocab=100, ncpus=1 ) Xtext = textpreprocessor.fittransform(df) rnn = BasicRNN( vocabsize=len(textpreprocessor.vocab.itos), embeddim=16, hiddendim=8, nlayers=1, )

WideDeep

model = WideDeep(deeptabular=tabmlp, deeptext=rnn, headhidden_dims=[32, 16])

Train

trainer = Trainer(model, objective="binary")

trainer.fit( Xtab=Xtab, Xtext=Xtext, target=df["target"].values, nepochs=1, batchsize=32, ) ```

4. Tabular and multiple text columns that are passed directly to WideDeep

```python from pytorchwidedeep.preprocessing import TabPreprocessor, TextPreprocessor from pytorchwidedeep.models import TabMlp, BasicRNN, WideDeep from pytorch_widedeep.training import Trainer

Tabular

tabpreprocessor = TabPreprocessor( embedcols=["city", "name"], continuouscols=["age", "height"] ) Xtab = tabpreprocessor.fittransform(df) tabmlp = TabMlp( columnidx=tabpreprocessor.columnidx, catembedinput=tabpreprocessor.catembedinput, continuouscols=tabpreprocessor.continuouscols, mlphiddendims=[64, 32], )

Text

textpreprocessor1 = TextPreprocessor( textcol="sentence", maxlen=20, maxvocab=100, ncpus=1 ) Xtext1 = textpreprocessor1.fittransform(df) textpreprocessor2 = TextPreprocessor( textcol="othersentence", maxlen=20, maxvocab=100, ncpus=1 ) Xtext2 = textpreprocessor2.fittransform(df) rnn1 = BasicRNN( vocabsize=len(textpreprocessor1.vocab.itos), embeddim=16, hiddendim=8, nlayers=1, ) rnn2 = BasicRNN( vocabsize=len(textpreprocessor2.vocab.itos), embeddim=16, hiddendim=8, n_layers=1, )

WideDeep

model = WideDeep(deeptabular=tabmlp, deeptext=[rnn1, rnn_2])

Train

trainer = Trainer(model, objective="binary")

trainer.fit( Xtab=Xtab, Xtext=[Xtext1, Xtext2], target=df["target"].values, nepochs=1, batch_size=32, ) ```

5. Tabular data and multiple text columns that are fused via a the library's ModelFuser class

```python from pytorchwidedeep.preprocessing import TabPreprocessor, TextPreprocessor from pytorchwidedeep.models import TabMlp, BasicRNN, WideDeep, ModelFuser from pytorch_widedeep import Trainer

Tabular

tabpreprocessor = TabPreprocessor( embedcols=["city", "name"], continuouscols=["age", "height"] ) Xtab = tabpreprocessor.fittransform(df) tabmlp = TabMlp( columnidx=tabpreprocessor.columnidx, catembedinput=tabpreprocessor.catembedinput, continuouscols=tabpreprocessor.continuouscols, mlphiddendims=[64, 32], )

Text

textpreprocessor1 = TextPreprocessor( textcol="sentence", maxlen=20, maxvocab=100, ncpus=1 ) Xtext1 = textpreprocessor1.fittransform(df) textpreprocessor2 = TextPreprocessor( textcol="othersentence", maxlen=20, maxvocab=100, ncpus=1 ) Xtext2 = textpreprocessor2.fit_transform(df)

rnn1 = BasicRNN( vocabsize=len(textpreprocessor1.vocab.itos), embeddim=16, hiddendim=8, nlayers=1, ) rnn2 = BasicRNN( vocabsize=len(textpreprocessor2.vocab.itos), embeddim=16, hiddendim=8, nlayers=1, )

modelsfuser = ModelFuser(models=[rnn1, rnn2], fusionmethod="mult")

WideDeep

model = WideDeep(deeptabular=tabmlp, deeptext=modelsfuser)

Train

trainer = Trainer(model, objective="binary")

trainer.fit( Xtab=Xtab, Xtext=[Xtext1, Xtext2], target=df["target"].values, nepochs=1, batch_size=32, ) ```

6. Tabular and multiple text columns, with an image column. The text columns are fused via the library's ModelFuser and then all fused via the deephead paramenter in WideDeep which is a custom ModelFuser coded by the user

This is perhaps the less elegant solution as it involves a custom component by the user and slicing the 'incoming' tensor. In the future, we will include a TextAndImageModelFuser to make this process more straightforward. Still, is not really complicated and it is a good example of how to use custom components in pytorch-widedeep.

Note that the only requirement for the custom component is that it has a property called output_dim that returns the size of the last layer of activations. In other words, it does not need to inherit from BaseWDModelComponent. This base class simply checks the existence of such property and avoids some typing errors internally.

```python import torch

from pytorchwidedeep.preprocessing import TabPreprocessor, TextPreprocessor, ImagePreprocessor from pytorchwidedeep.models import TabMlp, BasicRNN, WideDeep, ModelFuser, Vision from pytorchwidedeep.models.basewdmodelcomponent import BaseWDModelComponent from pytorchwidedeep import Trainer

Tabular

tabpreprocessor = TabPreprocessor( embedcols=["city", "name"], continuouscols=["age", "height"] ) Xtab = tabpreprocessor.fittransform(df) tabmlp = TabMlp( columnidx=tabpreprocessor.columnidx, catembedinput=tabpreprocessor.catembedinput, continuouscols=tabpreprocessor.continuouscols, mlphiddendims=[16, 8], )

Text

textpreprocessor1 = TextPreprocessor( textcol="sentence", maxlen=20, maxvocab=100, ncpus=1 ) Xtext1 = textpreprocessor1.fittransform(df) textpreprocessor2 = TextPreprocessor( textcol="othersentence", maxlen=20, maxvocab=100, ncpus=1 ) Xtext2 = textpreprocessor2.fittransform(df) rnn1 = BasicRNN( vocabsize=len(textpreprocessor1.vocab.itos), embeddim=16, hiddendim=8, nlayers=1, ) rnn2 = BasicRNN( vocabsize=len(textpreprocessor2.vocab.itos), embeddim=16, hiddendim=8, nlayers=1, ) modelsfuser = ModelFuser( models=[rnn1, rnn2], fusion_method="mult", )

Image

imagepreprocessor = ImagePreprocessor(imgcol="imagename", imgpath="images") Ximg = imagepreprocessor.fittransform(df) vision = Vision(pretrainedmodelsetup="resnet18", headhidden_dims=[16, 8])

deephead (custom model fuser)

class MyModelFuser(BaseWDModelComponent): """ Simply a Linear + Relu sequence on top of the text + images followed by a Linear -> Relu -> Linear for the concatenation of tabular slice of the tensor and the output of the text and image sequential model """ def init( self, tabincomingdim: int, textincomingdim: int, imageincomingdim: int, output_units: int, ):

    super(MyModelFuser, self).__init__()

    self.tab_incoming_dim = tab_incoming_dim
    self.text_incoming_dim = text_incoming_dim
    self.image_incoming_dim = image_incoming_dim
    self.output_units = output_units
    self.text_and_image_fuser = torch.nn.Sequential(
        torch.nn.Linear(text_incoming_dim + image_incoming_dim, output_units),
        torch.nn.ReLU(),
    )
    self.out = torch.nn.Sequential(
        torch.nn.Linear(output_units + tab_incoming_dim, output_units * 4),
        torch.nn.ReLU(),
        torch.nn.Linear(output_units * 4, output_units),
    )

def forward(self, X: torch.Tensor) -> torch.Tensor:
    tab_slice = slice(0, self.tab_incoming_dim)
    text_slice = slice(
        self.tab_incoming_dim, self.tab_incoming_dim + self.text_incoming_dim
    )
    image_slice = slice(
        self.tab_incoming_dim + self.text_incoming_dim,
        self.tab_incoming_dim + self.text_incoming_dim + self.image_incoming_dim,
    )
    X_tab = X[:, tab_slice]
    X_text = X[:, text_slice]
    X_img = X[:, image_slice]
    X_text_and_image = self.text_and_image_fuser(torch.cat([X_text, X_img], dim=1))
    return self.out(torch.cat([X_tab, X_text_and_image], dim=1))

@property
def output_dim(self):
    return self.output_units

deephead = MyModelFuser( tabincomingdim=tabmlp.outputdim, textincomingdim=modelsfuser.outputdim, imageincomingdim=vision.outputdim, outputunits=8, )

WideDeep

model = WideDeep( deeptabular=tabmlp, deeptext=modelsfuser, deepimage=vision, deephead=deephead, )

Train

trainer = Trainer(model, objective="binary")

trainer.fit( Xtab=Xtab, Xtext=[Xtext1, Xtext2], Ximg=Ximg, target=df["target"].values, nepochs=1, batch_size=32, ) ```

7. A two-tower model

This is a popular model in the context of recommendation systems. Let's say we have a tabular dataset formed my triples (user features, item features, target). We can create a two-tower model where the user and item features are passed through two separate models and then "fused" via a dot product.

```python import numpy as np import pandas as pd

from pytorchwidedeep import Trainer from pytorchwidedeep.preprocessing import TabPreprocessor from pytorch_widedeep.models import TabMlp, WideDeep, ModelFuser

Let's create the interaction dataset

user_features dataframe

np.random.seed(42) userids = np.arange(1, 101) ages = np.random.randint(18, 60, size=100) genders = np.random.choice(["male", "female"], size=100) locations = np.random.choice(["citya", "cityb", "cityc", "cityd"], size=100) userfeatures = pd.DataFrame( {"id": user_ids, "age": ages, "gender": genders, "location": locations} )

item_features dataframe

item_ids = np.arange(1, 101) prices = np.random.uniform(10, 500, size=100).round(2) colors = np.random.choice(["red", "blue", "green", "black"], size=100) categories = np.random.choice(["electronics", "clothing", "home", "toys"], size=100)

itemfeatures = pd.DataFrame( {"id": itemids, "price": prices, "color": colors, "category": categories} )

Interactions dataframe

interactionuserids = np.random.choice(userids, size=1000) interactionitemids = np.random.choice(itemids, size=1000) purchased = np.random.choice([0, 1], size=1000, p=[0.7, 0.3]) interactions = pd.DataFrame( { "userid": interactionuserids, "itemid": interactionitemids, "purchased": purchased, } ) useritempurchased = interactions.merge( userfeatures, lefton="userid", righton="id" ).merge(itemfeatures, lefton="itemid", righton="id")

Users

tabpreprocessoruser = TabPreprocessor( catembedcols=["gender", "location"], continuouscols=["age"], ) Xuser = tabpreprocessoruser.fittransform(useritempurchased) tabmlpuser = TabMlp( columnidx=tabpreprocessoruser.columnidx, catembedinput=tabpreprocessoruser.catembedinput, continuouscols=["age"], mlphiddendims=[16, 8], mlp_dropout=[0.2, 0.2], )

Items

tabpreprocessoritem = TabPreprocessor( catembedcols=["color", "category"], continuouscols=["price"], ) Xitem = tabpreprocessoritem.fittransform(useritempurchased) tabmlpitem = TabMlp( columnidx=tabpreprocessoritem.columnidx, catembedinput=tabpreprocessoritem.catembedinput, continuouscols=["price"], mlphiddendims=[16, 8], mlp_dropout=[0.2, 0.2], )

twotowermodel = ModelFuser([tabmlpuser, tabmlpitem], fusion_method="dot")

model = WideDeep(deeptabular=twotowermodel)

trainer = Trainer(model, objective="binary")

trainer.fit( Xtab=[Xuser, Xitem], target=interactions.purchased.values, nepochs=1, batch_size=32, ) ```

8. Tabular with a multi-target loss

This one is "a bonus" to illustrate the use of multi-target losses, more than actually a different architecture.

```python from pytorchwidedeep.preprocessing import TabPreprocessor, TextPreprocessor, ImagePreprocessor from pytorchwidedeep.models import TabMlp, BasicRNN, WideDeep, ModelFuser, Vision from pytorchwidedeep.lossesmultitarget import MultiTargetClassificationLoss from pytorchwidedeep.models.basewdmodelcomponent import BaseWDModelComponent from pytorchwidedeep import Trainer

let's add a second target to the dataframe

df["target2"] = [random.choice([0, 1]) for _ in range(100)]

Tabular

tabpreprocessor = TabPreprocessor( embedcols=["city", "name"], continuouscols=["age", "height"] ) Xtab = tabpreprocessor.fittransform(df) tabmlp = TabMlp( columnidx=tabpreprocessor.columnidx, catembedinput=tabpreprocessor.catembedinput, continuouscols=tabpreprocessor.continuouscols, mlphiddendims=[64, 32], )

'pred_dim=2' because we have two binary targets. For other types of targets,

please, see the documentation

model = WideDeep(deeptabular=tabmlp, preddim=2).

loss = MultiTargetClassificationLoss(binary_config=[0, 1], reduction="mean")

When a multi-target loss is used, 'customlossfunction' must not be None.

See the docs

trainer = Trainer(model, objective="multitarget", customlossfunction=loss)

trainer.fit( Xtab=Xtab, target=df[["target", "target2"]].values, nepochs=1, batchsize=32, ) ```

The deeptabular component

It is important to emphasize again that each individual component, wide, deeptabular, deeptext and deepimage, can be used independently and in isolation. For example, one could use only wide, which is in simply a linear model. In fact, one of the most interesting functionalities inpytorch-widedeep would be the use of the deeptabular component on its own, i.e. what one might normally refer as Deep Learning for Tabular Data. Currently, pytorch-widedeep offers the following different models for that component:

  1. Wide: a simple linear model where the nonlinearities are captured via cross-product transformations, as explained before.
  2. TabMlp: a simple MLP that receives embeddings representing the categorical features, concatenated with the continuous features, which can also be embedded.
  3. TabResnet: similar to the previous model but the embeddings are passed through a series of ResNet blocks built with dense layers.
  4. TabNet: details on TabNet can be found in TabNet: Attentive Interpretable Tabular Learning

Two simpler attention based models that we call:

  1. ContextAttentionMLP: MLP with at attention mechanism "on top" that is based on Hierarchical Attention Networks for Document Classification
  2. SelfAttentionMLP: MLP with an attention mechanism that is a simplified version of a transformer block that we refer as "query-key self-attention".

The Tabformer family, i.e. Transformers for Tabular data:

  1. TabTransformer: details on the TabTransformer can be found in TabTransformer: Tabular Data Modeling Using Contextual Embeddings.
  2. SAINT: Details on SAINT can be found in SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training.
  3. FT-Transformer: details on the FT-Transformer can be found in Revisiting Deep Learning Models for Tabular Data.
  4. TabFastFormer: adaptation of the FastFormer for tabular data. Details on the Fasformer can be found in FastFormers: Highly Efficient Transformer Models for Natural Language Understanding
  5. TabPerceiver: adaptation of the Perceiver for tabular data. Details on the Perceiver can be found in Perceiver: General Perception with Iterative Attention

And probabilistic DL models for tabular data based on Weight Uncertainty in Neural Networks:

  1. BayesianWide: Probabilistic adaptation of the Wide model.
  2. BayesianTabMlp: Probabilistic adaptation of the TabMlp model

Note that while there are scientific publications for the TabTransformer, SAINT and FT-Transformer, the TabFasfFormer and TabPerceiver are our own adaptation of those algorithms for tabular data.

In addition, Self-Supervised pre-training can be used for all deeptabular models, with the exception of the TabPerceiver. Self-Supervised pre-training can be used via two methods or routines which we refer as: encoder-decoder method and constrastive-denoising method. Please, see the documentation and the examples for details on this functionality, and all other options in the library.

The rec module

This module was introduced as an extension to the existing components in the library, addressing questions and issues related to recommendation systems. While still under active development, it currently includes a select number of powerful recommendation models.

It's worth noting that this library already supported the implementation of various recommendation algorithms using existing components. For example, models like Wide and Deep, Two-Tower, or Neural Collaborative Filtering could be constructed using the library's core functionalities.

The recommendation algorithms in the rec module are:

  1. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks
  2. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
  3. (Deep) Field Aware Factorization Machine (FFM): a Deep Learning version of the algorithm presented in Field-aware Factorization Machines in a Real-world Online Advertising System
  4. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
  5. Deep Interest Network for Click-Through Rate Prediction
  6. Deep and Cross Network for Ad Click Predictions
  7. DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems
  8. Towards Deeper, Lighter and Interpretable Click-through Rate Prediction
  9. A basic Transformer-based model for recommendation where the problem is faced as a sequence.

See the examples for details on how to use these models.

Text and Images

For the text component, deeptext, the library offers the following models:

  1. BasicRNN: a simple RNN 2. AttentiveRNN: a RNN with an attention mechanism based on the Hierarchical Attention Networks for DocumentClassification
  2. StackedAttentiveRNN: a stack of AttentiveRNNs
  3. HFModel: a wrapper around Hugging Face Transfomer-based models. At the moment only models from the families BERT, RoBERTa, DistilBERT, ALBERT and ELECTRA are supported. This is because this library is designed to address classification and regression tasks and these are the most 'popular' encoder-only models, which have proved to be those that work best for these tasks. If there is demand for other models, they will be included in the future.

For the image component, deepimage, the library supports models from the following families: 'resnet', 'shufflenet', 'resnext', 'wide_resnet', 'regnet', 'densenet', 'mobilenetv3', 'mobilenetv2', 'mnasnet', 'efficientnet' and 'squeezenet'. These are offered via torchvision and wrapped up in the Vision class.

Installation

Install using pip:

bash pip install pytorch-widedeep

Or install directly from github

bash pip install git+https://github.com/jrzaurin/pytorch-widedeep.git

Developer Install

```bash

Clone the repository

git clone https://github.com/jrzaurin/pytorch-widedeep cd pytorch-widedeep

Install in dev mode

pip install -e . ```

Quick start

Here is an end-to-end example of a binary classification with the adult dataset using Wide and DeepDense and defaults settings.

Building a wide (linear) and deep model with pytorch-widedeep:

```python import numpy as np import torch from sklearn.modelselection import traintest_split

from pytorchwidedeep import Trainer from pytorchwidedeep.preprocessing import WidePreprocessor, TabPreprocessor from pytorchwidedeep.models import Wide, TabMlp, WideDeep from pytorchwidedeep.metrics import Accuracy from pytorchwidedeep.datasets import loadadult

df = loadadult(asframe=True) df["incomelabel"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int) df.drop("income", axis=1, inplace=True) dftrain, dftest = traintestsplit(df, testsize=0.2, stratify=df.income_label)

Define the 'column set up'

widecols = [ "education", "relationship", "workclass", "occupation", "native-country", "gender", ] crossedcols = [("education", "occupation"), ("native-country", "occupation")]

catembedcols = [ "workclass", "education", "marital-status", "occupation", "relationship", "race", "gender", "capital-gain", "capital-loss", "native-country", ] continuouscols = ["age", "hours-per-week"] target = "incomelabel" target = df_train[target].values

prepare the data

widepreprocessor = WidePreprocessor(widecols=widecols, crossedcols=crossedcols) Xwide = widepreprocessor.fittransform(df_train)

tabpreprocessor = TabPreprocessor( catembedcols=catembedcols, continuouscols=continuouscols # type: ignore[arg-type] ) Xtab = tabpreprocessor.fittransform(df_train)

build the model

wide = Wide(inputdim=np.unique(Xwide).shape[0], preddim=1) tabmlp = TabMlp( columnidx=tabpreprocessor.columnidx, catembedinput=tabpreprocessor.catembedinput, continuouscols=continuouscols, ) model = WideDeep(wide=wide, deeptabular=tab_mlp)

train and validate

trainer = Trainer(model, objective="binary", metrics=[Accuracy]) trainer.fit( Xwide=Xwide, Xtab=Xtab, target=target, nepochs=5, batchsize=256, )

predict on test

Xwidete = widepreprocessor.transform(dftest) Xtabte = tabpreprocessor.transform(dftest) preds = trainer.predict(Xwide=Xwidete, Xtab=Xtabte)

Save and load

Option 1: this will also save training history and lr history if the

LRHistory callback is used

trainer.save(path="modelweights", savestate_dict=True)

Option 2: save as any other torch model

torch.save(model.statedict(), "modelweights/wd_model.pt")

From here in advance, Option 1 or 2 are the same. I assume the user has

prepared the data and defined the new model components:

1. Build the model

modelnew = WideDeep(wide=wide, deeptabular=tabmlp) modelnew.loadstatedict(torch.load("modelweights/wd_model.pt"))

2. Instantiate the trainer

trainernew = Trainer(modelnew, objective="binary")

3. Either start the fit or directly predict

preds = trainernew.predict(Xwide=Xwide, Xtab=Xtab, batchsize=32) ```

Of course, one can do much more. See the Examples folder, the documentation or the companion posts for a better understanding of the content of the package and its functionalities.

Testing

pytest tests

How to Contribute

Check CONTRIBUTING page.

Acknowledgments

This library takes from a series of other libraries, so I think it is just fair to mention them here in the README (specific mentions are also included in the code).

The Callbacks and Initializers structure and code is inspired by the torchsample library, which in itself partially inspired by Keras.

The TextProcessor class in this library uses the fastai's Tokenizer and Vocab. The code at utils.fastai_transforms is a minor adaptation of their code so it functions within this library. To my experience their Tokenizer is the best in class.

The ImageProcessor class in this library uses code from the fantastic Deep Learning for Computer Vision (DL4CV) book by Adrian Rosebrock.

License

This work is dual-licensed under Apache 2.0 and MIT (or any later version). You can choose between one of them if you use this work.

SPDX-License-Identifier: Apache-2.0 AND MIT

Cite

BibTex

@article{Zaurin_pytorch-widedeep_A_flexible_2023, author = {Zaurin, Javier Rodriguez and Mulinka, Pavol}, doi = {10.21105/joss.05027}, journal = {Journal of Open Source Software}, month = jun, number = {86}, pages = {5027}, title = {{pytorch-widedeep: A flexible package for multimodal deep learning}}, url = {https://joss.theoj.org/papers/10.21105/joss.05027}, volume = {8}, year = {2023} }

APA

Zaurin, J. R., & Mulinka, P. (2023). pytorch-widedeep: A flexible package for multimodal deep learning. Journal of Open Source Software, 8(86), 5027. https://doi.org/10.21105/joss.05027

Owner

  • Name: Javier
  • Login: jrzaurin
  • Kind: user
  • Location: London

JOSS Publication

pytorch-widedeep: A flexible package for multimodal deep learning
Published
June 24, 2023
Volume 8, Issue 86, Page 5027
Authors
Javier Rodriguez Zaurin ORCID
Independent Researcher, Spain
Pavol Mulinka ORCID
Centre Tecnologic de Telecomunicacions de Catalunya (CTTC/CERCA), Catalunya, Spain
Editor
Øystein Sørensen ORCID
Tags
Pytorch Deep learning

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Zaurin
  given-names: Javier Rodriguez
  orcid: "https://orcid.org/0000-0002-1082-1107"
- family-names: Mulinka
  given-names: Pavol
  orcid: "https://orcid.org/0000-0002-9394-8794"
doi: 10.5281/zenodo.7908172
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Zaurin
    given-names: Javier Rodriguez
    orcid: "https://orcid.org/0000-0002-1082-1107"
  - family-names: Mulinka
    given-names: Pavol
    orcid: "https://orcid.org/0000-0002-9394-8794"
  date-published: 2023-06-24
  doi: 10.21105/joss.05027
  issn: 2475-9066
  issue: 86
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 5027
  title: "pytorch-widedeep: A flexible package for multimodal deep
    learning"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.05027"
  volume: 8
title: "pytorch-widedeep: A flexible package for multimodal deep
  learning"

GitHub Events

Total
  • Create event: 4
  • Issues event: 6
  • Release event: 1
  • Watch event: 83
  • Issue comment event: 19
  • Push event: 22
  • Pull request review event: 1
  • Pull request event: 12
  • Fork event: 10
Last Year
  • Create event: 4
  • Issues event: 6
  • Release event: 1
  • Watch event: 83
  • Issue comment event: 19
  • Push event: 22
  • Pull request review event: 1
  • Pull request event: 12
  • Fork event: 10

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 884
  • Total Committers: 12
  • Avg Commits per committer: 73.667
  • Development Distribution Score (DDS): 0.199
Past Year
  • Commits: 84
  • Committers: 3
  • Avg Commits per committer: 28.0
  • Development Distribution Score (DDS): 0.06
Top Committers
Name Email Commits
jrzaurin j****n@g****m 708
Pavol Mulinka m****l@g****m 101
Javier Rodriguez Zaurin j****n@J****l 48
Krish k****2@g****m 11
Hyo-kyun Park g****8@n****m 4
Javier Rodriguez Zaurin j****r@b****m 3
SuperThickHearter 5****r 2
LuoXueling s****n@s****n 2
Pavol Mulinka p****o@a****m 2
Minjin Choi z****d@g****m 1
Bruno b****a@d****u 1
Alexander Shirkov a****u@a****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 127
  • Total pull requests: 59
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 21 days
  • Total issue authors: 69
  • Total pull request authors: 9
  • Average comments per issue: 3.2
  • Average comments per pull request: 1.05
  • Merged pull requests: 41
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 14
  • Average time to close issues: 1 day
  • Average time to close pull requests: 3 days
  • Issue authors: 7
  • Pull request authors: 3
  • Average comments per issue: 1.63
  • Average comments per pull request: 1.36
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • osorensen (25)
  • xylovezxy (11)
  • davidfstein (4)
  • LinXin04 (4)
  • jrzaurin (4)
  • 5uperpalo (3)
  • Yiming-Deng (3)
  • zhang-HZAU (2)
  • TheLegendAli (2)
  • rv-iiita (2)
  • thatalfredh (2)
  • makoeppel (2)
  • altar31 (2)
  • gradientsky (2)
  • aendk (2)
Pull Request Authors
  • jrzaurin (37)
  • 5uperpalo (19)
  • kd1510 (3)
  • weishi-deng (2)
  • gradientsky (1)
  • thatalfredh (1)
  • BrunoBelucci (1)
  • changbiHub (1)
  • LuoXueling (1)
Top Labels
Issue Labels
bug (7) feature (2) enhancement (1)
Pull Request Labels
bug (1) feature (1) enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 3,389 last-month
  • Total dependent packages: 3
  • Total dependent repositories: 4
  • Total versions: 27
  • Total maintainers: 1
pypi.org: pytorch-widedeep

Combine tabular data with text and images using Wide and Deep models in Pytorch

  • Versions: 27
  • Dependent Packages: 3
  • Dependent Repositories: 4
  • Downloads: 3,389 Last month
Rankings
Stargazers count: 1.9%
Dependent packages count: 2.4%
Forks count: 3.8%
Average: 4.2%
Downloads: 5.6%
Dependent repos count: 7.5%
Maintainers (1)
Last synced: 4 months ago

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v2 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v2 composite
  • codecov/codecov-action v1 composite
docs/requirements.txt pypi
  • einops *
  • gensim *
  • imutils *
  • numpy *
  • opencv-contrib-python *
  • pandas *
  • recommonmark *
  • scikit-learn *
  • scipy *
  • spacy *
  • sphinx *
  • sphinx-autodoc-typehints *
  • sphinx-copybutton *
  • sphinx-markdown-tables *
  • sphinx_rtd_theme *
  • torch *
  • torchmetrics *
  • torchvision *
  • tqdm *
  • wrapt *
mkdocs/requirements.txt pypi
  • mkdocs *
  • mkdocs-autolinks-plugin *
  • mkdocs-git-authors-plugin *
  • mkdocs-jupyter *
  • mkdocs-material *
  • mkdocstrings *
  • mkdocstrings-python *
requirements.txt pypi
  • einops *
  • fastparquet >=0.8.1
  • gensim *
  • imutils *
  • numpy >=1.21.6
  • opencv-contrib-python *
  • pandas >=1.3.5
  • pyarrow *
  • scikit-learn >=1.0.2
  • scipy >=1.7.3
  • spacy *
  • torch *
  • torchmetrics *
  • torchvision *
  • tqdm *
  • wrapt *
setup.py pypi