Combine Methods

This page explains the methods that are supported by multimodal_transformers.tabular_combiner.TabularFeatCombiner. See the table for details.

If you have rich categorical and numerical features any of the attention, gating, or weighted sum methods are worth trying.

The following describes each supported method and whether or not it requires both categorical and numerical features.

Combine Feat Method Description requires both cat and num features
text_only Uses just the text columns as processed by transformer before final classifier layer(s). Essentially equivalent to HuggingFace's ForSequenceClassification models False
concat Concatenate transformer output, numerical feats, and categorical feats all at once before final classifier layer(s) False
mlp_on_categorical_then_concat MLP on categorical feats then concat transformer output, numerical feats, and processed categorical feats before final classifier layer(s) False (Requires cat feats)
individual_mlps_on_cat_and_numerical_feats_then_concat Separate MLPs on categorical feats and numerical feats then concatenation of transformer output, with processed numerical feats, and processed categorical feats before final classifier layer(s). False
mlp_on_concatenated_cat_and_numerical_feats_then_concat MLP on concatenated categorical and numerical feat then concatenated with transformer output before final classifier layer(s) True
attention_on_cat_and_numerical_feats Attention based summation of transformer outputs, numerical feats, and categorical feats queried by transformer outputs before final classifier layer(s). False
gating_on_cat_and_num_feats_then_sum Gated summation of transformer outputs, numerical feats, and categorical feats before final classifier layer(s). Inspired by Integrating Multimodal Information in Large Pretrained Transformers which performs the mechanism for each token. False
weighted_feature_sum_on_transformer_cat_and_numerical_feats Learnable weighted feature-wise sum of transformer outputs, numerical feats and categorical feats for each feature dimension before final classifier layer(s) False

This table shows the the equations involved with each method. First we define some notation

  • https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cmathbf%7Bm%7Dequation  denotes the combined multimodal features

  • https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cmathbf%7Bx%7Dequation  denotes the output text features from the transformer

  • https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cmathbf%7Bc%7Dequation  denotes the categorical features

  • https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cmathbf%7Bn%7Dequation  denotes the numerical features

  • https://latex.codecogs.com/svg.latex?%5Cinline%20h_%7B%5Cmathbf%7B%5CTheta%7D%7Dequation denotes a MLP parameterized by https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cmathbf%7B%5CTheta%7Dequation

  • https://latex.codecogs.com/svg.latex?%5Cmathbf%7BW%7Dequation  denotes a weight matrix

  • https://latex.codecogs.com/svg.latex?bequation  denotes a scalar bias

Combine Feat Method Equation
text_only equation
concat equation
mlp_on_categorical_then_concat equation
individual_mlps_on_cat_and_
numerical_feats_then_concat
equation
mlp_on_concatenated_cat_and_
numerical_feats_then_concat
equation
attention_on_cat_and_numerical_feats equation

where

equation
gating_on_cat_and_num_feats_
then_sum
equation

equation

equation

equation

where equation is a hyperparameter and equation is an activation function
weighted_feature_sum_on_transformer_
cat_and_numerical_feats
equation