Combine Methods¶

This page explains the methods that are supported by multimodal_transformers.tabular_combiner.TabularFeatCombiner. See the table for details.

If you have rich categorical and numerical features any of the attention, gating, or weighted sum methods are worth trying.

The following describes each supported method and whether or not it requires both categorical and numerical features.

Combine Feat Method	Description	requires both cat and num features
text_only	Uses just the text columns as processed by transformer before final classifier layer(s). Essentially equivalent to HuggingFace's `ForSequenceClassification` models	False
concat	Concatenate transformer output, numerical feats, and categorical feats all at once before final classifier layer(s)	False
mlp_on_categorical_then_concat	MLP on categorical feats then concat transformer output, numerical feats, and processed categorical feats before final classifier layer(s)	False (Requires cat feats)
individual_mlps_on_cat_and_numerical_feats_then_concat	Separate MLPs on categorical feats and numerical feats then concatenation of transformer output, with processed numerical feats, and processed categorical feats before final classifier layer(s).	False
mlp_on_concatenated_cat_and_numerical_feats_then_concat	MLP on concatenated categorical and numerical feat then concatenated with transformer output before final classifier layer(s)	True
attention_on_cat_and_numerical_feats	Attention based summation of transformer outputs, numerical feats, and categorical feats queried by transformer outputs before final classifier layer(s).	False
gating_on_cat_and_num_feats_then_sum	Gated summation of transformer outputs, numerical feats, and categorical feats before final classifier layer(s). Inspired by Integrating Multimodal Information in Large Pretrained Transformers which performs the mechanism for each token.	False
weighted_feature_sum_on_transformer_cat_and_numerical_feats	Learnable weighted feature-wise sum of transformer outputs, numerical feats and categorical feats for each feature dimension before final classifier layer(s)	False

This table shows the the equations involved with each method. First we define some notation

$https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cmathbf%7Bm%7D$ equation denotes the combined multimodal features
$https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cmathbf%7Bx%7D$ equation denotes the output text features from the transformer
$https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cmathbf%7Bc%7D$ equation denotes the categorical features
$https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cmathbf%7Bn%7D$ equation denotes the numerical features
$https://latex.codecogs.com/svg.latex?%5Cinline%20h_%7B%5Cmathbf%7B%5CTheta%7D%7D$ equation denotes a MLP parameterized by $https://latex.codecogs.com/svg.latex?%5Cinline%20%5Cmathbf%7B%5CTheta%7D$ equation
$https://latex.codecogs.com/svg.latex?%5Cmathbf%7BW%7D$ equation denotes a weight matrix
$https://latex.codecogs.com/svg.latex?b$ equation denotes a scalar bias

Combine Feat Method	Equation
text_only	$\mathbf{m} = \mathbf{x}$
concat	$\mathbf{m} = \mathbf{x} \, \Vert \, \mathbf{c} \, \Vert \, \mathbf{n}$
mlp_on_categorical_then_concat	$\mathbf{m} = \mathbf{x} \, \Vert \, h_{\mathbf{\Theta}}( \mathbf{c}) \, \Vert \, \mathbf{n}$
individual_mlps_on_cat_and_ numerical_feats_then_concat	$\mathbf{m} = \mathbf{x} \, \Vert \, h_{\mathbf{\Theta_c}}( \mathbf{c}) \, \Vert \, h_{\mathbf{\Theta_n}}(\mathbf{n})$
mlp_on_concatenated_cat_and_ numerical_feats_then_concat	$\mathbf{m} = \mathbf{x} \, \Vert \, h_{\mathbf{\Theta}}( \mathbf{c} \, \Vert \, \mathbf{n})$
attention_on_cat_and_numerical_feats	$\mathbf{m} = \alpha_{x,x}\mathbf{W}_x\mathbf{x} + \alpha_{x,c}\mathbf{W}_c\mathbf{c} + \alpha_{x,n}\mathbf{W}_n\mathbf{n}$ where $\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{W}_i\mathbf{x}_i \, \Vert \, \mathbf{W}_j\mathbf{x}_j] \right)\right)} {\sum_{k \in \{ x, c, n \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{W}_i\mathbf{x}_i \, \Vert \, \mathbf{W}_k\mathbf{x}_k] \right)\right)}.$
gating_on_cat_and_num_feats_ then_sum	$\mathbf{m}= \mathbf{x} + \alpha\mathbf{h}$ $\mathbf{h} = \mathbf{g_c} \odot (\mathbf{W}_c\mathbf{c}) + \mathbf{g_n} \odot (\mathbf{W}_n\mathbf{n}) + b_h$ $\alpha = \mathrm{min}( \frac{\\| \mathbf{x} \\|_2}{\\| \mathbf{h} \\|_2}*\beta, 1)$ $\mathbf{g}_i = R(\mathbf{W}_{gi}[\mathbf{i} \, \Vert \, \mathbf{x}]+ b_i)$ where $\beta$ is a hyperparameter and $R$ is an activation function
weighted_feature_sum_on_transformer_ cat_and_numerical_feats	$\mathbf{m} = \mathbf{x} + \mathbf{W}_{c'} \odot \mathbf{W}_c \mathbf{c} + \mathbf{W}_{n'} \odot \mathbf{W}_n \mathbf{n}$