multimodal_transformers.model¶
Tabular Feature Combiner¶
-
class
TabularFeatCombiner
(tabular_config)[source]¶ Bases:
torch.nn.modules.module.Module
Combiner module for combining text features with categorical and numerical features The methods of combining, specified by
tabular_config.combine_feat_method
are shown below. \(\mathbf{m}\) denotes the combined multimodal features, \(\mathbf{x}\) denotes the output text features from the transformer, \(\mathbf{c}\) denotes the categorical features, \(\mathbf{t}\) denotes the numerical features, \(h_{\mathbf{\Theta}}\) denotes a MLP parameterized by \(\Theta\), \(W\) denotes a weight matrix, and \(b\) denotes a scalar biastext_only
\[\mathbf{m} = \mathbf{x}\]concat
\[\mathbf{m} = \mathbf{x} \, \Vert \, \mathbf{c} \, \Vert \, \mathbf{n}\]mlp_on_categorical_then_concat
\[\mathbf{m} = \mathbf{x} \, \Vert \, h_{\mathbf{\Theta}}( \mathbf{c}) \, \Vert \, \mathbf{n}\]individual_mlps_on_cat_and_numerical_feats_then_concat
\[\mathbf{m} = \mathbf{x} \, \Vert \, h_{\mathbf{\Theta_c}}( \mathbf{c}) \, \Vert \, h_{\mathbf{\Theta_n}}(\mathbf{n})\]mlp_on_concatenated_cat_and_numerical_feats_then_concat
\[\mathbf{m} = \mathbf{x} \, \Vert \, h_{\mathbf{\Theta}}( \mathbf{c} \, \Vert \, \mathbf{n})\]attention_on_cat_and_numerical_feats self attention on the text features
\[\mathbf{m} = \alpha_{x,x}\mathbf{W}_x\mathbf{x} + \alpha_{x,c}\mathbf{W}_c\mathbf{c} + \alpha_{x,n}\mathbf{W}_n\mathbf{n}\]where \(\mathbf{W}_x\) is of shape
(out_dim, text_feat_dim)
, \(\mathbf{W}_c\) is of shape(out_dim, cat_feat_dim)
, \(\mathbf{W}_n\) is of shape(out_dim, num_feat_dim)
, and the attention coefficients \(\alpha_{i,j}\) are computed as\[\alpha_{i,j} = \frac{ \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{W}_i\mathbf{x}_i \, \Vert \, \mathbf{W}_j\mathbf{x}_j] \right)\right)} {\sum_{k \in \{ x, c, n \}} \exp\left(\mathrm{LeakyReLU}\left(\mathbf{a}^{\top} [\mathbf{W}_i\mathbf{x}_i \, \Vert \, \mathbf{W}_k\mathbf{x}_k] \right)\right)}.\]gating_on_cat_and_num_feats_then_sum sum of features gated by text features. Inspired by the gating mechanism introduced in Integrating Multimodal Information in Large Pretrained Transformers
\[\mathbf{m}= \mathbf{x} + \alpha\mathbf{h}\]\[\mathbf{h} = \mathbf{g_c} \odot (\mathbf{W}_c\mathbf{c}) + \mathbf{g_n} \odot (\mathbf{W}_n\mathbf{n}) + b_h\]\[\alpha = \mathrm{min}( \frac{\| \mathbf{x} \|_2}{\| \mathbf{h} \|_2}*\beta, 1)\]where \(\beta\) is a hyperparamter, \(\mathbf{W}_c\) is of shape
(out_dim, cat_feat_dim)
, \(\mathbf{W}_n\) is of shape(out_dim, num_feat_dim)
. and the gating vector \(\mathbf{g}_i\) with activation function \(R\) is defined as\[\mathbf{g}_i = R(\mathbf{W}_{gi}[\mathbf{i} \, \Vert \, \mathbf{x}]+ b_i)\]where \(\mathbf{W}_{gi}\) is of shape
(out_dim, i_feat_dim + text_feat_dim)
weighted_feature_sum_on_transformer_cat_and_numerical_feats
\[\mathbf{m} = \mathbf{x} + \mathbf{W}_{c'} \odot \mathbf{W}_c \mathbf{c} + \mathbf{W}_{n'} \odot \mathbf{W}_n \mathbf{t}\]
- Parameters
tabular_config (
TabularConfig
) – Tabular model configuration class with all the parameters of the model.
-
forward
(text_feats, cat_feats=None, numerical_feats=None)[source]¶ - Parameters
text_feats (
torch.FloatTensor
of shape(batch_size, text_out_dim)
) – The tensor of text features. This is assumed to be the output from a HuggingFace transformer modelcat_feats (
torch.FloatTensor
of shape(batch_size, cat_feat_dim)
, optional, defaults toNone
)) – The tensor of categorical featuresnumerical_feats (
torch.FloatTensor
of shape(batch_size, numerical_feat_dim)
, optional, defaults toNone
) – The tensor of numerical features
- Returns
A tensor representing the combined features
- Return type
torch.FloatTensor
of shape(batch_size, final_out_dim)
Tabular Config¶
-
class
TabularConfig
(num_labels, mlp_division=4, combine_feat_method='text_only', mlp_dropout=0.1, numerical_bn=True, use_simple_classifier=True, mlp_act='relu', gating_beta=0.2, numerical_feat_dim=0, cat_feat_dim=0, **kwargs)[source]¶ Bases:
object
Config used for tabular combiner
- Parameters
mlp_division (int) – how much to decrease each MLP dim for each additional layer
combine_feat_method (str) – The method to combine categorical and numerical features. See
TabularFeatCombiner
for details on the supported methods.mlp_dropout (float) – dropout ratio used for MLP layers
numerical_bn (bool) – whether to use batchnorm on numerical features
use_simple_classifier (bool) – whether to use single layer or MLP as final classifier
mlp_act (str) – the activation function to use for finetuning layers
gating_beta (float) –
the beta hyperparameters used for gating tabular data see the paper Integrating Multimodal Information in Large Pretrained Transformers for details
numerical_feat_dim (int) – the number of numerical features
cat_feat_dim (int) – the number of categorical features
AutoModel with Tabular¶
-
class
AutoModelWithTabular
[source]¶ Bases:
object
-
classmethod
from_config
(config)[source]¶ Instantiates one of the base model classes of the library from a configuration.
Note
Only the models in multimodal_transformers.py are implemented
- Parameters
config (
PretrainedConfig
) –- The model class to instantiate is selected based on the configuration class:
see multimodal_transformers.py for supported transformer models
Examples:
config = BertConfig.from_pretrained('bert-base-uncased') # Download configuration from S3 and cache. model = AutoModelWithTabular.from_config(config) # E.g. model was saved using `save_pretrained('./test/saved_model/')`
-
classmethod
from_pretrained
(pretrained_model_name_or_path, *model_args, **kwargs)[source]¶ Instantiates one of the sequence classification model classes of the library from a pre-trained model configuration. See multimodal_transformers.py for supported transformer models
The from_pretrained() method takes care of returning the correct model class instance based on the model_type property of the config object, or when it’s missing, falling back to using pattern matching on the pretrained_model_name_or_path string:
The model is set in evaluation mode by default using model.eval() (Dropout modules are deactivated) To train the model, you should first set it back in training mode with model.train()
- Parameters
pretrained_model_name_or_path –
either:
a string with the shortcut name of a pre-trained model to load from cache or download, e.g.:
bert-base-uncased
.a string with the identifier name of a pre-trained model that was user-uploaded to our S3, e.g.:
dbmdz/bert-base-german-cased
.a path to a directory containing model weights saved using
save_pretrained()
, e.g.:./my_model_directory/
.a path or url to a tensorflow index checkpoint file (e.g. ./tf_model/model.ckpt.index). In this case,
from_tf
should be set to True and a configuration object should be provided asconfig
argument. This loading path is slower than converting the TensorFlow checkpoint in a PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
model_args – (optional) Sequence of positional arguments: All remaining positional arguments will be passed to the underlying model’s
__init__
methodconfig –
(optional) instance of a class derived from
PretrainedConfig
: Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:the model is a model provided by the library (loaded with the
shortcut-name
string of a pretrained model), orthe model was saved using
save_pretrained()
and is reloaded by suppling the save directory.the model is loaded by suppling a local directory as
pretrained_model_name_or_path
and a configuration JSON file named config.json is found in the directory.
state_dict – (optional) dict: an optional state dictionary for the model to use instead of a state dictionary loaded from saved weights file. This option can be used if you want to create a model from a pretrained configuration but load your own weights. In this case though, you should check if using
save_pretrained()
andfrom_pretrained()
is not a simpler option.cache_dir – (optional) string: Path to a directory in which a downloaded pre-trained model configuration should be cached if the standard cache should not be used.
force_download – (optional) boolean, default False: Force to (re-)download the model weights and configuration files and override the cached versions if they exists.
resume_download – (optional) boolean, default False: Do not delete incompletely recieved file. Attempt to resume the download if such a file exists.
proxies – (optional) dict, default None: A dictionary of proxy servers to use by protocol or endpoint, e.g.: {‘http’: ‘foo.bar:3128’, ‘http://hostname’: ‘foo.bar:4012’}. The proxies are used on each request.
output_loading_info – (optional) boolean: Set to
True
to also return a dictionary containing missing keys, unexpected keys and error messages.kwargs – (optional) Remaining dictionary of keyword arguments: These arguments will be passed to the configuration and the model.
Examples:
model = AutoModelWithTabular.from_pretrained('bert-base-uncased') # Download model and configuration from S3 and cache. model = AutoModelWithTabular.from_pretrained('./test/bert_model/') # E.g. model was saved using `save_pretrained('./test/saved_model/')` assert model.config.output_attention == True # Loading from a TF checkpoint file instead of a PyTorch model (slower) config = AutoConfig.from_json_file('./tf_model/bert_tf_model_config.json') model = AutoModelWithTabular.from_pretrained('./tf_model/bert_tf_checkpoint.ckpt.index', from_tf=True, config=config)
-
classmethod
Transformers with Tabular¶
-
class
AlbertWithTabular
(hf_model_config)[source]¶ Bases:
transformers.modeling_albert.AlbertForSequenceClassification
ALBERT Model transformer with a sequence classification/regression head as well as a TabularFeatCombiner module to combine categorical and numerical features with the Roberta pooled output
- Parameters
hf_model_config (
AlbertConfig
) – Model configuration class with all the parameters of the model. This object must also have a tabular_config member variable that is aTabularConfig
instance specifying the configs forTabularFeatCombiner
-
forward
(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None, class_weights=None, cat_feats=None, numerical_feats=None)[source]¶ The
AlbertWithTabular
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
transformers.AlbertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens.token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0
corresponds to a sentence A token,1
corresponds to a sentence B tokenposition_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
[0, config.max_position_embeddings - 1]
.head_mask (
torch.FloatTensor
of shape(num_heads,)
or(num_layers, num_heads)
, optional, defaults toNone
) – Mask to nullify selected heads of the self-attention modules. Mask values selected in[0, 1]
:1
indicates the head is not masked,0
indicates the head is masked.inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional, defaults toNone
) – If set toTrue
, the attentions tensors of all attention layers are returned. Seeattentions
under returned tensors for more detail.labels (
torch.LongTensor
of shape(batch_size,)
, optional, defaults toNone
) – Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]
. Ifconfig.num_labels == 1
a regression loss is computed (Mean-Square loss), Ifconfig.num_labels > 1
a classification loss is computed (Cross-Entropy).
-
class
BertWithTabular
(hf_model_config)[source]¶ Bases:
transformers.modeling_bert.BertForSequenceClassification
Bert Model transformer with a sequence classification/regression head as well as a TabularFeatCombiner module to combine categorical and numerical features with the Bert pooled output
- Parameters
hf_model_config (
BertConfig
) – Model configuration class with all the parameters of the model. This object must also have a tabular_config member variable that is aTabularConfig
instance specifying the configs forTabularFeatCombiner
-
forward
(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, class_weights=None, output_attentions=None, output_hidden_states=None, cat_feats=None, numerical_feats=None)[source]¶ The
BertWithTabular
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
transformers.BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens.token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0
corresponds to a sentence A token,1
corresponds to a sentence B tokenposition_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
[0, config.max_position_embeddings - 1]
.head_mask (
torch.FloatTensor
of shape(num_heads,)
or(num_layers, num_heads)
, optional, defaults toNone
) – Mask to nullify selected heads of the self-attention modules. Mask values selected in[0, 1]
:1
indicates the head is not masked,0
indicates the head is masked.inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.encoder_hidden_states (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if the model is configured as a decoder.encoder_attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) – Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in the cross-attention if the model is configured as a decoder. Mask values selected in[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens.output_attentions (
bool
, optional, defaults toNone
) – If set toTrue
, the attentions tensors of all attention layers are returned. Seeattentions
under returned tensors for more detail.class_weights (
torch.FloatTensor
of shape(tabular_config.num_labels,)
, optional, defaults toNone
) – Class weights to be used for cross entropy loss function for classification tasklabels (
torch.LongTensor
of shape(batch_size,)
, optional, defaults toNone
) – Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]
. Iftabular_config.num_labels == 1
a regression loss is computed (Mean-Square loss), Iftabular_config.num_labels > 1
a classification loss is computed (Cross-Entropy).cat_feats (
torch.FloatTensor
of shape(batch_size, tabular_config.cat_feat_dim)
, optional, defaults toNone
) – Categorical features to be passed in to the TabularFeatCombinernumerical_feats (
torch.FloatTensor
of shape(batch_size, tabular_config.numerical_feat_dim)
, optional, defaults toNone
) – Numerical features to be passed in to the TabularFeatCombiner
- Returns
- loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabel
is provided): Classification (or regression if tabular_config.num_labels==1) loss.
- logits (
torch.FloatTensor
of shape(batch_size, tabular_config.num_labels)
): Classification (or regression if tabular_config.num_labels==1) scores (before SoftMax).
- classifier_layer_outputs(
list
oftorch.FloatTensor
): The outputs of each layer of the final classification layers. The 0th index of this list is the combining module’s output
- loss (
- Return type
tuple
comprising various elements depending on configuration and inputs
-
class
DistilBertWithTabular
(hf_model_config)[source]¶ Bases:
transformers.modeling_distilbert.DistilBertForSequenceClassification
DistilBert Model transformer with a sequence classification/regression head as well as a TabularFeatCombiner module to combine categorical and numerical features with the Roberta pooled output
- Parameters
hf_model_config (
DistilBertConfig
) – Model configuration class with all the parameters of the model. This object must also have a tabular_config member variable that is aTabularConfig
instance specifying the configs forTabularFeatCombiner
-
forward
(input_ids=None, attention_mask=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, class_weights=None, cat_feats=None, numerical_feats=None)[source]¶ The
DistilBertWithTabular
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
transformers.DistilBertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens.head_mask (
torch.FloatTensor
of shape(num_heads,)
or(num_layers, num_heads)
, optional, defaults toNone
) – Mask to nullify selected heads of the self-attention modules. Mask values selected in[0, 1]
:1
indicates the head is not masked,0
indicates the head is masked.inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional, defaults toNone
) – If set toTrue
, the attentions tensors of all attention layers are returned. Seeattentions
under returned tensors for more detail.class_weights (
torch.FloatTensor
of shape(tabular_config.num_labels,)
,`optional`, defaults toNone
) – Class weights to be used for cross entropy loss function for classification tasklabels (
torch.LongTensor
of shape(batch_size,)
, optional, defaults toNone
) – Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]
. Iftabular_config.num_labels == 1
a regression loss is computed (Mean-Square loss), Iftabular_config.num_labels > 1
a classification loss is computed (Cross-Entropy).cat_feats (
torch.FloatTensor
of shape(batch_size, tabular_config.cat_feat_dim)
,`optional`, defaults toNone
) – Categorical features to be passed in to the TabularFeatCombinernumerical_feats (
torch.FloatTensor
of shape(batch_size, tabular_config.numerical_feat_dim)
,`optional`, defaults toNone
) – Numerical features to be passed in to the TabularFeatCombiner
- Returns
- loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabel
is provided): Classification (or regression if tabular_config.num_labels==1) loss.
- logits (
torch.FloatTensor
of shape(batch_size, tabular_config.num_labels)
): Classification (or regression if tabular_config.num_labels==1) scores (before SoftMax).
- classifier_layer_outputs(
list
oftorch.FloatTensor
): The outputs of each layer of the final classification layers. The 0th index of this list is the combining module’s output
- loss (
- Return type
tuple
comprising various elements depending on configuration and inputs
-
class
RobertaWithTabular
(hf_model_config)[source]¶ Bases:
transformers.modeling_roberta.RobertaForSequenceClassification
Roberta Model transformer with a sequence classification/regression head as well as a TabularFeatCombiner module to combine categorical and numerical features with the Roberta pooled output
- Parameters
hf_model_config (
RobertaConfig
) – Model configuration class with all the parameters of the model. This object must also have a tabular_config member variable that is aTabularConfig
instance specifying the configs forTabularFeatCombiner
-
forward
(input_ids=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, class_weights=None, cat_feats=None, numerical_feats=None)[source]¶ The
RobertaWithTabular
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
transformers.RobertaTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens.token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0
corresponds to a sentence A token,1
corresponds to a sentence B tokenposition_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
[0, config.max_position_embeddings - 1]
.head_mask (
torch.FloatTensor
of shape(num_heads,)
or(num_layers, num_heads)
, optional, defaults toNone
) – Mask to nullify selected heads of the self-attention modules. Mask values selected in[0, 1]
:1
indicates the head is not masked,0
indicates the head is masked.inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional, defaults toNone
) – If set toTrue
, the attentions tensors of all attention layers are returned. Seeattentions
under returned tensors for more detail.class_weights (
torch.FloatTensor
of shape(tabular_config.num_labels,)
, optional, defaults toNone
) – Class weights to be used for cross entropy loss function for classification tasklabels (
torch.LongTensor
of shape(batch_size,)
, optional, defaults toNone
) – Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]
. Iftabular_config.num_labels == 1
a regression loss is computed (Mean-Square loss), Iftabular_config.num_labels > 1
a classification loss is computed (Cross-Entropy).cat_feats (
torch.FloatTensor
of shape(batch_size, tabular_config.cat_feat_dim)
, optional, defaults toNone
) – Categorical features to be passed in to the TabularFeatCombinernumerical_feats (
torch.FloatTensor
of shape(batch_size, tabular_config.numerical_feat_dim)
, optional, defaults toNone
) – Numerical features to be passed in to the TabularFeatCombiner
- Returns
- loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabel
is provided): Classification (or regression if tabular_config.num_labels==1) loss.
- logits (
torch.FloatTensor
of shape(batch_size, tabular_config.num_labels)
): Classification (or regression if tabular_config.num_labels==1) scores (before SoftMax).
- classifier_layer_outputs(
list
oftorch.FloatTensor
): The outputs of each layer of the final classification layers. The 0th index of this list is the combining module’s output
- loss (
- Return type
tuple
comprising various elements depending on configuration and inputs
-
class
XLMRobertaWithTabular
(hf_model_config)[source]¶ Bases:
multimodal_transformers.model.tabular_transformers.RobertaWithTabular
This class overrides
RobertaWithTabular
. Please check the superclass for the appropriate documentation alongside usage examples.-
config_class
¶ alias of
transformers.configuration_xlm_roberta.XLMRobertaConfig
-
-
class
XLMWithTabular
(hf_model_config)[source]¶ Bases:
transformers.modeling_xlm.XLMForSequenceClassification
XLM Model transformer with a sequence classification/regression head as well as a TabularFeatCombiner module to combine categorical and numerical features with the Roberta pooled output
- Parameters
hf_model_config (
XLMConfig
) – Model configuration class with all the parameters of the model. This object must also have a tabular_config member variable that is aTabularConfig
instance specifying the configs forTabularFeatCombiner
-
forward
(input_ids=None, attention_mask=None, langs=None, token_type_ids=None, position_ids=None, lengths=None, cache=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None, class_weights=None, cat_feats=None, numerical_feats=None)[source]¶ The
XLMWithTabular
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
transformers.BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens.langs (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –A parallel sequence of tokens to be used to indicate the language of each token in the input. Indices are languages ids which can be obtained from the language names by using two conversion mappings provided in the configuration of the model (only provided for multilingual models). More precisely, the language name -> language id mapping is in model.config.lang2id (dict str -> int) and the language id -> language name mapping is model.config.id2lang (dict int -> str).
See usage examples detailed in the multilingual documentation.
token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0
corresponds to a sentence A token,1
corresponds to a sentence B tokenposition_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Indices of positions of each input sequence tokens in the position embeddings. Selected in the range
[0, config.max_position_embeddings - 1]
.lengths (
torch.LongTensor
of shape(batch_size,)
, optional, defaults toNone
) – Length of each sentence that can be used to avoid performing attention on padding token indices. You can also use attention_mask for the same result (see above), kept here for compatbility. Indices selected in[0, ..., input_ids.size(-1)]
:cache (
Dict[str, torch.FloatTensor]
, optional, defaults toNone
) – dictionary withtorch.FloatTensor
that contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model (see cache output below). Can be used to speed up sequential decoding. The dictionary object will be modified in-place during the forward pass to add newly computed hidden-states.head_mask (
torch.FloatTensor
of shape(num_heads,)
or(num_layers, num_heads)
, optional, defaults toNone
) – Mask to nullify selected heads of the self-attention modules. Mask values selected in[0, 1]
:1
indicates the head is not masked,0
indicates the head is masked.inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.output_attentions (
bool
, optional, defaults toNone
) – If set toTrue
, the attentions tensors of all attention layers are returned. Seeattentions
under returned tensors for more detail.labels (
torch.LongTensor
of shape(batch_size,)
, optional, defaults toNone
) – Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]
. Ifconfig.num_labels == 1
a regression loss is computed (Mean-Square loss), Ifconfig.num_labels > 1
a classification loss is computed (Cross-Entropy).
-
class
XLNetWithTabular
(hf_model_config)[source]¶ Bases:
transformers.modeling_xlnet.XLNetForSequenceClassification
XLNet Model transformer with a sequence classification/regression head as well as a TabularFeatCombiner module to combine categorical and numerical features with the Roberta pooled output
- Parameters
hf_model_config (
XLNetConfig
) – Model configuration class with all the parameters of the model. This object must also have a tabular_config member variable that is aTabularConfig
instance specifying the configs forTabularFeatCombiner
-
forward
(input_ids=None, attention_mask=None, mems=None, perm_mask=None, target_mapping=None, token_type_ids=None, input_mask=None, head_mask=None, inputs_embeds=None, labels=None, use_cache=None, output_attentions=None, output_hidden_states=None, return_dict=None, class_weights=None, cat_feats=None, numerical_feats=None)[source]¶ The
XLNetWithTabular
forward method, overrides the__call__()
special method.Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.- Parameters
input_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
) –Indices of input sequence tokens in the vocabulary.
Indices can be obtained using
transformers.BertTokenizer
. Seetransformers.PreTrainedTokenizer.encode()
andtransformers.PreTrainedTokenizer.__call__()
for details.attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Mask to avoid performing attention on padding token indices. Mask values selected in
[0, 1]
:1
for tokens that are NOT MASKED,0
for MASKED tokens.mems (
List[torch.FloatTensor]
of lengthconfig.n_layers
) – Contains pre-computed hidden-states (key and values in the attention blocks) as computed by the model (see mems output below). Can be used to speed up sequential decoding. The token ids which have their mems given to this model should not be passed as input ids as they have already been computed. use_cache has to be set to True to make use of mems.perm_mask (
torch.FloatTensor
of shape(batch_size, sequence_length, sequence_length)
, optional, defaults toNone
) – Mask to indicate the attention pattern for each input token with values selected in[0, 1]
: Ifperm_mask[k, i, j] = 0
, i attend to j in batch k; ifperm_mask[k, i, j] = 1
, i does not attend to j in batch k. If None, each token attends to all the others (full bidirectional attention). Only used during pretraining (to define factorization order) or for sequential decoding (generation).target_mapping (
torch.FloatTensor
of shape(batch_size, num_predict, sequence_length)
, optional, defaults toNone
) – Mask to indicate the output tokens to use. Iftarget_mapping[k, i, j] = 1
, the i-th predict in batch k is on the j-th token. Only used during pretraining for partial prediction or for sequential decoding (generation).token_type_ids (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) –Segment token indices to indicate first and second portions of the inputs. Indices are selected in
[0, 1]
:0
corresponds to a sentence A token,1
corresponds to a sentence B token. The classifier token should be represented by a2
.input_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional, defaults toNone
) – Mask to avoid performing attention on padding token indices. Negative of attention_mask, i.e. with 0 for real tokens and 1 for padding. Kept for compatibility with the original code base. You can only uses one of input_mask and attention_mask Mask values selected in[0, 1]
:1
for tokens that are MASKED,0
for tokens that are NOT MASKED.head_mask (
torch.FloatTensor
of shape(num_heads,)
or(num_layers, num_heads)
, optional, defaults toNone
) – Mask to nullify selected heads of the self-attention modules. Mask values selected in[0, 1]
:1
indicates the head is not masked,0
indicates the head is masked.inputs_embeds (
torch.FloatTensor
of shape(batch_size, sequence_length, hidden_size)
, optional, defaults toNone
) – Optionally, instead of passinginput_ids
you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.use_cache (
bool
) – If use_cache is True, mems are returned and can be used to speed up decoding (see mems). Defaults to True.output_attentions (
bool
, optional, defaults toNone
) – If set toTrue
, the attentions tensors of all attention layers are returned. Seeattentions
under returned tensors for more detail.labels (
torch.LongTensor
of shape(batch_size,)
, optional, defaults toNone
) – Labels for computing the sequence classification/regression loss. Indices should be in[0, ..., config.num_labels - 1]
. Ifconfig.num_labels == 1
a regression loss is computed (Mean-Square loss), Ifconfig.num_labels > 1
a classification loss is computed (Cross-Entropy).