Part 9 (5 points, coding task)
Part 9.1
Define your own collate function.
-
The function name is
my_collate_fn. -
Padding
-
For text data, let the longest sample be with
Ktokens. -
Consider another text sample with
Ltokens satisfyingL < K. Then, in addition to thoseLtokens, this sample is padded withK-Lpadding tokens whose values are 0.
-
-
Outputs
-
token_id_batch. If the batch size isBand the longest sample in the text data hasKtokens, thentoken_id_batchis a tensor with shape(B,K). -
attention_mask_batch. This is a tensor that has shape(B,K). If a position is occupied by a non-padding token, its value is 1. Otherwise, if it is occupied by a padding token, its value is 0. Data types areint64. -
image_batch. This is a tensor that has shape(B,3,224,224).
-
Part 9.2
Define a DataLoader object called CLIP_dataloader.
-
Set
batch_size = 16. -
Set
shuffle = True. -
Use the collate function defined in Part 10.