Part 9 (5 points, coding task)
Part 9.1
Define your own collate function.
-
The function name is
my_collate_fn
. -
Padding
-
For text data, let the longest sample be with
K
tokens. -
Consider another text sample with
L
tokens satisfyingL < K
. Then, in addition to thoseL
tokens, this sample is padded withK-L
padding tokens whose values are 0.
-
-
Outputs
-
token_id_batch
. If the batch size isB
and the longest sample in the text data hasK
tokens, thentoken_id_batch
is a tensor with shape(B,K)
. -
attention_mask_batch
. This is a tensor that has shape(B,K)
. If a position is occupied by a non-padding token, its value is 1. Otherwise, if it is occupied by a padding token, its value is 0. Data types areint64
. -
image_batch
. This is a tensor that has shape(B,3,224,224)
.
-
Part 9.2
Define a DataLoader object called CLIP_dataloader
.
-
Set
batch_size = 16
. -
Set
shuffle = True
. -
Use the collate function defined in Part 10.