2025 USA-NA-AIO Round 2, Problem 3, Part 8

Part 8 (5 points, coding task)

In this part, we prepare our CLIP dataset.

  1. Define class MyDataset that subclasses Dataset.

    • __init__

      • Inputs: images_pt, token_id_list.

      • Attributes: Same as inputs.

    • __len__

      • Output: total number of samples.
    • __getitem__

      • Input: sample index idx

      • Outputs: images_pt[idx], token_id_list[idx]

  2. Define dataset CLIP_dataset that is an object of MyDataset.

### WRITE YOUR SOLUTION HERE ###

class MyDataset(Dataset):
    def __init__(self, images_pt, token_id_list):
        self.images_pt = images_pt
        self.token_id_list = token_id_list

    def __len__(self):
        return len(self.token_id_list)

    def __getitem__(self, idx):
        return self.images_pt[idx], self.token_id_list[idx]

CLIP_dateset = MyDataset(images_pt, token_id_list)


""" END OF THIS PART """