Problem 3 (100 points)
In this problem, you are asked to study Contrastive Language-Image Pre-Training (CLIP), a powerful tool in multimodal AI.
# Run code in this cell
"""
DO NOT MAKE ANY CHANGE IN THIS CELL.
HINT: If something is not corrected installed, simply run this cell for few more times.
"""
!pip install datasets transformers
\color{red}{\text{WARNING !!!}}
-
Beyond importing libraries/modules/classes/functions in the following cell, you are NOT allowed to import anything else for the following purposes:
-
As a part of your final solution. For instance, if a problem asks you to build a model without using sklearn but you use it, then you will not earn points.
-
Temporarily import something to assist you to get a solution. For instance, if a problem asks you to manually compute eigenvalues but you temporarily use
np.linalg.eig
to get an answer and then delete your code, then you violate the rule.
Rule of thumb: Each part has its particular purpose to intentionally test you something. Do not attempt to find a shortcut to circumvent the rule.
-
# Run code in this cell
"""
DO NOT MAKE ANY CHANGE IN THIS CELL.
"""
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from transformers import BertTokenizer, BertModel, ViTModel
We will use flickr30k dataset to do image-language matching.
# Run code in this cell
"""
DO NOT MAKE ANY CHANGE IN THIS CELL.
"""
from datasets import load_dataset
dataset_train = load_dataset("USAAIO/2025-Round2-Problem3", split='train')
Part 1 (5 points, coding task)
Do the following tasks to explore the properties of dataset_train
:
-
dataset_train
is a list-like object. Print the number of elements in it. -
Consider index
idx = 2025
. Print the type ofdataset_train[idx]
. -
Print all keys in
dataset_train[idx]
. -
Name the value associated with the key
image
asimage_PIL
. Print it. -
Convert
image_PIL
to a NumPy array object, calledimage_np
. Printimage_np
and its shape. -
Display this image by using
plt.imshow
. -
Print the value associated with the key
alt_text
. Print its type.