2025 USA-NA-AIO Round 2, Problem 3, Part 6

USAAIO · May 14, 2025, 10:51pm

Part 6 (5 points, coding task)

In this part, we preprocess text data text_list.

Do tokenization with

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

Call

token_id_list = tokenizer(text_list)['input_ids']

Print token_id_list.
Print the type of token_id_list.
Print the length of token_id_list.
Print token_id_list[5].
Print the type of token_id_list[5].
Print the type of token_id_list[5][0].
For each idx, convert token_id_list[idx] from the above type to a 1-dim tensor. That is, after this step, token_id_list is a list that consists of all 1-dim tensors.
Print token_id_list[5:7].
Print the data type of token_id_list[5][0].

USAAIO · May 14, 2025, 10:52pm

### WRITE YOUR SOLUTION HERE ###

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

token_id_list = tokenizer(text_list)['input_ids']

print(token_id_list)
print(type(token_id_list))
print(len(token_id_list))

print(token_id_list[5])
print(type(token_id_list[5]))
print(type(token_id_list[5][0]))

token_id_list = [torch.tensor(token_id_list[idx]) for idx in range(len(token_id_list))]
print(token_id_list[5:7])
print(token_id_list[5][0].dtype)

""" END OF THIS PART """

Topic		Replies	Views
2025 USA-NA-AIO Round 2, Problem 3, Part 7 2025 USA-NA-AIO Round 2	1	311	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 3, Part 9 2025 USA-NA-AIO Round 2	1	322	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 3, Part 4 2025 USA-NA-AIO Round 2	1	370	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 3, Part 8 2025 USA-NA-AIO Round 2	1	341	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 3, Part 10 2025 USA-NA-AIO Round 2	1	442	May 14, 2025

2025 USA-NA-AIO Round 2, Problem 3, Part 6

Part 6 (5 points, coding task)

Related topics