Part 6 (5 points, coding task)
In this part, we preprocess text data text_list.
-
Do tokenization with
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') -
Call
token_id_list = tokenizer(text_list)['input_ids'] -
Print
token_id_list. -
Print the type of
token_id_list. -
Print the length of
token_id_list. -
Print
token_id_list[5]. -
Print the type of
token_id_list[5]. -
Print the type of
token_id_list[5][0]. -
For each
idx, converttoken_id_list[idx]from the above type to a 1-dim tensor. That is, after this step,token_id_listis a list that consists of all 1-dim tensors. -
Print
token_id_list[5:7]. -
Print the data type of
token_id_list[5][0].