2025 USA-NA-AIO Round 1, Problem 3, Part 6

USAAIO · March 28, 2025, 5:34am

Part 6 (5 points, coding and conceptual reasoning task)

In df_5, columns Sex and Embarked are categorical data.

Do the following tasks to process these categorical data.

To do logistic regression on this dataset, we need to do one hot encoding on these two columns. Explain why?
Do one hot encoding on these two columns. Set drop_first = True and dtype = np.int8. Save the new dataframe object as df_6.
Explain what drop_first = True means and why we do so.
Print the first five rows of df_5 and df_6.
Print the shapes of df_5 and df_6.

USAAIO · March 28, 2025, 5:34am

### WRITE YOUR SOLUTION HERE ###

# Question 1
"""
Answer:

Logistic regression requires numerical data, not categorical data.

"""

# Question 2
# Answer: (put your code here)

df_6 = pd.get_dummies(df_5, columns=['Sex', 'Embarked'], drop_first = True, dtype = np.int8)

# Question 3
"""
Answer:

Suppose a categorical variable takes value k chosen from K categories, indexed as 0, 1, ..., K-1.

By setting drop_first = True, it is replaced by a vector with shape K-1.

If k = 0, then in this vector, all entries are 0.

If k is not 0, then in this vector, the (k-1)th entry (entry indices starts from 0 and ends with K-2) is 1 and all other entries are 0.

Setting drop_first = True avoids multicollinearity.

"""

# Question 4
# Answer: (put your code here)

print(df_5.head())
print(df_6.head())

# Question 5
# Answer: (put your code here)

print(df_5.shape)
print(df_6.shape)

""" END OF THIS PART """

Topic		Replies	Views
2025 USA-NA-AIO Round 1, Problem 3, Part 2 2025 USA-NA-AIO Round 1	1	215	March 28, 2025
2025 USA-NA-AIO Round 1, Problem 3, Part 7 2025 USA-NA-AIO Round 1	1	146	March 28, 2025
2025 USA-NA-AIO Round 1, Problem 3, Part 5 2025 USA-NA-AIO Round 1	1	149	March 28, 2025
2025 USA-NA-AIO Round 1, Problem 3, Part 1 2025 USA-NA-AIO Round 1	1	472	March 28, 2025
2025 USA-NA-AIO Round 1, Problem 3, Part 3 2025 USA-NA-AIO Round 1	1	172	March 28, 2025

2025 USA-NA-AIO Round 1, Problem 3, Part 6

Part 6 (5 points, coding and conceptual reasoning task)

Related topics