### WRITE YOUR SOLUTION HERE ###
# Question 1
"""
Answer:
Logistic regression requires numerical data, not categorical data.
"""
# Question 2
# Answer: (put your code here)
df_6 = pd.get_dummies(df_5, columns=['Sex', 'Embarked'], drop_first = True, dtype = np.int8)
# Question 3
"""
Answer:
Suppose a categorical variable takes value k chosen from K categories, indexed as 0, 1, ..., K-1.
By setting drop_first = True, it is replaced by a vector with shape K-1.
If k = 0, then in this vector, all entries are 0.
If k is not 0, then in this vector, the (k-1)th entry (entry indices starts from 0 and ends with K-2) is 1 and all other entries are 0.
Setting drop_first = True avoids multicollinearity.
"""
# Question 4
# Answer: (put your code here)
print(df_5.head())
print(df_6.head())
# Question 5
# Answer: (put your code here)
print(df_5.shape)
print(df_6.shape)
""" END OF THIS PART """