Part 8 (5 points, coding task)
Do the following tasks in this part.
-
Define a function called
my_train_test_split
that splits the whole dataset into the training component and the test/validation component.-
The split is random
-
Inputs
-
X
: A DataFrame object of features of all sample data. -
y
: A Series object of labels of all sample data. -
test_size
: It takes a value between 0 and 1 that denotes the fraction of samples used for testing. That is, the number of samples used for testing isint(total number of samples * test_size)
.
-
-
Outputs
-
X_train
: It keeps samples inX
for training. -
X_test
: It keeps samples inX
for testing. -
y_train
: It keeps samples iny
for training. -
y_test
: It keeps samples iny
for testing.
-
-
-
Call this function with inputs
-
X = X
-
y = y
-
test_state = 0.2
-
-
Print object types and shapes of
X_train
,X_test
,y_train
,y_test
.