2024 IAIO Question 3.1

Consider the following 2D dataset:

Data Point # x y
1 1.90 0.97
2 1.76 0.84
3 2.32 1.63
4 2.31 2.09
5 1.14 2.11
6 5.02 3.02
7 5.74 3.84
8 2.25 3.47
9 4.71 3.60
10 3.17 4.96

Suppose the initial assignment of cluster centers based on (x, y) coordinates are:

\theta_{A,0}: (1.90, 0.97), \quad \theta_{B,0}: (3.17, 4.96)

Assuming k-means uses Euclidean distance,

d(p, q) = |p - q|_2^2 = \sqrt{\sum_{i=1}^{d} (p_i - q_i)^2}

Simulate the k-means (k=2) algorithm cluster assignment. What are the cluster assignments and distances from the nearer of the initial centers (\theta_{A,0}) and (\theta_{B,0}) after cluster assignment?

Data # Cluster Assignment Distance from the Cluster Centre
1
2
3
4
5
6
7
8
9
10

For the i th data point, denote z^{(i)} = \left( x^{(i)} , y^{(i)} \right).
Denote

C^{(i)}_0 \in \arg\min_{P \in \left\{ A, B \right\}} \left|\left| z^{(i)} - \theta_{P, 0} \right|\right|_2 .

Thus, the answer is as follows:

Data # Cluster Assignment Distance from the Cluster Centre
1 1 0
2 1 0.191
3 1 0.782
4 1 1.193
5 1 1.370
6 2 2.681
7 2 2.803
8 2 1.751
9 2 2.055
10 2 0