Let a configuration of the k-means algorithm correspond to the k-way partition (on the set of instances to be clustered) generated by the clustering at the end of each iteration. Is it possible for the k-means algorithm to revisit a configuration? Justify your answer and show why this proves that the k-means algorithm converges in a finite number of steps.
It is not possible for the k-means algorithm to revisit a configuration.
This is because after each iteration, unless the objective function of the k-means algorithm (total squared distance) reaches a fixed point, it monotonically decreases and finally converges.
We make a rigorous proof below.
Consider dataset \{x^{(n)} \}_{n=0}^{N-1} with x^{(n)} \in \Bbb R^d.
Denote by C_k the collection of data points that are in cluster k \in \left\{ 1 , \cdots , K \right\}.
Denote by \mu_k \in \Bbb R^d the centroid of cluster k.
We solve the following optimization problem that minimizes the total squared distance:
Consider the \left( j+1 \right) th iteration.
In Phase 1 (cluster optimization phase), we have
where the last equality follows from the cluster optimization rule in this algorithm.
Thus, after Phase 1 update, the value of the objective function (weakly) decreases.
Next, we analyze Phase 2 (centroid optimization phase).
Consider the following optimization problem:
Taking the first-order derivative w.r.t. \mu, we get
Thus, the first-order-condition \nabla_\mu f_k \left( \mu \right) = 0 implies
Taking the second-order derivative w.r.t. \mu, we get
Therefore, \mu_k^{(j+1)} minimizes f_k \left( \mu \right).
Therefore, we have
Thus, after Phase 2 update, the value of the objective function (weakly) decreases.
Putting the above analysis together, we have
Because J \left( C^{(j)}, \mu^{(j)} \right) is lower bounded by 0, it must converge.