2024 IAIO Question 3.3

Let a configuration of the k-means algorithm correspond to the k-way partition (on the set of instances to be clustered) generated by the clustering at the end of each iteration. Is it possible for the k-means algorithm to revisit a configuration? Justify your answer and show why this proves that the k-means algorithm converges in a finite number of steps.

It is not possible for the k-means algorithm to revisit a configuration.
This is because after each iteration, unless the objective function of the k-means algorithm (total squared distance) reaches a fixed point, it monotonically decreases and finally converges.

We make a rigorous proof below.

Consider dataset \{x^{(n)} \}_{n=0}^{N-1} with x^{(n)} \in \Bbb R^d.

Denote by C_k the collection of data points that are in cluster k \in \left\{ 1 , \cdots , K \right\}.
Denote by \mu_k \in \Bbb R^d the centroid of cluster k.

We solve the following optimization problem that minimizes the total squared distance:

\min_{\left\{ C_k, \mu_k \right\}_{k=1}^K} J \left( C, \mu \right) = \sum_{n=0}^{N-1} \sum_{k=1}^K \left| \left| x^{(n)} - \mu_k \right| \right|_2^2 \textbf{1} \left\{ n \in C_k \right\} .

Consider the \left( j+1 \right) th iteration.

In Phase 1 (cluster optimization phase), we have

\begin{align*} J \left( C^{(j)}, \mu^{(j)} \right) & = \sum_{n=0}^{N-1} \sum_{k=1}^K \left| \left| x^{(n)} - \mu_k^{(j)} \right| \right|_2^2 \textbf{1} \left\{ n \in C_k^{(j)} \right\} \\ & \geq \sum_{n=0}^{N-1} \min_k \left| \left| x^{(n)} - \mu_k^{(j)} \right| \right|_2^2 \\ & = J \left( C^{(j+1)}, \mu^{(j)} \right) , \end{align*}

where the last equality follows from the cluster optimization rule in this algorithm.

Thus, after Phase 1 update, the value of the objective function (weakly) decreases.

Next, we analyze Phase 2 (centroid optimization phase).

Consider the following optimization problem:

\begin{align*} \min_{\mu \in \Bbb R^d} & \ f_k \left( \mu \right) = \sum_{n \in C_k^{(j+1)}} \left| \left| x^{(n)} - \mu \right| \right|_2^2 . \end{align*}

Taking the first-order derivative w.r.t. \mu, we get

\begin{align*} \nabla_\mu f_k \left( \mu \right) & = 2 \sum_{n \in C_k^{(j+1)}} \left( \mu - x^{(n)} \right) . \end{align*}

Thus, the first-order-condition \nabla_\mu f_k \left( \mu \right) = 0 implies

\mu_k^{(j+1)} = \frac{\sum_{n \in C_k^{(j+1)}} x^{(n)}}{| C_k^{(j+1)} |}

Taking the second-order derivative w.r.t. \mu, we get

\begin{align*} \nabla^2_\mu f_k \left( \mu \right) & = 2 | C_k^{(j+1)} | I_{d \times d} \\ & \succ 0 . \end{align*}

Therefore, \mu_k^{(j+1)} minimizes f_k \left( \mu \right).
Therefore, we have

\begin{align*} J \left( C^{(j+1)}, \mu^{(j)} \right) & = \sum_{n=0}^{N-1} \sum_{k=1}^K \left| \left| x^{(n)} - \mu_k^{(j)} \right| \right|_2^2 \textbf{1} \left\{ n \in C_k^{(j+1)} \right\} \\ & = \sum_{k=1}^K \sum_{n \in C_k^{(j+1)}} \left| \left| x^{(n)} - \mu_k^{(j)} \right| \right|_2^2 \\ & \geq \sum_{k=1}^K \sum_{n \in C_k^{(j+1)}} \left| \left| x^{(n)} - \mu_k^{(j+1)} \right| \right|_2^2 \\ & = J \left( C^{(j+1)}, \mu^{(j+1)} \right) . \end{align*}

Thus, after Phase 2 update, the value of the objective function (weakly) decreases.

Putting the above analysis together, we have

\begin{align*} J \left( C^{(j)}, \mu^{(j)} \right) & \geq J \left( C^{(j+1)}, \mu^{(j)} \right) \\ & \geq J \left( C^{(j+1)}, \mu^{(j+1)} \right) . \end{align*}

Because J \left( C^{(j)}, \mu^{(j)} \right) is lower bounded by 0, it must converge.