- Introduction to Clustering
- Formalized Definition of k-Means Clustering
- 1-of-K Coding Scheme
- The Expectation and Maximization Algorithm

**Introduction to Clustering**

Clustering is a class of unsupervised learning concept or machine learning whose goal is to identify groups or clusters within datapoints in a multi-dimensional space.

Suppose we have data sets *{x _{1}, x_{2}, …, x_{n}}* consisting of N observations of a random D-dimensional Euclidean variable x. The goal is to partition the data into K clusters when the value of K is preknown.

We can define a cluster as a group of data points whose inter-point distances are minimal compared with the distance with points that are outside the cluster.

**Formalization of k-Means Clustering**

Let’s assume a set of D-dimensional vectors μ_{k} where k = 1, 2, …, K and μ_{k} represents the centroid of the cluster k. Take centroid to be the mean of data points in the cluster.

Our goal is to find the assignment of data points to clusters as well as a set of vectors { μ_{k}} such that the sum of squares of distances of each data point to its closest vector μ_{k} is at a minimum.

**1-of-K Coding Scheme**

The 1-of-k Coding Scheme is a coding scheme applied to a K-dimensional vector x in which one of the elements x_{k} equals 1 and all the remaining elements 0. This means that a data point cannot belong to two clusters at the same time.

In this case, for each data point x, we introduce a corresponding set of binary indicator variables* r _{nk} ∈ {0,1}.*

If x

_{n}is assigned to μ

_{k}then

*r*and r

_{nk}= 1_{nj}= 0 for j ≠ k

We can then define an objective function J know as distortion measure

This represents the sum of the squares of the distances of each of the data points from its assigned vector μ_{k}

The goal is to choose values for {r_{nk}} and {μ_{k}} so as to minimize J. This brings us to the next section where we would examine the algorithm.

**The Expectation and Maximization Algorithm(k-Means)**

The algorithm is an iterative algorithm where for each iteration, two steps are taken that optimizes the variables* r _{nk} *and μ

_{k}

*The algorithm goes this way. *

First, choose an initial value for μ_{k}

In the first phase: we minimize J with respect to *r _{nk}* while keeping

*μ*fixed.

_{k}In the second phase, we minimize J with respect to μ

_{k}while keeping

*r*fixed.

_{nk}Repeat the two steps until convergence.