Introducing K-means clustering in 200 words or less

Vicky
8bitDS
Published in
2 min readJan 29, 2022

--

K-Means clustering is the most popular unsupervised machine learning algorithm. It is used to find intrinsic groups within the unlabelled dataset and draw inferences from them.

The algorithm follows an easy or simple way to classify a given data set through a certain number of clusters, fixed apriori.

It alternates between two steps:

  • Assigning each data point to the closest cluster center
  • and then setting each cluster center as the mean of the data points that are assigned to it.

The algorithm is finished when the assignment of instances to clusters no longer changes.

K-Means Algorithm

We describe the algorithm with respect to the Euclidean distance function d(x,y) = ||x − y||.

K-Means algorithm

The following is how you apply k-means with scikit-learn:

The Output:

As n_clusters=3, the clusters are numbered 0 to 2.

The following plot describes the cluster centers found by k-means with three clusters:

import mglearnmglearn.discrete_scatter(X[:, 0], X[:, 1], kmeans.labels_, markers='o')mglearn.discrete_scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], [0, 1, 2],markers='^', markeredgewidth=2)
k-means with three clusters

Read other articles about machine learning at: 8bitDS

--

--