KMeans

E97072

KMeans is a popular unsupervised machine learning algorithm used for partitioning data into a specified number of clusters based on feature similarity.


Statements (50)
Predicate Object
instanceOf clustering algorithm
iterative optimization algorithm
partition-based clustering method
unsupervised learning algorithm
advantage computationally efficient for large datasets
scales linearly with number of samples and clusters in practice
simple to implement
alsoKnownAs Lloyd’s algorithm
k-means clustering
assumes Euclidean feature space in standard form
clusters are roughly spherical
clusters have similar size
basedOn minimization of within-cluster sum of squares
canUseDistanceMetric other Lp distances with modifications
commonlyUsedIn customer segmentation
document clustering
image compression
pattern recognition
convergesWhen change in objective function is below a threshold
cluster assignments no longer change
distanceMetric Euclidean distance (standard)
implementedIn Apache Spark MLlib
MATLAB Statistics and Machine Learning Toolbox
R stats and cluster packages
scikit-learn
input number of clusters k
set of data points
limitation cannot automatically determine optimal number of clusters
may converge to local minima
not robust to noise and outliers
performs poorly on non-spherical clusters
objectiveFunction minimize sum of squared distances between points and their assigned cluster centroid
optimizationProblem NP-hard in general
relatedAlgorithm Gaussian mixture models
fuzzy c-means
k-medoids
requires numerical feature representation
predefined number of clusters k
sensitiveTo feature scaling
initialization
outliers
step assign each point to nearest centroid
iterate assignment and update until convergence
recompute centroids as mean of assigned points
typicalInitialization k-means++ initialization
random selection of initial centroids
usedFor data compression
partitioning data into k clusters
prototype-based clustering
vector quantization

Referenced by (1)
Subject (surface form when different) Predicate
scikit-learn
hasConcept

Please wait…