ml-kmeans

K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.

Maintained by Zakodium

Installation

npm i ml-kmeans

API Documentation

Usage

import { kmeans } from 'ml-kmeans';

const data = [
  [1, 1, 1],
  [1, 2, 1],
  [-1, -1, -1],
  [-1, -1, -1.5],
];
const centers = [
  [1, 2, 1],
  [-1, -1, -1],
];

const ans = kmeans(data, 2, { initialization: centers });
console.log(ans);
/*
KMeansResult {
  clusters: [ 0, 0, 1, 1 ],
  centroids: [ [ 1, 1.5, 1 ], [ -1, -1, -1.25 ] ],
  converged: true,
  iterations: 2,
  distance: [Function: squaredEuclidean]
}
*/

// Compute the mean error and size of each cluster.
console.log(ans.computeInformation(data));
/*
[
  { centroid: [ 1, 1.5, 1 ], error: 0.25, size: 2 },
  { centroid: [ -1, -1, -1.25 ], error: 0.0625, size: 2 }
]
*/

// Assign new points to the clusters found above.
console.log(ans.nearest([[1, 2, 1]]));
// [ 0 ]

API

`kmeans(data, k, options)`

Runs the K-means algorithm and returns a KMeansResult.

data: array of points to cluster, each in the format [x, y, z, ...].
k: number of clusters.
options: an optional object with the following properties.

Option	Type	Default	Description
`initialization`	`string` or `number[][]`	`'kmeans++'`	Either custom start centroids (`[x, y, z, ...]`), or one of the methods `'kmeans++'`, `'random'`, or `'mostDistant'`.
`maxIterations`	`number`	`100`	Maximum number of iterations allowed. Set to `0` to iterate until convergence.
`tolerance`	`number`	`1e-6`	Error tolerance used as the convergence criterion.
`distanceFunction`	`(p, q) => number`	`squaredEuclidean`	Distance function to use between two points.
`seed`	`number`	(none)	Seed for the random number generator, used by the `'random'` and `'mostDistant'` initialization methods to make results reproducible.

Initialization methods

'kmeans++': uses the kmeans++ seeding method.
'random': chooses k random distinct points.
'mostDistant': chooses the most distant points starting from a first random pick.

`KMeansResult`

The object returned by kmeans.

clusters: array with the cluster index of each input point.
centroids: array with the resulting centroids.
converged: whether the convergence criterion was satisfied.
iterations: number of iterations performed.
result.nearest(points): returns the cluster index of each of the given new points.
result.computeInformation(data): returns the centroid, mean error, and size of each cluster.

`kmeansGenerator(data, k, options)`

Generator variant of kmeans that yields the KMeansResult of each iteration, which is useful to observe the algorithm step by step.

Authors

Miguel Asencio

Sources

D. Arthur, S. Vassilvitskii, k-means++: The Advantages of Careful Seeding, in: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007, pp. 1027–1035. Link to article

License

MIT

ml-kmeans

ml-kmeans

Maintained by Zakodium

Installation

API Documentation

Usage

API

`kmeans(data, k, options)`

Initialization methods

`KMeansResult`

`kmeansGenerator(data, k, options)`

Authors

Sources

License

Settings

On This Page