ml-kmeans
    Preparing search index...

    ml-kmeans

    ml-kmeans

    K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.

    Zakodium logo

    Maintained by Zakodium

    NPM version npm download test coverage license

    npm i ml-kmeans

    import { kmeans } from 'ml-kmeans';

    const data = [
    [1, 1, 1],
    [1, 2, 1],
    [-1, -1, -1],
    [-1, -1, -1.5],
    ];
    const centers = [
    [1, 2, 1],
    [-1, -1, -1],
    ];

    const ans = kmeans(data, 2, { initialization: centers });
    console.log(ans);
    /*
    KMeansResult {
    clusters: [ 0, 0, 1, 1 ],
    centroids: [ [ 1, 1.5, 1 ], [ -1, -1, -1.25 ] ],
    converged: true,
    iterations: 2,
    distance: [Function: squaredEuclidean]
    }
    */

    // Compute the mean error and size of each cluster.
    console.log(ans.computeInformation(data));
    /*
    [
    { centroid: [ 1, 1.5, 1 ], error: 0.25, size: 2 },
    { centroid: [ -1, -1, -1.25 ], error: 0.0625, size: 2 }
    ]
    */

    // Assign new points to the clusters found above.
    console.log(ans.nearest([[1, 2, 1]]));
    // [ 0 ]

    Runs the K-means algorithm and returns a KMeansResult.

    • data: array of points to cluster, each in the format [x, y, z, ...].
    • k: number of clusters.
    • options: an optional object with the following properties.
    Option Type Default Description
    initialization string or number[][] 'kmeans++' Either custom start centroids ([x, y, z, ...]), or one of the methods 'kmeans++', 'random', or 'mostDistant'.
    maxIterations number 100 Maximum number of iterations allowed. Set to 0 to iterate until convergence.
    tolerance number 1e-6 Error tolerance used as the convergence criterion.
    distanceFunction (p, q) => number squaredEuclidean Distance function to use between two points.
    seed number (none) Seed for the random number generator, used by the 'random' and 'mostDistant' initialization methods to make results reproducible.
    • 'kmeans++': uses the kmeans++ seeding method.
    • 'random': chooses k random distinct points.
    • 'mostDistant': chooses the most distant points starting from a first random pick.

    The object returned by kmeans.

    • clusters: array with the cluster index of each input point.
    • centroids: array with the resulting centroids.
    • converged: whether the convergence criterion was satisfied.
    • iterations: number of iterations performed.
    • result.nearest(points): returns the cluster index of each of the given new points.
    • result.computeInformation(data): returns the centroid, mean error, and size of each cluster.

    Generator variant of kmeans that yields the KMeansResult of each iteration, which is useful to observe the algorithm step by step.

    D. Arthur, S. Vassilvitskii, k-means++: The Advantages of Careful Seeding, in: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, 2007, pp. 1027–1035. Link to article

    MIT