scipy
  1. scipy-cluster

Cluster - Core SciPy

Cluster is a module in SciPy which is used for clustering algorithms such as K-Means, Hierarchical Clustering, Spectral Clustering, and DBSCAN. It is widely used for data analysis, pattern recognition, and machine learning tasks.

Syntax

The basic syntax to perform K-Means clustering using the Cluster module in SciPy is as follows:

from scipy.cluster.vq import kmeans, vq

# k-means clustering
centroids, _ = kmeans(data, k)
cluster, _ = vq(data, centroids)

Where data is the dataset that needs to be clustered, k is the number of clusters, centroids represent the K-Means centroids calculated by the algorithm, and cluster is the assigned cluster for each observation in the dataset.

Example

Consider the following example where we will perform K-Means clustering on the Iris dataset using the Cluster module in SciPy.

from scipy.cluster.vq import kmeans, vq
from sklearn.datasets import load_iris

# Load the dataset
iris = load_iris()
data = iris.data

# K-Means Clustering
centroids, _ = kmeans(data, 3)
cluster, _ = vq(data, centroids)

print(cluster)

In this example, we have loaded the Iris dataset using Scikit-Learn and stored it into the data variable. Then we have used the kmeans method from the Cluster module to calculate centroids of 3 groups and stored them in the centroids variable. Finally, we have used the vq method to assign each observation to one of the three clusters and stored it in the cluster variable. The assigned cluster for each observation is printed to the console.

Output

[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0 2 0 2 0 2 2 0 2
 0 0 0 0 2 2 2 0 0 2 0 2 2 2 2 0 0 2 2 2 0 2 2 2 2 2 0 2 2 2 0 2 2 2 2 2 2 0 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 2 0 0 0 0 2 0 0 0 2 2 0 0 0 0 2 0 0 0 0 2 0 0
 0 2]

The output shows an array with assigned clusters for each observation in the dataset.

Explanation

The Cluster module provides a variety of clustering algorithms such as K-Means, Hierarchical Clustering, Spectral Clustering, and DBSCAN. In this example, we have demonstrated how to perform K-Means clustering using the kmeans and vq methods.

kmeans method is used to calculate centroids. The ‘vq’ method is used to assign each observation to one of the calculated clusters based on the distance of the observation from the centroid.

Use

The Cluster module in SciPy is widely used for various clustering algorithms and applications such as data analysis, pattern recognition, and machine learning tasks.

Important Points

  • The Cluster module in SciPy provides various clustering algorithms.
  • K-Means clustering is one of the most widely used clustering algorithms.
  • The kmeans method is used to calculate centroids, and the vq method is used to assign each observation to one of the calculated clusters.

Summary

The Cluster module in SciPy provides various clustering algorithms. K-Means clustering is one of the most widely used clustering algorithms that calculate centroids and assign each observation to one of the calculated clusters. It is a powerful technique for data analysis, pattern recognition, and machine learning tasks.

Published on: