Fuzzy Clustering

Clustering of numerical data forms the basis of many classification and system-modeling algorithms. The purpose of clustering is to identify natural groupings of data from a large data set to produce a concise representation of system behavior. Fuzzy Logic Toolbox™ tools allow you to find clusters in input-output training data using:

Fuzzy c-means clustering — Each data point belongs to each cluster to a degree that is specified by a fuzzy membership grade.
Subtractive clustering — A fast nonfuzzy algorithm for estimating the number of clusters in a data set.

Fuzzy C-Means Clustering

Fuzzy c-means (FCM) is a data clustering technique where each data point belongs to a cluster to a degree that is specified by a membership grade.

The FCM algorithm starts with an initial guess for the cluster centers, which represent the mean location of each cluster. The initial guess for these cluster centers is most likely incorrect. Additionally, FCM assigns every data point a membership grade for each cluster. By iteratively updating the cluster centers and the membership grades for each data point, the algorithm iteratively moves the cluster centers to the optimal location within a data set. This iteration is based on minimizing an objective function that represents the distance from any given data point to a cluster center weighted by the data point membership grade.

To cluster data using FCM clustering, use the fcm function. To specify the clustering algorithm options, use an fcmOptions object. The fcm function outputs a list of cluster centers and cluster membership grades for each data point.

You can use cluster information to generate a fuzzy inference system that best models the data behavior using a minimum number of rules. The rules partition themselves according to the fuzzy qualities associated with each of the data clusters. To automatically generate this type of FIS, use the genfis function. You can generate an initial fuzzy system using clustering results from either FCM or subtractive clustering.

To cluster data, specify an array of data points, x_i, with N rows. The number of columns for each data point is equal to the data dimensionality.

$x_{j} = {[\begin{matrix} x_{j 1} & x_{j 2} & \dots & x_{j n} \end{matrix}]}^{⊤}, 1 \leq j \leq N$

The FCM algorithm computes the cluster centers, c_i. This array contains one row for each cluster center and the number of columns matches the number of columns in x_i.

$c_{i} = {[\begin{matrix} c_{i 1} & c_{i 2} & \dots & c_{i n} \end{matrix}]}^{⊤}, 1 \leq i \leq C$

To specify the number of clusters C, use the NumClusters option.

The FCM algorithm minimizes the following objective function.

$J_{m} = \sum_{i = 1}^{C} \sum_{j = 1}^{N} μ_{i j}^{m} D_{i j}^{2}$

Here:

m is the fuzzy partition matrix exponent for controlling the degree of fuzzy overlap, with m > 1. Fuzzy overlap refers to how fuzzy the boundaries between clusters are, that is, the number of data points that have significant membership in more than one cluster. To specify fuzzy partition matrix exponent, use the Exponent option.
D_ij is the distance from the jth data point to the ith cluster.
μ_ij is the degree of membership of the jth data point in the ith cluster. For a given data point, the sum of the membership values for all clusters is one.

The fcm function supports three types of FCM clustering. These methods differ based on the distance metric used for computing D_ij. To select an FCM algorithm, set the DistanceMetric option.

FCM Algorithm	`DistanceMetric` Value	Description
Classical FCM	`"euclidean"`	Compute distances using the Euclidean distance norm, which assumes a spherical shape for all clusters. [1]
Gustafson-Kessel FCM	`"mahalanobis"`	Compute distances using a Mahalanobis distance norm, where cluster covariance is weighted by cluster membership values. This method is useful for detecting nonspherical clusters. [4]
Gath-Geva FCM	`"fmle"`	Compute distances using an exponential distance measure based on fuzzy maximum likelihood estimation. This method is useful for detecting nonshperical clusters in the presence of variable cluster densities and unequal numbers of data points in each cluster. [5]

FCM clustering performs the following steps.

Set the initial cluster centers. By default, the initial cluster centers are random. You can specify initial cluster centers using the ClusterCenters option.
Randomly initialize the cluster membership values μ_ij.
Calculate the cluster centers.
$c_{i} = \frac{\sum_{j = 1}^{N} μ_{i j}^{m} x_{i}}{\sum_{j = 1}^{N} μ_{i j}^{m}}, 1 \leq i \leq C$
Compute the distance metric for each data point based on the clustering algorithm.
Update the membership values for each data point.
$μ_{i j} = \frac{1}{\sum_{k = 1}^{N} {(\frac{D_{i j}}{D_{i k}})}^{\frac{2}{m - 1}}}, 1 \leq i \leq C, 1 \leq j \leq N$
Calculate the objective function J_m.
Repeat steps 3–6 until one of the following conditions is met.
- J_m improves by less than a specified minimum threshold, which you specify using the MinImprovement option.
- The algorithm performs the specified maximum number of iterations, which you specify using the MaxNumIteration option.

Classical FCM Distance Calculation

The classical FCM algorithm computes the Euclidean distance from each data point to each cluster center, as shown in the following equation.

$D_{i j} = \sqrt{{(x_{j} - c_{i})}^{⊤} (x_{j} - c_{i})}, 1 \leq i \leq C, 1 \leq j \leq N$

Gustafson-Kessel Distance Calculation

Since R2023a

The Gustafson-Kessel (GK) FCM algorithm first computes covariance matrices F_i for each cluster center.

$\begin{array}{l} F_{i} = \frac{\sum_{j = 1}^{N} μ_{i j}^{m} (x_{j} - c_{i}) {(x_{j} - c_{i})}^{⊤}}{S_{i}}, 1 \leq i \leq C \\ S_{i} = \sum_{j = 1}^{N} μ_{i j}^{m} \end{array}$

Then, it computes the Mahalanobis distance from each data point to each cluster using the covariance matrices.

$D_{i j} = \sqrt{{(x_{j} - c_{i})}^{⊤} [\det {(F_{i})}^{1 / N} F_{i}^{- 1}] (x_{j} - c_{i})}, 1 \leq i \leq C, 1 \leq j \leq N$

Gath-Geva Distance Calculation

Since R2023b

The Gath-Geva (GG) FCM algorithm first computes covariance matrices F_i for each cluster center.

$\begin{array}{l} F_{i} = \frac{\sum_{j = 1}^{N} μ_{i j}^{m} (x_{j} - c_{i}) {(x_{j} - c_{i})}^{⊤}}{S_{i}}, 1 \leq i \leq C \\ S_{i} = \sum_{j = 1}^{N} μ_{i j}^{m} \end{array}$

Then, it computes the prior probability for selecting each cluster [6].

$P_{i} = \frac{S_{i}}{\sum_{i = 1}^{C} S_{i}}, 1 \leq i \leq C$

Finally, it computes the distance from each data point to each cluster using an exponential distance measure.

$\begin{array}{l} D_{i j} = A_{i} \cdot \exp (0.5 \sum_{j = 1}^{N} {(x_{j} - c_{i})}^{⊤} F_{i}^{- 1} (x_{j} - c_{i})), 1 \leq i \leq C, 1 \leq j \leq N \\ A_{i} = \frac{\sqrt{\det (F_{i})}}{P_{i}} \end{array}$

Subtractive Clustering

If you do not have a clear idea how many clusters there should be for a given set of data, subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and the cluster centers for a set of data [2]. To obtain the cluster estimates, use the subclust function. You can use the cluster estimates to initialize iterative optimization-based clustering methods, such as FCM, and model identification methods, such as ANFIS. The subclust function finds the clusters using the subtractive clustering method.

Subtractive clustering assumes that each data point is a potential cluster center. Based on that assumption, the clustering algorithm includes the following steps.

Calculate the likelihood that each data point would define a cluster center, based on the density of surrounding data points.
Choose the data point with the highest potential to be the first cluster center.
Remove all data points near the first cluster center based on a specified cluster influence range.
Choose the remaining point with the highest potential as the next cluster center.
Repeat steps 3 and 4 until all the data is within the influence range of a cluster center.

The subtractive clustering method is an extension of the mountain clustering method proposed in [3].

References

[1] Bezdek, James C. Pattern Recognition with Fuzzy Objective Function Algorithms. Boston, MA: Springer US, 1981. https://doi.org/10.1007/978-1-4757-0450-1.

[2] Chiu, Stephen L. “Fuzzy Model Identification Based on Cluster Estimation.” Journal of Intelligent and Fuzzy Systems 2, no. 3 (1994): 267–78. https://doi.org/10.3233/IFS-1994-2306.

[3] Yager, Ronald R., and Dimitar P. Filev. “Generation of Fuzzy Rules by Mountain Clustering.” Journal of Intelligent and Fuzzy Systems 2, no. 3 (1994): 209–19. https://doi.org/10.3233/IFS-1994-2306.

[4] Gustafson, Donald, and William Kessel. “Fuzzy Clustering with a Fuzzy Covariance Matrix.” In 1978 IEEE Conference on Decision and Control Including the 17th Symposium on Adaptive Processes, 761–66. San Diego, CA, USA: IEEE, 1978. https://doi.org/10.1109/CDC.1978.268028.

[5] Gath, I., and A.B. Geva. “Unsupervised Optimal Fuzzy Clustering.” IEEE Transactions on Pattern Analysis and Machine Intelligence 11, no. 7 (July 1989): 773–80. https://doi.org/10.1109/34.192473.

[6] Höppner, Frank, ed. Fuzzy Cluster Analysis: Methods for Classification, Data Analysis, and Image Recognition. Chichester ; New York: J. Wiley, 1999: 49–53.