Spatial Clustering
spatial clustering refers to those clustering methods that clustering data based on the spatial information including the density, actual location and relative path, etc.
DBSCAN
Denstiy-based spatial clustering of applications with Nosie (DBSCAN) is a kind of spatial clustering algorithm based on the density of data points. The following link will give you a view about how the algorithm is proceeding. I recommend you to try smile face to know its advantage and density bar to realize its drawbacks.
Visualizing DBSCAN Clustering (naftaliharris.com)
The algorithm has two important parameters: epsilon and minPoints. And If you have watched the visualization, you would probably know that the epsilon means the radius of the searching circle and minPoints representing the minimum points should include in one cluster.
The algorithms work like this: 1. To random select a point and search its neighbor within the radius and propaganda the process to select their neighbors until there is no data points within the circle. 2. Select points that have not been clustered and repeat the first step, until all of the points have been selected.
Evaluation clustering performance
Silhouette Coefficient
The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The silhouette ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters. — Wikipedia
For above definition and Sihouette value is defined in the
As a(i) is a measure of how dissimilar i is to its own cluster, a small value means it is well matched. Furthermore, a large b(i) implies that i is badly matched to its neighbouring cluster. Thus an s(i) close to 1 means that the data is appropriately clustered. If s(i) is close to -1, then by the same logic we see that i would be more appropriate if it was clustered in its neighbouring cluster. An s(i) near zero means that the datum is on the border of two natural clusters.
Sklearn.metrics
The sklearn.metrics
module includes score functions, performance metrics and pairwise metrics and distance computations. And here is the document for usage.
3.3. Metrics and scoring: quantifying the quality of predictions — scikit-learn 1.1.2 documentation
6.8. Pairwise metrics, Affinities and Kernels — scikit-learn 1.1.2 documentation