Listen

Description

Primary clustering tools for practical applications include K-means using scikit-learn or Faiss, agglomerative clustering leveraging cosine similarity with scikit-learn, and density-based methods like DBSCAN or HDBSCAN. For determining the optimal number of clusters, silhouette score is generally preferred over inertia-based visual heuristics, and it natively supports pre-computed distance matrices.

Links

K-means Clustering

Alternatives to K-means for High Dimensions

Semantic Search and Vector Indexing

Determining the Optimal Number of Clusters

Density-Based Clustering: DBSCAN and HDBSCAN

Summary Recommendations and Links