Indexed by:
Abstract:
Hubs are the data instances appearing frequently on the nearest neighbours lists. As the hubs of a high-dimensional dataset are close to the centres of clusters or sub-clusters, some of them are selected as the centres of clusters by hub based clustering algorithms. In the process of hub selection, these algorithms rank data instances in terms of their global hubness scores computed upon their nearest neighbours lists, ignoring cluster related information such as their labels, their and their related instances' clustering quality. As a result, some suitable hubs may be neglected. To solve this problem, we suggest evaluating instances by their relative hubness scores. Moreover, we propose a weighted relative hubness score computed upon nearest neighbours lists and silhouette information. Besides, we suggest selecting the instance of the highest silhouette information when two or more instances tie for first place. Experimental results on real datasets and synthetic datasets suggest that both the relative hubness score and the weighted relative hubness score can improve hub based clustering, and the weighted relative hubness score often plays better.
Keyword:
Reprint 's Address:
Email:
Source :
2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD)
Year: 2014
Page: 479-484
Language: English
Cited Count:
WoS CC Cited Count: 2
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: