Comparison of cluster validation indices with missing data

Publiceringsår

2018

Upphovspersoner

Niemelä, Marko; Äyrämö, Sami; Kärkkäinen, Tommi

Abstrakt

Clustering is an unsupervised machine learning technique, which aims to divide a given set of data into subsets. The number of hidden groups in cluster analysis is not always obvious and, for this purpose, various cluster validation indices have been suggested. Recently some studies reviewing validation indices have been provided, but any experiments against missing data are not yet available. In this paper, performance of ten well-known indices on ten synthetic data sets with various ratios of missing values is measured using squared euclidean and city block distances based clustering. The original indices are modified for a city block distance in a novel way. Experiments illustrate the different degree of stability for the indices with respect to the missing data.

Visa mer

Organisationer och upphovspersoner

Jyväskylä universitet

Niemelä Marko

Äyrämö Sami

Kärkkäinen Tommi

Publikationstyp

Publikationsform

Artikel

Moderpublikationens typ

Konferens

Artikelstyp

Annan artikel

Målgrupp

Vetenskaplig

Kollegialt utvärderad

UKM:s publikationstyp

A4 Artikel i en konferenspublikation

Publikationskanalens uppgifter

Sidor

461-466

ISBN

978-2-87587-047-6

Publikationsforum

55877

Publikationsforumsnivå

Öppen tillgång

Öppen tillgänglighet i förläggarens tjänst

Nej

Parallellsparad

Övriga uppgifter

Vetenskapsområden

Data- och informationsvetenskap

Nyckelord

[object Object],[object Object]

Publiceringsland

Belgien

Förlagets internationalitet

Internationell

Språk

engelska

Internationell sampublikation

Nej

Sampublikation med ett företag

Nej