Abstract
An overview of cluster analysis techniques from a data mining point of view is given. This is done by a strict separation of the questions of various similarity and distance measures and related optimization criteria for clusterings from the methods to create and modify clusterings themselves. In addition to this general setting and overview, the second focus is used on discussions of the essential ingredients of the demographic cluster algorithm of IBM's Intelligent Miner, based Condorcet's criterion.
Similar content being viewed by others
References
Ball, G.H. 1965. Data analysis in the social sciences-What about the details. In Proc. AFIPS Fall Joint Computer Conf. 27. 1965. London: McMillan, Vol. 1, pp. 533–559.
Ball, G.H. 1967a. A clustering technique for summarizing multivariate data. Behavioral Science, 12:153–155.
Ball, G.H. 1967b. PROMENADE-An online pattern recognition system. Stanford Res. Inst., Technical Report No. RADC-TR-67-310.
Ball, G.H. and Hall, D.J. 1965. ISODATA, a novel technique for data analysis and pattern classification. Standford Res. Inst., Menlo Park, CA.
Bigus, J.P. 1996. Data Mining with Neural Networks. New York: McGraw-Hill.
Bishop, C. 1995. Neural Networks for Pattern Recognition. Oxford, UK: Oxford University Press.
Bock, H.H. 1974. Automatische Klassifikation. Vandenhoeck & Ruprecht.
Braverman, E.M. 1996. The method of potential functions in the problem of training machines to recognize patterns without a teacher. Automation Remote Control, 27:1748–1771.
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (Eds.). 1996. Advances in Knowledge Discovery and Data Mining. AAAI Press/The MIT Press, Menlo Park.
Fortier, J.J. and Solomon, H. 1996. Clustering procedures. In proceedings of the Multivariate Analysis,' 66, P.R. Krishnaiah (Ed.), pp. 493–506.
Graham, R.L., Knuth, D., and Patashnik, O. 1989. Concrete Mathematics, a Foundation of Computer Science. Reading, MA: Addison-Wesley.
Hartung, H.J. and Elpelt, B. 1984. Multivariate Statistik. München Wien: Oldenbourg.
Höppner, F., Klawonn, F., Kruse, R., and Runkler, T. 1999. Fuzzy Cluster Analysis. Chichester: Wiley. Updated German version: Höppner, F., Klawonn, F., and Kruse, R.: Fuzzy-Clusteranalyse. Verfahren fur die Bilderkennung, Klassifikation und Datenanalyse, Vieweg, Braunschweig, 1997. Also available at http://fuzzy.cs.uni-magdeburg.de/clusterbook
Jain, A.K. and Dubes, R.C. 1988. Algorithms for Clustering Data. New York: Wiley.
Jobson, J.D. 1992. Applied Multivariate Data Analysis. New York: Springer Bd. I and II.
Johnson, N.L. and Kotz, S. 1990. Continuous Univariate Distributions-1. New York: Wiley.
Kaufman, L. and Rousseeuw, P.J. 1990. Finding Groups in Data. New York: Wiley.
Kohonen, T. 1997. Self-Organizing Maps, 2nd Ed. Berlin: Springer-Verlag.
Krishnaiah, P.R. Multivariate analysis. In Proceedings of the Multivariate Analysis' 66, P.R. Krishnaiah (Ed.). New York: Academic Press.
Lu, S.Y. and Fu, K.S. 1978. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Transactions on Systems, Man and Cybernetics SMC, 8:381–389.
McLachlan, G.J. and Basford, K.E. Mixture Models. New York: Marcel Dekker.
Messatfa, H. and Zait, M. 1997.A comparative study of clustering methods. Future Generation Computer Systems, 13:149–159.
Michaud, P. 1982. Aggrégation á la majorité: Hommage á Condorcet. Technical Report F-051, IBM Centre Scientifique IBM France, Paris.
Michaud, P. 1985. Aggrégation á la majorité II: Analyse du Résultat d'un vote. Technical Report F-052, IBM Centre Scientifique IBM France, Paris.
Michaud, P. 1987a. Aggrégation á la majorité III: Approache statistique, géometrique ou logique. Technical Report F-084, IBM Centre Scientifique IBM France, Paris.
Michaud, P. 1987b. Condorcet-a man of the avant-garde. Applied Stochastic Models and Data Analysis, 3:173–198.
Michaud, P. 1995. Classical version non-classical clustering methods: An overview. Technical Report MAP-010, IBM ECAM.
Michaud, P. 1997. Clustering techniques. Future Generation Computer Systems, 13:135–147.
Rao, C.R. 1973. Linear Statistical Inference and Its Application. New York: Wiley.
Renyi, A. 1962. Wahrscheinlichkeitstheorie, mit einem Anhang über Informationstheorie. VEB Deutsche Verlag der Wissenschaften, Berlin.
Ripley, B.D. 1996. Pattern Recognition and Neural Network. Oxford, UK: Cambridge University Press, Oxford, 1996.
Robins, H. and Monro, S. 1951. A stochastic approximation method. Ann. Math. Stat., 22:400–407.
Rudolph, A. 1999. Data Mining in action: Statistische Verfahren der Klassifikation. Shaker Verlag.
Seber, G.A.F. 1984. Multivariate Observations. New York: Wiley.
Spaeth, H. 1984. Cluster Analysis-Algorithms. Chicester; Ellis Horwood Limited.
Steinhausen, D. and Langer, K. 1977. Clusteranalyse. Walter de Gruyter.
Tsypkin, Y.Z. and Kelmans, G.K. 1967. Recursive self-training algorithms. Enginering Cybernetics USSR, V:70–79.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Grabmeier, J., Rudolph, A. Techniques of Cluster Algorithms in Data Mining. Data Mining and Knowledge Discovery 6, 303–360 (2002). https://doi.org/10.1023/A:1016308404627
Issue Date:
DOI: https://doi.org/10.1023/A:1016308404627