Learning classification rules from data

https://doi.org/10.1016/S0898-1221(03)00034-8Get rights and content
Under an Elsevier user license
open archive

Abstract

We present ELEM2, a machine learning system that induces classification rules from a set of data based on a heuristic search over a hypothesis space. ELEM2 is distinguished from other rule induction systems in three aspects. First, it uses a new heuristtic function to guide the heuristic search. The function reflects the degree of relevance of an attribute-value pair to a target concept and leads to selection of the most relevant pairs for formulating rules. Second, ELEM2 handles inconsistent training examples by defining an unlearnable region of a concept based on the probability distribution of that concept in the training data. The unlearnable region is used as a stopping criterion for the concept learning process, which resolves conflicts without removing inconsistent examples. Third, ELEM2 employs a new rule quality measure in its post-pruning process to prevent rules from overfitting the data. The rule quality formula measures the extent to which a rule can discriminate between the positive and negative examples of a class. We describe features of ELEM2, its rule induction algorithm and its classification procedure. We report experimental results that compare ELEM2 with C4.5 and CN2 on a number of datasets.

Keywords

Machine learning
Rule induction
Classification
Data mining
Artificial intelligence

Cited by (0)

This research was supported by grants from the Natural Sciences and Engineering Research Council (NSERC) and the Institute for Robotics and Intelligent Systems (IRIS). I would also like to thank N. Cercone of University of Waterloo for his suggestions on this work.