Optimal techniques for class-dependent attribute discretization

N. Bryson, Anito Joseph

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Preprocessing of raw data has been shown to improve performance of knowledge discovery processes. Discretization of quantitative attributes is a key component of preprocessing and has the potential to greatly impact the efficiency of the process and the quality of its outcomes. In attribute discretization, the value domain of an attribute is partitioned into a finite set of intervals so that the attribute can be described using a small number of discrete representations. Discretization therefore involves two decisions, on the number of intervals and the placement of interval boundaries. Previous approaches for quantitative attribute discretization have used heuristic algorithms to identify partitions of the attribute value domain. Therefore, these approaches cannot be guaranteed to provide the optimal solution for the given discretization criterion and number of intervals. In this paper, we use linear programming (LP) methods to formulate the attribute discretization problem. The LP formulation allows the discretization criterion and the number of intervals to be integral considerations of the problem. We conduct experiments and identify optimal solutions for various discretization criteria and numbers of intervals.

Original languageEnglish (US)
Pages (from-to)1130-1143
Number of pages14
JournalJournal of the Operational Research Society
Volume52
Issue number10
DOIs
StatePublished - Jan 1 2001

Fingerprint

Linear programming
Heuristic algorithms
Data mining
Experiments
Discretization

Keywords

  • Attribute discretization
  • Data mining
  • Decision trees
  • Entropy
  • Machine learning
  • Parametric linear programming

ASJC Scopus subject areas

  • Management Information Systems
  • Strategy and Management
  • Management Science and Operations Research
  • Marketing

Cite this

Optimal techniques for class-dependent attribute discretization. / Bryson, N.; Joseph, Anito.

In: Journal of the Operational Research Society, Vol. 52, No. 10, 01.01.2001, p. 1130-1143.

Research output: Contribution to journalArticle

@article{494a1b31d7594d8a86258f6709cd58e9,
title = "Optimal techniques for class-dependent attribute discretization",
abstract = "Preprocessing of raw data has been shown to improve performance of knowledge discovery processes. Discretization of quantitative attributes is a key component of preprocessing and has the potential to greatly impact the efficiency of the process and the quality of its outcomes. In attribute discretization, the value domain of an attribute is partitioned into a finite set of intervals so that the attribute can be described using a small number of discrete representations. Discretization therefore involves two decisions, on the number of intervals and the placement of interval boundaries. Previous approaches for quantitative attribute discretization have used heuristic algorithms to identify partitions of the attribute value domain. Therefore, these approaches cannot be guaranteed to provide the optimal solution for the given discretization criterion and number of intervals. In this paper, we use linear programming (LP) methods to formulate the attribute discretization problem. The LP formulation allows the discretization criterion and the number of intervals to be integral considerations of the problem. We conduct experiments and identify optimal solutions for various discretization criteria and numbers of intervals.",
keywords = "Attribute discretization, Data mining, Decision trees, Entropy, Machine learning, Parametric linear programming",
author = "N. Bryson and Anito Joseph",
year = "2001",
month = "1",
day = "1",
doi = "10.1057/palgrave.jors.2601174",
language = "English (US)",
volume = "52",
pages = "1130--1143",
journal = "Journal of the Operational Research Society",
issn = "0160-5682",
publisher = "Palgrave Macmillan Ltd.",
number = "10",

}

TY - JOUR

T1 - Optimal techniques for class-dependent attribute discretization

AU - Bryson, N.

AU - Joseph, Anito

PY - 2001/1/1

Y1 - 2001/1/1

N2 - Preprocessing of raw data has been shown to improve performance of knowledge discovery processes. Discretization of quantitative attributes is a key component of preprocessing and has the potential to greatly impact the efficiency of the process and the quality of its outcomes. In attribute discretization, the value domain of an attribute is partitioned into a finite set of intervals so that the attribute can be described using a small number of discrete representations. Discretization therefore involves two decisions, on the number of intervals and the placement of interval boundaries. Previous approaches for quantitative attribute discretization have used heuristic algorithms to identify partitions of the attribute value domain. Therefore, these approaches cannot be guaranteed to provide the optimal solution for the given discretization criterion and number of intervals. In this paper, we use linear programming (LP) methods to formulate the attribute discretization problem. The LP formulation allows the discretization criterion and the number of intervals to be integral considerations of the problem. We conduct experiments and identify optimal solutions for various discretization criteria and numbers of intervals.

AB - Preprocessing of raw data has been shown to improve performance of knowledge discovery processes. Discretization of quantitative attributes is a key component of preprocessing and has the potential to greatly impact the efficiency of the process and the quality of its outcomes. In attribute discretization, the value domain of an attribute is partitioned into a finite set of intervals so that the attribute can be described using a small number of discrete representations. Discretization therefore involves two decisions, on the number of intervals and the placement of interval boundaries. Previous approaches for quantitative attribute discretization have used heuristic algorithms to identify partitions of the attribute value domain. Therefore, these approaches cannot be guaranteed to provide the optimal solution for the given discretization criterion and number of intervals. In this paper, we use linear programming (LP) methods to formulate the attribute discretization problem. The LP formulation allows the discretization criterion and the number of intervals to be integral considerations of the problem. We conduct experiments and identify optimal solutions for various discretization criteria and numbers of intervals.

KW - Attribute discretization

KW - Data mining

KW - Decision trees

KW - Entropy

KW - Machine learning

KW - Parametric linear programming

UR - http://www.scopus.com/inward/record.url?scp=0035493965&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035493965&partnerID=8YFLogxK

U2 - 10.1057/palgrave.jors.2601174

DO - 10.1057/palgrave.jors.2601174

M3 - Article

AN - SCOPUS:0035493965

VL - 52

SP - 1130

EP - 1143

JO - Journal of the Operational Research Society

JF - Journal of the Operational Research Society

SN - 0160-5682

IS - 10

ER -