Handling ambiguous values in instance-based classifiers

Hans Holland, Miroslav Kubat, Jan Žižka

Research output: Contribution to journalArticle

Abstract

In an attempt to automate evaluation of network intrusion detection systems, we encountered the problem of ambiguously described learning examples. For instance, an attributes value, or a class label, in a given example was known to be a or b but definitely not c or d. Previous research in machine learning usually either "disambiguated" the value (by giving preference to a or b), or replaced it with a "dont-know" symbol. Neither approach is satisfactory: while the former distorts the available information by pretending precise knowledge, the latter ignores the fact that at least something is known. Our experiments confirm the intuition that classification performance is indeed impaired if the ambiguities are not handled properly. In the research reported here, we limited ourselves to the realm of the relatively simple nearest-neighbor classifiers and investigated a few alternative solutions. The paper describes the techniques we used and describes their behavior in experimental domains.

Original languageEnglish
Pages (from-to)449-463
Number of pages15
JournalInternational Journal on Artificial Intelligence Tools
Volume17
Issue number3
DOIs
StatePublished - Jun 1 2008

Fingerprint

Classifiers
Intrusion detection
Learning systems
Labels
Experiments

Keywords

  • Ambiguous attributes
  • Instance-based classifiers

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Handling ambiguous values in instance-based classifiers. / Holland, Hans; Kubat, Miroslav; Žižka, Jan.

In: International Journal on Artificial Intelligence Tools, Vol. 17, No. 3, 01.06.2008, p. 449-463.

Research output: Contribution to journalArticle

@article{623b1a8efade4674a0598e16146584db,
title = "Handling ambiguous values in instance-based classifiers",
abstract = "In an attempt to automate evaluation of network intrusion detection systems, we encountered the problem of ambiguously described learning examples. For instance, an attributes value, or a class label, in a given example was known to be a or b but definitely not c or d. Previous research in machine learning usually either {"}disambiguated{"} the value (by giving preference to a or b), or replaced it with a {"}dont-know{"} symbol. Neither approach is satisfactory: while the former distorts the available information by pretending precise knowledge, the latter ignores the fact that at least something is known. Our experiments confirm the intuition that classification performance is indeed impaired if the ambiguities are not handled properly. In the research reported here, we limited ourselves to the realm of the relatively simple nearest-neighbor classifiers and investigated a few alternative solutions. The paper describes the techniques we used and describes their behavior in experimental domains.",
keywords = "Ambiguous attributes, Instance-based classifiers",
author = "Hans Holland and Miroslav Kubat and Jan Žižka",
year = "2008",
month = "6",
day = "1",
doi = "10.1142/S0218213008003996",
language = "English",
volume = "17",
pages = "449--463",
journal = "International Journal on Artificial Intelligence Tools",
issn = "0218-2130",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "3",

}

TY - JOUR

T1 - Handling ambiguous values in instance-based classifiers

AU - Holland, Hans

AU - Kubat, Miroslav

AU - Žižka, Jan

PY - 2008/6/1

Y1 - 2008/6/1

N2 - In an attempt to automate evaluation of network intrusion detection systems, we encountered the problem of ambiguously described learning examples. For instance, an attributes value, or a class label, in a given example was known to be a or b but definitely not c or d. Previous research in machine learning usually either "disambiguated" the value (by giving preference to a or b), or replaced it with a "dont-know" symbol. Neither approach is satisfactory: while the former distorts the available information by pretending precise knowledge, the latter ignores the fact that at least something is known. Our experiments confirm the intuition that classification performance is indeed impaired if the ambiguities are not handled properly. In the research reported here, we limited ourselves to the realm of the relatively simple nearest-neighbor classifiers and investigated a few alternative solutions. The paper describes the techniques we used and describes their behavior in experimental domains.

AB - In an attempt to automate evaluation of network intrusion detection systems, we encountered the problem of ambiguously described learning examples. For instance, an attributes value, or a class label, in a given example was known to be a or b but definitely not c or d. Previous research in machine learning usually either "disambiguated" the value (by giving preference to a or b), or replaced it with a "dont-know" symbol. Neither approach is satisfactory: while the former distorts the available information by pretending precise knowledge, the latter ignores the fact that at least something is known. Our experiments confirm the intuition that classification performance is indeed impaired if the ambiguities are not handled properly. In the research reported here, we limited ourselves to the realm of the relatively simple nearest-neighbor classifiers and investigated a few alternative solutions. The paper describes the techniques we used and describes their behavior in experimental domains.

KW - Ambiguous attributes

KW - Instance-based classifiers

UR - http://www.scopus.com/inward/record.url?scp=46249088095&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=46249088095&partnerID=8YFLogxK

U2 - 10.1142/S0218213008003996

DO - 10.1142/S0218213008003996

M3 - Article

AN - SCOPUS:46249088095

VL - 17

SP - 449

EP - 463

JO - International Journal on Artificial Intelligence Tools

JF - International Journal on Artificial Intelligence Tools

SN - 0218-2130

IS - 3

ER -