Handling missing values via decomposition of the conditioned set

Mei-Ling Shyu, Indika Priyantha Kuruppu-Appuhamilage, Shu Ching Chen, Li Wu Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In this paper, a framework for replacing missing values in a database is proposed since a real-world database is seldom complete. Good data quality in a database can directly improve the performance of any data mining algorithm in various applications. Our proposed framework adopts the basic concepts from conditional probability theories and further develops an algorithm to facilitate the capability of handling both nominal and numerical values, which addresses the problem of the inability of handling both nominal and numerical values with a high degree of accuracy in the existing algorithms. Several experiments are conducted and the experimental results demonstrate that our framework provides a high accuracy when compared with most of the commonly used algorithms such as using the average value, using the maximum value, and using the minimum value to replace missing values.

Original languageEnglish
Title of host publicationProceedings of the 2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005
Pages199-204
Number of pages6
Volume2005
DOIs
StatePublished - Dec 1 2005
Event2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005 - Las Vegas, NV, United States
Duration: Aug 15 2005Aug 17 2005

Other

Other2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005
CountryUnited States
CityLas Vegas, NV
Period8/15/058/17/05

Fingerprint

Decomposition
Data mining
Experiments

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Shyu, M-L., Kuruppu-Appuhamilage, I. P., Chen, S. C., & Chang, L. W. (2005). Handling missing values via decomposition of the conditioned set. In Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005 (Vol. 2005, pp. 199-204). [1506473] https://doi.org/10.1109/IRI-05.2005.1506473

Handling missing values via decomposition of the conditioned set. / Shyu, Mei-Ling; Kuruppu-Appuhamilage, Indika Priyantha; Chen, Shu Ching; Chang, Li Wu.

Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005. Vol. 2005 2005. p. 199-204 1506473.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shyu, M-L, Kuruppu-Appuhamilage, IP, Chen, SC & Chang, LW 2005, Handling missing values via decomposition of the conditioned set. in Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005. vol. 2005, 1506473, pp. 199-204, 2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005, Las Vegas, NV, United States, 8/15/05. https://doi.org/10.1109/IRI-05.2005.1506473
Shyu M-L, Kuruppu-Appuhamilage IP, Chen SC, Chang LW. Handling missing values via decomposition of the conditioned set. In Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005. Vol. 2005. 2005. p. 199-204. 1506473 https://doi.org/10.1109/IRI-05.2005.1506473
Shyu, Mei-Ling ; Kuruppu-Appuhamilage, Indika Priyantha ; Chen, Shu Ching ; Chang, Li Wu. / Handling missing values via decomposition of the conditioned set. Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005. Vol. 2005 2005. pp. 199-204
@inproceedings{134c7430b50f468a95fd97ce897ea3ee,
title = "Handling missing values via decomposition of the conditioned set",
abstract = "In this paper, a framework for replacing missing values in a database is proposed since a real-world database is seldom complete. Good data quality in a database can directly improve the performance of any data mining algorithm in various applications. Our proposed framework adopts the basic concepts from conditional probability theories and further develops an algorithm to facilitate the capability of handling both nominal and numerical values, which addresses the problem of the inability of handling both nominal and numerical values with a high degree of accuracy in the existing algorithms. Several experiments are conducted and the experimental results demonstrate that our framework provides a high accuracy when compared with most of the commonly used algorithms such as using the average value, using the maximum value, and using the minimum value to replace missing values.",
author = "Mei-Ling Shyu and Kuruppu-Appuhamilage, {Indika Priyantha} and Chen, {Shu Ching} and Chang, {Li Wu}",
year = "2005",
month = "12",
day = "1",
doi = "10.1109/IRI-05.2005.1506473",
language = "English",
isbn = "0780390938",
volume = "2005",
pages = "199--204",
booktitle = "Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005",

}

TY - GEN

T1 - Handling missing values via decomposition of the conditioned set

AU - Shyu, Mei-Ling

AU - Kuruppu-Appuhamilage, Indika Priyantha

AU - Chen, Shu Ching

AU - Chang, Li Wu

PY - 2005/12/1

Y1 - 2005/12/1

N2 - In this paper, a framework for replacing missing values in a database is proposed since a real-world database is seldom complete. Good data quality in a database can directly improve the performance of any data mining algorithm in various applications. Our proposed framework adopts the basic concepts from conditional probability theories and further develops an algorithm to facilitate the capability of handling both nominal and numerical values, which addresses the problem of the inability of handling both nominal and numerical values with a high degree of accuracy in the existing algorithms. Several experiments are conducted and the experimental results demonstrate that our framework provides a high accuracy when compared with most of the commonly used algorithms such as using the average value, using the maximum value, and using the minimum value to replace missing values.

AB - In this paper, a framework for replacing missing values in a database is proposed since a real-world database is seldom complete. Good data quality in a database can directly improve the performance of any data mining algorithm in various applications. Our proposed framework adopts the basic concepts from conditional probability theories and further develops an algorithm to facilitate the capability of handling both nominal and numerical values, which addresses the problem of the inability of handling both nominal and numerical values with a high degree of accuracy in the existing algorithms. Several experiments are conducted and the experimental results demonstrate that our framework provides a high accuracy when compared with most of the commonly used algorithms such as using the average value, using the maximum value, and using the minimum value to replace missing values.

UR - http://www.scopus.com/inward/record.url?scp=33745697521&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745697521&partnerID=8YFLogxK

U2 - 10.1109/IRI-05.2005.1506473

DO - 10.1109/IRI-05.2005.1506473

M3 - Conference contribution

SN - 0780390938

SN - 9780780390935

VL - 2005

SP - 199

EP - 204

BT - Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration, IRI - 2005

ER -