TY - GEN
T1 - Enhancing multimedia imbalanced concept detection using VIMP in Random Forests
AU - Sadiq, Saad
AU - Yan, Yilin
AU - Shyu, Mei Ling
AU - Chen, Shu Ching
AU - Ishwaran, Hemant
N1 - Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2016
Y1 - 2016
N2 - Recent developments in social media and cloud storage lead to an exponential growth in the amount of multimedia data, which increases the complexity of managing, storing, indexing, and retrieving information from such big data. Many current content-based concept detection approaches lag from successfully bridging the semantic gap. To solve this problem, a multi-stage random forest framework is proposed to generate predictor variables based on multivariate regressions using variable importance (VIMP). By fine tuning the forests and significantly reducing the predictor variables, the concept detection scores are evaluated when the concept of interest is rare and imbalanced, i.e., having little collaboration with other high level concepts. Using classical multivariate statistics, estimating the value of one coordinate using other coordinates standardizes the covariates and it depends upon the variance of the correlations instead of the mean. Thus, conditional dependence on the data being normally distributed is eliminated. Experimental results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.
AB - Recent developments in social media and cloud storage lead to an exponential growth in the amount of multimedia data, which increases the complexity of managing, storing, indexing, and retrieving information from such big data. Many current content-based concept detection approaches lag from successfully bridging the semantic gap. To solve this problem, a multi-stage random forest framework is proposed to generate predictor variables based on multivariate regressions using variable importance (VIMP). By fine tuning the forests and significantly reducing the predictor variables, the concept detection scores are evaluated when the concept of interest is rare and imbalanced, i.e., having little collaboration with other high level concepts. Using classical multivariate statistics, estimating the value of one coordinate using other coordinates standardizes the covariates and it depends upon the variance of the correlations instead of the mean. Thus, conditional dependence on the data being normally distributed is eliminated. Experimental results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.
KW - Multimedia imbalanced concept detection
KW - Multivariate regression
KW - Random forests
KW - Variable importance (VIMP)
UR - http://www.scopus.com/inward/record.url?scp=84991216144&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84991216144&partnerID=8YFLogxK
U2 - 10.1109/IRI.2016.87
DO - 10.1109/IRI.2016.87
M3 - Conference contribution
AN - SCOPUS:84991216144
T3 - Proceedings - 2016 IEEE 17th International Conference on Information Reuse and Integration, IRI 2016
SP - 601
EP - 608
BT - Proceedings - 2016 IEEE 17th International Conference on Information Reuse and Integration, IRI 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th IEEE International Conference on Information Reuse and Integration, IRI 2016
Y2 - 28 July 2016 through 30 July 2016
ER -