TY - GEN
T1 - Predicting key recognition difficulty in polyphonic audio
AU - Chuan, Ching Hua
AU - Charapko, Aleksey
PY - 2013/12/1
Y1 - 2013/12/1
N2 - In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.
AB - In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.
KW - audio key finding
KW - key difficulty recognition
KW - multiple linear regression
KW - proportional odds model
UR - http://www.scopus.com/inward/record.url?scp=84900645154&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84900645154&partnerID=8YFLogxK
U2 - 10.1109/ISM.2013.82
DO - 10.1109/ISM.2013.82
M3 - Conference contribution
AN - SCOPUS:84900645154
SN - 9780769551401
T3 - Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013
SP - 421
EP - 426
BT - Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013
T2 - 15th IEEE International Symposium on Multimedia, ISM 2013
Y2 - 9 December 2013 through 11 December 2013
ER -