Predicting key recognition difficulty in polyphonic audio

Ching-Hua Chuan, Aleksey Charapko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013
Pages421-426
Number of pages6
DOIs
StatePublished - Dec 1 2013
Externally publishedYes
Event15th IEEE International Symposium on Multimedia, ISM 2013 - Anaheim, CA, United States
Duration: Dec 9 2013Dec 11 2013

Other

Other15th IEEE International Symposium on Multimedia, ISM 2013
CountryUnited States
CityAnaheim, CA
Period12/9/1312/11/13

Fingerprint

Audio recordings
Linear regression
Mean square error
Acoustics
Statistical Models

Keywords

  • audio key finding
  • key difficulty recognition
  • multiple linear regression
  • proportional odds model

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

Cite this

Chuan, C-H., & Charapko, A. (2013). Predicting key recognition difficulty in polyphonic audio. In Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013 (pp. 421-426). [6746834] https://doi.org/10.1109/ISM.2013.82

Predicting key recognition difficulty in polyphonic audio. / Chuan, Ching-Hua; Charapko, Aleksey.

Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013. 2013. p. 421-426 6746834.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chuan, C-H & Charapko, A 2013, Predicting key recognition difficulty in polyphonic audio. in Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013., 6746834, pp. 421-426, 15th IEEE International Symposium on Multimedia, ISM 2013, Anaheim, CA, United States, 12/9/13. https://doi.org/10.1109/ISM.2013.82
Chuan C-H, Charapko A. Predicting key recognition difficulty in polyphonic audio. In Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013. 2013. p. 421-426. 6746834 https://doi.org/10.1109/ISM.2013.82
Chuan, Ching-Hua ; Charapko, Aleksey. / Predicting key recognition difficulty in polyphonic audio. Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013. 2013. pp. 421-426
@inproceedings{3c7cefd603864bc5bd7c5d09162458c0,
title = "Predicting key recognition difficulty in polyphonic audio",
abstract = "In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.",
keywords = "audio key finding, key difficulty recognition, multiple linear regression, proportional odds model",
author = "Ching-Hua Chuan and Aleksey Charapko",
year = "2013",
month = "12",
day = "1",
doi = "10.1109/ISM.2013.82",
language = "English (US)",
isbn = "9780769551401",
pages = "421--426",
booktitle = "Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013",

}

TY - GEN

T1 - Predicting key recognition difficulty in polyphonic audio

AU - Chuan, Ching-Hua

AU - Charapko, Aleksey

PY - 2013/12/1

Y1 - 2013/12/1

N2 - In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.

AB - In this paper, we present statistical models to predict the difficulty of recognizing musical keys from polyphonic audio signals. Automatic audio key finding has been studied for many years, and various approaches have been proposed and reported. Reports of these methods' performance are usually based on the proposers' own data sets. Without details on the data set, i.e., how challenging the data set is, directly comparing the effectiveness of these methods is not meaningful or even possible. Thus, in this study we focus on predicting the difficulty level of key recognition as perceived by human experts. Given an audio recording, represented as the extracted acoustic features, we apply multiple linear regression and proportional odds model to predict the difficulty level of the recording, annotated by experts as an integer on a 5-point Likert scale. We use four metrics to evaluate our prediction results: root mean square error, Pearson correlation coefficient, exact accuracy, and adjacent accuracy. We also examine the difference between experts' annotations and discuss their consistency.

KW - audio key finding

KW - key difficulty recognition

KW - multiple linear regression

KW - proportional odds model

UR - http://www.scopus.com/inward/record.url?scp=84900645154&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84900645154&partnerID=8YFLogxK

U2 - 10.1109/ISM.2013.82

DO - 10.1109/ISM.2013.82

M3 - Conference contribution

AN - SCOPUS:84900645154

SN - 9780769551401

SP - 421

EP - 426

BT - Proceedings - 2013 IEEE International Symposium on Multimedia, ISM 2013

ER -