TY - GEN
T1 - Audio scene segmentation for video with generic content
AU - Niu, Feng
AU - Goela, Naveen
AU - Divakaran, Ajay
AU - Abdel-Mottaleb, Mohamed
PY - 2008/5/15
Y1 - 2008/5/15
N2 - In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on "semantic audio texture analysis." At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to an overall audio texture change and those corresponding to a special "transition marker" used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.
AB - In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on "semantic audio texture analysis." At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to an overall audio texture change and those corresponding to a special "transition marker" used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.
KW - Audio scene
KW - Segmentation
KW - Semantic texture
KW - SVM
UR - http://www.scopus.com/inward/record.url?scp=43249112529&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=43249112529&partnerID=8YFLogxK
U2 - 10.1117/12.760267
DO - 10.1117/12.760267
M3 - Conference contribution
AN - SCOPUS:43249112529
SN - 9780819469922
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - Proceedings of SPIE-IS and T Electronic Imaging - Multimedia Content Access
T2 - Multimedia Content Access: Algorithms and Systems II
Y2 - 30 January 2008 through 31 January 2008
ER -