Audio scene segmentation for video with generic content

Feng Niu, Naveen Goela, Ajay Divakaran, Mohamed Abdel-Mottaleb

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on "semantic audio texture analysis." At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to an overall audio texture change and those corresponding to a special "transition marker" used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.

Original languageEnglish
Title of host publicationProceedings of SPIE - The International Society for Optical Engineering
Volume6820
DOIs
StatePublished - May 15 2008
EventMultimedia Content Access: Algorithms and Systems II - San Jose, CA, United States
Duration: Jan 30 2008Jan 31 2008

Other

OtherMultimedia Content Access: Algorithms and Systems II
CountryUnited States
CitySan Jose, CA
Period1/30/081/31/08

Fingerprint

Textures
textures
markers
semantics
music
Semantics
audio data

Keywords

  • Audio scene
  • Segmentation
  • Semantic texture
  • SVM

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Condensed Matter Physics

Cite this

Niu, F., Goela, N., Divakaran, A., & Abdel-Mottaleb, M. (2008). Audio scene segmentation for video with generic content. In Proceedings of SPIE - The International Society for Optical Engineering (Vol. 6820). [68200S] https://doi.org/10.1117/12.760267

Audio scene segmentation for video with generic content. / Niu, Feng; Goela, Naveen; Divakaran, Ajay; Abdel-Mottaleb, Mohamed.

Proceedings of SPIE - The International Society for Optical Engineering. Vol. 6820 2008. 68200S.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Niu, F, Goela, N, Divakaran, A & Abdel-Mottaleb, M 2008, Audio scene segmentation for video with generic content. in Proceedings of SPIE - The International Society for Optical Engineering. vol. 6820, 68200S, Multimedia Content Access: Algorithms and Systems II, San Jose, CA, United States, 1/30/08. https://doi.org/10.1117/12.760267
Niu F, Goela N, Divakaran A, Abdel-Mottaleb M. Audio scene segmentation for video with generic content. In Proceedings of SPIE - The International Society for Optical Engineering. Vol. 6820. 2008. 68200S https://doi.org/10.1117/12.760267
Niu, Feng ; Goela, Naveen ; Divakaran, Ajay ; Abdel-Mottaleb, Mohamed. / Audio scene segmentation for video with generic content. Proceedings of SPIE - The International Society for Optical Engineering. Vol. 6820 2008.
@inproceedings{c365d5ec75ed4d03aeaf288e6f23a4b6,
title = "Audio scene segmentation for video with generic content",
abstract = "In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on {"}semantic audio texture analysis.{"} At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to an overall audio texture change and those corresponding to a special {"}transition marker{"} used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.",
keywords = "Audio scene, Segmentation, Semantic texture, SVM",
author = "Feng Niu and Naveen Goela and Ajay Divakaran and Mohamed Abdel-Mottaleb",
year = "2008",
month = "5",
day = "15",
doi = "10.1117/12.760267",
language = "English",
isbn = "9780819469922",
volume = "6820",
booktitle = "Proceedings of SPIE - The International Society for Optical Engineering",

}

TY - GEN

T1 - Audio scene segmentation for video with generic content

AU - Niu, Feng

AU - Goela, Naveen

AU - Divakaran, Ajay

AU - Abdel-Mottaleb, Mohamed

PY - 2008/5/15

Y1 - 2008/5/15

N2 - In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on "semantic audio texture analysis." At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to an overall audio texture change and those corresponding to a special "transition marker" used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.

AB - In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on "semantic audio texture analysis." At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to an overall audio texture change and those corresponding to a special "transition marker" used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.

KW - Audio scene

KW - Segmentation

KW - Semantic texture

KW - SVM

UR - http://www.scopus.com/inward/record.url?scp=43249112529&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=43249112529&partnerID=8YFLogxK

U2 - 10.1117/12.760267

DO - 10.1117/12.760267

M3 - Conference contribution

SN - 9780819469922

VL - 6820

BT - Proceedings of SPIE - The International Society for Optical Engineering

ER -