TY - JOUR
T1 - A Novel Collaborative Optimization Framework for Web Video Event Mining Based on the Combination of Inaccurate Visual Similarity Detection Information and Sparse Textual Information
AU - Zhang, Chengde
AU - Jin, Dandan
AU - Xiao, Xia
AU - Chen, Gao
AU - Shyu, Mei Ling
N1 - Funding Information:
This work was supported in part by the Fundamental Research Funds for the Central Universities Program under Grant 2722019PY062, in part by the Humanities and Social Sciences Research Project of Hubei Education Department under Grant 18G012, in part by the Program for Excellent Project of Student Work in Colleges and Universities of Hubei Province under Grant 2018XGJPB3017, in part by the Laboratory Research Projects of Colleges and Universities in Hubei Province under Grant HBSY2018-48, in part by the Fundamental Research Funds for the Central Universities, Zhongnan University of Economics and Law, under Grant 2722019JCT037 and Grant 201911417, and in part by the Humanities and Social Sciences Planning Project of the China Ministry of Education under Grant 19YJAZH099.
PY - 2020
Y1 - 2020
N2 - The high speed and low latency of 5G mobile network have accelerated the speed and amount of information transmission. Web video is likely to become the main mode of news production and dissemination in the future for its richer information and more convenient dissemination, which will subvert the traditional mode of event mining. Therefore, event mining based on web videos has become a new research hotspot. However, web videos are vulnerable to video editing, lighting, shooting perspective and shooting angle, and other factors, resulting in the inaccurate visual similarity detection problem. Generally speaking, effectively integrating humungous volumes of cross-model information would give a great help. However, web videos are described with few terms, and thus sparse text information becomes a challenge for cross-model information combination. To address this issue, this paper proposes a new collaborative optimization framework with the combination of inaccurate visual similarity detection information and sparse textual information. This framework is composed of three steps. After obtaining the statistics of the distribution characteristics of each word in all Near-Duplicate Keyframes (NDKs), the high-level semantic cross-correlation between NDKs is first mined with the help of textual features, forming a new set of semantic relevant NDKs with different visual expressions. Next, textual distribution features are enriched through finding more semantically related words by the new NDK set with various forms of visual expressions, solving the sparse distribution problem for each word in all NDKs. Finally, Multiple Correspondence Analysis (MCA) is used to mine the events. Experimental results with a large number of real data demonstrate that the proposed model outperforms the existing methods for web video event mining.
AB - The high speed and low latency of 5G mobile network have accelerated the speed and amount of information transmission. Web video is likely to become the main mode of news production and dissemination in the future for its richer information and more convenient dissemination, which will subvert the traditional mode of event mining. Therefore, event mining based on web videos has become a new research hotspot. However, web videos are vulnerable to video editing, lighting, shooting perspective and shooting angle, and other factors, resulting in the inaccurate visual similarity detection problem. Generally speaking, effectively integrating humungous volumes of cross-model information would give a great help. However, web videos are described with few terms, and thus sparse text information becomes a challenge for cross-model information combination. To address this issue, this paper proposes a new collaborative optimization framework with the combination of inaccurate visual similarity detection information and sparse textual information. This framework is composed of three steps. After obtaining the statistics of the distribution characteristics of each word in all Near-Duplicate Keyframes (NDKs), the high-level semantic cross-correlation between NDKs is first mined with the help of textual features, forming a new set of semantic relevant NDKs with different visual expressions. Next, textual distribution features are enriched through finding more semantically related words by the new NDK set with various forms of visual expressions, solving the sparse distribution problem for each word in all NDKs. Finally, Multiple Correspondence Analysis (MCA) is used to mine the events. Experimental results with a large number of real data demonstrate that the proposed model outperforms the existing methods for web video event mining.
KW - Event mining
KW - near-duplicate keyframes (NDKs)
KW - topic detection and tracking (TDT)
KW - web video
UR - http://www.scopus.com/inward/record.url?scp=85078700463&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078700463&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.2964714
DO - 10.1109/ACCESS.2020.2964714
M3 - Article
AN - SCOPUS:85078700463
VL - 8
SP - 10516
EP - 10527
JO - IEEE Access
JF - IEEE Access
SN - 2169-3536
M1 - 8951020
ER -