• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Zhou, Mingyao (Zhou, Mingyao.) [1] | Chen, Wenjing (Chen, Wenjing.) [2] | Sun, Hao (Sun, Hao.) [3] | Xie, Wei (Xie, Wei.) [4] | Dong, Ming (Dong, Ming.) [5] | Lu, Xiaoqiang (Lu, Xiaoqiang.) [6] (Scholars:卢孝强)

Indexed by:

EI

Abstract:

Recently, weakly supervised temporal sentence grounding in videos (TSGV) has attracted extensive attention because it does not require precise start-end time annotations during training, and it can quickly retrieve interesting segments according to user needs. In weakly supervised TSGV, query reconstruction (QR)-based methods are the current mainstream, and the quality of proposals determines their performance. QR-based methods have two problems in proposal quality. First, a multi-modal global token is usually mapped to proposals with limited duration diversity, making it difficult to capture relevant segments at varying durations in real scenarios. Additionally, Gaussian functions are typically used to generate relatively fixed weights for frames within proposals, which weigh the original video features to generate proposal-specific features. This results in query-irrelevant frames affecting the discrimination of the proposal features. In this study, we propose a query-aware multi-scale proposal network (QMN). Initially, pre-trained encoders are used to extract video and query features. Subsequently, a multi-scale proposal generation module is designed to refine video features guided by queries and diversify the duration of the proposal. This module performs multi-modal interaction and multi-scale modeling to obtain proposals of different durations. Furthermore, to extract discriminative proposal features and enhance the modeling of proposal frame correlation, a query-aware weight generator is constructed to learn frame weights to suppress query-irrelevant frame representations through contrastive learning. Finally, the masked query is reconstructed using the proposal features to select the best proposal. The effectiveness of the proposed QMN is verified through experiments on the Charades-STA and ActivityNet-Captions datasets. © 2024 Elsevier B.V.

Keyword:

Contrastive Learning Self-supervised learning Structured Query Language Supervised learning

Community:

  • [ 1 ] [Zhou, Mingyao]Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 2 ] [Zhou, Mingyao]School of Computer Science, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 3 ] [Zhou, Mingyao]National Language Resources Monitoring and Research Center for Network Media, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 4 ] [Chen, Wenjing]School of Computer Science, Hubei University of Technology, NO.28 Nanli Road, Hubei, Wuhan; 430068, China
  • [ 5 ] [Sun, Hao]Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 6 ] [Sun, Hao]School of Computer Science, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 7 ] [Sun, Hao]National Language Resources Monitoring and Research Center for Network Media, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 8 ] [Xie, Wei]Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 9 ] [Xie, Wei]School of Computer Science, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 10 ] [Xie, Wei]National Language Resources Monitoring and Research Center for Network Media, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 11 ] [Dong, Ming]Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 12 ] [Dong, Ming]School of Computer Science, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 13 ] [Dong, Ming]National Language Resources Monitoring and Research Center for Network Media, Central China Normal University, NO.152 Luoyu Road, Hubei, Wuhan; 430079, China
  • [ 14 ] [Lu, Xiaoqiang]College of Physics and Information Engineering, Fuzhou University, No.2 Wulong Jiangbei Avenue, Fujian, Fuzhou; 350002, China

Reprint 's Address:

Email:

Show more details

Related Keywords:

Source :

Knowledge-Based Systems

ISSN: 0950-7051

Year: 2024

Volume: 304

7 . 2 0 0

JCR@2023

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 1

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 0

Online/Total:96/10043927
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1