Query:
学者姓名:郭文忠
Refining:
Year
Type
Indexed by
Source
Complex
Former Name
Co-
Language
Clean All
Abstract :
Action quality assessment (AQA) is a challenging vision task that requires discerning and quantifying subtle differences in actions from the same class. While recent research has made strides in creating fine-grained annotations for more precise analysis, existing methods primarily focus on coarse action segmentation, leading to limited identification of discriminative action frames. To address this issue, we propose a Vision-Language Action Knowledge Learning approach for action quality assessment, along with a multi-grained alignment framework to understand different levels of action knowledge. In our framework, prior knowledge, such as specialized terminology, is embedded into video-level, stage-level, and frame-level representations via CLIP. We further propose a new semantic-aware collaborative attention module to prevent confusing interactions and preserve textual knowledge in cross-modal and cross-semantic spaces. Specifically, we leverage the powerful cross-modal knowledge of CLIP to embed textual semantics into image features, which then guide action spatial-temporal representations. Our approach can be plug-and-played with existing AQA methods, frame-wise annotations or not. Extensive experiments and ablation studies show that our approach achieves state-of-the-art on four public short and long-term AQA benchmarks: FineDiving, MTL-AQA, JIGSAWS, and Fis-V.
Keyword :
Action quality assessment Action quality assessment Semantic-aware learning Semantic-aware learning Vision-language pre-training Vision-language pre-training
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Xu, Huangbiao , Ke, Xiao , Li, Yuezhou et al. Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment [J]. | COMPUTER VISION - ECCV 2024, PT XLII , 2025 , 15100 : 423-440 . |
MLA | Xu, Huangbiao et al. "Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment" . | COMPUTER VISION - ECCV 2024, PT XLII 15100 (2025) : 423-440 . |
APA | Xu, Huangbiao , Ke, Xiao , Li, Yuezhou , Xu, Rui , Wu, Huanqi , Lin, Xiaofeng et al. Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment . | COMPUTER VISION - ECCV 2024, PT XLII , 2025 , 15100 , 423-440 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Asa promising area in machine learning, multi-view learning enhances model performance by integrating data from various views. With the rise of graph convolutional networks, many studies have explored incorporating them into multi-view learning frameworks. However, these methods often require storing the entire graph topology, leading to significant memory demands. Additionally, iterative update operations in graph convolutions lead to longer inference times, making it difficult to deploy existing multi-view learning models on large graphs. To overcome these challenges, we introduce an efficient multi-view graph convolutional network via local aggregation and global propagation. In the local aggregation module, we use a structure-aware matrix for feature aggregation, which significantly reduces computational complexity compared to traditional graph convolutions. After that, we design a global propagation module that allows the model to be trained in batches, enabling deployment on large-scale graphs. Finally, we introduce the attention mechanism into multi-view feature fusion to more effectively explore the consistency and complementarity between views. The proposed method is employed to perform multi-view semi-supervised classification, and comprehensive experimental results on benchmark datasets validate its effectiveness.
Keyword :
Graph neural networks Graph neural networks Local aggregation Local aggregation Multi-view learning Multi-view learning Representation learning Representation learning Semi-supervised classification Semi-supervised classification
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Liu, Lu , Shi, Yongquan , Pi, Yueyang et al. Efficient multi-view graph convolutional networks via local aggregation and global propagation [J]. | EXPERT SYSTEMS WITH APPLICATIONS , 2025 , 266 . |
MLA | Liu, Lu et al. "Efficient multi-view graph convolutional networks via local aggregation and global propagation" . | EXPERT SYSTEMS WITH APPLICATIONS 266 (2025) . |
APA | Liu, Lu , Shi, Yongquan , Pi, Yueyang , Guo, Wenzhong , Wang, Shiping . Efficient multi-view graph convolutional networks via local aggregation and global propagation . | EXPERT SYSTEMS WITH APPLICATIONS , 2025 , 266 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
3D anomaly detection aims to solve the problem that image anomaly detection is greatly affected by lighting conditions. As commercial confidentiality and personal privacy become increasingly paramount, access to training samples is often restricted. To address these challenges, we propose a zero-shot 3D anomaly detection method. Unlike previous CLIP-based methods, the proposed method does not require any prompt and is capable of detecting anomalies on the depth modality. Furthermore, we also propose a pre-trained structural rerouting strategy, which modifies the transformer without retraining or fine-tuning for the anomaly detection task. Most importantly, this paper proposes an online voter mechanism that registers voters and performs majority voter scoring in a one-stage, zero-start and growth-oriented manner, enabling direct anomaly detection on unlabeled test sets. Finally, we also propose a confirmatory judge credibility assessment mechanism, which provides an efficient adaptation for possible few-shot conditions. Results on datasets such as MVTec3D-AD demonstrate that the proposed method can achieve superior zero-shot 3D anomaly detection performance, indicating its pioneering contributions within the pertinent domain.
Keyword :
Anomaly detection Anomaly detection Multimodal Multimodal Online voter mechanism Online voter mechanism Pretrained model Pretrained model Zero-shot Zero-shot
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Zheng, Wukun , Ke, Xiao , Guo, Wenzhong . Zero-shot 3D anomaly detection via online voter mechanism [J]. | NEURAL NETWORKS , 2025 , 187 . |
MLA | Zheng, Wukun et al. "Zero-shot 3D anomaly detection via online voter mechanism" . | NEURAL NETWORKS 187 (2025) . |
APA | Zheng, Wukun , Ke, Xiao , Guo, Wenzhong . Zero-shot 3D anomaly detection via online voter mechanism . | NEURAL NETWORKS , 2025 , 187 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Accurate polyp segmentation is crucial for early diagnosis and treatment of colorectal cancer. This is a challenging task for three main reasons: (i) the problem of model overfitting and weak generalization due to the multi-center distribution of data; (ii) the problem of interclass ambiguity caused by motion blur and overexposure to endoscopic light; and (iii) the problem of intraclass inconsistency caused by the variety of morphologies and sizes of the same type of polyps. To address these challenges, we propose a new high-precision polyp segmentation framework, MEFA-Net, which consists of three modules, including the plug-and-play Mask Enhancement Module (MEG), Separable Path Attention Enhancement Module (SPAE), and Dynamic Global Attention Pool Module (DGAP). Specifically, firstly, the MEG module regionally masks the high-energy regions of the environment and polyps through a mask, which guides the model to rely on only a small amount of information to distinguish between polyps and background features, avoiding the model from overfitting the environmental information, and improving the robustness of the model. At the same time, this module can effectively counteract the 'dark corner phenomenon' in the dataset and further improve the generalization performance of the model. Next, the SPAE module can effectively alleviate the inter-class fuzzy problem by strengthening the feature expression. Then, the DGAP module solves the intra-class inconsistency problem by extracting the invariance of scale, shape and position. Finally, we propose a new evaluation metric, MultiColoScore, for comprehensively evaluating the segmentation performance of the model on five datasets with different domains. We evaluated the new method quantitatively and qualitatively on five datasets using four metrics. Experimental results show that MEFA-Net significantly improves the accuracy of polyp segmentation and outperforms current state-of-the-art algorithms. Code posted on https://github.com/847001315/MEFA-Net. © 2024
Keyword :
Endoscopy Endoscopy Image coding Image coding Image segmentation Image segmentation Risk assessment Risk assessment
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ke, Xiao , Chen, Guanhong , Liu, Hao et al. MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation [J]. | Computers in Biology and Medicine , 2025 , 186 . |
MLA | Ke, Xiao et al. "MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation" . | Computers in Biology and Medicine 186 (2025) . |
APA | Ke, Xiao , Chen, Guanhong , Liu, Hao , Guo, Wenzhong . MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation . | Computers in Biology and Medicine , 2025 , 186 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Multi-view learning has demonstrated strong potential in processing data from different sources or viewpoints. Despite the significant progress made by Multi-view Graph Neural Networks (MvGNNs) in exploiting graph structures, features, and representations, existing research generally lacks architectures specifically designed for the intrinsic properties of multi-view data. This leads to models that still have deficiencies in fully utilizing consistent and complementary information in multi-view data. Most of current research tends to simply extend the single-view GNN framework to multi-view data, lacking in-depth strategies to handle and leverage the unique properties of these data. To address this issue, we propose a simple yet effective MvGNN framework called Multi-view Representation Learning with Decoupled private and shared Propagation (MvRL-DP). This framework enables multi-view data to be effectively processed as a whole by alternating private and shared operations to integrate cross-view information. In addition, to address possible inconsistencies between views, we present a discriminative loss that promotes class separability and prevents the model from being misled by noise hidden in multi-view data. Experiments demonstrate that the proposed framework is superior to current state-of-the-art methods in the multi-view semi-supervised classification task.
Keyword :
Multi-view learning Multi-view learning Propagation decoupling Propagation decoupling Representation learning Representation learning Semi-supervised classification Semi-supervised classification Tensor operation Tensor operation
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Xuzheng , Lan, Shiyang , Wu, Zhihao et al. Multi-view Representation Learning with Decoupled private and shared Propagation [J]. | KNOWLEDGE-BASED SYSTEMS , 2025 , 310 . |
MLA | Wang, Xuzheng et al. "Multi-view Representation Learning with Decoupled private and shared Propagation" . | KNOWLEDGE-BASED SYSTEMS 310 (2025) . |
APA | Wang, Xuzheng , Lan, Shiyang , Wu, Zhihao , Guo, Wenzhong , Wang, Shiping . Multi-view Representation Learning with Decoupled private and shared Propagation . | KNOWLEDGE-BASED SYSTEMS , 2025 , 310 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Camera-based stereo 3D object detection estimates 3D properties of objects with binocular images only, which is a cost-effective solution for autonomous driving. The state-of-the-art methods mainly improve the detection accuracy of general objects by designing ingenious stereo matching algorithms or complex pipeline modules. Moreover, additional fine-grained annotations, such as masks or LiDAR point clouds, are often introduced to deal with the occlusion problems, which brings in high manual costs for this task. To address the detection bottleneck caused by occlusion in a more cost-effective manner, we develop a novel stereo 3D object detection method named DSC3D, which achieves significant improvements for occluded objects without introducing additional supervision. Specifically, we first report the ambiguity in feature sampling, which refers to the presence of noisy features in the sampling for occluded objects. Then, we propose the Epipolar Constraint Deform-Attention (ECDA) module to address the unreliable left-right correspondence computation in stereo matching caused by occlusion, which reweights epipolar features by adaptively aggregating local neighbor information. Furthermore, to ensure that 3D property estimation is based on robust object features, we propose visible regions guided constraint to explicitly guide the offset learning for feature sampling. Extensive experiments conducted on the KITTI benchmark have demonstrated the proposed DSC3D outperforms the state-of-the-art camera-based methods.
Keyword :
3D object detection 3D object detection autonomous driving autonomous driving binocular images binocular images occluded object occluded object stereo matching stereo matching
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Chen, Jiawei , Song, Qi , Guo, Wenzhong et al. DSC3D: Deformable Sampling Constraints in Stereo 3D Object Detection for Autonomous Driving [J]. | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2025 , 35 (3) : 2794-2805 . |
MLA | Chen, Jiawei et al. "DSC3D: Deformable Sampling Constraints in Stereo 3D Object Detection for Autonomous Driving" . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 35 . 3 (2025) : 2794-2805 . |
APA | Chen, Jiawei , Song, Qi , Guo, Wenzhong , Huang, Rui . DSC3D: Deformable Sampling Constraints in Stereo 3D Object Detection for Autonomous Driving . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2025 , 35 (3) , 2794-2805 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Fine-grained visual classification (FGVC) is a highly challenging task that aims to learn subtle differences between visually similar objects. Most existing methods for FGVC rely on deep convolutional neural networks to mine local fine-grained features, which neglect the learning of relationships between global and local semantics. Moreover, the feature encoding stage inevitably constructs complex feature representations, leading to overfitting to specific feature patterns, which is not beneficial for fine-grained visual classification. To address these issues, we propose a Transformer-based FGVC model, called the Multi-Granularity Interaction and Feature Recombination Network(MGIFR-Net), which consists of three modules. Firstly, a self-attention guided localization module is designed to locate and amplify discriminative local regions, enabling the sufficient learning of local detail information. Secondly, to enhance the perception of multi-granularity semantic interaction information, we construct a multi-granularity feature interaction learning module to jointly learn local and global feature representations. Finally, a dynamic feature recombination enhancement method is proposed, which explores diverse feature pattern combinations while retaining invariant features, effectively alleviating the overfitting problem caused by complex feature representations. Our method achieves stateof-the-art performance on four benchmark FGVC datasets (CUB-200-2011, Stanford Cars, FGVC-Aircraft, and NAbirds), and experimental results demonstrate the superiority of our method on different visual classification benchmarks.
Keyword :
Feature recombination Feature recombination Fine-grained visual classification Fine-grained visual classification Multi-granularity feature interaction Multi-granularity feature interaction Vision transformer Vision transformer
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ke, Xiao , Cai, Yuhang , Chen, Baitao et al. Multi-granularity interaction and feature recombination network for fine-grained visual classification [J]. | PATTERN RECOGNITION , 2025 , 166 . |
MLA | Ke, Xiao et al. "Multi-granularity interaction and feature recombination network for fine-grained visual classification" . | PATTERN RECOGNITION 166 (2025) . |
APA | Ke, Xiao , Cai, Yuhang , Chen, Baitao , Liu, Hao , Guo, Wenzhong . Multi-granularity interaction and feature recombination network for fine-grained visual classification . | PATTERN RECOGNITION , 2025 , 166 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
The fair and objective assessment of performances and competitions is a common pursuit and challenge in human society. The application of computer vision technology offers hope for this purpose, but it still faces obstacles such as occlusion and motion blur. To address these hindrances, our DanceFix proposes a bidirectional spatial-temporal context optical flow correction (BOFC) method. This approach leverages the consistency and complementarity of motion information between two modalities: optical flow, which excels at pixel capture, and lightweight skeleton data. It enables the extraction of pixel-level motion changes and the correction of abnormal skeleton data. Furthermore, we propose a part-level dance dataset (Dancer Parts) and part-level motion feature extraction based on task decoupling (PETD). This aims to decouple complex whole-body parts tracking into fine-grained limb-level motion extraction, enhancing the confidence of temporal information and the accuracy of correction for abnormal data. Finally, we present the DNV dataset, which simulates fully neat group dance scenes and provides reliable labels and validation methods for the newly introduced group dance neatness assessment (GDNA). To the best of our knowledge, this is the first work to develop quantitative criteria for assessing limb and joint neatness in group dance. We conduct experiments on DNV and video-based public JHMDB datasets. Our method effectively corrects abnormal skeleton points, flexibly embeds, and improves the accuracy of existing pose estimation algorithms. Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Xu, Huangbiao , Ke, Xiao , Wu, Huanqi et al. DanceFix: An Exploration in Group Dance Neatness Assessment Through Fixing Abnormal Challenges of Human Pose [C] . 2025 : 8869-8877 . |
MLA | Xu, Huangbiao et al. "DanceFix: An Exploration in Group Dance Neatness Assessment Through Fixing Abnormal Challenges of Human Pose" . (2025) : 8869-8877 . |
APA | Xu, Huangbiao , Ke, Xiao , Wu, Huanqi , Xu, Rui , Li, Yuezhou , Xu, Peirong et al. DanceFix: An Exploration in Group Dance Neatness Assessment Through Fixing Abnormal Challenges of Human Pose . (2025) : 8869-8877 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Multi-view learning methods leverage multiple data sources to enhance perception by mining correlations across views, typically relying on predefined categories. However, deploying these models in real-world scenarios presents two primary openness challenges. 1) Lack of Interpretability: The integration mechanisms of multi-view data in existing black-box models remain poorly explained; 2) Insufficient Generalization: Most models are not adapted to multi-view scenarios involving unknown categories. To address these challenges, we propose OpenViewer, an openness-aware multi-view learning framework with theoretical support. This framework begins with a Pseudo-Unknown Sample Generation Mechanism to efficiently simulate open multi-view environments and previously adapt to potential unknown samples. Subsequently, we introduce an Expression-Enhanced Deep Unfolding Network to intuitively promote interpretability by systematically constructing functional prior-mapping modules and effectively providing a more transparent integration mechanism for multi-view data. Additionally, we establish a Perception-Augmented Open-Set Training Regime to significantly enhance generalization by precisely boosting confidences for known categories and carefully suppressing inappropriate confidences for unknown ones. Experimental results demonstrate that OpenViewer effectively addresses openness challenges while ensuring recognition performance for both known and unknown samples. Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Keyword :
Deep learning Deep learning Multi-task learning Multi-task learning
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Du, Shide , Fang, Zihan , Tan, Yanchao et al. OpenViewer: Openness-Aware Multi-View Learning [C] . 2025 : 16389-16397 . |
MLA | Du, Shide et al. "OpenViewer: Openness-Aware Multi-View Learning" . (2025) : 16389-16397 . |
APA | Du, Shide , Fang, Zihan , Tan, Yanchao , Wang, Changwei , Wang, Shiping , Guo, Wenzhong . OpenViewer: Openness-Aware Multi-View Learning . (2025) : 16389-16397 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
In real-world scenarios, multi-view data comprises heterogeneous features, with each feature corresponding to a specific view. The objective of multi-view semi-supervised classification is to enhance classification performance by leveraging the inherent complementary and consistent information present within diverse views. Nevertheless, many existing frameworks primarily focus on assigning suitable weights to different views while neglecting the importance of consistent information. In this paper, a multi-view semi-supervised classification framework called joint diversity and consistency graph convolutional network (JDC-GCN) is proposed. Firstly, the structure of graph convolutional network is introduced to the multi-view semi-supervised classification, capable of propagating the label information over the topological structure of multi-view data. Secondly, the proposed JDC-GCN captures the complementary and consistent information from multiple views through two indispensable sub-modules, Diversity-GCN and Consistency-GCN, respectively. Finally, the attention mechanism is leveraged to dynamically adjust the weights of various views, allowing us to measure the significance of heterogeneous features and the consistent graph without introducing additional parameters. Comprehensive experiments on eight multi-view datasets are conducted to validate the effectiveness of the JDC-GCN algorithm. The results show that the proposed method exhibits superior classification performance compared to other state-of-the-art methods. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.
Keyword :
Adversarial machine learning Adversarial machine learning Contrastive Learning Contrastive Learning Convolutional neural networks Convolutional neural networks Graph algorithms Graph algorithms Network theory (graphs) Network theory (graphs) Self-supervised learning Self-supervised learning Semi-supervised learning Semi-supervised learning
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Lin, Renjie , Yao, Jie , Wang, Shiping et al. JDC-GCN: joint diversity and consistency graph convolutional network [J]. | Neural Computing and Applications , 2025 , 37 (16) : 10407-10423 . |
MLA | Lin, Renjie et al. "JDC-GCN: joint diversity and consistency graph convolutional network" . | Neural Computing and Applications 37 . 16 (2025) : 10407-10423 . |
APA | Lin, Renjie , Yao, Jie , Wang, Shiping , Guo, Wenzhong . JDC-GCN: joint diversity and consistency graph convolutional network . | Neural Computing and Applications , 2025 , 37 (16) , 10407-10423 . |
Export to | NoteExpress RIS BibTex |
Version :
Export
Results: |
Selected to |
Format: |