Query:
学者姓名:柯逍
Refining:
Year
Type
Indexed by
Source
Complex
Former Name
Co-
Language
Clean All
Abstract :
Fine-grained visual classification (FGVC) is a highly challenging task that aims to learn subtle differences between visually similar objects. Most existing methods for FGVC rely on deep convolutional neural networks to mine local fine-grained features, which neglect the learning of relationships between global and local semantics. Moreover, the feature encoding stage inevitably constructs complex feature representations, leading to overfitting to specific feature patterns, which is not beneficial for fine-grained visual classification. To address these issues, we propose a Transformer-based FGVC model, called the Multi-Granularity Interaction and Feature Recombination Network(MGIFR-Net), which consists of three modules. Firstly, a self-attention guided localization module is designed to locate and amplify discriminative local regions, enabling the sufficient learning of local detail information. Secondly, to enhance the perception of multi-granularity semantic interaction information, we construct a multi-granularity feature interaction learning module to jointly learn local and global feature representations. Finally, a dynamic feature recombination enhancement method is proposed, which explores diverse feature pattern combinations while retaining invariant features, effectively alleviating the overfitting problem caused by complex feature representations. Our method achieves stateof-the-art performance on four benchmark FGVC datasets (CUB-200-2011, Stanford Cars, FGVC-Aircraft, and NAbirds), and experimental results demonstrate the superiority of our method on different visual classification benchmarks.
Keyword :
Feature recombination Feature recombination Fine-grained visual classification Fine-grained visual classification Multi-granularity feature interaction Multi-granularity feature interaction Vision transformer Vision transformer
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ke, Xiao , Cai, Yuhang , Chen, Baitao et al. Multi-granularity interaction and feature recombination network for fine-grained visual classification [J]. | PATTERN RECOGNITION , 2025 , 166 . |
MLA | Ke, Xiao et al. "Multi-granularity interaction and feature recombination network for fine-grained visual classification" . | PATTERN RECOGNITION 166 (2025) . |
APA | Ke, Xiao , Cai, Yuhang , Chen, Baitao , Liu, Hao , Guo, Wenzhong . Multi-granularity interaction and feature recombination network for fine-grained visual classification . | PATTERN RECOGNITION , 2025 , 166 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Accurate polyp segmentation is crucial for early diagnosis and treatment of colorectal cancer. This is a challenging task for three main reasons: (i) the problem of model overfitting and weak generalization due to the multi-center distribution of data; (ii) the problem of interclass ambiguity caused by motion blur and overexposure to endoscopic light; and (iii) the problem of intraclass inconsistency caused by the variety of morphologies and sizes of the same type of polyps. To address these challenges, we propose a new high-precision polyp segmentation framework, MEFA-Net, which consists of three modules, including the plug-and-play Mask Enhancement Module (MEG), Separable Path Attention Enhancement Module (SPAE), and Dynamic Global Attention Pool Module (DGAP). Specifically, firstly, the MEG module regionally masks the high-energy regions of the environment and polyps through a mask, which guides the model to rely on only a small amount of information to distinguish between polyps and background features, avoiding the model from overfitting the environmental information, and improving the robustness of the model. At the same time, this module can effectively counteract the 'dark corner phenomenon' in the dataset and further improve the generalization performance of the model. Next, the SPAE module can effectively alleviate the inter-class fuzzy problem by strengthening the feature expression. Then, the DGAP module solves the intra-class inconsistency problem by extracting the invariance of scale, shape and position. Finally, we propose a new evaluation metric, MultiColoScore, for comprehensively evaluating the segmentation performance of the model on five datasets with different domains. We evaluated the new method quantitatively and qualitatively on five datasets using four metrics. Experimental results show that MEFA-Net significantly improves the accuracy of polyp segmentation and outperforms current state-of-the-art algorithms. Code posted on https://github.com/847001315/MEFA-Net. © 2024
Keyword :
Endoscopy Endoscopy Image coding Image coding Image segmentation Image segmentation Risk assessment Risk assessment
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ke, Xiao , Chen, Guanhong , Liu, Hao et al. MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation [J]. | Computers in Biology and Medicine , 2025 , 186 . |
MLA | Ke, Xiao et al. "MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation" . | Computers in Biology and Medicine 186 (2025) . |
APA | Ke, Xiao , Chen, Guanhong , Liu, Hao , Guo, Wenzhong . MEFA-Net: A mask enhanced feature aggregation network for polyp segmentation . | Computers in Biology and Medicine , 2025 , 186 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
3D anomaly detection aims to solve the problem that image anomaly detection is greatly affected by lighting conditions. As commercial confidentiality and personal privacy become increasingly paramount, access to training samples is often restricted. To address these challenges, we propose a zero-shot 3D anomaly detection method. Unlike previous CLIP-based methods, the proposed method does not require any prompt and is capable of detecting anomalies on the depth modality. Furthermore, we also propose a pre-trained structural rerouting strategy, which modifies the transformer without retraining or fine-tuning for the anomaly detection task. Most importantly, this paper proposes an online voter mechanism that registers voters and performs majority voter scoring in a one-stage, zero-start and growth-oriented manner, enabling direct anomaly detection on unlabeled test sets. Finally, we also propose a confirmatory judge credibility assessment mechanism, which provides an efficient adaptation for possible few-shot conditions. Results on datasets such as MVTec3D-AD demonstrate that the proposed method can achieve superior zero-shot 3D anomaly detection performance, indicating its pioneering contributions within the pertinent domain.
Keyword :
Anomaly detection Anomaly detection Multimodal Multimodal Online voter mechanism Online voter mechanism Pretrained model Pretrained model Zero-shot Zero-shot
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Zheng, Wukun , Ke, Xiao , Guo, Wenzhong . Zero-shot 3D anomaly detection via online voter mechanism [J]. | NEURAL NETWORKS , 2025 , 187 . |
MLA | Zheng, Wukun et al. "Zero-shot 3D anomaly detection via online voter mechanism" . | NEURAL NETWORKS 187 (2025) . |
APA | Zheng, Wukun , Ke, Xiao , Guo, Wenzhong . Zero-shot 3D anomaly detection via online voter mechanism . | NEURAL NETWORKS , 2025 , 187 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Action quality assessment (AQA) is a challenging vision task that requires discerning and quantifying subtle differences in actions from the same class. While recent research has made strides in creating fine-grained annotations for more precise analysis, existing methods primarily focus on coarse action segmentation, leading to limited identification of discriminative action frames. To address this issue, we propose a Vision-Language Action Knowledge Learning approach for action quality assessment, along with a multi-grained alignment framework to understand different levels of action knowledge. In our framework, prior knowledge, such as specialized terminology, is embedded into video-level, stage-level, and frame-level representations via CLIP. We further propose a new semantic-aware collaborative attention module to prevent confusing interactions and preserve textual knowledge in cross-modal and cross-semantic spaces. Specifically, we leverage the powerful cross-modal knowledge of CLIP to embed textual semantics into image features, which then guide action spatial-temporal representations. Our approach can be plug-and-played with existing AQA methods, frame-wise annotations or not. Extensive experiments and ablation studies show that our approach achieves state-of-the-art on four public short and long-term AQA benchmarks: FineDiving, MTL-AQA, JIGSAWS, and Fis-V.
Keyword :
Action quality assessment Action quality assessment Semantic-aware learning Semantic-aware learning Vision-language pre-training Vision-language pre-training
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Xu, Huangbiao , Ke, Xiao , Li, Yuezhou et al. Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment [J]. | COMPUTER VISION - ECCV 2024, PT XLII , 2025 , 15100 : 423-440 . |
MLA | Xu, Huangbiao et al. "Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment" . | COMPUTER VISION - ECCV 2024, PT XLII 15100 (2025) : 423-440 . |
APA | Xu, Huangbiao , Ke, Xiao , Li, Yuezhou , Xu, Rui , Wu, Huanqi , Lin, Xiaofeng et al. Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment . | COMPUTER VISION - ECCV 2024, PT XLII , 2025 , 15100 , 423-440 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
增强水下图像质量对水下作业领域的发展具有重要意义 . 现有的水下图像增强方法通常基于成对的水下图像和参考图像进行训练,然而实际获取与水下图像对应的参考图像比较困难,相比之下获得非成对高质量水下图像或者陆上图像较为容易. 此外,现有的水下图像增强方法很难同时针对各种失真类型进行图像增强. 为了避免对成对训练数据的依赖和进一步降低获得训练数据的难度,并应对多样的水下图像失真类型,本文提出了一种基于分频式生成对抗网络(Frequency-Decomposed Generative Adversarial Network,FD-GAN)的非成对水下图像增强方法,并在此基础上设计了高低频双分支生成器用于重建高质量水下增强图像. 具体来说,本文引入特征级别的小波变换将特征分为低频和高频部分,并基于循环一致性生成对抗网络对低频和高频部分区分处理. 其中,低频分支采用结合低频注意力机制的编码-解码器结构实现对图像颜色和亮度的增强,高频分支则采用并行的高频注意力机制对各高频分量进行增强,从而实现对图像细节的恢复. 在多个标准水下图像数据集上的实验结果表明,本文提出的方法在使用非成对的高质量水下图像和引入部分陆上图像的情况下,均能有效生成高质量的水下增强图像,且有效性和泛化性均优于当 前主流的水下图像增强方法.
Keyword :
小波变换 小波变换 水下图像增强 水下图像增强 注意力机制 注意力机制 生成对抗网络 生成对抗网络 高低频双分支生成器 高低频双分支生成器
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | 牛玉贞 , 张凌昕 , 兰杰 et al. 基于分频式生成对抗网络的非成对水下图像增强 [J]. | 电子学报 , 2025 . |
MLA | 牛玉贞 et al. "基于分频式生成对抗网络的非成对水下图像增强" . | 电子学报 (2025) . |
APA | 牛玉贞 , 张凌昕 , 兰杰 , 许瑞 , 柯逍 . 基于分频式生成对抗网络的非成对水下图像增强 . | 电子学报 , 2025 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Enhancing the quality of underwater images is crucial for advancements in the fields of underwater exploration and underwater rescue. Existing underwater image enhancement methods typically rely on paired underwater images and reference images for training. However, obtaining corresponding reference images for underwater images is challenging in practice. In contrast, acquiring high-quality unpaired underwater images or images captured on land are relatively more straightforward. Furthermore, existing techniques for underwater image enhancement often struggle to address a variety of distortion types simultaneously. To avoid the reliance on paired training data, reduce the difficulty of acquiring training data, and effectively handle diverse types of underwater image distortions, in this paper, we propose a novel unpaired underwater image enhancement method based on the frequency-decomposed generative adversarial network (FD-GAN). We design a dual-branch generator based on high and low frequencies to reconstruct high-quality underwater images. Specifically, feature-level wavelet transform is introduced to separate the features into low-frequency and high-frequency parts. Then the separated features are processed by a cycle-consistent generative adversarial network, so as to simultaneously enhance the color and luminance in the low-frequency component and details in the high-frequency part. More specific, the low-frequency branch employs an encoder-decoder structure with a low-frequency attention mechanism to enhance the color and brightness of the image. The high-frequency branch utilizes parallel high-frequency attention mechanisms to enhance various high-frequency components, thereby achieving the restoration of image details. Experimental results on multiple datasets show that the proposed method trained with unpaired high-quality underwater images or unpaired high-quality underwater images and on-land images, can effectively generate high-quality underwater enhanced images and the proposed method is superior to the state-of-the-art underwater image enhancement methods in terms of effectiveness and generalization. © 2025 Chinese Institute of Electronics. All rights reserved.
Keyword :
Color image processing Color image processing Image coding Image coding Image compression Image compression Image enhancement Image enhancement Photointerpretation Photointerpretation Underwater photography Underwater photography Wavelet decomposition Wavelet decomposition
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Niu, Yu-Zhen , Zhang, Ling-Xin , Lan, Jie et al. FD-GAN: Frequency-Decomposed Generative Adversarial Network for Unpaired Underwater Image Enhancement [J]. | Acta Electronica Sinica , 2025 , 53 (2) : 527-544 . |
MLA | Niu, Yu-Zhen et al. "FD-GAN: Frequency-Decomposed Generative Adversarial Network for Unpaired Underwater Image Enhancement" . | Acta Electronica Sinica 53 . 2 (2025) : 527-544 . |
APA | Niu, Yu-Zhen , Zhang, Ling-Xin , Lan, Jie , Xu, Rui , Ke, Xiao . FD-GAN: Frequency-Decomposed Generative Adversarial Network for Unpaired Underwater Image Enhancement . | Acta Electronica Sinica , 2025 , 53 (2) , 527-544 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
The fair and objective assessment of performances and competitions is a common pursuit and challenge in human society. The application of computer vision technology offers hope for this purpose, but it still faces obstacles such as occlusion and motion blur. To address these hindrances, our DanceFix proposes a bidirectional spatial-temporal context optical flow correction (BOFC) method. This approach leverages the consistency and complementarity of motion information between two modalities: optical flow, which excels at pixel capture, and lightweight skeleton data. It enables the extraction of pixel-level motion changes and the correction of abnormal skeleton data. Furthermore, we propose a part-level dance dataset (Dancer Parts) and part-level motion feature extraction based on task decoupling (PETD). This aims to decouple complex whole-body parts tracking into fine-grained limb-level motion extraction, enhancing the confidence of temporal information and the accuracy of correction for abnormal data. Finally, we present the DNV dataset, which simulates fully neat group dance scenes and provides reliable labels and validation methods for the newly introduced group dance neatness assessment (GDNA). To the best of our knowledge, this is the first work to develop quantitative criteria for assessing limb and joint neatness in group dance. We conduct experiments on DNV and video-based public JHMDB datasets. Our method effectively corrects abnormal skeleton points, flexibly embeds, and improves the accuracy of existing pose estimation algorithms. Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Xu, Huangbiao , Ke, Xiao , Wu, Huanqi et al. DanceFix: An Exploration in Group Dance Neatness Assessment Through Fixing Abnormal Challenges of Human Pose [C] . 2025 : 8869-8877 . |
MLA | Xu, Huangbiao et al. "DanceFix: An Exploration in Group Dance Neatness Assessment Through Fixing Abnormal Challenges of Human Pose" . (2025) : 8869-8877 . |
APA | Xu, Huangbiao , Ke, Xiao , Wu, Huanqi , Xu, Rui , Li, Yuezhou , Xu, Peirong et al. DanceFix: An Exploration in Group Dance Neatness Assessment Through Fixing Abnormal Challenges of Human Pose . (2025) : 8869-8877 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Image hiding aims to conceal one or more secret images within a cover image of the same resolution. Due to strict capacity requirements, image hiding is commonly called large-capacity steganography. In this paper, we propose StegFormer, a novel autoencoder-based image-hiding model. StegFormer can conceal one or multiple secret images within a cover image of the same resolution while preserving the high visual quality of the stego image. In addition, to mitigate the limitations of current steganographic models in real-world scenarios, we propose a normalizing training strategy and a restrict loss to improve the reliability of the steganographic models under realistic conditions. Furthermore, we propose an efficient steganographic capacity expansion method to increase the capacity of steganography and enhance the efficiency of secret communication. Through this approach, we can increase the relative payload of StegFormer to 96 bits per pixel without any training strategy modifications. Experiments demonstrate that our StegFormer outperforms existing state-of-the-art (SOTA) models. In the case of single-image steganography, there is an improvement of more than 3 dB and 5 dB in PSNR for secret/recovery image pairs and cover/stego image pairs. Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Keyword :
Artificial intelligence Artificial intelligence Image enhancement Image enhancement Learning systems Learning systems Steganography Steganography
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ke, Xiao , Wu, Huanqi , Guo, Wenzhong . StegFormer: Rebuilding the Glory of Autoencoder-Based Steganography [C] . 2024 : 2723-2731 . |
MLA | Ke, Xiao et al. "StegFormer: Rebuilding the Glory of Autoencoder-Based Steganography" . (2024) : 2723-2731 . |
APA | Ke, Xiao , Wu, Huanqi , Guo, Wenzhong . StegFormer: Rebuilding the Glory of Autoencoder-Based Steganography . (2024) : 2723-2731 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
时空动作检测依赖于视频空间信息与时间信息的学习. 目前,最先进的基于卷积神经网络的动作检测器采用2D CNN或3D CNN架构,取得了显著的效果. 然而,由于网络结构的复杂性与时空信息感知的原因,这些方法通常采用非实时、离线的方式. 时空动作检测主要的挑战在于设计高效的检测网络架构,并能有效地感知融合时空特征. 考虑到上述问题,本文提出了一种基于时空交叉感知的实时动作检测方法. 该方法首先通过对输入视频进行乱序重排来增强时序信息,针对仅使用2D或3D骨干网络无法有效对时空特征进行建模,提出了基于时空交叉感知的多分支特征提取网络. 针对单一尺度时空特征描述性不足,提出一个多尺度注意力网络来学习长期的时间依赖和空间上下文信息. 针对时序和空间两种不同来源特征的融合,提出了一种新的运动显著性增强融合策略,对时空信息进行编码交叉映射,引导时序特征和空间特征之间的融合,突出更具辨别力的时空特征表示. 最后,基于帧级检测器结果在线计算动作关联性链接 . 本文提出的方法在两个时空动作数据集 UCF101-24 和 JHMDB-21 上分别达到了 84.71% 和78.4%的准确率,优于现有最先进的方法,并达到 73帧/秒的速度 . 此外,针对 JHMDB-21数据集存在高类间相似性与难样本数据易于混淆等问题,本文提出了基于动作表示的关键帧光流动作检测方法,避免了冗余光流的产生,进一步提升了动作检测准确率.
Keyword :
多尺度注意力 多尺度注意力 实时动作检测 实时动作检测 时空交叉感知 时空交叉感知
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | 柯逍 , 缪欣 , 郭文忠 . 基于时空交叉感知的实时动作检测方法 [J]. | 电子学报 , 2024 . |
MLA | 柯逍 et al. "基于时空交叉感知的实时动作检测方法" . | 电子学报 (2024) . |
APA | 柯逍 , 缪欣 , 郭文忠 . 基于时空交叉感知的实时动作检测方法 . | 电子学报 , 2024 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
The rising popularity of light field imaging underscores the pivotal role of image quality in user experience. However, evaluating the quality of light field images presents significant challenges owing to their highdimensional nature. Current quality assessment methods for light field images predominantly rely on machine learning or statistical analysis, often overlooking the interdependence among pixels. To overcome this limitation, we propose an innovative approach that employs a universal backbone network and introduces a dual-task framework for feature extraction. Specifically, we integrate a staged "primary-secondary" hierarchical evaluation mode into the universal backbone networks, enabling accurate quality score inference while preserving the intrinsic information of the original image. Our proposed approach reduces inference time by over 75% compared to existing methods, simultaneously achieving state-of-the-art results in terms of evaluation metrics. By harnessing the efficiency of neural networks, our framework offers an effective solution for the quality assessment of light field images, providing superior accuracy and speed compared to current methodologies.
Keyword :
Deep learning Deep learning Image quality assessment Image quality assessment Light field images Light field images Multitasking mode Multitasking mode
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Guo, Wenzhong , Wang, Hanling , Ke, Xiao . Splitting the backbone: A novel hierarchical method for assessing light field image quality [J]. | OPTICS AND LASERS IN ENGINEERING , 2024 , 178 . |
MLA | Guo, Wenzhong et al. "Splitting the backbone: A novel hierarchical method for assessing light field image quality" . | OPTICS AND LASERS IN ENGINEERING 178 (2024) . |
APA | Guo, Wenzhong , Wang, Hanling , Ke, Xiao . Splitting the backbone: A novel hierarchical method for assessing light field image quality . | OPTICS AND LASERS IN ENGINEERING , 2024 , 178 . |
Export to | NoteExpress RIS BibTex |
Version :
Export
Results: |
Selected to |
Format: |