• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索
High Impact Results & Cited Count Trend for Year Keyword Cloud and Partner Relationship

Query:

学者姓名:陈飞

Refining:

Source

Submit Unfold

Co-

Submit Unfold

Language

Submit

Clean All

Sort by:
Default
  • Default
  • Title
  • Year
  • WOS Cited Count
  • Impact factor
  • Ascending
  • Descending
< Page ,Total 8 >
Parathyroid Gland Detection Based on Multi-Scale Weighted Fusion Attention Mechanism SCIE
期刊论文 | 2025 , 14 (6) | ELECTRONICS
Abstract&Keyword Cite Version(1)

Abstract :

While deep learning techniques, such as Convolutional neural networks (CNNs), show significant potential in medical applications, real-time detection of parathyroid glands (PGs) during complex surgeries remains insufficiently explored, posing challenges for surgical accuracy and outcomes. Previous studies highlight the importance of leveraging prior knowledge, such as shape, for feature extraction in detection tasks. However, they fail to address the critical multi-scale variability of PG objects, resulting in suboptimal performance and efficiency. In this paper, we propose an end-to-end framework, MSWF-PGD, for Multi-Scale Weighted Fusion Parathyroid Gland Detection. To improve accuracy and efficiency, our approach extracts feature maps from convolutional layers at multiple scales and re-weights them using cluster-aware multi-scale alignment, considering diverse attributes such as the size, color, and position of PGs. Additionally, we introduce Multi-Scale Aggregation to enhance scale interactions and enable adaptive multi-scale feature fusion, providing precise and informative locality information for detection. Extensive comparative experiments and ablation studies on the parathyroid dataset (PGsdata) demonstrate the proposed framework's superiority in accuracy and real-time efficiency, outperforming state-of-the-art models such as RetinaNet, FCOS, and YOLOv8.

Keyword :

feature fusion feature fusion multi-scale features multi-scale features object detection object detection parathyroid glands parathyroid glands prior information prior information

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Liu, Wanling , Lu, Wenhuan , Li, Yijian et al. Parathyroid Gland Detection Based on Multi-Scale Weighted Fusion Attention Mechanism [J]. | ELECTRONICS , 2025 , 14 (6) .
MLA Liu, Wanling et al. "Parathyroid Gland Detection Based on Multi-Scale Weighted Fusion Attention Mechanism" . | ELECTRONICS 14 . 6 (2025) .
APA Liu, Wanling , Lu, Wenhuan , Li, Yijian , Chen, Fei , Jiang, Fan , Wei, Jianguo et al. Parathyroid Gland Detection Based on Multi-Scale Weighted Fusion Attention Mechanism . | ELECTRONICS , 2025 , 14 (6) .
Export to NoteExpress RIS BibTex

Version :

Parathyroid Gland Detection Based on Multi-Scale Weighted Fusion Attention Mechanism Scopus
期刊论文 | 2025 , 14 (6) | Electronics (Switzerland)
EAN: Edge-Aware Network for Image Manipulation Localization SCIE
期刊论文 | 2025 , 35 (2) , 1591-1601 | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
Abstract&Keyword Cite Version(2)

Abstract :

Image manipulation has sparked widespread concern due to its potential security threats on the Internet. The boundary between the authentic and manipulated region exhibits artifacts in image manipulation localization (IML). These artifacts are more pronounced in heterogeneous image splicing and homogeneous image copy-move manipulation, while they are more subtle in removal and inpainting manipulated images. However, existing methods for image manipulation detection tend to capture boundary artifacts via explicit edge features and have limitations in effectively addressing subtle artifacts. Besides, feature redundancy caused by the powerful feature extraction capability of large models may prevent accurate identification of manipulated artifacts, exhibiting a high false-positive rate. To solve these problems, we propose a novel edge-aware network (EAN) to capture boundary artifacts effectively. This network treats the image manipulation localization problem as a segmentation problem inside and outside the boundary. In EAN, we develop an edge-aware mechanism to refine implicit and explicit edge features by the interaction of adjacent features. This approach directs the encoder to prioritize the desired edge information. Also, we design a multi-feature fusion strategy combined with an improved attention mechanism to enhance key feature representation significantly for mitigating the effects of feature redundancy. We perform thorough experiments on diverse datasets, and the outcomes confirm the efficacy of the suggested approach, surpassing leading manipulation localization techniques in the majority of scenarios.

Keyword :

attention mechanism attention mechanism Attention mechanisms Attention mechanisms convolutional neural network convolutional neural network Discrete wavelet transforms Discrete wavelet transforms Feature extraction Feature extraction feature fusion feature fusion Image edge detection Image edge detection Image manipulation localization Image manipulation localization Location awareness Location awareness Neural networks Neural networks Noise Noise Semantics Semantics Splicing Splicing Transformers Transformers

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Chen, Yun , Cheng, Hang , Wang, Haichou et al. EAN: Edge-Aware Network for Image Manipulation Localization [J]. | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2025 , 35 (2) : 1591-1601 .
MLA Chen, Yun et al. "EAN: Edge-Aware Network for Image Manipulation Localization" . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 35 . 2 (2025) : 1591-1601 .
APA Chen, Yun , Cheng, Hang , Wang, Haichou , Liu, Ximeng , Chen, Fei , Li, Fengyong et al. EAN: Edge-Aware Network for Image Manipulation Localization . | IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2025 , 35 (2) , 1591-1601 .
Export to NoteExpress RIS BibTex

Version :

EAN: Edge-Aware Network for Image Manipulation Localization EI
期刊论文 | 2025 , 35 (2) , 1591-1601 | IEEE Transactions on Circuits and Systems for Video Technology
EAN: Edge-Aware Network for Image Manipulation Localization Scopus
期刊论文 | 2024 , 35 (2) , 1591-1601 | IEEE Transactions on Circuits and Systems for Video Technology
Vision-language pre-training via modal interaction Scopus
期刊论文 | 2024 , 156 | Pattern Recognition
Abstract&Keyword Cite

Abstract :

Existing vision-language pre-training models typically extract region features and conduct fine-grained local alignment based on masked image/text completion or object detection methods. However, these models often design independent subtasks for different modalities, which may not adequately leverage interactions between modalities, requiring large datasets to achieve optimal performance. To address these limitations, this paper introduces a novel pre-training approach that facilitates fine-grained vision-language interaction. We propose two new subtasks — image filling and text filling — that utilize data from one modality to complete missing parts in another, enhancing the model's ability to integrate multi-modal information. A selector mechanism is also developed to minimize semantic overlap between modalities, thereby improving the efficiency and effectiveness of the pre-trained model. Our comprehensive experimental results demonstrate that our approach not only fosters better semantic associations among different modalities but also achieves state-of-the-art performance on downstream vision-language tasks with significantly smaller datasets. © 2024 Elsevier Ltd

Keyword :

Cross-modal Cross-modal Image captioning Image captioning Partial auxiliary Partial auxiliary Pre-training Pre-training

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Cheng, H. , Ye, H. , Zhou, X. et al. Vision-language pre-training via modal interaction [J]. | Pattern Recognition , 2024 , 156 .
MLA Cheng, H. et al. "Vision-language pre-training via modal interaction" . | Pattern Recognition 156 (2024) .
APA Cheng, H. , Ye, H. , Zhou, X. , Liu, X. , Chen, F. , Wang, M. . Vision-language pre-training via modal interaction . | Pattern Recognition , 2024 , 156 .
Export to NoteExpress RIS BibTex

Version :

融合目标定位与异构局部交互学习的细粒度图像分类
期刊论文 | 2024 , 50 (11) , 2219-2230 | 自动化学报
Abstract&Keyword Cite Version(1)

Abstract :

由于细粒度图像之间存在小的类间方差和大的类内差异,现有分类算法仅仅聚焦于单张图像的显著局部特征的提取与表示学习,忽视了多张图像之间局部的异构语义判别信息,较难关注到区分不同类别的微小细节,导致学习到的特征缺乏足够区分度.本文提出了一种渐进式网络以弱监督的方式学习图像不同粒度层级的信息.首先,构建一个注意力累计目标定位模块(Attention accumulation object localization module, AAOLM),在单张图像上从不同的训练轮次和特征提取阶段对注意力信息进行语义目标集成定位.其次,设计一个多张图像异构局部交互图模块(Heterogeneous local interactive graph module, HLIGM),提取每张图像的显著性局部区域特征,在类别标签引导下构建多张图像的局部区域特征之间的图网络,聚合局部特征增强表示的判别力.最后,利用知识蒸馏将异构局部交互图模块产生的优化信息反馈给主干网络,从而能够直接提取具有较强区分度的特征,避免了在测试阶段建图的计算开销.通过在多个数据集上进行的实验,证明了提出方法的有效性,能够提高细粒度分类的精度.

Keyword :

图神经网络 图神经网络 弱监督目标定位 弱监督目标定位 深度学习 深度学习 知识蒸馏 知识蒸馏 细粒度图像分类 细粒度图像分类

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 陈权 , 陈飞 , 王衍根 et al. 融合目标定位与异构局部交互学习的细粒度图像分类 [J]. | 自动化学报 , 2024 , 50 (11) : 2219-2230 .
MLA 陈权 et al. "融合目标定位与异构局部交互学习的细粒度图像分类" . | 自动化学报 50 . 11 (2024) : 2219-2230 .
APA 陈权 , 陈飞 , 王衍根 , 程航 , 王美清 . 融合目标定位与异构局部交互学习的细粒度图像分类 . | 自动化学报 , 2024 , 50 (11) , 2219-2230 .
Export to NoteExpress RIS BibTex

Version :

融合目标定位与异构局部交互学习的细粒度图像分类
期刊论文 | 2024 , 50 (11) , 2219-2230 | 自动化学报
基于逐像素强化学习的边缘保持图像复原
期刊论文 | 2024 , 50 (12) , 224-232 | 计算机工程
Abstract&Keyword Cite

Abstract :

高强度的高斯噪声往往会模糊或破坏图像的细节和结构,导致边缘信息的丢失.为此,提出基于逐像素强化学习的边缘保持图像复原算法.首先,为每个像素构建一个像素层智能体并设计针对边缘处的侧窗均值滤波器到动作空间中,所有的像素层智能体共享优势行动者-评论家算法的参数,因此模型可以同时输出所有位置的状态转移概率并选择合适的策略进行状态转移,从而复原图像;其次,在特征提取共享网络中结合协调注意力,聚焦所有像素位置在特征通道间的全局信息,并保留位置嵌入信息;然后,为了缓解稀疏奖励问题,设计一个基于图拉普拉斯正则的辅助损失,关注图像的局部平滑信息,对局部不平滑区域加以惩罚,从而促进像素层智能体更加有效地学习到正确的策略以实现边缘保持.实验结果表明,所提的算法在Middlebury2005数据集和MNIST数据集上的峰值信噪比(PSNR)分别达到32.97 dB和28.26 dB,相比于Pixel-RL算法分别提升了 0.23 dB和0.75 dB,参数量和训练总时间分别减少了 44.9%和18.2%,在实现边缘保持的同时有效降低了模型的复杂度.

Keyword :

协调注意力 协调注意力 图像复原 图像复原 图拉普拉斯 图拉普拉斯 深度强化学习 深度强化学习 边缘保持 边缘保持 逐像素强化学习 逐像素强化学习

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 江敏 , 陈飞 , 程航 et al. 基于逐像素强化学习的边缘保持图像复原 [J]. | 计算机工程 , 2024 , 50 (12) : 224-232 .
MLA 江敏 et al. "基于逐像素强化学习的边缘保持图像复原" . | 计算机工程 50 . 12 (2024) : 224-232 .
APA 江敏 , 陈飞 , 程航 , 王美清 . 基于逐像素强化学习的边缘保持图像复原 . | 计算机工程 , 2024 , 50 (12) , 224-232 .
Export to NoteExpress RIS BibTex

Version :

Intraoperative AI-assisted early prediction of parathyroid and ischemia alert in endoscopic thyroid surgery SCIE
期刊论文 | 2024 , 46 (8) , 1975-1987 | HEAD AND NECK-JOURNAL FOR THE SCIENCES AND SPECIALTIES OF THE HEAD AND NECK
Abstract&Keyword Cite Version(1)

Abstract :

Background: The preservation of parathyroid glands is crucial in endoscopic thyroid surgery to prevent hypocalcemia and related complications. However, current methods for identifying and protecting these glands have limitations. We propose a novel technique that has the potential to improve the safety and efficacy of endoscopic thyroid surgery. Purpose: Our study aims to develop a deep learning model called PTAIR 2.0 (Parathyroid gland Artificial Intelligence Recognition) to enhance parathyroid gland recognition during endoscopic thyroidectomy. We compare its performance against traditional surgeon-based identification methods. Materials and methods: Parathyroid tissues were annotated in 32 428 images extracted from 838 endoscopic thyroidectomy videos, forming the internal training cohort. An external validation cohort comprised 54 full-length videos. Six candidate algorithms were evaluated to select the optimal one. We assessed the model's performance in terms of initial recognition time, identification duration, and recognition rate and compared it with the performance of surgeons. Results: Utilizing the YOLOX algorithm, we developed PTAIR 2.0, which demonstrated superior performance with an AP50 score of 92.1%. The YOLOX algorithm achieved a frame rate of 25.14 Hz, meeting real-time requirements. In the internal training cohort, PTAIR 2.0 achieved AP50 values of 94.1%, 98.9%, and 92.1% for parathyroid gland early prediction, identification, and ischemia alert, respectively. Additionally, in the external validation cohort, PTAIR outperformed both junior and senior surgeons in identifying and tracking parathyroid glands (p < 0.001). Conclusion: The AI-driven PTAIR 2.0 model significantly outperforms both senior and junior surgeons in parathyroid gland identification and ischemia alert during endoscopic thyroid surgery, offering potential for enhanced surgical precision and patient outcomes.

Keyword :

artificial intelligence artificial intelligence computer vision model computer vision model deep learning deep learning endoscopic thyroid surgery endoscopic thyroid surgery ischemia alert ischemia alert parathyroid gland recognition parathyroid gland recognition

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Wang, Bo , Yu, Jia-Fan , Lin, Si-Ying et al. Intraoperative AI-assisted early prediction of parathyroid and ischemia alert in endoscopic thyroid surgery [J]. | HEAD AND NECK-JOURNAL FOR THE SCIENCES AND SPECIALTIES OF THE HEAD AND NECK , 2024 , 46 (8) : 1975-1987 .
MLA Wang, Bo et al. "Intraoperative AI-assisted early prediction of parathyroid and ischemia alert in endoscopic thyroid surgery" . | HEAD AND NECK-JOURNAL FOR THE SCIENCES AND SPECIALTIES OF THE HEAD AND NECK 46 . 8 (2024) : 1975-1987 .
APA Wang, Bo , Yu, Jia-Fan , Lin, Si-Ying , Li, Yi-Jian , Huang, Wen-Yu , Yan, Shou-Yi et al. Intraoperative AI-assisted early prediction of parathyroid and ischemia alert in endoscopic thyroid surgery . | HEAD AND NECK-JOURNAL FOR THE SCIENCES AND SPECIALTIES OF THE HEAD AND NECK , 2024 , 46 (8) , 1975-1987 .
Export to NoteExpress RIS BibTex

Version :

Manifold Graph Signal Restoration Using Gradient Graph Laplacian Regularizer SCIE
期刊论文 | 2024 , 72 , 744-761 | IEEE TRANSACTIONS ON SIGNAL PROCESSING
Abstract&Keyword Cite Version(2)

Abstract :

In the graph signal processing (GSP) literature, graph Laplacian regularizer (GLR) was used for signal restoration to promote piecewise smooth / constant reconstruction with respect to an underlying graph. However, for signals slowly varying across graph kernels, GLR suffers from an undesirable "staircase" effect. In this paper, focusing on manifold graphs-collections of uniform discrete samples on low-dimensional continuous manifolds-we generalize GLR to gradient graph Laplacian regularizer (GGLR) that promotes planar / piecewise planar (PWP) signal reconstruction. Specifically, for a graph endowed with sampling coordinates (e.g., 2D images, 3D point clouds), we first define a gradient operator, using which we construct a gradient graph for nodes' gradients in the sampling manifold space. This maps to a gradient-induced nodal graph (GNG) and a positive semi-definite (PSD) Laplacian matrix with planar signals as the 0 frequencies. For manifold graphs without explicit sampling coordinates, we propose a graph embedding method to obtain node coordinates via fast eigenvector computation. We derive the means-square-error minimizing weight parameter for GGLR efficiently, trading off bias and variance of the signal estimate. Experimental results show that GGLR outperformed previous graph signal priors like GLR and graph total variation (GTV) in a range of graph signal restoration tasks.

Keyword :

graph embedding graph embedding Graph signal processing Graph signal processing graph smoothness priors graph smoothness priors quadratic programming quadratic programming

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Chen, Fei , Cheung, Gene , Zhang, Xue . Manifold Graph Signal Restoration Using Gradient Graph Laplacian Regularizer [J]. | IEEE TRANSACTIONS ON SIGNAL PROCESSING , 2024 , 72 : 744-761 .
MLA Chen, Fei et al. "Manifold Graph Signal Restoration Using Gradient Graph Laplacian Regularizer" . | IEEE TRANSACTIONS ON SIGNAL PROCESSING 72 (2024) : 744-761 .
APA Chen, Fei , Cheung, Gene , Zhang, Xue . Manifold Graph Signal Restoration Using Gradient Graph Laplacian Regularizer . | IEEE TRANSACTIONS ON SIGNAL PROCESSING , 2024 , 72 , 744-761 .
Export to NoteExpress RIS BibTex

Version :

Manifold Graph Signal Restoration Using Gradient Graph Laplacian Regularizer EI
期刊论文 | 2024 , 72 , 744-761 | IEEE Transactions on Signal Processing
Manifold Graph Signal Restoration Using Gradient Graph Laplacian Regularizer Scopus
期刊论文 | 2024 , 72 , 744-761 | IEEE Transactions on Signal Processing
Lossless image steganography: Regard steganography as super-resolution SCIE SSCI
期刊论文 | 2024 , 61 (4) | INFORMATION PROCESSING & MANAGEMENT
Abstract&Keyword Cite Version(2)

Abstract :

Image steganography attempts to imperceptibly hide the secret image within the cover image. Most of the existing deep learning -based steganography approaches have excelled in payload capacity, visual quality, and steganographic security. However, they are difficult to losslessly reconstruct secret images from stego images with relatively large payload capacity. Recently, although some studies have introduced invertible neural networks (INNs) to achieve largecapacity image steganography, these methods still cannot reconstruct the secret image losslessly due to the existence of lost information on the output side of the concealing network. We present an INN -based framework in this paper for lossless image steganography. Specifically, we regard image steganography as an image super -resolution task that converts low -resolution cover images to high -resolution stego images while hiding secret images. The feature dimension of the generated stego image matches the total dimension of the input secret and cover images, thereby eliminating the lost information. Besides, a bijective secret projection module is designed to transform various secret images into a latent variable that follows a simple distribution, improving the imperceptibility of the secret image. Comprehensive experiments indicate that the proposed framework achieves secure hiding and lossless extraction of the secret image.

Keyword :

Covert communication Covert communication Information security Information security Invertible neural networks Invertible neural networks Lossless steganography Lossless steganography

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Wang, Tingqiang , Cheng, Hang , Liu, Ximeng et al. Lossless image steganography: Regard steganography as super-resolution [J]. | INFORMATION PROCESSING & MANAGEMENT , 2024 , 61 (4) .
MLA Wang, Tingqiang et al. "Lossless image steganography: Regard steganography as super-resolution" . | INFORMATION PROCESSING & MANAGEMENT 61 . 4 (2024) .
APA Wang, Tingqiang , Cheng, Hang , Liu, Ximeng , Xu, Yongliang , Chen, Fei , Wang, Meiqing et al. Lossless image steganography: Regard steganography as super-resolution . | INFORMATION PROCESSING & MANAGEMENT , 2024 , 61 (4) .
Export to NoteExpress RIS BibTex

Version :

Lossless image steganography: Regard steganography as super-resolution Scopus
期刊论文 | 2024 , 61 (4) | Information Processing and Management
Lossless image steganography: Regard steganography as super-resolution EI
期刊论文 | 2024 , 61 (4) | Information Processing and Management
Lightweight Privacy-Preserving Feature Extraction for EEG Signals Under Edge Computing SCIE
期刊论文 | 2024 , 11 (2) , 2520-2533 | IEEE INTERNET OF THINGS JOURNAL
WoS CC Cited Count: 2
Abstract&Keyword Cite Version(2)

Abstract :

The health-related Internet of Things (IoT) plays an irreplaceable role in the collection, analysis, and transmission of medical data. As a device of the health-related IoT, the electroencephalogram (EEG) has long been a powerful tool for physiological and clinical brain research, which contains a wealth of personal information. Due to its rich computational/storage resources, cloud computing is a promising solution to extract the sophisticated feature of massive EEG signals in the age of big data. However, it needs to solve both response latency and privacy leakage. To reduce latency between users and servers while ensuring data privacy, we propose a privacy-preserving feature extraction scheme, called LightPyFE, for EEG signals in the edge computing environment. In this scheme, we design an outsourced computing toolkit, which allows the users to achieve a series of secure integer and floating-point computing operations. During the implementation, LightPyFE can ensure that the users just perform the encryption and decryption operations, where all computing tasks are outsourced to edge servers for specific processing. Theoretical analysis and experimental results have demonstrated that our scheme can successfully achieve privacy-preserving feature extraction for EEG signals, and is practical yet effective.

Keyword :

Additive secret sharing Additive secret sharing edge computing edge computing electroencephalogram (EEG) signal electroencephalogram (EEG) signal Internet of Things (IoT) Internet of Things (IoT) privacy-preserving privacy-preserving

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Yan, Nazhao , Cheng, Hang , Liu, Ximeng et al. Lightweight Privacy-Preserving Feature Extraction for EEG Signals Under Edge Computing [J]. | IEEE INTERNET OF THINGS JOURNAL , 2024 , 11 (2) : 2520-2533 .
MLA Yan, Nazhao et al. "Lightweight Privacy-Preserving Feature Extraction for EEG Signals Under Edge Computing" . | IEEE INTERNET OF THINGS JOURNAL 11 . 2 (2024) : 2520-2533 .
APA Yan, Nazhao , Cheng, Hang , Liu, Ximeng , Chen, Fei , Wang, Meiqing . Lightweight Privacy-Preserving Feature Extraction for EEG Signals Under Edge Computing . | IEEE INTERNET OF THINGS JOURNAL , 2024 , 11 (2) , 2520-2533 .
Export to NoteExpress RIS BibTex

Version :

Lightweight Privacy-Preserving Feature Extraction for EEG Signals under Edge Computing EI
期刊论文 | 2024 , 11 (2) , 2520-2533 | IEEE Internet of Things Journal
Lightweight Privacy-Preserving Feature Extraction for EEG Signals under Edge Computing Scopus
期刊论文 | 2023 , 11 (2) , 1-1 | IEEE Internet of Things Journal
Vision-language pre-training via modal interaction SCIE
期刊论文 | 2024 , 156 | PATTERN RECOGNITION
Abstract&Keyword Cite Version(2)

Abstract :

Existing vision-language pre-training models typically extract region features and conduct fine-grained local alignment based on masked image/text completion or object detection methods. However, these models often design independent subtasks for different modalities, which may not adequately leverage interactions between modalities, requiring large datasets to achieve optimal performance. To address these limitations, this paper introduces a novel pre-training approach that facilitates fine-grained vision-language interaction. We propose two new subtasks - image filling and text filling - that utilize data from one modality to complete missing parts in another, enhancing the model's ability to integrate multi-modal information. A selector mechanism is also developed to minimize semantic overlap between modalities, thereby improving the efficiency and effectiveness of the pre-trained model. Our comprehensive experimental results demonstrate that our approach not only fosters better semantic associations among different modalities but also achieves state-of-the-art performance on downstream vision-language tasks with significantly smaller datasets.

Keyword :

Cross-modal Cross-modal Image captioning Image captioning Partial auxiliary Partial auxiliary Pre-training Pre-training

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Cheng, Hang , Ye, Hehui , Zhou, Xiaofei et al. Vision-language pre-training via modal interaction [J]. | PATTERN RECOGNITION , 2024 , 156 .
MLA Cheng, Hang et al. "Vision-language pre-training via modal interaction" . | PATTERN RECOGNITION 156 (2024) .
APA Cheng, Hang , Ye, Hehui , Zhou, Xiaofei , Liu, Ximeng , Chen, Fei , Wang, Meiqing . Vision-language pre-training via modal interaction . | PATTERN RECOGNITION , 2024 , 156 .
Export to NoteExpress RIS BibTex

Version :

Vision-language pre-training via modal interaction EI
期刊论文 | 2024 , 156 | Pattern Recognition
Vision-language pre-training via modal interaction Scopus
期刊论文 | 2024 , 156 | Pattern Recognition
10| 20| 50 per page
< Page ,Total 8 >

Export

Results:

Selected

to

Format:
Online/Total:178/10008955
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1