Query:
学者姓名:兰诚栋
Refining:
Year
Type
Indexed by
Source
Complex
Co-
Language
Clean All
Abstract :
The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints during compression and delivery, and ultimately reconstruct them using Side Information (SInfo). Typically, depth maps are used to construct SInfo. However, these methods suffer from reconstruction inaccuracies and inherently high bitrates. In this paper, we propose a novel multi-view video coding method that leverages the image generation capabilities of Generative Adversarial Network (GAN) to improve the reconstruction accuracy of SInfo. Additionally, we consider incorporating information from adjacent temporal and spatial viewpoints to further reduce SInfo redundancy. At the encoder, we construct a spatio-temporal Epipolar Plane Image (EPI) and further utilize a convolutional network to extract the latent code of a GAN as SInfo. At the decoder, we combine the SInfo and adjacent viewpoints to reconstruct intermediate views using the GAN generator. Specifically, we establish a joint encoder constraint for reconstruction cost and SInfo entropy to achieve an optimal trade-off between reconstruction quality and bitrate overhead. Experiments demonstrate the significant improvement in Rate-Distortion (RD) performance compared to state-of-the-art methods.
Keyword :
Epipolar plane image Epipolar plane image Generative adversarial network Generative adversarial network Latent code learning Latent code learning Multi-view video coding Multi-view video coding
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Lan, Chengdong , Yan, Hao , Luo, Cheng et al. GAN-based multi-view video coding with spatio-temporal EPI reconstruction [J]. | SIGNAL PROCESSING-IMAGE COMMUNICATION , 2025 , 132 . |
MLA | Lan, Chengdong et al. "GAN-based multi-view video coding with spatio-temporal EPI reconstruction" . | SIGNAL PROCESSING-IMAGE COMMUNICATION 132 (2025) . |
APA | Lan, Chengdong , Yan, Hao , Luo, Cheng , Zhao, Tiesong . GAN-based multi-view video coding with spatio-temporal EPI reconstruction . | SIGNAL PROCESSING-IMAGE COMMUNICATION , 2025 , 132 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Learning-based point cloud compression has achieved great success in Rate-Distortion (RD) efficiency. Existing methods usually utilize Variational AutoEncoder (VAE) network, which might lead to poor detail reconstruction and high computational complexity. To address these issues, we propose a Scale-adaptive Asymmetric Sparse Variational AutoEncoder (SAS-VAE) in this work. First, we develop an Asymmetric Multiscale Sparse Convolution (AMSC), which exploits multi-resolution branches to aggregate multiscale features at encoder, and excludes symmetric feature fusion branches to control the model complexity at decoder. Second, we design a Scale Adaptive Feature Refinement Structure (SAFRS) to adaptively adjust the number of Feature Refinement Modules (FRMs), thereby improving RD performance with an acceptable computational overhead. Third, we implement our framework with AMSC and SAFRS, and train it with an RD loss based on Fine-grained Weighted Binary Cross-Entropy (FWBCE) function. Experimental results on 8iVFB, Owlii, and MVUV datasets show that our method outperforms several popular methods, with a 90.0% time reduction and a 51.8% BD-BR saving compared with V-PCC. The code will be available soon at https://github.com/fancj2017/SAS-VAE.
Keyword :
asymmetric multiscale sparse convolution asymmetric multiscale sparse convolution Convolution Convolution Decoding Decoding Feature extraction Feature extraction Octrees Octrees Point cloud compression Point cloud compression Rate-distortion Rate-distortion scale adaptive feature refinement structure scale adaptive feature refinement structure Three-dimensional displays Three-dimensional displays variational autoencoder variational autoencoder
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Chen, Jian , Zhu, Yingtao , Huang, Wei et al. Scale-Adaptive Asymmetric Sparse Variational AutoEncoder for Point Cloud Compression [J]. | IEEE TRANSACTIONS ON BROADCASTING , 2024 , 70 (3) : 884-894 . |
MLA | Chen, Jian et al. "Scale-Adaptive Asymmetric Sparse Variational AutoEncoder for Point Cloud Compression" . | IEEE TRANSACTIONS ON BROADCASTING 70 . 3 (2024) : 884-894 . |
APA | Chen, Jian , Zhu, Yingtao , Huang, Wei , Lan, Chengdong , Zhao, Tiesong . Scale-Adaptive Asymmetric Sparse Variational AutoEncoder for Point Cloud Compression . | IEEE TRANSACTIONS ON BROADCASTING , 2024 , 70 (3) , 884-894 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Convolutional neural networks are constrained in adaptively capturing information due to the use of fixed-size kernels. Although they provide a wide receptive field and achieve competitive performance with fewer parameters by using decomposed large kernels, they lack adaptability. Therefore, we propose the dynamic large kernel network (DLKN) for lightweight image super-resolution. Specifically, we design a basic convolutional block of feature aggregation groups, akin to the transformer architecture. It comprises a dynamic large kernel attention block and a local feature enhancement block that can adaptively utilize information. In our dynamic large kernel attention block, we decompose the large kernel convolution into kernels with different sizes and expansion rates. We then fuse their information for weight selection, dynamically adjusting the proportion of information from different receptive fields. The local feature enhancement block significantly improves local feature extraction with low parameter counts. It encourages interactions between local spatial features by decomposing the convolution into horizontally and vertically cascading kernels. Experimental results on benchmark datasets demonstrate that our proposed model achieves excellent performance in lightweight and performance-oriented super-resolution tasks. It successfully balances the relationship between performance and model complexity. The code is available at https://github.com/LyTinGiu/DLKN_SR.
Keyword :
CNN CNN Image processing Image processing Large kernel convolution Large kernel convolution Super-resolution Super-resolution
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Liu, YaTing , Lan, ChengDong , Feng, Wanjian . DLKN: enhanced lightweight image super-resolution with dynamic large kernel network [J]. | VISUAL COMPUTER , 2024 , 41 (5) : 3627-3644 . |
MLA | Liu, YaTing et al. "DLKN: enhanced lightweight image super-resolution with dynamic large kernel network" . | VISUAL COMPUTER 41 . 5 (2024) : 3627-3644 . |
APA | Liu, YaTing , Lan, ChengDong , Feng, Wanjian . DLKN: enhanced lightweight image super-resolution with dynamic large kernel network . | VISUAL COMPUTER , 2024 , 41 (5) , 3627-3644 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this letter, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
Keyword :
compression artifact compression artifact Perceivable Encoding Artifacts (PEAs) Perceivable Encoding Artifacts (PEAs) saliency detection saliency detection Video quality assessment Video quality assessment
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Lin, Liqun , Zheng, Yang , Chen, Weiling et al. Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment [J]. | IEEE SIGNAL PROCESSING LETTERS , 2023 , 30 : 693-697 . |
MLA | Lin, Liqun et al. "Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment" . | IEEE SIGNAL PROCESSING LETTERS 30 (2023) : 693-697 . |
APA | Lin, Liqun , Zheng, Yang , Chen, Weiling , Lan, Chengdong , Zhao, Tiesong . Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment . | IEEE SIGNAL PROCESSING LETTERS , 2023 , 30 , 693-697 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Feature pyramid network (FPN) is a typical detector commonly for solving the issue of object detection at different scales. However, the lateral connections in FPN lead to the loss of feature information due to the reduction of feature channels. Moreover, the top-down feature fusion will weaken the feature representation in the process of feature delivery because of features with different semantic information. In this paper, we propose a feature pyramid network with channel and content adaptive feature enhancement module (CCA-FPN), which uses a channel adaptive guided mechanism module (CAGM) and multi-scale content adaptive feature enhancement module (MCAFEM) to alleviate these problems. We conduct comprehensive experiments on the MS COCO dataset. By replacing FPN with CCA-FPN in ATSS, our models achieve 1.3 percentage points higher Average Precision (AP) when using ResNet50 as backbone. Furthermore, our CCA-FPN achieves 0.3 percentage points higher than the AugFPN which is the state-of-the-art FPN-based detector.
Keyword :
Channel and content adaptive Channel and content adaptive Feature enhancement module Feature enhancement module Feature pyramid network Feature pyramid network Object detection Object detection
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Ye, Zhiyang , Lan, Chengdong , Zou, Min et al. CCA-FPN: Channel and content adaptive object detection [J]. | JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION , 2023 , 95 . |
MLA | Ye, Zhiyang et al. "CCA-FPN: Channel and content adaptive object detection" . | JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION 95 (2023) . |
APA | Ye, Zhiyang , Lan, Chengdong , Zou, Min , Qiu, Xu , Chen, Jian . CCA-FPN: Channel and content adaptive object detection . | JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION , 2023 , 95 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Panoramic video multimedia technology has made significant advancements in recent years, providing users with an immersive experience by displaying the entire 360 degrees spherical scene centered around their virtual location. However, due to its larger data volume compared to traditional video formats, transmitting high-quality videos requires more bandwidth. It is important to note that users do not see the whole 360 degrees content simultaneously, but only a portion that is within their viewport. To save bandwidth, viewport-based adaptive streaming has become a significant technology that transmits only the viewports of interest to the user in high quality. Therefore, the accuracy of viewport prediction plays a crucial role. However, the performance of viewport prediction is affected by the size of the prediction window, which decreases significantly as the window size increases. In order to address this issue, we propose an effective self-attention viewport prediction model based on distance constraint in this paper. Firstly, by analyzing the existing viewport trajectory dataset, we find the randomness and continuity of the viewport trajectory. Secondly, to solve the randomness problem, we design a viewport prediction model based on a self-attention mechanism to provide more trajectory information for long inputs. Thirdly, in order to ensure the continuity of the predicted viewport trajectory, the loss function is modified with the distance constraint to reduce the change in the continuity of prediction results. Finally, the experimental results based on the real viewport trajectory datasets show that the algorithm we propose has higher prediction accuracy and stability compared with the advanced models.
Keyword :
Distance constraints Distance constraints Panoramic video Panoramic video Self-attention Self-attention Viewport prediction Viewport prediction
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Lan, ChengDong , Qiu, Xu , Miao, Chenqi et al. A self-attention model for viewport prediction based on distance constraint [J]. | VISUAL COMPUTER , 2023 , 40 (9) : 5997-6014 . |
MLA | Lan, ChengDong et al. "A self-attention model for viewport prediction based on distance constraint" . | VISUAL COMPUTER 40 . 9 (2023) : 5997-6014 . |
APA | Lan, ChengDong , Qiu, Xu , Miao, Chenqi , Zheng, MengTing . A self-attention model for viewport prediction based on distance constraint . | VISUAL COMPUTER , 2023 , 40 (9) , 5997-6014 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
Currently, an effective stream adaptation method for stereo panoramic video transmission is missing. However, the traditional panoramic video adaptive streaming strategy for transmitting binocular stereo panoramic video suffers from the problem of doubling the transmission data and requiring huge bandwidth. A multi-agent reinforcement learning based stereo panoramic video asymmetric transmission adaptive streaming method is proposed in this paper to cope with the limited bandwidth and fluctuation of network bandwidth in real time. First, due to the human eye's preference for the saliency regions of video, each tile in the left and right viewpoints of stereoscopic video contributes differently to the perceptual quality, and a tiles-based method for predicting the watching probability of left and right viewpoint is proposed. Second, a multi-agent reinforcement learning framework based on policy-value (Actor-Critic) is designed for joint rate control of left and right viewpoints. Finally, a reasonable reward function is designed based on the model structure and the principle of binocular suppression. The experimental results show that the proposed method is more suitable for tiles-based stereo panoramic video transmission than the traditional self-adaptive stream transmission strategy. A novel approach is proposed for stereo panoramic video joint rate control and user Quality of Experience (QoE) improvement under limited bandwidth. © 2022, Science Press. All right reserved.
Keyword :
Bandwidth Bandwidth Image communication systems Image communication systems Multi agent systems Multi agent systems Quality control Quality control Quality of service Quality of service Reinforcement learning Reinforcement learning Stereo image processing Stereo image processing Video streaming Video streaming
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Lan, Chengdong , Rao, Yingjie , Song, Caixia et al. Adaptive Streaming of Stereoscopic Panoramic Video Based on Reinforcement Learning [J]. | Journal of Electronics and Information Technology , 2022 , 44 (4) : 1461-1468 . |
MLA | Lan, Chengdong et al. "Adaptive Streaming of Stereoscopic Panoramic Video Based on Reinforcement Learning" . | Journal of Electronics and Information Technology 44 . 4 (2022) : 1461-1468 . |
APA | Lan, Chengdong , Rao, Yingjie , Song, Caixia , Chen, Jian . Adaptive Streaming of Stereoscopic Panoramic Video Based on Reinforcement Learning . | Journal of Electronics and Information Technology , 2022 , 44 (4) , 1461-1468 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
为了减少所需采集的视频数据量,基于图像绘制(Image-based rendering,IBR)的前沿方法将稠密视点信息映射成压缩感知框架中的原始信号,并将稀疏视点图像作为随机测量值,但低维测量信号由所有稠密视点信息线性组合而成,而稀疏视点图像仅仅来源于部分视点信息,导致稀疏视点采集的图像与低维测量信号不一致.本文提出利用间隔采样矩阵消除测量信号与稀疏视点图像位置之间的差异,进而通过约束由测量矩阵和基函数构成的传感矩阵尽量满足有限等距性,使得能够获得原始信号的唯一精确解.仿真实验结果表明,相比于前沿方法,本文提出的方法对于不同复杂程度的场景重建都提高了主客观质量.
Keyword :
压缩感知 压缩感知 基于图像的绘制 基于图像的绘制 多视点图像重建 多视点图像重建 极平面图像 极平面图像
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | 兰诚栋 , 林宇鹏 , 方大锐 et al. 多视点稀疏测量的图像绘制方法 [J]. | 自动化学报 , 2021 , 47 (4) : 882-890 . |
MLA | 兰诚栋 et al. "多视点稀疏测量的图像绘制方法" . | 自动化学报 47 . 4 (2021) : 882-890 . |
APA | 兰诚栋 , 林宇鹏 , 方大锐 , 陈建 . 多视点稀疏测量的图像绘制方法 . | 自动化学报 , 2021 , 47 (4) , 882-890 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
为了减少所需采集的视频数据量,基于图像绘制(Image-based rendering, IBR)的前沿方法将稠密视点信息映射成压缩感知框架中的原始信号,并将稀疏视点图像作为随机测量值,但低维测量信号由所有稠密视点信息线性组合而成,而稀疏视点图像仅仅来源于部分视点信息,导致稀疏视点采集的图像与低维测量信号不一致.本文提出利用间隔采样矩阵消除测量信号与稀疏视点图像位置之间的差异,进而通过约束由测量矩阵和基函数构成的传感矩阵尽量满足有限等距性,使得能够获得原始信号的唯一精确解.仿真实验结果表明,相比于前沿方法,本文提出的方法对于不同复杂程度的场景重建都提高了主客观质量.
Keyword :
压缩感知 压缩感知 基于图像的绘制 基于图像的绘制 多视点图像重建 多视点图像重建 极平面图像 极平面图像
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | 兰诚栋 , 林宇鹏 , 方大锐 et al. 多视点稀疏测量的图像绘制方法 [J]. | 自动化学报 , 2021 , 47 (04) : 882-890 . |
MLA | 兰诚栋 et al. "多视点稀疏测量的图像绘制方法" . | 自动化学报 47 . 04 (2021) : 882-890 . |
APA | 兰诚栋 , 林宇鹏 , 方大锐 , 陈建 . 多视点稀疏测量的图像绘制方法 . | 自动化学报 , 2021 , 47 (04) , 882-890 . |
Export to | NoteExpress RIS BibTex |
Version :
Abstract :
There is a huge amount of data in multi-view video which brings enormous challenges to the compression, storage, and transmission of video data. Transmitting part of the viewpoint information is a prior solution to reconstruct the original multi-viewpoint information. They are all based on pixel matching to obtain the correlation between adjacent viewpoint images. However, pixels cannot express the invariability of image features and are susceptible to noise. Therefore, in order to overcome the above problems, the VGG network is used to extract the high-dimensional features between the images, indicating the relevance of the adjacent images. The GAN is further used to more accurately generate virtual viewpoint images. We extract the lines at the same positions of the viewpoints as local areas for image merging and input the local images into the network. In the reconstruction viewpoint, we generate a local image of a dense viewpoint through the GAN network. Experiments on multiple test sequences show that the proposed method has a 0.2-0.8-dB PSNR and 0.15-0.61 MOS improvement over the traditional method.
Keyword :
EPI EPI Hybrid resolution Hybrid resolution Multi-view video Multi-view video SRGAN SRGAN Virtual view reconstruction Virtual view reconstruction
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Li, Song , Lan, Chengdong , Zhao, Tiesong . Reconstruction of Multi-view Video Based on GAN [C] . 2018 : 618-629 . |
MLA | Li, Song et al. "Reconstruction of Multi-view Video Based on GAN" . (2018) : 618-629 . |
APA | Li, Song , Lan, Chengdong , Zhao, Tiesong . Reconstruction of Multi-view Video Based on GAN . (2018) : 618-629 . |
Export to | NoteExpress RIS BibTex |
Version :
Export
Results: |
Selected to |
Format: |