• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索
High Impact Results & Cited Count Trend for Year Keyword Cloud and Partner Relationship

Query:

学者姓名:黄世震

Refining:

Source

Submit Unfold

Co-

Submit Unfold

Language

Submit

Clean All

Sort by:
Default
  • Default
  • Title
  • Year
  • WOS Cited Count
  • Impact factor
  • Ascending
  • Descending
< Page ,Total 18 >
SH-GAT: Software-hardware co-design for accelerating graph attention networks on FPGA SCIE
期刊论文 | 2024 , 32 (4) , 2310-2322 | ELECTRONIC RESEARCH ARCHIVE
Abstract&Keyword Cite

Abstract :

Graph convolution networks (GCN) have demonstrated success in learning graph structures; however, they are limited in inductive tasks. Graph attention networks (GAT) were proposed to address the limitations of GCN and have shown high performance in graph -based tasks. Despite this success, GAT faces challenges in hardware acceleration, including: 1) The GAT algorithm has difficulty adapting to hardware; 2) challenges in efficiently implementing Sparse matrix multiplication (SPMM); and 3) complex addressing and pipeline stall issues due to irregular memory accesses. To this end, this paper proposed SH-GAT, an FPGA-based GAT accelerator that achieves more efficient GAT inference. The proposed approach employed several optimizations to enhance GAT performance. First, this work optimized the GAT algorithm using split weights and softmax approximation to make it more hardware -friendly. Second, a load -balanced SPMM kernel was designed to fully leverage potential parallelism and improve data throughput. Lastly, data preprocessing was performed by pre -fetching the source node and its neighbor nodes, effectively addressing pipeline stall and complexly addressing issues arising from irregular memory access. SH-GAT was evaluated on the Xilinx FPGA Alveo U280 accelerator card with three popular datasets. Compared to existing CPU, GPU, and state-of-the-art (SOTA) FPGA-based accelerators, SH-GAT can achieve speedup by up to 3283x, 13x, and 2.3x.

Keyword :

accelerator accelerator co-design co-design FPGA FPGA graph graph graph attention networks graph attention networks

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Wang, Renping , Li, Shun , Tang, Enhao et al. SH-GAT: Software-hardware co-design for accelerating graph attention networks on FPGA [J]. | ELECTRONIC RESEARCH ARCHIVE , 2024 , 32 (4) : 2310-2322 .
MLA Wang, Renping et al. "SH-GAT: Software-hardware co-design for accelerating graph attention networks on FPGA" . | ELECTRONIC RESEARCH ARCHIVE 32 . 4 (2024) : 2310-2322 .
APA Wang, Renping , Li, Shun , Tang, Enhao , Lan, Sen , Liu, Yajing , Yang, Jing et al. SH-GAT: Software-hardware co-design for accelerating graph attention networks on FPGA . | ELECTRONIC RESEARCH ARCHIVE , 2024 , 32 (4) , 2310-2322 .
Export to NoteExpress RIS BibTex

Version :

Multi-source transfer learning with Graph Neural Network for excellent modelling the bioactivities of ligands targeting orphan G protein-coupled receptors SCIE
期刊论文 | 2023 , 20 (2) , 2588-2608 | MATHEMATICAL BIOSCIENCES AND ENGINEERING
Abstract&Keyword Cite

Abstract :

G protein-coupled receptors (GPCRs) have been the targets for more than 40% of the currently approved drugs. Although neural networks can effectively improve the accuracy of prediction with the biological activity, the result is undesirable in the limited orphan GPCRs (oGPCRs) datasets. To this end, we proposed Multi-source Transfer Learning with Graph Neural Network, called MSTL-GNN, to bridge this gap. Firstly, there are three ideal sources of data for transfer learning, oGPCRs, experimentally validated GPCRs, and invalidated GPCRs similar to the former one. Secondly, the SIMLEs format GPCRs convert to graphics, and they can be the input of Graph Neural Network (GNN) and ensemble learning for improving prediction accuracy. Finally, our experiments show that MSTL-GNN remarkably improves the prediction of GPCRs ligand activity value compared with previous studies. On average, the two evaluation indexes we adopted, R2 and Root-mean-square deviation (RMSE). Compared with the state-of-the-art work MSTL-GNN increased up to 67.13% and 17.22%, respectively. The effectiveness of MSTL-GNN in the field of GPCR Drug discovery with limited data also paves the way for other similar application scenarios.

Keyword :

biological activity biological activity G protein-coupled receptors (GPCRs) G protein-coupled receptors (GPCRs) Graph Neural Network Graph Neural Network multi-source transfer learning multi-source transfer learning

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Huang, Shizhen , Zheng, ShaoDong , Chen, Ruiqi . Multi-source transfer learning with Graph Neural Network for excellent modelling the bioactivities of ligands targeting orphan G protein-coupled receptors [J]. | MATHEMATICAL BIOSCIENCES AND ENGINEERING , 2023 , 20 (2) : 2588-2608 .
MLA Huang, Shizhen et al. "Multi-source transfer learning with Graph Neural Network for excellent modelling the bioactivities of ligands targeting orphan G protein-coupled receptors" . | MATHEMATICAL BIOSCIENCES AND ENGINEERING 20 . 2 (2023) : 2588-2608 .
APA Huang, Shizhen , Zheng, ShaoDong , Chen, Ruiqi . Multi-source transfer learning with Graph Neural Network for excellent modelling the bioactivities of ligands targeting orphan G protein-coupled receptors . | MATHEMATICAL BIOSCIENCES AND ENGINEERING , 2023 , 20 (2) , 2588-2608 .
Export to NoteExpress RIS BibTex

Version :

H-GAT: A Hardware-Efficient Accelerator For Graph Attention Networks ESCI
期刊论文 | 2023 , 27 (3) , 2233-2240 | JOURNAL OF APPLIED SCIENCE AND ENGINEERING
Abstract&Keyword Cite

Abstract :

Recently, Graph Attention Networks (GATs) have shown good performance for representation learning on graphs. Furthermore, GAT leverage the masked self-attention mechanism to get a more advanced feature representation than the graph convolution networks (GCNs). However, GAT incurs large amounts of irregularity in computation and memory access, which prevents the efficient use of traditional neural network accelerators. Moreover, existing dedicated GAT accelerators demand high memory volumes and are difficult to implement onto resource-limited edge devices. Due to this, this paper proposes an FPGA-based accelerator, called H-GAT, which achieves excellent performance on acceleration and energy efficiency in GAT inference. HGAT decomposes GAT operation into matrix multiplication and activation function unit. We first design an effective and fully-pipelined PE for sparse matrix multiplication (SpMM) and dense matrix-vector multiplication (DMVM). Moreover, we optimize the softmax data flow so that the computational efficiency of softmax can be improved dramatically. We evaluate our design on Xilinx Kintex-7 FPGA with three popular datasets. Compared to existing CPU, GPU, and state-of-the-art FPGA-based GAT accelerator, H-GAT can achieve speedup by up to 585x, 2.7x, and 11x and increases power efficiency by up to 2095x, 173x, and 65x, respectively.

Keyword :

FPGA FPGA Graph neural network Graph neural network sparse -matrix -vector sparse -matrix -vector

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Huang, Shizhen , Tang, Enhao , Li, Shun . H-GAT: A Hardware-Efficient Accelerator For Graph Attention Networks [J]. | JOURNAL OF APPLIED SCIENCE AND ENGINEERING , 2023 , 27 (3) : 2233-2240 .
MLA Huang, Shizhen et al. "H-GAT: A Hardware-Efficient Accelerator For Graph Attention Networks" . | JOURNAL OF APPLIED SCIENCE AND ENGINEERING 27 . 3 (2023) : 2233-2240 .
APA Huang, Shizhen , Tang, Enhao , Li, Shun . H-GAT: A Hardware-Efficient Accelerator For Graph Attention Networks . | JOURNAL OF APPLIED SCIENCE AND ENGINEERING , 2023 , 27 (3) , 2233-2240 .
Export to NoteExpress RIS BibTex

Version :

Semantic segmentation of remote sensing images based on attention mechanism and feature fusion Scopus
其他 | 2023 , 12645
Abstract&Keyword Cite

Abstract :

In this article, we design this algorithm to perform the segmentation task of remote sensing images by using attention module and feature fusion to deal with the challenges of features with different dimensions, small-size objects and uncertain boundary segmentation in remote sensing images. The encoder uses a PVTv2 model to increase the feature information acquisition capability of the model, and several convolutional paths of the feature aggregation module can acquire different size information of the image. The deep network uses attention mechanisms to obtain features that are valid for the segmentation task. In the decoding phase, the feature fusion module fuses the shallow network detail information with the deep network semantic information. The experimental comparison on the ISPRS Potsdam dataset shows that pixel accuracy of the model in this paper achieves 88.59%, IoU, mIoU, and FWIoU are higher than some classical semantic segmentation networks, which effectively increases the accuracy of remote sensing image segmentation. © 2023 SPIE.

Keyword :

attention mechanism attention mechanism feature fusion feature fusion remote sensing images remote sensing images semantic segmentation semantic segmentation

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Pan, J. , Huang, S. . Semantic segmentation of remote sensing images based on attention mechanism and feature fusion [未知].
MLA Pan, J. et al. "Semantic segmentation of remote sensing images based on attention mechanism and feature fusion" [未知].
APA Pan, J. , Huang, S. . Semantic segmentation of remote sensing images based on attention mechanism and feature fusion [未知].
Export to NoteExpress RIS BibTex

Version :

Real-time high definition license plate localization and recognition accelerator for IoT endpoint system on chip EI
期刊论文 | 2022 , 25 (1) , 1-11 | Journal of Applied Science and Engineering (Taiwan)
Abstract&Keyword Cite

Abstract :

Automatic License Plate Recognition (ALPR) systems have become popular application areas of the Internet of Things (IoT). A typical ALPR system always needs powerful processors such as Cortex-A7. However, most known system for Standard Definition (SD) are not suitable for real-time High Definition (HD) image processing and low power consuming requirement in IoT. A HD ALPR accelerator for the IoT endpoint System on Chip (SoC) is proposed in this paper to meet the needs of computations. Based on the programming flexibility of IoT endpoint SoC, it can switch between HD and SD resolutions, which can avoid the specific resolution switching algorithm. A Field Programmable Gate Array (FPGA) chip is transplanted the Cortex-M0 as the IoT endpoint SoC, through the design of ALPR accelerator and Cortex-M0, data communication is achieved by First-In, First-Out (FIFO) with AMBA High-performance Bus (AHB) interface. Heterogeneous implementation of ALPR system has shown that this HD ALPR algorithm can recognize a license plate in 12.5ms, with a success rate of 95.5%. The system utilizes 41,763 Look-Up-Tables (LUTs) without special FPGA IP core. The comparison shows that the system proposed in this paper makes performance of the SoC based on the Cortex-M0 kernel was two times higher than the Cortex-A72 SoC and 39% of the power consumption of Zynq-7000 that is typical heterogeneous ALPR platform. © The Author('s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Keyword :

Application specific integrated circuits Application specific integrated circuits Automatic vehicle identification Automatic vehicle identification Digital television Digital television Field programmable gate arrays (FPGA) Field programmable gate arrays (FPGA) Image processing Image processing Internet of things Internet of things License plates (automobile) License plates (automobile) Optical character recognition Optical character recognition Programmable logic controllers Programmable logic controllers System-on-chip System-on-chip Table lookup Table lookup

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Huang, Shizhen , Lin, Mengru , Yu, Fan et al. Real-time high definition license plate localization and recognition accelerator for IoT endpoint system on chip [J]. | Journal of Applied Science and Engineering (Taiwan) , 2022 , 25 (1) : 1-11 .
MLA Huang, Shizhen et al. "Real-time high definition license plate localization and recognition accelerator for IoT endpoint system on chip" . | Journal of Applied Science and Engineering (Taiwan) 25 . 1 (2022) : 1-11 .
APA Huang, Shizhen , Lin, Mengru , Yu, Fan , Chen, Ruiqi , Zhang, Lei , Zhu, Yanxiang . Real-time high definition license plate localization and recognition accelerator for IoT endpoint system on chip . | Journal of Applied Science and Engineering (Taiwan) , 2022 , 25 (1) , 1-11 .
Export to NoteExpress RIS BibTex

Version :

Implementation of quasi-Newton algorithm on FPGA for IoT endpoint devices EI
期刊论文 | 2022 , 17 (2) , 124-134 | International Journal of Security and Networks
Abstract&Keyword Cite

Abstract :

With the recent developments in the internet of things (IoT), there has been a significant rapid generation of data. Theoretically, machine learning can help edge devices by providing a better analysis and processing of data near the data source. However, solving the nonlinear optimisation problem is time-consuming for IoT edge devices. A standard method for solving the nonlinear optimisation problems in machine learning models is the Broyden-Fletcher-Goldfarb-Shanno (BFGS-QN) method. Since the field-programmable gate arrays (FPGAs) are customisable, reconfigurable, highly parallel and cost-effective, the present study envisaged the implementation of the BFGS-QN algorithm on an FPGA platform. The use of half-precision floating-point numbers and single-precision floating-point numbers to save the FPGA resources were adopted to implement the BFGS-QN algorithm on an FPGA platform. The results indicate that compared to the single-precision floating-point numbers, the implementation of the mixed-precision BFGS-QN algorithm reduced 27.1% look-up tables, 18.2% flip-flops and 17.9% distributed random memory. Copyright © 2022 Inderscience Enterprises Ltd.

Keyword :

Cost effectiveness Cost effectiveness Digital arithmetic Digital arithmetic Edge computing Edge computing Field programmable gate arrays (FPGA) Field programmable gate arrays (FPGA) Flip flop circuits Flip flop circuits Fluorine compounds Fluorine compounds Internet of things Internet of things Learning systems Learning systems Logic gates Logic gates Machine learning Machine learning Nonlinear programming Nonlinear programming Table lookup Table lookup

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Huang, Shizhen , Guo, Anhua , Su, Kaikai et al. Implementation of quasi-Newton algorithm on FPGA for IoT endpoint devices [J]. | International Journal of Security and Networks , 2022 , 17 (2) : 124-134 .
MLA Huang, Shizhen et al. "Implementation of quasi-Newton algorithm on FPGA for IoT endpoint devices" . | International Journal of Security and Networks 17 . 2 (2022) : 124-134 .
APA Huang, Shizhen , Guo, Anhua , Su, Kaikai , Chen, Siyu , Chen, Ruiqi . Implementation of quasi-Newton algorithm on FPGA for IoT endpoint devices . | International Journal of Security and Networks , 2022 , 17 (2) , 124-134 .
Export to NoteExpress RIS BibTex

Version :

Biological Activity Prediction of GPCR-targeting Ligands on Heterogeneous FPGA-based Accelerators CPCI-S
期刊论文 | 2022 , 237-237 | 2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022)
Abstract&Keyword Cite

Abstract :

In the drug discovery process, the biological activity value (BAV) of G Protein-Coupled Receptors (GPCRs) targeting ligands is a large consideration. Past BAV prediction on CPU consumes tremendous time and power, yet there is rarely any related acceleration research. Therefore, this paper proposes a series of heterogeneous FPGA-based accelerators for well-performing algorithms to predict GPCRs ligands BAV. Communication delay is reduced by compressing the sparse matrix and directly coupling accelerators on the system BUS. Computation is accelerated by the remapping during the weight storage. Experimental results show that our FPGA accelerator implemented on Xilinx XCZU7EV performs 54:5x faster than CPU and 35:2x more energy-efficient than GPU.

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Chen, Ruiqi , Ma, Yuhanxiao , Zheng, Shaodong et al. Biological Activity Prediction of GPCR-targeting Ligands on Heterogeneous FPGA-based Accelerators [J]. | 2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022) , 2022 : 237-237 .
MLA Chen, Ruiqi et al. "Biological Activity Prediction of GPCR-targeting Ligands on Heterogeneous FPGA-based Accelerators" . | 2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022) (2022) : 237-237 .
APA Chen, Ruiqi , Ma, Yuhanxiao , Zheng, Shaodong , Huang, Shizhen , Chen, Chao , Yu, Jun et al. Biological Activity Prediction of GPCR-targeting Ligands on Heterogeneous FPGA-based Accelerators . | 2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022) , 2022 , 237-237 .
Export to NoteExpress RIS BibTex

Version :

Hardware-friendly compression and hardware acceleration for transformer: A survey SCIE
期刊论文 | 2022 , 30 (10) , 3755-3785 | ELECTRONIC RESEARCH ARCHIVE
WoS CC Cited Count: 1
Abstract&Keyword Cite

Abstract :

The transformer model has recently been a milestone in artificial intelligence. The algorithm has enhanced the performance of tasks such as Machine Translation and Computer Vision to a level previously unattainable. However, the transformer model has a strong performance but also requires a high amount of memory overhead and enormous computing power. This significantly hinders the deployment of an energy-efficient transformer system. Due to the high parallelism, low latency, and low power consumption of field-programmable gate arrays (FPGAs) and application specific integrated circuits (ASICs), they demonstrate higher energy efficiency than Graphics Processing Units (GPUs) and Central Processing Units (CPUs). Therefore, FPGA and ASIC are widely used to accelerate deep learning algorithms. Several papers have addressed the issue of deploying the Transformer on dedicated hardware for acceleration, but there is a lack of comprehensive studies in this area. Therefore, we summarize the transformer model compression algorithm based on the hardware accelerator and its implementation to provide a comprehensive overview of this research domain. This paper first introduces the transformer model framework and computation process. Secondly, a discussion of hardware-friendly compression algorithms based on self-attention and Transformer is provided, along with a review of a state-of-the-art hardware accelerator framework. Finally, we considered some promising topics in transformer hardware acceleration, such as a high-level design framework and selecting the optimum device using reinforcement learning.

Keyword :

compression compression FPGA FPGA hardware accelerators hardware accelerators self-attention self-attention transformer transformer

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Huang, Shizhen , Tang, Enhao , Li, Shun et al. Hardware-friendly compression and hardware acceleration for transformer: A survey [J]. | ELECTRONIC RESEARCH ARCHIVE , 2022 , 30 (10) : 3755-3785 .
MLA Huang, Shizhen et al. "Hardware-friendly compression and hardware acceleration for transformer: A survey" . | ELECTRONIC RESEARCH ARCHIVE 30 . 10 (2022) : 3755-3785 .
APA Huang, Shizhen , Tang, Enhao , Li, Shun , Ping, Xiangzhan , Chen, Ruiqi . Hardware-friendly compression and hardware acceleration for transformer: A survey . | ELECTRONIC RESEARCH ARCHIVE , 2022 , 30 (10) , 3755-3785 .
Export to NoteExpress RIS BibTex

Version :

基于ARM和深度学习的智能行人预警系统
期刊论文 | 2021 , 40 (12) , 60-64 | 信息技术与网络安全
Abstract&Keyword Cite

Abstract :

针对行人交通安全问题,开发行人检测系统以提醒行人和司机危险的发生。对目标检测神经网络模型进行分析和对比实验,选取以darknet为网络框架的YOLO-fastest模型进行改进优化并采用分类并标签的实时交通数据进行训练,最终将训练模型部署至开发板完成实时性检测并能够根据车辆速度反馈给行人危险信号。实验结果表明YOLO-fastest模型的平均检测精度为96.1%,检测速度为33 f/s,模型大小为1.2 MB,既满足检测精度又满足检测速度的要求,能够完成对真实交通场景下的实时性检测。

Keyword :

YOLO-fastest算法 YOLO-fastest算法 目标检测 目标检测 神经网络 神经网络 行人安全 行人安全

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 刘佳丽 , 黄世震 , 何恩德 . 基于ARM和深度学习的智能行人预警系统 [J]. | 信息技术与网络安全 , 2021 , 40 (12) : 60-64 .
MLA 刘佳丽 et al. "基于ARM和深度学习的智能行人预警系统" . | 信息技术与网络安全 40 . 12 (2021) : 60-64 .
APA 刘佳丽 , 黄世震 , 何恩德 . 基于ARM和深度学习的智能行人预警系统 . | 信息技术与网络安全 , 2021 , 40 (12) , 60-64 .
Export to NoteExpress RIS BibTex

Version :

基于多特征神经网络的便携式地沟油快速检测仪 PKU
期刊论文 | 2021 , 37 (05) , 47-52,232 | 食品与机械
Abstract&Keyword Cite

Abstract :

基于可编程片上系统(SoPC)的异构架构,设计了便携式地沟油快速检测仪。以现场可编程逻辑门阵列(FPGA)通过移植Cortex-M0软核,完成SoPC平台的构建;使用FPGA硬件资源,通过串并通行流水线设计思想对运算次序进行优化,设计了参数校准加速器和神经网络加速器并接入总线系统。验证测试表明:该装置可以高速准确地完成地沟油定性分析,并能区分油的种类,相较于传统的同类终端地沟油检测设备,检测时间缩短了89%;同时,引入加速器后的SoPC的数据处理速度超过Cortex-A9两倍。

Keyword :

可编程片上系统 可编程片上系统 地沟油 地沟油 异构架构 异构架构 检测 检测 现场可编程逻辑门阵列 现场可编程逻辑门阵列 神经网络 神经网络

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 张志鹏 , 黄世震 , 林梦如 et al. 基于多特征神经网络的便携式地沟油快速检测仪 [J]. | 食品与机械 , 2021 , 37 (05) : 47-52,232 .
MLA 张志鹏 et al. "基于多特征神经网络的便携式地沟油快速检测仪" . | 食品与机械 37 . 05 (2021) : 47-52,232 .
APA 张志鹏 , 黄世震 , 林梦如 , 陆清茹 , 林彦 , 陈睿祺 . 基于多特征神经网络的便携式地沟油快速检测仪 . | 食品与机械 , 2021 , 37 (05) , 47-52,232 .
Export to NoteExpress RIS BibTex

Version :

10| 20| 50 per page
< Page ,Total 18 >

Export

Results:

Selected

to

Format:
Online/Total:652/7275622
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1