Graph-OPU: A Highly Integrated FPGA-Based Overlay Processor for Graph Neural Networks - Details

author：

Indexed by：

Abstract：

Field-programmable　gate　array　(FPGA)　is　an　ideal　candidate　for　accelerating　graph　neural　networks　(GNNs).　However,　FPGA　reconfiguration　is　a　time-consuming　process　when　updating　or　switching　between　diverse　GNN　models　across　different　applications.　This　paper　proposes　a　highly　integrated　FPGA-based　overlay　processor　for　GNN　accelerations.　Graph-OPU　provides　excellent　flexibility　and　software-like　programmability　for　GNN　end-users,　as　the　executable　code　of　GNN　models　are　automatically　compiled　and　reloaded　without　requiring　FPGA　reconfiguration.　First,　we　customize　the　instruction　sets　for　the　inference　qprocess　of　different　GNN　models.　Second,　we　propose　a　microarchitecture　ensuring　a　fully-pipelined　process　for　GNN　inference.　Third,　we　design　a　unified　matrix　multiplication　to　process　sparse-dense　matrix　multiplication　and　general　matrix　multiplication　to　increase　the　Graph-OPU　performance.　Finally,　we　implement　a　hardware　prototype　on　the　Xilinx　Alveo　U50　and　test　the　mainstream　GNN　models　using　various　datasets.　Graph-OPU　takes　an　average　of　only　2　minutes　to　switch　between　different　GNN　models,　exhibiting　average　128×　speedup　compared　to　related　works.　In　addition,　Graph-OPU　outperforms　state-of-the-art　end-to-end　overlay　accelerators　for　GNN,　reducing　latency　by　an　average　of　1.36×　and　improving　energy　efficiency　by　an　average　of　1.41×.　Moreover,　Graph-OPU　achieves　up　to　1654×　and　63×　speedup,　as　well　as　up　to　5305×　and　422×　energy　efficiency　boosts,　compared　to　implementations　on　CPU　and　GPU,　respectively.　To　the　best　of　our　knowledge,　Graph-OPU　represents　the　first　in-depth　study　of　an　FPGA-based　overlay　processor　for　GNNs,　offering　high　flexibility,　speedup,　and　energy　efficiency.　©　2023　IEEE.

Keyword：

Energy efficiency Field programmable gate arrays (FPGA) Graph neural networks Matrix algebra Neural network models Reconfigurable hardware

Community：

[ 1 ] [Chen, Ruiqi]Fudan University, State Key Lab of Asic & System, Shanghai, China
[ 2 ] [Zhang, Haoyang]Fudan University, State Key Lab of Asic & System, Shanghai, China
[ 3 ] [Li, Shun]College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
[ 4 ] [Tang, Enhao]College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
[ 5 ] [Yu, Jun]Fudan University, State Key Lab of Asic & System, Shanghai, China
[ 6 ] [Wang, Kun]Fudan University, State Key Lab of Asic & System, Shanghai, China

Reprint 's Address：

Email：

Show more details

Related Keywords：

H-GAT: A Hardware-Efficient Accelerator for Graph Attention Networks
2023，Journal of Applied Science and Engineering (Taiwan)
Research on Key Technologies for Intelligent Detection of High-Speed Railway Pantograph System Status Based on Deep learning
2024，2024 International Conference on Computer Vision and Deep Learning, CVDL 2024
Design of high parallel CNN accelerator based on FPGA for AIoT
2022，Journal of China Universities of Posts and Telecommunications
IF filter design and implementation of FPGA
2009，1st International Conference on Information Science and Engineering, ICISE2009
Implementation of DTTB SRRC filter based on FPGA
2010，

Source ：

Year： 2023

Page： 228-234

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 8

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to