Indexed by:
Abstract:
The multi-object counting in visual question answering (VQA) is still a challenging problem. Existing VQA models mainly adopt object detection network to extract image features and combine soft attention mechanism to further increase the model accuracy. However, repeated counting of the same object may occur when the object detection network extracts image features. In addition, the sum of attention weights of all objects calculated by soft attention mechanism is 1, which leads to the constant quantity information of objects being 1. We propose a new counting attention mechanism based on classification confidence. The main idea is to calculate the initial attention with sigmoid function and similarity with the object location generated by object detection network; we introduce classification confidence to calculate a more accurate similarity and solve the problem that the quantity information under existing soft attention mechanism is always 1. The experiment compares the proposed counting attention mechanism with the baseline model and the related work under the VQA v2 dataset. The results show that the counting attention mechanism improves the counting accuracy by 6.4% compared with the baseline model and surpasses most VQA models. © 2019 IEEE.
Keyword:
Reprint 's Address:
Email:
Source :
SocialCom 2019
Year: 2019
Page: 1173-1179
Language: English
Cited Count:
SCOPUS Cited Count: 3
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: