• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Chen, Q.-L. (Chen, Q.-L..) [1] | Liao, X.-W. (Liao, X.-W..) [2] (Scholars:廖祥文) | Wei, J.-J. (Wei, J.-J..) [3] | Chen, G.-L. (Chen, G.-L..) [4] (Scholars:陈国龙)

Indexed by:

Scopus PKU CSCD

Abstract:

The existing multirecord webpage extraction methods usually make overall longitudinal analyses of the document object model (DOM) tree. The computional structural similarity is always low, and therefore record regions can not be identified correctly. Different from the previous work, a method named data record extraction based on DOM tree hierarchical feature (DEBHF) is proposed to make transverse analyses of the DOM tree by distinguishing different roles of nodes at different levels. Thus, the problem of searching similar sub-trees is converted into the problem of searching similar sub-blocks in data blocks. Finally, the two-way search for non-overlapped and repeated sub-blocks is adopted to segment the record regions. Experimental results show that the proposed approach can deal with webpages which can not be obtained by the existing methods and the extraction results of different data sources demonstrate its effectiveness. ©, 2015, Journal of Pattern Recognition and Artificial Intelligence. All right reserved.

Keyword:

Extraction algorithm; Information extraction; Multirecord webpage

Community:

  • [ 1 ] [Chen, Q.-L.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350108, China
  • [ 2 ] [Liao, X.-W.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350108, China
  • [ 3 ] [Wei, J.-J.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350108, China
  • [ 4 ] [Chen, G.-L.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350108, China

Reprint 's Address:

  • 廖祥文

    [Liao, X.-W.]College of Mathematics and Computer Science, Fuzhou UniversityChina

Email:

Show more details

Related Keywords:

Related Article:

Source :

Pattern Recognition and Artificial Intelligence

ISSN: 1003-6059

CN: 34-1089/TP

Year: 2015

Issue: 2

Volume: 28

Page: 125-131

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 0

Online/Total:5/10106813
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1