Indexed by:
Abstract:
Web pages contain a large amount of valuable information and resources, meanwhile may update at any time. However, the current Web-data extraction algorithms are generally targeted at specific web page structure. When web pages update, the problem which is caused by the changes of web pages may be encountered, leading to the inability to extract web page information or wrong information. In order to solve this problem, this paper proposes a new method to extract the feature values of each area in the web page through page rendering, and then combine the DOM tree structure of the page, semantic similarity and other information, so that it can still extract the target data correctly after the structure of the web page changes. © 2019 IEEE.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
Year: 2019
Page: 1524-1525
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 5
Affiliated Colleges: