Adaptively extracting structured data from web pages - Details

author：

Guo, Yingnan (Guo, Yingnan.) ^[1] | Zhang, Jiajun (Zhang, Jiajun.) ^[2] | Chen, Xing (Chen, Xing.) ^[3] (Scholars：陈星)

Indexed by：

EI Scopus

Abstract：

Web　pages　contain　a　large　amount　of　valuable　information　and　resources,　meanwhile　may　update　at　any　time.　However,　the　current　Web-data　extraction　algorithms　are　generally　targeted　at　specific　web　page　structure.　When　web　pages　update,　the　problem　which　is　caused　by　the　changes　of　web　pages　may　be　encountered,　leading　to　the　inability　to　extract　web　page　information　or　wrong　information.　In　order　to　solve　this　problem,　this　paper　proposes　a　new　method　to　extract　the　feature　values　of　each　area　in　the　web　page　through　page　rendering,　and　then　combine　the　DOM　tree　structure　of　the　page,　semantic　similarity　and　other　information,　so　that　it　can　still　extract　the　target　data　correctly　after　the　structure　of　the　web　page　changes.　©　2019　IEEE.

Keyword：

Big data Cloud computing Data mining Semantics Social networking (online) Trees (mathematics) Websites

Community：

[ 1 ] [Guo, Yingnan]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
[ 2 ] [Guo, Yingnan]Fujian Key Laboratory of Network Computing, Intelligent Information Processing, Fuzhou, China
[ 3 ] [Zhang, Jiajun]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
[ 4 ] [Zhang, Jiajun]Fujian Key Laboratory of Network Computing, Intelligent Information Processing, Fuzhou, China
[ 5 ] [Chen, Xing]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
[ 6 ] [Chen, Xing]Fujian Key Laboratory of Network Computing, Intelligent Information Processing, Fuzhou, China

Reprint 's Address：

Email：

Show more details

Version：

Adaptively extracting structured data from web pages
2019，SocialCom 2019

Related Keywords：

Mixed word embedding method based on knowledge graph augment for text classification
2019，17th IEEE International Conference on Parallel and Distributed Processing with Applications, 9th IEEE International Conference on Big Data and Cloud Computing, 9th IEEE International Conference on Sustainable Computing and Communications, 12th IEEE International Conference on Social Computing and Networking, ISPA/BDCloud/SustainCom/SocialCom 2019
Automatic text summarization based on transformer and switchable normalization
2019，17th IEEE International Conference on Parallel and Distributed Processing with Applications, 9th IEEE International Conference on Big Data and Cloud Computing, 9th IEEE International Conference on Sustainable Computing and Communications, 12th IEEE International Conference on Social Computing and Networking, ISPA/BDCloud/SustainCom/SocialCom 2019
Sequence data enhancement method based on knowledge graph
2019，17th IEEE International Conference on Parallel and Distributed Processing with Applications, 9th IEEE International Conference on Big Data and Cloud Computing, 9th IEEE International Conference on Sustainable Computing and Communications, 12th IEEE International Conference on Social Computing and Networking, ISPA/BDCloud/SustainCom/SocialCom 2019
Identification and prediction of key nucleotide sites using machine learning in bioinformatics: A brief overview
2019，17th IEEE International Conference on Parallel and Distributed Processing with Applications, 9th IEEE International Conference on Big Data and Cloud Computing, 9th IEEE International Conference on Sustainable Computing and Communications, 12th IEEE International Conference on Social Computing and Networking, ISPA/BDCloud/SustainCom/SocialCom 2019

Source ：

Year： 2019

Page： 1524-1525

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 24

Affiliated Colleges：

数学与统计学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to