• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

Inventor:

陈星 (陈星.) [1] (Scholars:陈星) | 张佳俊 (张佳俊.) [2] | 王一洲 (王一洲.) [3]

Indexed by:

incoPat

Abstract:

本发明公开了一种网页结构化数据的信息提取方法,首先对网页代码进行预处理,去除噪音信息,根据网页布局标签作为节点,通过布局标签的嵌套关系和层次关系,构造其DOM树,并存入List,通过判断分支是否相同对DOM树进行剪枝,形成DOM重构树;然后通过节点路径对节点进行标记,并对两个网页对应的DOM重构树进行对比,确定目标对象所在的特征路径,并产生相应的包装器,实现自动抽取。本发明能自动快速地处理大量WEB内容,提取到正确信息。

Keyword:

Reprint 's Address:

Email:

Show more details

Related Keywords:

Related Article:

Patent Info :

Type: 发明授权

Patent No.: CN201710605031.3

Filing Date: 2017/7/24

Publication Date: 2020/11/3

Pub. No.: CN107423391B

公开国别: CN

Applicants: 福州大学

Legal Status: 授权

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Online/Total:107/10025892
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1