• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Li, F. (Li, F..) [1]

Indexed by:

Scopus

Abstract:

With the advantages of some current web text extraction algorithms, this paper puts forward a new method based on the combination of the regular expressions and density of page text, the method firstly uses the regular expressions to clear the html tags by the characteristics of the web page source code, and then extracts the main text of page with the distribution density of text. The algorithm is simple and efficient and the method proves to have higher accuracy for extraction after tests. © 2011 IEEE.

Keyword:

Regular expressions; Text density; Text extraction; Web page

Community:

  • [ 1 ] [Li, F.]Public Management School, Fuzhou University, Fuzhou, China

Reprint 's Address:

  • [Li, F.]Public Management School, Fuzhou University, Fuzhou, China

Show more details

Related Keywords:

Related Article:

Source :

Proceedings - 2011 4th International Conference on Information Management, Innovation Management and Industrial Engineering, ICIII 2011

Year: 2011

Volume: 1

Page: 287-290

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 0

Affiliated Colleges:

Online/Total:188/11206791
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1