• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Li, Fayun (Li, Fayun.) [1] (Scholars:李法运)

Indexed by:

EI Scopus

Abstract:

With the advantages of some current web text extraction algorithms, this paper puts forward a new method based on the combination of the regular expressions and density of page text, the method firstly uses the regular expressions to clear the html tags by the characteristics of the web page source code, and then extracts the main text of page with the distribution density of text. The algorithm is simple and efficient and the method proves to have higher accuracy for extraction after tests. © 2011 IEEE.

Keyword:

Extraction Information management Pattern matching Websites

Community:

  • [ 1 ] [Li, Fayun]Public Management School, Fuzhou University, Fuzhou, China

Reprint 's Address:

  • 李法运

Show more details

Version:

Related Keywords:

Related Article:

Source :

Year: 2011

Volume: 1

Page: 287-290

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 0

Online/Total:89/11180585
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1