A web text extraction method based on regular expressions and text density - Details

初始密码提示：姓名拼音首字母【第一个汉字的首字母大写，其他首字母小写】+身份证号（或护照）后六位【包含字母的，字母大写】

手机验证码登录找回密码

author：

Li, F. (Li, F..) ^[1]

Indexed by：

Scopus

Abstract：

With　the　advantages　of　some　current　web　text　extraction　algorithms,　this　paper　puts　forward　a　new　method　based　on　the　combination　of　the　regular　expressions　and　density　of　page　text,　the　method　firstly　uses　the　regular　expressions　to　clear　the　html　tags　by　the　characteristics　of　the　web　page　source　code,　and　then　extracts　the　main　text　of　page　with　the　distribution　density　of　text.　The　algorithm　is　simple　and　efficient　and　the　method　proves　to　have　higher　accuracy　for　extraction　after　tests.　©　2011　IEEE.

Keyword：

Regular expressions; Text density; Text extraction; Web page

Community：

[ 1 ] [Li, F.]Public Management School, Fuzhou University, Fuzhou, China

Reprint 's Address：

[Li, F.]Public Management School, Fuzhou University, Fuzhou, China

Email：

fayunli2002@yahoo.com.cn

Show more details

Related Keywords：

Source ：

Proceedings - 2011 4th International Conference on Information Management, Innovation Management and Industrial Engineering, ICIII 2011

Year： 2011

Volume： 1

Page： 287-290

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to