Converting PDF files to XML files - Details

author：

Zhang, W. (Zhang, W..) ^[1]

Indexed by：

Scopus

Abstract：

Purpose　-　The　purpose　of　this　paper　is　to　develop　a　system　that　can　convert　PDF　files　to　XML　files.　Design/methodology/approach　-　The　system　works　with　XML　as　an　information　display　model　and　XSLT　as　an　information　extraction　rule.　The　process　is　illustrated　by　converting　a　scientific　and　technological　paper　in　PDF　to　a　valid　XML　file.　Findings　-　Because　the　PDF　file　adopts　the　self-descriptive　definition,　its　content　information　and　the　display　information　exists　in　different　objects;　therefore,　it　is　not　easy　to　directly　extract　information　from　the　PDF　source　file.　The　undirected　way　to　solve　this　problem　in　the　system　design　was　to　convert　the　PDF　source　file　to　a　relatively　easy　processing　intermediate　format,　which　can　then　be　automatically　converted　to　the　target　file　in　accordance　with　relevant　rules.　Originality/value　-　It　is　important　to　be　able　to　easily　and　conveniently　extract　information　from　PDF　files　and　this　paper　shows　how　it　can　be　done.　The　design　ideas　contained　in　the　paper　can　also　be　applied　to　information　extraction　from　other　types　of　files.

Keyword：

Extensible Markup Language; Information exchange; Portable document format

Community：

[ 1 ] [Zhang, W.]Fuzhou University Library, Fuzhou, Fujian, China

Reprint 's Address：

[Zhang, W.]Fuzhou University Library, Fuzhou, Fujian, China

Email：

zhangwd@fzu.edu.cn

Show more details

Related Keywords：

Source ：

Electronic Library

ISSN： 0264-0473

Year： 2008

Issue： 1

Volume： 26

Page： 68-74

0 . 3 9 3

JCR@2008

1 . 5 0 0

JCR@2023

JCR Journal Grade：3

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to