• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Zhang, Wende (Zhang, Wende.) [1] (Scholars:张文德)

Indexed by:

SSCI Scopus

Abstract:

Purpose - The purpose of this paper is to develop a system that can convert PDF files to XML files. Design/methodology/approach - The system works with XML as an information display model and XSLT as an information extraction rule. The process is illustrated by converting a scientific and technological paper in PDF to a valid XML file. Findings - Because the PDF file adopts the self-descriptive definition, its content information and the display information exists in different objects; therefore, it is not easy to directly extract information from the PDF source file. The undirected way to solve this problem in the system design was to convert the PDF source file to a relatively easy processing intermediate format, which can then be automatically converted to the target file in accordance with relevant rules. Originality/value - It is important to be able to easily and conveniently extract information from PDF files and this paper shows how it can be done. The design ideas contained in the paper can also be applied to information extraction from other types of files.

Keyword:

extensible markup language information exchange portable document format

Community:

  • [ 1 ] Fuzhou Univ Lib, Fuzhou, Fujian, Peoples R China

Reprint 's Address:

  • 张文德

    [Zhang, Wende]Fuzhou Univ Lib, Fuzhou, Fujian, Peoples R China

Show more details

Version:

Related Keywords:

Related Article:

Source :

ELECTRONIC LIBRARY

ISSN: 0264-0473

Year: 2008

Issue: 1

Volume: 26

Page: 68-74

0 . 3 9 3

JCR@2008

1 . 5 0 0

JCR@2023

ESI Discipline: SOCIAL SCIENCES, GENERAL;

JCR Journal Grade:3

Cited Count:

WoS CC Cited Count: 4

SCOPUS Cited Count: 4

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 0

Online/Total:121/10022664
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1