Welcome Guest (Log in)

Epub Opener

StarStarStarStarStar
Array (4) | Regex (3) | RevZip (3) | Xml (3) | HtmlText (2)
General
Expander triangle
  • Author
  • Rating
  • Type
  • Revision
  • Downloaded
  • Updated
  • James Hale
  • StarStarStarStarStar
  • Stack
  • 8
  • 1313 Times
  • 11 August 2018
ePub Opener V1.04 - use of rev zip xml and a little regex
1.04: adjust a "repeat" construct for stricter parsing introduced in LC8
1.03: version check to work with LC 7's new textdecode function
1.02: updates to tidy comments and to account for flat epub archives- i.e. no "images" folder

The purpose of this stack is to provide a way to upack an ePub publication and transfer the contents to Livecode arrays.
Image files are instead transfered to an "Images" folder for access by Livecode.
The script assumes an ePub adhering to the ePub 2 standard (all I have been interested in comply, more or less to this.)
To get a publication into this standard the open source Calibre (http://calibre-ebook.com) can be used.

At present the script will:
a) Unpack the ePub (really a zip file) - use of the revzip library.
b) Check its contents according to ePub 2 - use of the xml library.
c) Extract any major meta data and place it in an array, aMD - use of the xml library.
d) Read in the the OPF and extract the manifest and populate an array, aMan - use of the xml and revzip library.
e) Read the table of contents (navigation or .ncx file in conjunction with the "spine" items of the OPF) into an array, aNav - use of the xml and revzip library.
f) Transfer text contents to a content array, aContent and images to a local folder.
g) Clean up the HTML or XHMTL to remove non body sections, css styles and an few other mods to not cause too much issue to Livecodes htmltext function - use of regex.
h) Process each content entry to extract or make extant the anhor points for the table of contents - use of regex.
i) Convert the XHTML or HTML to Livecode's styled text representation.
j) Update the navigation array to ensure any anchored hyperlinks point to the appropriate line in the corresponding content.
k) Remove now surplus extant anchor points from the content.

You are then left with three arrays; aMD which has the meta data for the publication, aNav and aContent that contain the text as well as the navigation of the ePub.

Currently embedded hyperlinks such as an embedded table of contents, endnotes, lists of illustrations or other internal linking are not processed.

The code is liberally commented so hopefully you will be able to follow it easily.
It also describes the structure of a typical ePub 2 package.
For now it is licensed under an: Creative Commons Attribution-ShareAlike 4.0 International (http://creativecommons.org/licenses/by-sa/4.0/).

The first card of this stack demonstrates display of a clickable simple table of contents to access the content of any ePub you load.

Please enjoy and let me know if you find any improvements or are able to use it in you own apps.

James Hale

Ajax Loader
Tags
Expander triangle
User Comments
Expander triangle