Jeff Duntemann's Contrapositive Diary Rotating Header Image

EPub and Word Processors

Well. Got your heart medicine handy? Jeff is considering a Mac. Well, not exactly. (Put down that nitroglycerine.) I’m strongly considering getting an iPad. And I’ll bet you didn’t know that I already have an iPod, thanks to Jim Strickland, who may in fact persuade me to get a Mac someday. I worry about some of Apple’s cultural issues (like not providing clear guidelines on what you can sell in their stores and what you can’t, and changing your &!$#*% mind about it every other week) but their engineering is extremely good. I spent some quality time with an iPad at a recent Enclave Meetup, and basically, I’m sold. Those guys pretty much nailed the ebook experience, or at very least came up with the best possible compromise between fixed-page and reflowable presentation that anyone might strike. And I want my books out there in the iBooks marketplace.

This means that I need to be able to create EPub files, and good ones. What boggles me is the scarcity of visual tools for that purpose. Among the mainline desktop publishing apps, only InDesign CS4 and CS5 can export finished EPub files, and some people think the feature itself isn’t finished yet. (I don’t have either version so I can’t do my own testing–and at $700 for the app, I don’t expect to get it.) Some odd comments I’ve seen online suggest that the Scribus developers don’t think that reflowable document export is a suitable task for a fixed-layout desktop pubber, and that they’re not going to do it. There are lots of converter programs for taking various types of files and turning them into EPubs. As best I can tell, most people code their EPubs up manually, as though they were writing a C++ program. Gakkh. But also as best I can tell, affordable WYSIWYG EPub editors begin and end with Sigil.

The format itself is not a skullcracker. You’ve got one or more XHTML files expressing content (plus image files, if present), one or more CSS files defining styles, and one or more XML files describing document structure and metadata, all placed in a container file that’s not much more than a .zip with a different extension. There’s an optional DRM layer in the spec, but it’s technology-agnostic and not much used. The spec is simple enough so that people write the damned things by hand. I can’t imagine that parsing and generating the XML/XHTML/CSS would strain any sort of editor.

My point here is that you don’t need a fixed-layout desktop publishing program like InDesign or Quark to create and maintain EPub ebooks. In a sense, EPub is a modern XML-based word processor file spec, and even a middling WYSIWYG word processor could be twisted a little bit to read, render, edit, and write EPub files that could be loaded right into iBooks without further processing.

Sigil comes close. I’m using it and I’m reasonably impressed, considering that the team is basically writing a brand-new word processor from scratch. What boggles me is that it’s the only WYSIWYG EPub editor in the universe. And as a word processor, well, it’s pretty spare.

There’s no reason for this. Existing word processing apps like OpenOffice Writer and AbiWord could easily be extended to import and export EPub files, or forked to create a ramcharged ebook development system using EPub as its primary file format. Fork or not, I’m convinced of this: All word processors will eventually become ebook editors. The ebook market is closing in on reality. We now have the file format we need. The software will follow.

But sheesh guys, how about picking up the pace a little!


  1. Carrington Dixon says:

    Have you seen what C. J. Cherryh and Jane Francher as doing at ?? They are using Calibre to convert word processor files to EPUB and the other popular ebook formats. This sounds clunky at first, but remember that they are supporting a half-dozen or so file formats. Even if their word processor of choice could create EPUB directly, they would still need convert to the other formats.

    And even if the wp would write all the necessary formats, there has to be one ‘master’ file. .doc, .html, .rtf, .epub, .mobi, … ???

    1. I should have been more explicit about that: I think we’ve got a winner in the ebook file format wars, and EPub is it. EPub was designed from scratch to be a reflowable ebook format, so there are no compromises associated with the format, and since ordinary WP documents don’t need everything that an ebook needs, an EPub-based word processor could easily export a data subset to common filetypes like .doc, .rtf, and so on. (Don’t know the .mobi format well enough to speak to that yet.) The EPub file would be the master file, everything else comes out of just another data meatgrinder, and we know how to do those.

      Maintaining both fixed-layout and reflowable files from the same file in the same app is another issue, and a far more difficult one, but sooner or later it’s going to happen. There are special higher-level issues that I’ve thought about, like how to reference a passage in a reflowable ebook that does not have unchanging page numbers, and I will have to come back to that at some point.

  2. Jeff, the first hit is always free. 😉

    Also, I have indesign cs4. Next time you’re up visiting, we can poke at its epub generation a little. Having built epubs by hand (well, by hand, with word processor macros, a makefile, and various bits of checking software) I wasn’t especially impressed, but as an easy workflow “Save indesign file as an epub” has big advantages of the way I do it.


    1. I should add that the vast majority of my scripting is cleaning up after apple’s rtf -> html conversion, and the greatest number of headaches in the process have been glitches in this part. If Sigil can export to xhtml cleanly, that alone would make it worthwhile in the tool chain.


      1. You mentioned that the other day, and we got distracted by other things. My view is that we need to eliminate the whole notion of a tool chain for something as simple as a single EPub file. (There’s some value in allowing batch processing of more than one file at a time.) There’s nothing that should stop one (not so complex) editor from loading, rendering, and writing perfect EPub files. XML and XHTML are rigorous systems, really intended for machine parsing and generation. They’re a lot less “sloppy” than RTF, .doc, and most other formats. In a way it’s easier to deal with XHTML than HTML, because the ambiguities of the now-ancient HTML markup system have been designed out.

        I’m going to do up a few free stories in Sigil to give it a good thorough test as time allows.

  3. Erbo says:

    That packaging system for EPub files sounds a hell of a lot like the one for OpenDocument files. It sounds like this needs to go on the “feature requests” list for Writer Real Soon Now. OpenOffice can already export to PDF, exporting to EPub should be, comparatively, a walk in the park.

  4. Aki says:

    “Apple’s Worst Security Breach: 114,000 iPad Owners Exposed”

  5. […] be, and creating ebooks in the EPub format is particularly–and inexplicably–hard. In my June 9, 2010 entry, I spoke about the EPub format itself, and how it’s not a great deal different from a […]

Leave a Reply

Your email address will not be published. Required fields are marked *