Jeff Duntemann's Contrapositive Diary Rotating Header Image

ebooks

EPub and Word Processors

Well. Got your heart medicine handy? Jeff is considering a Mac. Well, not exactly. (Put down that nitroglycerine.) I’m strongly considering getting an iPad. And I’ll bet you didn’t know that I already have an iPod, thanks to Jim Strickland, who may in fact persuade me to get a Mac someday. I worry about some of Apple’s cultural issues (like not providing clear guidelines on what you can sell in their stores and what you can’t, and changing your &!$#*% mind about it every other week) but their engineering is extremely good. I spent some quality time with an iPad at a recent Enclave Meetup, and basically, I’m sold. Those guys pretty much nailed the ebook experience, or at very least came up with the best possible compromise between fixed-page and reflowable presentation that anyone might strike. And I want my books out there in the iBooks marketplace.

This means that I need to be able to create EPub files, and good ones. What boggles me is the scarcity of visual tools for that purpose. Among the mainline desktop publishing apps, only InDesign CS4 and CS5 can export finished EPub files, and some people think the feature itself isn’t finished yet. (I don’t have either version so I can’t do my own testing–and at $700 for the app, I don’t expect to get it.) Some odd comments I’ve seen online suggest that the Scribus developers don’t think that reflowable document export is a suitable task for a fixed-layout desktop pubber, and that they’re not going to do it. There are lots of converter programs for taking various types of files and turning them into EPubs. As best I can tell, most people code their EPubs up manually, as though they were writing a C++ program. Gakkh. But also as best I can tell, affordable WYSIWYG EPub editors begin and end with Sigil.

The format itself is not a skullcracker. You’ve got one or more XHTML files expressing content (plus image files, if present), one or more CSS files defining styles, and one or more XML files describing document structure and metadata, all placed in a container file that’s not much more than a .zip with a different extension. There’s an optional DRM layer in the spec, but it’s technology-agnostic and not much used. The spec is simple enough so that people write the damned things by hand. I can’t imagine that parsing and generating the XML/XHTML/CSS would strain any sort of editor.

My point here is that you don’t need a fixed-layout desktop publishing program like InDesign or Quark to create and maintain EPub ebooks. In a sense, EPub is a modern XML-based word processor file spec, and even a middling WYSIWYG word processor could be twisted a little bit to read, render, edit, and write EPub files that could be loaded right into iBooks without further processing.

Sigil comes close. I’m using it and I’m reasonably impressed, considering that the team is basically writing a brand-new word processor from scratch. What boggles me is that it’s the only WYSIWYG EPub editor in the universe. And as a word processor, well, it’s pretty spare.

There’s no reason for this. Existing word processing apps like OpenOffice Writer and AbiWord could easily be extended to import and export EPub files, or forked to create a ramcharged ebook development system using EPub as its primary file format. Fork or not, I’m convinced of this: All word processors will eventually become ebook editors. The ebook market is closing in on reality. We now have the file format we need. The software will follow.

But sheesh guys, how about picking up the pace a little!

Odd Lots

  • I’m not very good at one-liners. So, in my contrarian fashion, I will present an Odd Lots composed entirely of…two-liners.
  • Technical material (textbooks, manuals, computer books) rendered on an ebook reader? Now you’re talking.
  • As someone fond of both astronomy (especially telescopes) and Star Wars, I consider this a wonderful building hack.
  • Harrison Bergeron was evidently a Canadian kid soccer player. (Thanks to Bob Trembley for the link.)
  • What’s your favorite app for extracting text from PDFs? Any experience with ABBYY’s PDF Transformer?
  • And if you’re going the other way, slow but sure pays off: PDFCreator has finally reached version 1.0, after only seven years.
  • Sigil is the only WYSIWIYG editor for EPUB-format ebooks. Why? When will we start editing ebooks and stop coding them?
  • One of my cousins once had a sandbox in an enormous worn-out tractor tire. Now somebody’s recycled such a tire into a bike.

Odd Lots

CBZ Files as Image Archives

Last fall, I gathered a stack of Alma-Tadema‘s paintings from my pre-1923 images folder, wrapped them up into a ZIP file, and sent them to a friend who was looking for a copyright-free color cover for a novel. Some weeks ago, I learned that the CBZ (Comic Book Zip) file format is nothing more than a ZIP file with a different extension. I downloaded and installed a free CBZ reader called Comical. After changing the extension on the Alma-Tadema archive to .cbz, I double-clicked on it, and boom! There it was, beautifully presented and trivially easy to click through. And if you change the extension back to .zip, you can de-archive the images in the usual fashion using any ZIP-capable archiver. It’s all in the extension; no changes to the binary archive need to be made.

Not being a comics guy, I’d never heard of the CBZ format, though it’s been around since 2004. It’s basically an ebook reader protocol (since it is, after all, simply an ordinary ZIP archive) that opens a .zip file and displays the files in alpha order by filename. If the files are displayable as images, the reader displays them. If the files are not displayable as images, a well-behaved reader will ignore them. (Comical, one of the simplest free readers, sometimes crashes when it encounters a non-image binary.) If you need an indicia page, some readers will display text if it’s in an .nfo file. The .nfo will appear in a separate text window on opening the file, rather than in the page display area.

I’ve tested four free CBZ readers: ComicRack and Comical under Windows, and QComicBook and Comix under Linux. All but ComicRack are open-source. ComicRack is overkill in a lot of ways, though it works very well. (It requires the .NET framework, if that’s significant to you.) Comical is much simpler, and my only gripes are that it doesn’t display .nfo files, and it crashes when it finds certain kinds of non-displayable files in a .cbz archive. QComicBook is a Qt4/KDE app, and the one I find myself using under Linux. Comix (a Python app) works well but is not as capable as QComicBook. (Feature-wise, it’s on a par with Comical.) Others exist. Okular will open CBZ files without complaint, but it simply scrolls vertically through the images without attempting to show one per click.

Most of the comic book readers also read CBR and CBT files, which are RAR and TAR archives, respectively, and work almost exactly the same way. (I haven’t tested those formats.)

The CBZ system works best when all the images in the archive are the same dimensions and aspect ratios. I’m putting together some photo albums for showing the folks back home that are collections of digital photographs in one (big) .cbz file. The bigness is mostly unavoidable, since JPG files don’t compress very well. Still, it makes file management simpler

Here are some sample CBZ archives that I put together for testing: Alma-Tadema (14 MB). Hi-Flier Kite Catalog 1977 (6 MB). The “Elf” Space-Charge Receiver (1.7 MB).

Odd Lots

Odd Lots

  • Here’s a great article from NASA on the unexpected success it’s had with the WISE (Wide-field Infrared Survey Explorer) spacecraft in spotting previously unknown asteroids in the infrared spectrum. WISE is detecting hundreds of new asteroids every day, which is unnerving, since a rock no bigger than a Motel 6 could cause regional devastation greater than any nuclear weapon yet produced.
  • From Larry Nelson comes a pointer to the AirStash, an interesting $100 USB Wi-Fi gadget that can accept up to a 32 GB SD card and act as a content server over Wireless b/g. Anthough nominally a thumb drive, the USB plug also charges the internal battery, and (though it’s not screamed from the rooftops) the thingie works all by itself, no computer connection required. This suggests “wearable file sharing”: Drop one in your pocket and nearby people can download files from the device without having any idea where it actually is. Little by little, the jiminy (an AI wearable computer I thought up in 1983, and figured would be mature by 2027) creeps toward realization. The AI is actually the tough part; everything else already exists, if not in as small a package as I imagined 25 years ago.
  • And if you ever wanted to run Linux on one of your fillings (ok, one of your elephant’s fillings) this would be the solution. (Thanks to Bill Cherepy for the link.)
  • Here’s a gadget that builds you an external USB storage device by dropping in (literally) a naked SATA hard drive. I may not need it, but I admire the elegance of the concept.
  • I’ve been arguing in favor of dual-screen reader devices for years, and this one is a good start. Sounds like the user interface software needs work…but when has that not been an issue for a first-gen device? We’re closing in on it, though.
  • Nice status update on some of the current non-Tokamak fusion research approaches, link thanks to Frank Glover.
  • Also from Frank comes a reasonable article on how people would die in a vacuum and how they wouldn’t. I had heard of lung shredding; heart failure was new to me. But take, um, heart: Your blood wouldn’t boil.
  • If you ever wondered why you cry when you slice onions, well, it’s the sulfuric acid released by cells in the onion when they’re cut open. Supposedly living things evolved this mechanism (or at least key parts of it) half a billion years ago. Onions evolved their chemical weapons to avoid being laid on hamburgers in slices–but we evolved Vidalias to prove that we were smarter than onions, and that fast food will prevail against all threats.
  • Interestingly, the Canon G11 camera reduces the size of the image sensor to 10 megapixels, down from the 12.5 on the G10. The new sensor gives you fewer pixels but better ones, and faster, which is all for the best.
  • Burger King is testing a new retailing feature in Brazil. When you order a burger, they take your picture and print your face on the burger wrapper.

Odd Lots

  • Here’s the best discussion I’ve yet seen on why Flash may never work well–or perhaps at all–on touchscreen devices like the iPad.
  • Most recent laser printers have Ethernet ports, and some older printers (like my Laserjet 2100TN) can accept a JetDirect network adapter. Installing a printer on a network port means you don’t have to worry about whether the machine it’s attached to is turned on. If you’d like to do this but you’re not a network geek, here’s the best XP-based step-by-step on the topic I’ve ever run across. Same tutorial for Windows 2000.
  • Bruce Baker passed me a link to a nice item on the issue of broadening publisher book production to allow all formats to be generated from a single master file. Follow and read the link to The New Sleekness as well. Pablo should take it down a notch; XML is not a markup language; it’s a general mechanism for creating markup languages, and what may happen eventually (perhaps in ten years or so) is a standard book-production markup language derived from XML and built into a new generation of word processors. Still, what nobody in either article mentions is the problem of pages verses reflowable, which is the 9 trillion pound gorilla in the business. If you don’t solve that problem, absolutely nothing else matters. (And it is not as easy to solve as some may claim–I’ve been thinking about it for several years now and see no solution whatsoever on the horizon .)
  • Kompozer 0.8b2 has been released. I just got it installed in a VM and will be poking at it in coming days. According to Kaz, most of the changes are code cleanups, but any progress on the editor is a fine, fine thing.
  • I’ve done model rocketry here and there over the (many) years, and I’ve seen some very odd things lofted on D engines. Back in high school, my friend George built a Harecules Guided Muscle (which was from the Beany & Cecil cartoon show) in the form of a big whittled balsa wood fist on a short, thick body. I’m amazed it flew as well as it did. Well, here’s a fire-’em-together pack of 8 rockets shaped and colored like Crayola crayons. The guy took his time (six years) but he did a great job–and created a spectacular Web page documenting the project.
  • We rarely go to WalMart, but last time we did, I picked up a bottle of Diet Mountain Lightning. It has nothing on Kroger’s Diet Citrus Drop, easily the best of all the Diet Mountain Dew clones I’ve ever had the opportunity to try.

Odd Lots

  • Wow! The Authors’ Guild finally had a good idea a couple of weeks ago: Who Moved My Buy Button, a Web site that tracks Amazon’s “Buy” button for any given title. If the Buy button goes away (for example, if the book goes out of stock or if the publisher places it out of print–or if Amazon gets in another cage fight with a major publisher) you get an email to that effect. Don’t miss their “Buttonology” page, which explains how to interpret Buy button disruption by inspection. (Thanks to Bruce Baker for the link.)
  • So what exactly is this, anyway? It looks like what used to happen to me when I tried to develop my own film (briefly) in 1966, and found these odd (and similar) little anomalies on my negatives. Dirt, or perhaps the edge of the film contacting the center. Nothing says he wasn’t using a film camera, but film is pretty uncommon these days. If I had to guess (and assuming it isn’t some flaw in the camera optics) I like the idea of a meteor passing through the ionized region of the atmosphere where the aurora display was happening. (Thanks to Frank Glover for the link.)
  • While we’re talking high-energy physics, I’m finding it remarkable how rapidly an apparently dead Sun came back to life, on or about January 1. We now have three significant sunspots on the visible face, including a genuine monster. (Here’s an animated GIF of spot 1045 growing.) This gives us a sunspot number of 71, the likes of which I haven’t seen in three or four years. I’ve been spinning the dials downstairs, and have heard openings on 18 MHz and even 21 MHz. Gonna get those wires shielded before the next solar minimum, fersure.
  • Integrated reader/bookstore systems have made me a little bit nervous ever since the Kindle Orwell debacle last year, and the iPad, if anything, will be even more vulnerable to that sort of remote meddling. It’s not so much malfeasance by the system operators as their vulnerability to government corruption and coercion. Here’s a perspective from a French chap.
  • Still wedged on VMWare Workstation, but Bp. Sam’l Bassett pointed me to a site providing lots of free VirtualBox VMs. The question of how trustworthy such downloadable images are is a good one, but they’re certainly one way to mess with a new OS without having to fuss with hard disk partitioning and installation.
  • I know it’s really her name, and no disrespect is intended, but when I read a headline like: “Costa Rica Elects Chinchilla First Woman President” I don’t see what I’m supposed to see. Journalists used to be taught to avoid gaffes like this, and many other news organizations did. Including her first name would have helped.
  • I kid you not: Pepsico is wrapping up a limited-edition, 8-week-only campaign for Mountain Dew Throwback, which contains Real Sugar instead of high-fructose corn syrup. I’m a diet soda guy and won’t partake, but that’s a quarter step in the right direction. (My guess: The ridiculous ethanol-as-fuel scam is making corn expensive enough so that HFCS is not the big win that it used to be.)
  • Once again, XKCD scores big–and loud. (SNSFW.) (Thanks to Baron Waste for the link.)

An (Ebook) Fate Worse Than Piracy

I’ve had it in mind for some months now to conduct and publish an interview like this one with a backchannel correspondent of mine who calls himself The Jolly Pirate. That’s unlikely; Jolly didn’t like the idea much, and more to the point, he doesn’t pay much attention to pirated ebooks. He is not an ubercracker from the Scene and doesn’t want to be. He knows where the stuff is and he downloads its. He doesn’t upload at all, except for the uploading inherent in torrent downloading.

His motivation and modus operandi are interesting and I will describe them at greater length someday; from a height I’d describe him as a hoarder who downloads all sorts of things under the assumption that they may eventually be harder to come by. He’s read a few computer books downloaded from Usenet, all of them .chm files, and treats them like a sort of third-party help system for the technologies he’s interested in. The thing that makes me grin a little is this: He says he has over 50,000 ebook files on his hard drive, but he doesn’t own an ebook reader. He doesn’t read for fun and I get the impression that he doesn’t read much at all unless he has to. I asked him why he downloaded all those books, and his answer was simple and obvious: “Because it was easy.” Most of you have seen my entry for December 29, 2009. Jolly downloaded 10,000 ebooks in a couple of hours. That scares some authors and publishers a lot, and I’m still trying to get my head around the question. Tim O’Reilly said somewhere that piracy is like a progressive tax on success, and that’s a useful metaphor. I rarely see my own material in the pirate channels. That is not true of Steven King or Ms. Rowling.

And in truth, something else makes me worry more than piracy. This isn’t an original insight, though I don’t recall where I first read it (anybody?) but a major threat to success in writing today is the competition from books that have already been published. There are only so many hours in a life, and with most any popular print book available used but in good shape on ABEBooks for $5 or less, a given consumer never has to buy a new book at all, especially fiction. It’s less true in nonfiction covering emerging issues and technologies, but for last year’s news and mature technologies it’s operative: All the Windows XP books that the world needs have already been published, and you can get most of them from the penny sellers for the (slightly padded) cost of shipping.

My point: Existing books compete for reader chair-time with new books. An enormous number of books have been published in the past twenty years or so, and that’s not old enough for them to crumble into shreds. (Alas, my ’60s MM paperbacks are doing exactly that, reminding me constantly what “pulp” means.) They’re all still kicking around the used and unused remainder market, and will be for decades to come. All the arguing about ebook pricing that I’ve seen so far seems to ignore the fact that new books of either type compete with used print books, and ubiquitous Web access makes finding precisely what you want almost effortless.

Paying $15 for an ebook is a sort of impatience tax. Wait a few months, and used copies of the hardcover will be on ABEBooks for $5 or (probably) less, including shipping. Good books, too. If Big Media ever truly embraces ebooks, it will be as a means of defeating the Doctrine of First Sale and eliminating the used book market. (The legal issues there are still very much in play. Expect much agitation in coming years for new laws forbidding the resale of “used” electronic files.)

This shines some different light on the difficulties Google has had getting authors to sign on to the Google Books settlement. I’m not sure that all authors and (especially) publishers even want the orphan copyright issue to be settled. If it is, suddenly the Google scanning machine will drop what may eventually be hundreds of thousands of additional ebooks into the marketplace, all of them competing for quality chair time with whatever current authors are writing. That may explain why I’ve had so much trouble getting SF publishers to talk to me. People may not be reading less these days, but they’re certainly reading and re-reading things that already exist. The value of what I write now is correspondingly less.

When a pulp becomes an ebook, it becomes eternal. Don’t tell me about death due to storage or container format obsolescence. I still have SF copy I wrote using CP/M WordStar in 1979 and stored to 8″ floppies, now safely on a USB thumb drive in .rtf format. If USB ever becomes obsolete, all my files will follow me to whatever comes next–and will probably take five seconds or less to transfer.

There will always be a reliable if modest supply of book crazies and loyal fans who will pay top dollar for The Latest. Beyond that, market cruelties come into play that will make it a lot harder to break into the writing business for the forseeable future.

Piracy? What’s that again?

Daywander

Not having much luck making Workstation 6 function, and two conversations and numerous emails with VMWare’s tech support people hasn’t helped. I install the product, I enter the serial number as requested, and get this error message. Has anybody else ever seen this? Or can anybody even explain it? I emailed the screenshot to VMWare, and that’s about the time they clammed up.

I hate to abandon Workstation entirely. VMWare’s snapshot system is far superior to that of VirtualBox, and I use it a lot. I’ll miss it. Boy.

And while I’m asking peculiar things, let me ask the multitudes here how you pronounce “iodine.” I have always said eye-oh-dyne, but Bob Thompson, who knows more than a little about chemistry (and certainly more than I) pronounces it eye-oh-deen. This lines up with the rest of the halogens; we don’t, after all, say “broh-myne.” So? Which is it?

I edited another half a chapter of FreePascal From Square One yesterday morning, and in laying out the edited material got up to page 136. The book I’m adapting it from is 800 pages long, but don’t look for anything that size. To be workable on Lulu, the book is going to have to stop at or before page 400. A lot of the material in Borland Pascal 7 From Square One just doesn’t apply anymore…who’s called the Borland Graphics Interface lately, or done text output by poking word values into the video display buffer? The BGI chapter was 100 pages all by itself, and when I slice out that and other things like overlays and DOS/BIOS calls, I’m really pulling 400 pages out of no more than 600 pages of useful material, maybe less. Should be done by June. I hope.

The issue of whether Amazon imposes DRM on Kindle publishers is complicated, and I’ll back away some from my statement to that effect on Monday, and will hold off until I try to get one of my own titles into the system. This article suggests that recent policy changes have made DRM optional. Having to face the DRM issue square-on has kept me putting off publishing on the Kindle for some time. As a very small publisher I’ve made this promise to my readers: No DRM of any kind, on anything, ever. I’m willing to forgo Kindle sales if the DRM decision is not my own, but from what I’m reading now, I think that won’t be the case.

As for Amazon caving, well, that’s more complex too. I see that Nancy Kress’s new book Steal Across the Sky is listed on the Amazon Web store, and her publisher, Tor, is one of Macmillan’s imprints. However, you can’t order it from Amazon at this time. (Third-party affiliates are offering it, but Amazon itself is not. Note the double dashes under “Amazon Price.”) Ditto Nancy’s Beggars and Choosers, another Tor book. Yesterday morning’s Wall Street Journal had a story explicitly stating that Amazon had conceded the price issue to Macmillan. But Amazon isn’t selling the books yet, so clearly the struggle goes on.

Off to church, to install an SX270 in place of a doddering old E-Machines box that is four times the size and probably a third the capability.