Jeff Duntemann's Contrapositive Diary Rotating Header Image

September 17th, 2011:

Annotating Reality

We’ve had evening clouds here for well over a week. Maybe ten days. I’ve lost count, but I may well have to kiss off seeing that supernova in M101. That’s a shame, because I’ve downloaded the Google Sky Map app to my new Android phone, and I want to try it out under the stars.

The app knows what time it is and where you are, and if you hold the phone up against the sky, it will show you what stars/planets/constellations lie in that part of the sky. Move the phone, and the star map moves to reflect the phone’s new position. How the phone knows which way it’s pointed is an interesting technical question that I still need to research, but let it pass: The phone basically annotates your view of the sky, and that’s not only useful, it suggests boggling possibilities. I’m guessing there are now apps that will identify a business if you point your phone at it, and possibly display a menu (the food kind) or a list of daily sales and special deals. With a rich enough database, a phone could display short history writeups of historical buildings, identify landforms for hikers, and things like that.

That mechanism is not an original insight with me; Vernor Vinge described almost exactly that (and much more) in his Hugo-winning 2006 novel Rainbows End . Most of my current boggle stems from not expecting so much of it to happen this soon. When I read the book back in 2006 I was thinking 2060. We are well on our way, and may be there by 2040. (Vinge himself said 2025, but me, well, I’m a severe pessimist on such questions. How long have we been waiting thirty years for commercial fusion power?)

In general terms, I call this idea “annotating reality.” In its fully realized form, it would be an app that will tell me in very specific terms (and in as much detail as I request) what I’m looking at. I do a certain amount of this now, but with the limitation that I have to know how to name what I’m looking at, and that’s hit-or-miss. I have an excellent visual vocabulary in certain areas (tools, electronic components, wheeled vehicles, aircraft) and almost none in others (clothes, shoes, sports paraphernalia, exotic animals.) I was 25 before I’d ever heard the term “lamé” (metallic-looking cloth) and had no idea what it was when I saw it mentioned in one novel or another. I had indeed seen lamé cloth and lamé women’s shoes, but I didn’t know the word. It’s more than the simple ignorance of youth. As much as Carol and I are involved in the dog show scene, I still see dog breeds here and there that I don’t recognize. (Is that a bergamasco or a Swedish vallhund?) Even my core competence has limits: I received a Snap-On A173 radiator hose tool in Uncle Louie’s estate, and if it hadn’t had Snap-On’s part number on it I doubt that I’d know what it was even today, because I don’t work on cars.

I want something that lives in my shirt pocket and works like Google Images in reverse: Show it the image and it gives you the text description, with links to longer descriptions, reviews, and shopping. This is a nasty computational challenge; much worse, I’m guessing, than query-by-humming. (I’ve been experimenting with Android’s SoundHound app recently. Nice work!) Dual-core smartphones won’t hack it, and we’ll need lots more bandwidth than even our best 4G networks can offer.

But we’re working on it. Facial recognition may be worst-case, so I have hopes that the same algorithms that can discriminate between almost-identical faces can easily tell a tubax from a soprillo. I can’t imagine that identifying the Insane Clown Posse band logo is all that hard–unless, of course, you don’t follow rap. (I don’t.) Bp. Sam’l Bassett did some clever googling and identified Li’l Orby for me, but as with the Insane Clowns logo, the problem isn’t so much drawing distinctions as building the database. Pace Sagan, there are billions and billions of things right down here in the workaday world. Giving them all names may be the ultimate exercise in crowdsourcing. But hey, if we can do Wikipedia in forward, we can do it in reverse. C’mon, let’s get started–it’s gotta be easier than fusion power!

UPDATE: Well, if I read Bruce Sterling more I’m sure I’d have known this, but Google’s already started, with Google Goggles. I downloaded the app to the Droid X2, and surezhell, it knew I was drinking a Coke Zero. The app said clearly that it doesn’t work on animals, but when I snapped QBit it returned photos of three white animals as “similar,” including a poodle, a kitten, and two bunnies. Close enough to warrant a cigar, at least in 2011. More as I play with it. (And thanks to the six or seven people who wrote to tell me!)