Jeff Duntemann's Contrapositive Diary Rotating Header Image

AI Image Generators, Mon Dieu

I finished a 10,700 novelette the other day, the first short fiction I’ve finished since 2008, when I wrote “Sympathy on the Loss of One of Your Legs,” now available in my collection, Souls in Silicon. I’ve mostly written novels and short novels since then. (I’ll have more to say about “Volare” in a future entry here.)

To be published, it needs a cover. I have no objection to paying artists for covers, which apart from an experiment or two (see “Whale Meat”) I’ve always done in the past. Given all the yabbjabber about AI content creation recently, I thought, “Hey, here’s a chance to see if it’s all BS.”

The spoiler: It’s not all BS, but parts of it are BS-ier than others.

Ok. I’ve tested two AI image generators: OpenAI’s DALL-E 2, and Microsft’s Bing Image Generator. I found them through a solid article on ZDNet by Sabrina Ortiz. As it happens, Bing Image Generator outsources the process to DALL-E. I wanted to try Midjourney, and may eventually, but you have to have a paid subscription (about $8/month) to use it.

I’m not going to summarize the story here. One image I wanted to try as a cover would be the female lead sitting with her behind in a wicker basket, floating through the air at dawn a thousand feet or so over Baltimore. In both generators (which are basically the same generator) you feed the AI a detailed text description and turn it loose. I started simple: “A woman flying through the air in a wicker basket.” Edy Gagliano does precisely that in the story. What DALL-E gave me was this:

DALL·E 2023-04-23 14.46.55 - a woman flying through the air in a wicker basket - 500 Wide

Well, the woman is flying through the air, but we have a preposition problem here. She is over, not in the basket. Good first shot, though. I tried various extensions of that basic description, to the tune of 48 images on Dall-E. I won’t post them all here for space reasons, but they ran the gamut: A woman flying through the air holding a basket, a woman flying through the air in a basket the size and shape of a bathtub, and on and on.

The next one here is perhaps the best I’ve gotten from DALL-E. It’s a woman in a basket over Baltimore, I guess. Here’s the description: “a barefoot woman sitting down inside a magical wicker basket that flies through the air at dawn over Baltimore.” In one sense, it’s not a bad picture:

DALL·E 2023-04-23 10.05.40 - a barefoot woman sitting down inside a magical wicker basket that flies through the air at dawn over Baltimore 500 wide

That said, It looks out of focus. The basket is not wicker and it’s yuge. And in the story, Edy just puts her butt in the basket and lets her legs hang over the side.

Now let us move over to Bing Image Generator. In a way, it came closer than nearly all of the DALL-E images. But now we confront a well-known weakness of AI image generators: They can’t draw realistic hands or feet or faces. Here’s my first take on the image from Bing:

_77229ce5-3d7c-4c09-964f-b2b784ba3580 - 500 Wide

Look closely. Her hands and feet appear to be drawn by something that doesn’t know what a human hand or foot looks like. The face, furthermore, looks like it has one eye missing. (That’s easier to see in the full-sized image.)

I’ll give Bing credit: The images are less fuzzy and smeary. Because Bing uses DALL-E, I suspect there are DALL-E settings I don’t know about yet. I tried a few more times and got some reasonable images, all of them including some weirdness or another. The one below is a better rendering of a woman who is actually sitting in the basket with her legs hanging over the basket’s edge. But did I order a helicopter? Her face is a little lopsided, and her hands and feet, while not grotesque, aren’t quite right.

_090cd681-df9a-4736-8fcd-cdaafe028ae1 - 500 wide

Bing gave me about 24 images while I messed with it, and some of the images, while not capturing what I intended, were well-rendered and not full of weirdness. The one below is probably closest to Edy as I imagine her, and we get a SpaceX booster burning up in the atmosphere to boot. Is she over Baltimore? I don’t know Baltimore well enough to be sure, but that, at least, doesn’t matter. Stock photos of anonymous cities are everywhere.

_794c2ce1-7cd6-492d-9712-7e75ab646a3c - 500 wide

None of the others are notable enough to show here.

So where does this leave us? AIs can draw pictures. That’s real, and I’m guessing that if you tell it to draw something a little less loopy than a woman with her butt in a flying basket, it might do a better job. I remain puzzled why hands and feet and faces are so hard to do. Don’t AIs need training? And aren’t there plenty of photos of hands and feet and faces for them to generalize from a substantial number of specific examples?

I have no idea how these things are supposed to work, and if there were a good overview book on AI image generator internals, I’d buy it like a shot. In the meantime, I may practice some more and look at specific settings. If nothing else, I can produce some concept images to show to a cover artist. And maybe I’ll luck into something usable as-is.

Whatever I discover, you can count on seeing it here.

8 Comments

  1. This was a fascinating report. Thank you for doing the research. You wished for a book on image generators. Alas, printed manuals for current software and operating systems are hard to find. I have a ton of $40 obsolete paperback manuals on programs. Version changes and updates quickly leave the printed manuals in the dust. We now rely on web pages on the applications. This is great except that outdated online advice is abundant and often wrong and/or misleading. The trick is to know what to believe.

    I think the solution is for each publisher of software to support a frequently updated online manual as well as a Q & A page.

  2. Vince says:

    Re why hands and feet are hard to generate. Yeah, it’s puzzling. I would have thought it already has database of human with all the norma components and it just has to position the model to the appropriate posture. Is it assembling a human from components? If so, it at least seems to get the proportions and skin colours and perspectives right.

  3. greatUnknown says:

    The methane study will be quickly memory-holed, because it would remove a major argument against eating meat. And eating meat is one of the major taboos of the gaiean religion.

  4. Lee Hart says:

    I think it’s too much to expect an idiot-genius AI program to produce good results all by itself. Your best bet is probably to use it to give you a basic image to start with, and then edit it yourself to get the desired results.

  5. Orvan Taurus says:

    Thing I’ve seen (furry) complain about being really hard to draw:

    Hands
    Feet
    Horses

    You’d expect, yes, a LOT of photos of such. Perhaps the AI is trained on *artwork* that is a bit dubious, if not outright shy, about showing hands and feet. No idea on horses. I’ve heard MidJ tends add antlers… I sure can’t explain that one.

    1. Horses? Wow. Will have to check that out. Harder than lions or buffalo? Or just plain vanilla cows?

      I posted an entry today with my theory as to why hands are hard. See:

      http://www.contrapositivediary.com/?p=4957

      This is becoming fun, now that I’ve gotten to the point that I don’t expect reasonable hands or feet in an AI image.

  6. […] you haven’t read my entry for April 23 yet, please do so—this entry is a follow-on, now that I’ve had a chance to do a little more […]

  7. […] registers. Numbers aren’t their forte, even down at level of counting on their fingers, since AI image generators don’t have any clear idea how many fingers a hand is supposed to have. (More on that topic […]

Leave a Reply

Your email address will not be published. Required fields are marked *