Frustrations with ChatGPT Image Generation

Like many L&D folks, I’ve been experimenting with ChatGPT image generation. Sometimes, ChatGPT does great! But other times, I find it really frustrating. I’ve been working on a scenario recently for a client project, and I need a series of images of two characters talking together. Midjourney is great for generating images of consistent characters, but only with one character at a time. So, I thought I’d try ChatGPT for this project instead. What I discovered is that ChatGPT’s consistency in characters degrades over multiple generations. I also struggled with the position of the characters; ChatGPT keeps placing characters so they aren’t looking at each other.

As I’ve noted elsewhere, I’m not an expert in AI. My perspective is as a practitioner experimenting with AI to figure out how to improve my own work. Image generation is one of the areas where I’ve spent a lot of time, partly because I often can get better results with AI-generated images for scenarios than with stock libraries. But part of “showing my work” with AI means that I show you when it’s not working too.

I’ll show you my results from experimenting with ChatGPT image generation, including what didn’t work and how I finally got better results.

Generating the characters and setting the scene

For this scenario, I have two characters: a graduate student and a professor. They’re meeting in a conference room or classroom with a whiteboard in the background.

I started by generating a single character, the grad student. So far, so good. This is a usable image.

Generating an image of this same character talking worked OK too. The character and scene consistency is pretty good.

The same graduate student as the previous image, but talking.

Adding a second character

Adding a second character, the professor, is where things got trickier. I prompted ChatGPT to change the angle to a side view so we can see the other person in the conversation.

I prompted for the other character to be sitting on the “opposite side of the table,” but it set them up around a corner. He should be looking at her while listening, but the body positions are wrong. He’s sort of looking past her, not at her.

The graduate student talking to a professor, but he isn't really looking at her.

I tried to adjust the characters’ positions with this prompt. “Change where they’re sitting so they are on opposite sides of the table, directly facing each other, with more space between them. They should not be at the corner of the table like this. Make the image aspect ratio 16×9.”

As you can see below, it’s still not quite right. They are a little farther apart, but still not directly opposite. They are turned toward each other a little better though. Plus, the character consistency is a little lower. The student’s facial features have changed slightly, and the professor’s hair is darker and less gray. Still, this might be good enough for a scenario.

The student and professor talking, sitting a little further apart. She's looking at him, but he's still looking past her.

Changing which character is talking

Then, I tried switching speakers. ChatGPT flipped the position of the speakers rather than keeping them in the same place but changing their poses. Obviously, that won’t work.

The professor is on the left and the graduate student is on the right.

I got both characters sitting on the right sides of the table, but neither of these characters is looking at each other. Both characters lose some consistency with each subsequent generation.

The graduate student and professor sitting in a table. The professor is talking and looking off camera. The student is facing at an angle but at least looking at the professor.

When I asked for the characters to turn their bodies so they’re looking directly at each other, both the professor’s hair and the students skin became darker. Plus, he still looks like he’s talking to a third person in the room, rather than the graduate student who is now sitting so close that their arms almost touch.

The student is looking at the professor and listening, but the professor is talking to the wall.

I tried one more time in this set before giving up. “Turn the professor’s head more so he’s looking at the woman’s face.” As you can see, it didn’t turn his head.

The professor talking and staring into the corner of the room, avoiding eye contact with the student.

Character inconsistency

Two versions of the graduate student character from the beginning and end of the ChatGPT image generation process. The two images have similar blouses and hair, but different skin color, earrings, and facial features.

Here’s a side-by-side comparison of the graduate student character so the inconsistency is more obvious. I always expect some minor inconsistency in AI image generation; that’s part of the reality of working with these tools. But to have a character shift this much after only a handful of iterations was really disappointing. I think ChatGPT does better with character consistency when you’re only working with a single character. In a scene with even two characters, I don’t think it’s usable if the characters change so much.

I think ChatGPT did somewhat better with consistency in the white male character than it did with the Black woman. The consistency still wasn’t great, but I wonder if part of the issue is that I’m deliberately showing diversity in my characters. We know that AI image generators show bias and tend to exaggerate stereotypes. I think there are general issues with getting character consistency with multiple characters, but I suspect that the underlying bias in the training data may be part of the problem too.

Trying a different approach

Obviously, this approach with image generation wasn’t working. Prompting for images in ChatGPT is different from other tools; you can be more conversational and less precise. This conversational prompting lends itself well to iterating and refining images over multiple attempts (but only if the characters can stay consistent).

I decided to try something different. I generated the initial characters in Midjourney instead of in ChatGPT so I was starting with more detailed images. I think Midjourney is better for more interesting, less generic characters than ChatGPT. I prefer this version of the graduate student character rather than the one I generated in ChatGPT. The professor character still looks a little obviously AI, so I will probably go back and regenerate a base image for that character. It was good enough for this experiment though.

Professor as generated in Midjourney. A white man in his 50s with graying hair and a bear, wearing glasses and a tweed jacket.

I uploaded both of these reference images to ChatGPT and asked it to put them together in a scene. This is one area where ChatGPT can work well; it can combine and remix existing images into new scenes. There’s some loss of detail in the characters, but I can live with that. However, it still set up the two characters at the corner of a table.

Sketch for prompt

After my previous frustrating experience, I decided to do something different to prompt for the layout I wanted in my image. I did a very quick sketch of the scene and how I wanted the characters positioned.

For my prompt, I used, “That’s not quite right. I want the characters to be viewed from a profile view instead of a 3/4 view. The table should be between them, and they are sitting across from each other on opposite sides of the table. See this attached image for the layout. Use this sketch as a guide, and keep the realistic characters but rotate them to a profile view.”

Sketch showing two people sitting across from each other at a table with a whiteboard in the background.

This approach finally got me the results I wanted. The characters are actually looking at each other!

This also worked for changing which character is speaking. There’s some more inconsistency in characters here, especially the graduate student (note her hair and the black strap for an ID around her neck). But this is definitely more on the right track.

Continuing to experiment

I’ll keep working and experimenting with ChatGPT image generation. I’m still not sure that ChatGPT can keep the characters consistent enough across multiple iterations like I need for this scenario. However, since I started with characters generated in Midjourney, I can also create some images of each character separately in Midjourney.

If Midjourney had an easy way to maintain character consistency with multiple characters, I’d probably just stick with that tool.

Sometimes, ChatGPT is clearly the better image generation tool. Tim Slade showed how he used ChatGPT to generate cutout characters to put on slides. Midjourney can’t do transparent background PNGs yet. I’ve had success with a white background for character images in Midjourney, and that’s fairly easy to edit to remove the background. It does require an extra step though. ChatGPT can do it on its own without the extra editing.

It may be that I need a different workflow for building these scenes with multiple characters. There are some other tools out there like Runway that appear to be able to remix character images into new scenes. Tons of tools are working on character consistency right now since it’s important for storytelling across many fields (marketing, entertainment, etc.). If I can’t get it to work in ChatGPT with the current technology, I think it should be possible in some tool.

If you have had success in generating scenes with multiple characters, let me know what tools you’re using. I’ll add them to my list of tools to test.

Source link

Author

admin

Equipment Loan Calculator | AMP Advance

Welcoming Marcus Mackay of M PERFORM as Our Newest Growth Partner

Welcoming Marcus Mackay of M PERFORM as Our Newest Growth Partner

How to Properly Use Drill Bit Tiles for Various DIY Projects

Leave a Reply Cancel reply

Quick Links

Welcome Back!

Create New Account!

Retrieve your password

Frustrations with ChatGPT Image Generation

Generating the characters and setting the scene

Adding a second character

Changing which character is talking

Character inconsistency

Trying a different approach

Sketch for prompt

Continuing to experiment

Like this:

Related

Author

Equipment Loan Calculator | AMP Advance

Welcoming Marcus Mackay of M PERFORM as Our Newest Growth Partner

Welcoming Marcus Mackay of M PERFORM as Our Newest Growth Partner

How to Properly Use Drill Bit Tiles for Various DIY Projects

Leave a Reply Cancel reply

Quick Links

Follow Us

Welcome Back!

Create New Account!

Retrieve your password