Like many L&D folks, I’ve been experimenting with ChatGPT image generation. Sometimes, ChatGPT does great! But other times, I find it really frustrating. I’ve been working on a scenario recently for a client project, and I need a series of images of two characters talking together. Midjourney is great for generating images of consistent characters, but only with one character at a time. So, I thought I’d try ChatGPT for this project instead. What I discovered is that ChatGPT’s consistency in characters degrades over multiple generations. I also struggled with the position of the characters; ChatGPT keeps placing characters so they aren’t looking at each other.
As I’ve noted elsewhere, I’m not an expert in AI. My perspective is as a practitioner experimenting with AI to figure out how to improve my own work. Image generation is one of the areas where I’ve spent a lot of time, partly because I often can get better results with AI-generated images for scenarios than with stock libraries. But part of “showing my work” with AI means that I show you when it’s not working too.
I’ll show you my results from experimenting with ChatGPT image generation, including what didn’t work and how I finally got better results.
Generating the characters and setting the scene
For this scenario, I have two characters: a graduate student and a professor. They’re meeting in a conference room or classroom with a whiteboard in the background.
I started by generating a single character, the grad student. So far, so good. This is a usable image.
Generating an image of this same character talking worked OK too. The character and scene consistency is pretty good.

Adding a second character
Adding a second character, the professor, is where things got trickier. I prompted ChatGPT to change the angle to a side view so we can see the other person in the conversation.
I prompted for the other character to be sitting on the “opposite side of the table,” but it set them up around a corner. He should be looking at her while listening, but the body positions are wrong. He’s sort of looking past her, not at her.

I tried to adjust the characters’ positions with this prompt. “Change where they’re sitting so they are on opposite sides of the table, directly facing each other, with more space between them. They should not be at the corner of the table like this. Make the image aspect ratio 16×9.”
As you can see below, it’s still not quite right. They are a little farther apart, but still not directly opposite. They are turned toward each other a little better though. Plus, the character consistency is a little lower. The student’s facial features have changed slightly, and the professor’s hair is darker and less gray. Still, this might be good enough for a scenario.

Changing which character is talking
Then, I tried switching speakers. ChatGPT flipped the position of the speakers rather than keeping them in the same place but changing their poses. Obviously, that won’t work.

I got both characters sitting on the right sides of the table, but neither of these characters is looking at each other. Both characters lose some consistency with each subsequent generation.

When I asked for the characters to turn their bodies so they’re looking directly at each other, both the professor’s hair and the students skin became darker. Plus, he still looks like he’s talking to a third person in the room, rather than the graduate student who is now sitting so close that their arms almost touch.

I tried one more time in this set before giving up. “Turn the professor’s head more so he’s looking at the woman’s face.” As you can see, it didn’t turn his head.

Character inconsistency

Here’s a side-by-side comparison of the graduate student character so the inconsistency is more obvious. I always expect some minor inconsistency in AI image generation; that’s part of the reality of working with these tools. But to have a character shift this much after only a handful of iterations was really disappointing. I think ChatGPT does better with character consistency when you’re only working with a single character. In a scene with even two characters, I don’t think it’s usable if the characters change so much.
I think ChatGPT did somewhat better with consistency in the white male character than it did with the Black woman. The consistency still wasn’t great, but I wonder if part of the issue is that I’m deliberately showing diversity in my characters. We know that AI image generators show bias and tend to exaggerate stereotypes. I think there are general issues with getting character consistency with multiple characters, but I suspect that the underlying bias in the training data may be part of the problem too.
Trying a different approach
Obviously, this approach with image generation wasn’t working. Prompting for images in ChatGPT is different from other tools; you can be more conversational and less precise. This conversational prompting lends itself well to iterating and refining images over multiple attempts (but only if the characters can stay consistent).
I decided to try something different. I generated the initial characters in Midjourney instead of in ChatGPT so I was starting with more detailed images. I think Midjourney is better for more interesting, less generic characters than ChatGPT. I prefer this version of the graduate student character rather than the one I generated in ChatGPT. The professor character still looks a little obviously AI, so I will probably go back and regenerate a base image for that character. It was good enough for this experiment though.


I uploaded both of these reference images to ChatGPT and asked it to put them together in a scene. This is one area where ChatGPT can work well; it can combine and remix existing images into new scenes. There’s some loss of detail in the characters, but I can live with that. However, it still set up the two characters at the corner of a table.

Sketch for prompt
After my previous frustrating experience, I decided to do something different to prompt for the layout I wanted in my image. I did a very quick sketch of the scene and how I wanted the characters positioned.
For my prompt, I used, “That’s not quite right. I want the characters to be viewed from a profile view instead of a 3/4 view. The table should be between them, and they are sitting across from each other on opposite sides of the table. See this attached image for the layout. Use this sketch as a guide, and keep the realistic characters but rotate them to a profile view.”

This approach finally got me the results I wanted. The characters are actually looking at each other!

This also worked for changing which character is speaking. There’s some more inconsistency in characters here, especially the graduate student (note her hair and the black strap for an ID around her neck). But this is definitely more on the right track.

Continuing to experiment
I’ll keep working and experimenting with ChatGPT image generation. I’m still not sure that ChatGPT can keep the characters consistent enough across multiple iterations like I need for this scenario. However, since I started with characters generated in Midjourney, I can also create some images of each character separately in Midjourney.
If Midjourney had an easy way to maintain character consistency with multiple characters, I’d probably just stick with that tool.
Sometimes, ChatGPT is clearly the better image generation tool. Tim Slade showed how he used ChatGPT to generate cutout characters to put on slides. Midjourney can’t do transparent background PNGs yet. I’ve had success with a white background for character images in Midjourney, and that’s fairly easy to edit to remove the background. It does require an extra step though. ChatGPT can do it on its own without the extra editing.
It may be that I need a different workflow for building these scenes with multiple characters. There are some other tools out there like Runway that appear to be able to remix character images into new scenes. Tons of tools are working on character consistency right now since it’s important for storytelling across many fields (marketing, entertainment, etc.). If I can’t get it to work in ChatGPT with the current technology, I think it should be possible in some tool.
If you have had success in generating scenes with multiple characters, let me know what tools you’re using. I’ll add them to my list of tools to test.