DALL·E 2 🎨
OpenAI’s latest image-generating AI system
Year 2049 is the weekly newsletter that discusses the impactful innovations, discoveries, and research shaping our future.
If this was forwarded to you, subscribe for free to get a new story in your inbox every Friday.
Hello friends 👋
The AI research lab OpenAI just introduced the latest version of DALL·E, its image-generating AI system. What they’ve created is impressive but its potential risks will make you pause and wonder if it should ever become publicly available.
Read along and enjoy!
Cartoon: Salvador DALL·E shows Vic his latest painting
Story: OpenAI’s DALL·E 2
What’s DALL·E 2?
The types of images DALL·E 2 can generate
How OpenAI is minimizing misuse
Risks and limitations
Comic: Salvador DALL·E
Something I learned: The real Salvador Dalí used to draw on the back of his cheques whenever he used to pay for meals, knowing that restaurants would never cash a cheque with his original artwork on it. THE AUDACITY 🤣
Story: DALL·E 2
OpenAI’s new work of art
What do you get if you mix the creativity of Salvador Dalí with the intelligence of WALL-E? OpenAI’s new brainchild: DALL·E 2.
The AI research lab just introduced the latest version of its image-generating AI system to the world. The first version, DALL·E, was introduced back in 2021. DALL·E 2 is a significant improvement compared to its predecessor: it can better understand words and create more photorealistic and high-resolution images.
When asked to generate an image of “bears shopping for groceries in Ancient Egypt”, DALL·E 2 generated the following image:
You can even specify which art style you would like.
“An astronaut playing basketball with cats in space as a children’s book illustration” returns this image:
DALL·E 2 is still a research project and is not available to the public yet. OpenAI hasn’t outlined any specific or intended applications for it:
Our hope is that DALL·E 2 will empower people to express themselves creatively. DALL·E 2 also helps us understand how advanced AI systems see and understand our world, which is critical to our mission of creating AI that benefits humanity.
Drawing and stealing like an artist
DALL·E 2 can perform 3 types of tasks:
Create brand new images
Edit existing images
Create variations of existing images
#1: Creating brand new images
DALL·E can create brand new images from a text description, as long as it understands the words you enter. It doesn’t just mashup different concepts together in one image, but it understands the relationship between items and can represent actions visually.
In the “koala dunking a basketball” example, DALL·E 2 needs to understand and put together three concepts: koalas, basketball, and the act of dunking. DALL·E correctly generates an image of an airborne koala dunking like it’s at the NBA All-Star Weekend.
#2: Editing existing images
When you don’t need DALL·E 2 to channel its inner artist, it can make realistic edits to existing images while maintaining consistent textures, shadows, and reflections.
The researchers at OpenAI used DALL·E 2 to give the Mona Lisa a mohawk. If you look closely at the image, you can see how the hair colour was well-preserved: the light is coming from the left, making the front of the mohawk lighter than the side. The top seems a bit blurry, but it’s still impressive.
At least it doesn’t edit paintings like Mr. Bean.
#3: Creating variations of existing images
Finally, DALL·E can copy something and change it up a bit. The AI system can take an existing image and create new variations of it. An example:
OpenAI wants to minimize potential misuse
Like any other technology, AI can be used for unpleasant reasons.
According to OpenAI, the research group took several measures to minimize potential misuse:
Removing harmful concepts from the training data: Data containing violence, hate, or adult images was removed from the training data so DALL·E 2 wouldn’t be exposed to these concepts and start understanding them.
OpenAI also says they used “advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures”. I couldn’t find more information on how they did this exactly.
Detecting and preventing inappropriate generations: DALL·E 2 doesn’t generate images when it’s given a text description containing violent, adult, or political content. You can read OpenAI’s full content policy here.
Phased deployment: OpenAI decided to phase out the launch of DALL·E 2 as it works with a select group of experts to understand its capabilities and limitations in more depth. I signed up for the waitlist so maybe I’ll get access soon and experiment with it.
Subscribe to get 1 new story in your inbox every Friday.
The risks of DALL·E 2
Despite these measures, OpenAI still found multiple risks and limitations with DALL·E 2 when testing the system:
Bias and representation
Harassment, bullying, and exploitation
Dis- and misinformation
Copyright and trademarks
I’m summarizing the main risks below but I’ve included a link to the detailed analysis provided by OpenAI in the Deep Dive section.
#1: Explicit content
Although DALL·E 2 won’t generate an image when given a text prompt that includes violence or nudity, it can still create images that suggest these topics when visual synonyms are used.
A man with blood all over his shirt → No image generated ❌
A man with ketchup all over his shirt → Image generated ✅
Even if ketchup is harmless, it would still generate an image containing what most of us would assume to be blood in that context.
#2: Bias and representation
DALL·E 2 may reinforce existing gender, racial, or cultural stereotypes due to bias in the model’s training data. Testing of the model uncovered different types of biases:
Racial bias: It overrepresented people who are white.
Gender bias: It overrepresented certain genders based on professions. Images of nurses contained mostly females, while images of CEOs contained mostly males.
Cultural bias: It defaults to Western culture, customs, and traditions when generating images of things like weddings, restaurants, and homes.
#3: Harassment, bullying, and exploitation
Since DALL·E tries to maintain consistent textures, reflections, and shadows when editing images, it can become hard to distinguish them from reality.
Although images can be edited and altered with many other tools, DALL·E makes the process much easier and faster compared to something like Photoshop which needs more time and effort to learn. It might even give you a more realistic image compared to the one you tried editing in Photoshop.
#4: Dis- and misinformation
This is somewhat related to the previous point but has wider and more serious implications.
Editing or creating photorealistic images to deceive or mislead people can be extremely manipulative. We’re already facing widespread misinformation with something as rudimentary as fake articles, and more recently with other AI applications like deepfakes.
DALL·E’s super-charged creation and editing skills could replace some of the work done by designers, photographers, models, and artists.
I can envision applications to generate custom art and logos for individuals at a fraction of the price of hiring a designer. It would be harder to replace an entire creative team for a bigger project (for now) since DALL·E 2 gives you little control over the art direction.
Ownership is another problem. Who owns the art generated by DALL·E 2? OpenAI says that commercial use of these generated images is not allowed but that would be difficult, if not impossible, to track and enforce. This reminds me of the previous dilemma we discussed in the Artificial Inventor episode.
#6: Copyright and trademarks
Finally, OpenAI says that the model can generate images with trademarked logos or copyrighted characters. The model was trained on large and public datasets that may contain references to IP-protected elements or concepts which are hard to filter out.
This is one of those innovations that get you excited until you start learning about its equally-harmful applications.
Koalas dunking basketballs and Mona Lisa with a mohawk are fun and creative visualizations that make me eager to try the system out. But altering images to harm and deceive people makes me worried about it being released to the public.
I think there’s a middle ground, however. Almost all of DALL·E’s risks come from generating photorealistic images of real people because they can be hard to separate from reality. It can completely ruin our trust systems when it comes to consuming online content.
Many of these risks could be eliminated if DALL·E 2 was only trained to generate images in artistic styles like line drawings, cartoons, and watercolour. These would enable fun and creative experiments that aren’t competing with reality. And I believe this would better preserve OpenAI’s goal of empowering people to express themselves creatively.
I would love to hear your thoughts about this in the comments 👇
Thanks for reading Year 2049! Subscribe for free to receive 1 new story about an impactful innovation or discovery every Friday.
If you enjoyed today’s story, I’ve compiled some additional links to satisfy your curiosity:
Risks and limitations of DALL·E 2 (Github) – highly recommended
Previous episodes you might enjoy
You can also check out all previous Year 2049 editions in chronological order to learn about other impactful innovations shaping our future across all aspects of life.
How would you rate this week's edition?