DALL·E 2 🎨

OpenAI’s latest image-generating AI system

Apr 22, 2022

Year 2049 is the weekly newsletter that discusses the impactful innovations, discoveries, and research shaping our future.
If this was forwarded to you, subscribe for free to get a new story in your inbox every Friday.

Hello friends 👋

The AI research lab OpenAI just introduced the latest version of DALL·E, its image-generating AI system. What they’ve created is impressive but its potential risks will make you pause and wonder if it should ever become publicly available.

Read along and enjoy!

– Fawzi

Today’s Edition

Cartoon: Salvador DALL·E shows Vic his latest painting
Story: OpenAI’s DALL·E 2
- What’s DALL·E 2?
- The types of images DALL·E 2 can generate
- How OpenAI is minimizing misuse
- Risks and limitations

Comic: Salvador DALL·E

Something I learned: The real Salvador Dalí used to draw on the back of his cheques whenever he used to pay for meals, knowing that restaurants would never cash a cheque with his original artwork on it. THE AUDACITY 🤣

Story: DALL·E 2

OpenAI’s new work of art

What do you get if you mix the creativity of Salvador Dalí with the intelligence of WALL-E? OpenAI’s new brainchild: DALL·E 2.

The AI research lab just introduced the latest version of its image-generating AI system to the world. The first version, DALL·E, was introduced back in 2021. DALL·E 2 is a significant improvement compared to its predecessor. It can better understand words and create more photorealistic and high-resolution images.

When asked to generate an image of “bears shopping for groceries in Ancient Egypt”, DALL·E 2 generated the following image:

You can even specify which art style you would like.

“An astronaut playing basketball with cats in space as a children’s book illustration” returns this image:

DALL·E 2 is still a research project and is not available to the public yet. OpenAI hasn’t outlined any specific or intended applications for it:

Our hope is that DALL·E 2 will empower people to express themselves creatively. DALL·E 2 also helps us understand how advanced AI systems see and understand our world, which is critical to our mission of creating AI that benefits humanity.
– OpenAI

Drawing and stealing like an artist

DALL·E 2 can perform 3 types of tasks:

Create brand new images
Edit existing images
Create variations of existing images

#1: Creating brand new images

DALL·E can create brand new images from a text description, as long as it understands the words you enter. It doesn’t just mashup different concepts together in one image, but it understands the relationship between items and can represent actions visually.

In the “koala dunking a basketball” example, DALL·E 2 needs to understand and put together three concepts: koalas, basketball, and the act of dunking. DALL·E correctly generates an image of an airborne koala dunking like it’s at the NBA All-Star Weekend.

#2: Editing existing images

When you don’t need DALL·E 2 to channel its inner artist, it can make realistic edits to existing images while maintaining consistent textures, shadows, and reflections.

The researchers at OpenAI used DALL·E 2 to give the Mona Lisa a mohawk. If you look closely at the image, you can see how the hair colour was well-preserved: the light is coming from the left, making the front of the mohawk lighter than the side. The top seems a bit blurry, but it’s still impressive.

At least it doesn’t edit paintings like Mr. Bean.

What is your favorite Mr. Bean movie (rate them on a scale of 10, based on funniness as well)? - Quora — Credit: Bean

#3: Creating variations of existing images

Finally, DALL·E can copy something and change it up a bit. The AI system can take an existing image and create new variations of it. An example:

OpenAI wants to minimize potential misuse

Like any other technology, AI can be used for unpleasant reasons.

According to OpenAI, the research group took several measures to minimize potential misuse:

Preventing harmful generations: Data containing violence, hate, or adult images was removed from the training data so DALL·E 2 wouldn’t be exposed to these concepts and start understanding them.
- OpenAI also says they used “advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures”. I couldn’t find more information on how they did this exactly.
Preventing misuse: DALL·E 2 doesn’t generate images when it’s given a text description containing violent, adult, or political content. You can read OpenAI’s full content policy here.
Phased deployment: OpenAI decided to phase out the launch of DALL·E 2 as it works with a select group of experts to understand its capabilities and limitations in more depth. I signed up for the waitlist so maybe I’ll get access soon and experiment with it.

The risks of DALL·E 2

Despite these measures, OpenAI still found multiple risks and limitations with DALL·E 2 when testing the system:

Explicit content
Bias and representation
Harassment, bullying, and exploitation
Dis- and misinformation
Economic
Copyright and trademarks

I’m summarizing the main risks below but I’ve included a link to the detailed analysis provided by OpenAI in the Deep Dive section.

#1: Explicit content

Although DALL·E 2 won’t generate an image when given a text prompt that includes violence or nudity, it can still create images that suggest these topics when visual synonyms are used.

For example:

A man with blood all over his shirt → No image generated ❌
A man with ketchup all over his shirt → Image generated ✅

Even if ketchup is harmless, it would still generate an image containing what most of us would assume to be blood in that context.

#2: Bias and representation

Prompting DALL·E to generate “lawyer” returns mostly male results (Source: OpenAI)

DALL·E 2 may reinforce existing gender, racial, or cultural stereotypes due to bias in the model’s training data. Testing of the model uncovered different types of biases:

Racial bias: It overrepresented people who are white.
Gender bias: It overrepresented certain genders based on professions. Images of nurses contained mostly females, while images of CEOs contained mostly males.
Cultural bias: It defaults to Western culture, customs, and traditions when generating images of things like weddings, restaurants, and homes.

#3: Harassment, bullying, and exploitation

Since DALL·E tries to maintain consistent textures, reflections, and shadows when editing images, it can become hard to distinguish them from reality.

Although images can be edited and altered with many other tools, DALL·E makes the process much easier and faster compared to something like Photoshop which needs more time and effort to learn. It might even give you a more realistic image compared to the one you tried editing in Photoshop.

#4: Dis- and misinformation

This is somewhat related to the previous point but it has wider and more serious implications.

Editing or creating photorealistic images to deceive or mislead people can be extremely manipulative. We’re already facing widespread misinformation with something as rudimentary as fake articles, and more recently with other AI applications like deepfakes.

#5: Economic

DALL·E’s super-charged creation and editing skills could replace some of the work done by designers, photographers, models, and artists.

I can envision applications to generate custom art and logos for individuals at a fraction of the price of hiring a designer. It would be harder to replace an entire creative team for a bigger project since DALL·E 2 gives you little control over the art direction.

Ownership is another problem. Who owns the art generated by DALL·E 2? OpenAI says that commercial use of these generated images is not allowed but that would be difficult, if not impossible, to track. This reminds me of the previous dilemma I discussed in the Artificial Inventor episode.

Year 2049

The Artificial Inventor 💡

Welcome to a new episode of Year 2049, your weekly guide to the events, discoveries, and innovations shaping the future of tech, climate, science, and more. Subscribe to get a new story in your inbox every Friday 👇…

4 years ago · 5 likes · 1 comment · Fawzi Ammache

#6: Copyright and trademarks

Finally, OpenAI says that the model can generate images with trademarked logos or copyrighted characters. The model was trained on large and public datasets that may contain references to IP-protected elements or concepts which are hard to filter out.

Final thoughts

This is one of those innovations that make you go “this is cool!” until you start learning about its equally-harmful applications.

That was my reaction in the process of discovering and learning more about DALL·E 2. Koalas dunking basketballs and Mona Lisa with a mohawk are fun and creative visualizations that get me excited about trying the system out. But altering images to harm and deceive people makes me hope that it’s never released to the public.

I think there’s a middle ground, however. Almost all of DALL·E’s risks come from generating photorealistic images of real people because they can be hard to separate from reality. It can completely ruin our trust systems when it comes to consuming online content.

Many of these risks could be eliminated if DALL·E 2 was only trained to generate images in artistic styles like line drawings, cartoons, and watercolour. These would enable fun and creative experiments that aren’t competing with reality. And I believe this would better preserve OpenAI’s goal of empowering people to express themselves creatively.

I would love to hear your thoughts about this in the comments 👇

Deep dive

If you enjoyed today’s story, I’ve compiled some additional links to satisfy your curiosity:

Previous episodes you might enjoy

⚛️ Fusion Power: the decades-long dream of unlimited energy

🌞 Solar Geoengineering: is the cure worse than the disease?

🦴 Ossiform’s 3D-printed bone implants

You can also check out all previous Year 2049 editions in chronological order to learn about other impactful innovations shaping our future across all aspects of life.

How would you rate this week's edition?

Boring | Okay | Great

Email me at fawzi@year2049.com with any questions or other feedback.

Mark Starlin

May 8, 2022

AI generated art is a novelty. But I prefer human creativity. I imagine it could be used for good creative things. Unfortunately, humans always find a way to use technology for evil and or selfish gain.

Expand full comment

William Collen

Champing at the bit over here, waiting to get access to DALL-E!

Regarding the unintentional bias you mentioned: I remember the same thing happening with WOMBO back in November of '21; I saw a twitter thread where someone entered "terrorist" as the text prompt and Wombo spat out images with a definite "middle eastern" vibe. It's a good reminder that whatever these AI art generators produce is just a reflection of what's in our own minds. The koala doing a dunk only looks realistic to us because that's what we expect a koala doing a dunk to look like. But what if they gave DALL-E something impossible to visualize, something like "men's fashion 500 years from now?" What would happen then?

Some of the great surrealist artists, people like Max Ernst, Joan Miro, and even our friend Salvador Dali, were able to create images utterly unlike anything ever seen before, yet which still had emotional weight and substance. I haven't yet seen an AI art generator which is able to convey emotions through images, like the surrealist or symbolist painters. It's a fascinating subject, one I'm trying to keep close tabs on. Thanks for writing!

1 reply by Fawzi Ammache

1 more comment...