I created my talking AI twin. Here's how to make your own.

A step-by-step guide to creating animated characters using AI tools

Apr 02, 2023

Subscribe to Year 2049 to get weekly analysis, insights, and resources to help you develop a deeper understanding of the technologies shaping the future.

Hey friends 👋

Welcome back to Year 2049! Today, I’m sharing an experiment I recently did, which blew my mind.

I hope you enjoy and try this out yourself. If you do, send me the results!

Can I replicate myself using AI?

It feels like we’re in the wild west of AI. Every day, my inbox and social media feeds are flooded with the latest AI tools that can seemingly do amazing things.

I wanted to put them to the ultimate test: can they help me create an “AI twin” that looks and talks like me?

It ended up being much easier than expected. I’ll show you the result and then walk you through what I did.

Pretty impressive. It took less than 2 hours and cost me $40, although you can recreate a decent version of this for free.

Process Overview

To create my AI twin, I had to do a few things:

Create an AI avatar that looks like me.
Create a synthetic version of my voice that reads any text I give it.
Combine the AI avatar and audio to create my talking AI twin.

Step 1: Create your AI avatar

I first created an AI avatar that looks like me using Midjourney. To get the best results, I used the latest version of Midjourney (v5) on their $10/month plan. Even if you don’t want to pay, you can still use the free credits they give you to make a good avatar.

Note: If you’ve never used Midjourney, here’s a quick start guide on how to make an account and use the tool.

I uploaded an image of myself in one of the #newbies channels so that Midjourney can use it as a reference. Let’s call this the reference image for the rest of this tutorial.

I recommend uploading a clear image of yourself looking into the camera.

Next, you’ll need to click on the image, right-click, then “Copy Link”.

Type “/imagine” into the chat box and hit enter to start writing your prompt:

The first thing you need to do is paste your reference image URL to tell Midjourney to use it as a guide.

Now, you must write a full prompt describing the image you want Midjourney to generate. This step requires a lot of trial and error, so don’t expect to get a perfect result on your first try. It took me 6 or 7 generations to get a look I was satisfied with.

The full prompt I ended up using:

(reference image URL) sitting at a desk, wearing a plain black t-shirt, white wall and bookshelf in the background, looking into the camera, facing the camera, holding a microphone, cinematic lighting, unreal engine, photorealistic

Of the four images I got, I found the first to look the most like me (and a hint of Messi? 🧐).

There’s a lot of room to be creative here! Here’s what Midjourney gave me when I asked it to turn me into a Pixar character.

Once you have an AI avatar you’re satisfied with, move to Step 2.

Alternative methods:

If you don’t want to use Midjourney, use other tools like DALL•E 2 or Lensa.
If you don’t want to upload images of your face for this experiment, you can look up existing AI avatars/characters on Lexica. Make sure the character you get is facing the camera and not at an angle. This will be important for the last step.

Step 2: Synthesize your voice

What’s an avatar without a voice?

For this step, I wanted to create a synthetic version of my voice that could read any text I give it while still sounding like me. I discovered Descript’s Overdub feature, which allowed me to do that.

The tool allows you to create a “voice model” of yourself based on the audio recordings you provide it.

Note: This step requires you to make a Descript account and download their desktop app.

In the Descript desktop app, click on “Create new voice” under the “Voices” menu:

Next, you'll need to provide at least 10 minutes of audio recordings that Descript can use to create your voice model. You have two options here:

Drag any existing audio recordings of yourself into the page.
Record yourself reading Descript’s training script.

I used the second option and only recorded myself reading the script for 10 minutes. Descript recommends 30 minutes of audio to train the voice model, but mine still sounded pretty accurate.

Once you upload your audio recordings to Descript, click “Submit training data” in the top right.

After submitting, Descript may take up to 24 hours to generate your voice model. Mine took about 12 hours.

Once it’s ready, go to the “Recent projects” menu and create a new project:

This will create a new document where you can type any text and use your voice model to read it. For this experiment, I wanted my AI twin to introduce himself as if I was meeting him in person.

I used ChatGPT to get a script for a short introduction:

I pasted that introduction into Descript:

Then, I set the speaker to Fawzi:

You’ll need to give Descript a minute or two to generate the audio based on your voice model. Expect to be creeped out like I was.

Note: Descript’s free plan limits the words you can use in your script to this list of 1,000 words and will replace any other words with “jibber jabber”. Because I wanted to get the best results, I temporarily upgraded to their $30/month plan.

When the audio is ready, you can export it by clicking on “Publish” then “Export”:

Once you have the audio file downloaded, go to Step 3.

Alternative methods:

Someone on IG recommended ElevenLabs as an alternative. I’ve never used the tool and can’t speak to its quality, but I’m including it in case you need an alternative.
If you don’t want to synthesize your voice, you can use a regular recording of yourself talking for the last step.

Step 3: Create your talking AI twin

The last step was surprisingly the easiest.

Now that you have your AI avatar and synthetic audio recording of yourself, you’re ready to bring your AI twin to life!

For this step, I used D-ID on their free plan.

After you make an account, go to “Create Video” and upload your AI avatar under “Choose a presenter”:

Then, on the right-hand menu, switch from “Script” to “Audio” and upload the audio recording you exported from Descript:

Click on “Generate Video” and… you’re done! The video will show up in your video library after a few minutes.

Congrats, you’ve officially created your AI twin who looks and sounds like you.

Here’s my AI avatar introducing himself to you.

futurewithfawzi

A post shared by Fawzi Ammache (@futurewithfawzi)

Final thoughts

I was surprised at how straightforward this was to make. I find it somewhat creepy but also amazing.

If there’s one lesson to take away from this is that it’s essential to take audio/video content online with a grain of salt, especially with content involving public and influential figures. Technologies like these can be fun for creative experiments like this but can also be used maliciously.

So if you come across a video or audio recording of a celebrity or politician saying something crazy or inflammatory, remember that these can be easily faked now.

I got these results using one reference image and 10 minutes of audio, so imagine how much more believable it would be with hundreds of reference images and hours of audio.

Show me what you made!

I would love to see what you came up with, so DM me on Instagram or leave a comment to tell me about it.

Sharing is caring

The future is too exciting to keep to yourself.

Share this post in your group chats with friends, family, and coworkers.

How would you rate this Year 2049 edition?

Boring | Okay | Great