Creating Talking Avatars: Step-by-Step Guide

Dorlita Blakely
15 Dec 202310:58

TLDRIn this engaging video, Grace shares a step-by-step guide on creating and animating a talking avatar. She begins with generating an avatar image using chat GPT and Doll E3, then refines the image in Canva to add details. Grace demonstrates how to create a script for the avatar's voice using chat GPT and then uses 11 Labs to generate the voice. She clones her own voice for a personalized touch. The final step involves uploading the avatar and audio into D-ID to create the video, which she then customizes and shares. The video concludes with a serene nighttime affirmation, emphasizing peace and gratitude.


  • ๐ŸŽจ Start with a prompt in Chat GPT to create the initial avatar image.
  • ๐Ÿ–Œ๏ธ Choose and download the desired avatar image for further editing.
  • โœ‚๏ธ Use Canva to refine the avatar's appearance and add details.
  • ๐Ÿ“ Create a 16:9 video canvas size in Canva for the final presentation.
  • ๐Ÿ” Utilize the 'Magic Expand' tool in Canva to enlarge and center the avatar.
  • ๐Ÿ–ฅ๏ธ Select and download the preferred avatar option as a PNG file.
  • ๐Ÿ“น Upload the refined avatar image to a video editing platform like D-ID.
  • ๐Ÿ—ฃ๏ธ Generate a script for the avatar's voiceover using Chat GPT.
  • ๐Ÿ”Š Use a voice generation tool like 11 Labs to create a custom voice for the avatar.
  • ๐ŸŽง Download the generated voice audio to be used in the video.
  • ๐Ÿ“ Import the audio into Canva and synchronize it with the avatar.
  • ๐ŸŒŸ Add a personal touch by overlaying a logo to cover any watermarks.
  • ๐ŸŒŒ Customize the video to fit different social media formats like reels or squares.
  • ๐Ÿ“š The script includes a nighttime affirmation to promote peace and gratitude.

Q & A

  • What is the first step in creating a talking avatar according to the guide?

    -The first step is to start in Chat GPT using DALL-E 3, where you paste your prompt to create your avatar.

  • Which image does Grace choose from the generated options?

    -Grace chooses the second image from the generated options.

  • What tool is used to expand the avatar image in the video?

    -The 'magic expand' tool is used to make the avatar image larger and more centered in the video.

  • What is the recommended file format for downloading the image?

    -The recommended file format for downloading the image is PNG.

  • How does Grace create the script for the avatar's voice?

    -Grace asks Chat GPT for a script for what she wants her avatar to say, which is nighttime affirmations in this case.

  • What platform does Grace use to generate the voice for the avatar?

    -Grace uses 11 Labs to generate the voice for the avatar.

  • How does Grace train the AI to speak in her voice?

    -Grace trains the AI by uploading audio from a podcast that has her voice to the voice lab in 11 Labs.

  • What is the cost of creating the video with the audio length provided in the script?

    -It takes eight credits from Grace's account to create the video due to the length of the audio.

  • How does Grace cover the D Watermark on the final video?

    -Grace covers the D Watermark by placing her logo over it in the video.

  • What affirmations does the avatar speak in the video?

    -The avatar speaks nighttime affirmations that focus on peace, gratitude, and a connection to the universe.

  • How does Grace plan to use the final video with the talking avatar?

    -Grace plans to download the video and upload it to various platforms, potentially resizing it for different formats like a reel or a square video.

  • What is the purpose of the avatar speaking the nighttime affirmations?

    -The purpose is to provide a soothing and positive message that can help with relaxation and reflection before sleep.



๐Ÿ˜€ Avatar Creation Process

Grace, the host, introduces the video's purpose: guiding viewers through the creation of a talk avatar using chat GPT and doll E3. She outlines the steps from generating the image with a prompt, choosing an avatar, and refining it in Canva. The process includes expanding the image for a video canvas, adding text, and assembling the final video. She also discusses uploading the avatar to a D account and using 11 Labs to create a voice script for the avatar.


๐ŸŽ™๏ธ Voice Cloning and Video Finalization

The second paragraph details the process of using 11 Labs to clone Grace's voice for the avatar. She explains uploading a podcast audio to train the AI and mentions alternative voice options. After generating the voice, she downloads the audio and incorporates it into the video using Canva. She also discusses the cost associated with video creation and how she covers the D watermark with her logo. The paragraph concludes with the finalization of the video and its potential uses, such as resizing for different platforms.


๐ŸŒ™ Nighttime Affirmations Script

The final paragraph presents a script for nighttime affirmations that the avatar will speak. It is a reflective and calming piece that emphasizes gratitude, peace, and a connection to the universe. The affirmations aim to instill tranquility, promote healing, and encourage a positive outlook towards sleep and the new day. The script is meant to be a beautiful journey of rest and renewal, filled with gratitude and love.



๐Ÿ’กTalking Avatars

Talking avatars are digital representations of a person or character that can simulate conversation through pre-recorded audio or AI-generated speech. In the video, Grace guides viewers through the process of creating a talking avatar using various software tools, which is central to the video's theme of bringing digital characters to life for content creation.

๐Ÿ’กChat GPT

Chat GPT refers to a type of AI language model that can generate human-like text based on a given prompt. In the context of the video, Grace uses Chat GPT to create a script for her avatar, which is a crucial step in the process of giving the avatar a voice and making it interactive.


Canva is a graphic design platform used for creating visual content such as social media graphics, presentations, and videos. Grace uses Canva to edit and enhance the image of her avatar, making it larger and more focused for the video, demonstrating how Canva can be utilized in the avatar creation process.

๐Ÿ’กMagic Expand

The term 'Magic Expand' refers to a feature within graphic design software that allows users to increase the size of an image or element without losing quality. In the video, Grace uses this feature in Canva to make her avatar larger and more central to the video composition.

๐Ÿ’กDoll E3

Doll E3 is a software or tool implied to be used for creating or editing avatar images. Although not explicitly detailed in the transcript, it is suggested that Grace uses Doll E3 as part of her avatar creation workflow, indicating a specialized tool for this purpose.

๐Ÿ’ก11 Labs

11 Labs is a platform mentioned in the video that is used to generate voiceovers for the avatar. Grace uses her own voice, cloned through 11 Labs, to give life to the avatar, highlighting the personalization aspect of the content creation process.

๐Ÿ’กNighttime Affirmations

Nighttime affirmations are positive statements that are typically recited before sleep to encourage a peaceful and restful night. In the video, the avatar is programmed to speak these affirmations, which ties into the overall theme of the video about creating a serene and positive experience for the viewer.

๐Ÿ’กPNG Format

PNG is a file format used for storing images with transparent backgrounds. Grace specifies downloading images in PNG format for use in her video, which is important for maintaining the quality and flexibility of the images when layered or used in different contexts.

๐Ÿ’กD Watermark

A watermark is a semi-transparent logo or mark placed on a video or image to indicate copyright or branding. Grace mentions the D Watermark, which she plans to cover with her own logo, showing an awareness of branding and copyright in the context of digital content creation.


Serenity refers to a state of peace and calmness. The video's script includes a nighttime affirmation that speaks of embracing tranquility, which is a key theme in the video, aiming to provide a soothing and peaceful message to the viewers.


Reels are short video formats, typically used on social media platforms like Instagram. Grace mentions resizing her video for a reel, indicating the adaptability of the content she creates for different social media formats and the importance of platform-specific content creation.


Grace guides viewers through the process of creating and animating talking avatars.

The avatar creation starts in Chat GPT using Doll E3.

A prompt is used to generate the initial avatar image.

The chosen avatar image is downloaded and imported into Canva for further editing.

Canva is used to adjust the avatar's size and center it within the video canvas.

The text for the avatar is created and integrated into the video.

Option three is selected for the avatar's final look.

The image is downloaded in PNG format for use in the avatar's digital account.

A new video is created and the avatar image is added.

11 Labs is used to generate the voice for the avatar.

A script for nighttime affirmations is requested from Chat GPT.

The user's voice is cloned in 11 Labs to be used for the avatar.

The generated voice is downloaded and prepared for video integration.

Canva is used to connect to the avatar account and upload the audio.

The video is generated with the avatar speaking the affirmations.

The DID watermark is covered with the user's logo in the final video.

The final video can be resized and uploaded to various platforms.

The avatar speaks a script of nighttime affirmations to promote peace and gratitude.