Dall-E 3, Sora, & ChatGPT Plus: Stable Audio vs Suno v3 & New Video Generator!

Theoretically Media
4 Apr 202411:16

TLDRIn this week's AI news, OpenAI introduces in-painting for Dolly 3, despite its delayed implementation. Stability AI releases Stable Audio 2.0, offering free music generation, though it lags behind Sunno in quality. Chad GPT 3.5 becomes accessible without login, and Sora releases its first music video, 'World Weight.' Additionally, Anna Portrait emerges as a promising animator, and HiFi, a new video generator, is on the horizon.

Takeaways

  • 🚀 OpenAI has introduced 'in paint' feature in Dolly 3, albeit later than expected.
  • 🎨 Users can now edit Dolly 3 generated images directly, but the process is not as intuitive as expected.
  • 🍞 An example given in the script is adding butter to a piece of toast generated by Dolly 3, which results in an image with an excessive amount of butter.
  • 💬 Stability AI released Stable Audio 2.0, capable of creating full musical tracks up to 3 minutes from a single prompt and offering 20 free credits per month.
  • 🎵 Sunno, another AI music generator, is considered superior in terms of audio quality and instrumentation, and also allows for singing and the use of audio references.
  • 🆓 OpenAI now offers free access to Chat GPT 3.5 without the need for login, providing a still-capable model for public use.
  • 🎶 A news article was used to generate an 8-line poem by Chat GPT, showcasing its creative capabilities.
  • 📹 The first music video created with Sora, 'World Weight' by August Camp, has been released, featuring a consistent aesthetic and ambient electronic track.
  • 🌟 Sora's capabilities are compared to Hyper, a free tool that can generate similar outputs with additional features like overlays and textures.
  • 🎭 Anna Portrait is a new tool inspired by emotive Avatar, using a reference photo and video to generate high-quality character animations.
  • 🔜 HiFi, a new video generator led by Alex Masharov, is in beta with a focus on improving video editing and character modification.

Q & A

  • What new feature has been added to Dolly 3 that was long overdue?

    -The new feature added to Dolly 3 is the in-painting capability, which allows users to edit images by adding or changing elements within the photo directly.

  • What was the speaker's initial impression of Dolly 3's output aesthetic?

    -The speaker was not a huge fan of Dolly 3's output from an aesthetic standpoint, as they personally did not resonate with it as much as they expected.

  • How does the integration of Dolly 3 with chat GPT affect the user experience?

    -The integration of Dolly 3 with chat GPT allows users to chat with their image generator, providing a more interactive and dynamic experience when creating and editing images.

  • What is the limitation when trying to add a small amount of butter to the toast in Dolly 3?

    -When trying to add a small amount of butter to the toast in Dolly 3, the output image shows the toast with an excessive amount of butter, which is not as intuitive or controllable as one might expect.

  • What is the unique feature of Stable Audio 2.0 compared to other AI-generated music platforms?

    -Stable Audio 2.0's unique feature is that it allows users to add their own audio as a reference for the AI to create music, offering a more personalized and creative output.

  • How does the new video model, Sora, differ from other AI video generators like Hyper?

    -Sora focuses on creating music videos with a distinctive aesthetic, including long tracking shots and vintage film looks. However, it's noted that similar results can be achieved with other platforms like Hyper when combined with additional elements.

  • What is the significance of the emotive Avatar talker and the new Anna portrait technology?

    -The emotive Avatar talker and Anna portrait technology are significant as they improve upon the traditional bobblehead Avatar lip-sync look by using reference videos to create more natural and emotive character animations.

  • What is HiFi, the new video generator mentioned in the script, planning to offer?

    -HiFi plans to offer an improved video editor that will enable users to modify characters and objects in videos and train a more powerful video generation model, aiming to enhance the quality and control over AI-generated videos.

  • What is the speaker's upcoming event related to AI filmmaking?

    -The speaker will be attending the Curious Refuge AI filmmaking Mega party on April 15th, where they will be judging the world's first AI Esports tournament alongside other notable figures.

  • What does the speaker suggest as an alternative to Sora for creating similar video outputs?

    -The speaker suggests using Hyper, a free platform that has upgraded its model to generate up to 4 seconds of video, as an alternative to Sora for creating similar video outputs.

  • How does the script highlight the rapid advancements in AI technology?

    -The script highlights the rapid advancements in AI technology by discussing new features and updates across various platforms, such as Dolly 3's in-painting, Stable Audio 2.0's music generation, and the upcoming HiFi video generator.

Outlines

00:00

🚀 Open AI Updates and New Tools

This paragraph discusses recent updates in the AI field, focusing on Open AI's new features and tools. It highlights the long-awaited in-painting feature in Dolly 3, which allows users to edit images directly. The speaker shares their mixed feelings about Dolly 3's aesthetics and functionality, especially regarding its interaction with Chat GPT. The paragraph also touches on the new audio update from Stability AI, which generates full musical tracks from a single prompt, and compares it with the capabilities of Sunno, another AI music generation platform. The speaker provides a demo of the generated music and discusses the advantages of Stability AI's audio reference feature.

05:01

🎶 AI Music Generation and Sora News

The second paragraph delves into the world of AI-generated music, discussing the latest updates from Stability AI's Stable Audio 2.0 and its comparison to Sunno's music generation capabilities. It also introduces the new feature of adding singing to Sunno's output. The speaker then shifts focus to Sora, discussing the first music video created with this tool. The video, 'World Weight' by August Camp, is described along with the speaker's thoughts on the visual aesthetics and the potential for creative use of the tool. The paragraph ends with a brief mention of the speaker's upcoming events and a comparison between Sora and Hyper, another free AI tool.

10:02

🎨 New Portrait Animators and Upcoming Video Model

This paragraph introduces Anna Portrait, a new tool inspired by Emotive Avatar Talker, which uses a combination of reference photos and videos to create realistic animations. The speaker describes a use case from Visible Maker, where a character created in Mid Journey was upscaled and had its voice generated by 11 Labs. The paragraph concludes with news about an upcoming video generation platform called HiFi, led by Alex Masharov, former head of AI at Snap. HiFi aims to improve video editing by allowing modifications to characters and objects and training a more powerful video generation model. The speaker expresses enthusiasm for HiFi's lean approach and provides information on how to sign up for the beta.

Mindmap

Keywords

💡AI news

AI news refers to the latest developments and updates in the field of Artificial Intelligence. In the context of the video, it is the central theme that ties together various topics such as software updates, new AI tools, and industry trends. The script mentions a slow week in AI news, indicating that even during quieter periods, significant advancements are being made.

💡Dolly 3

Dolly 3 is an AI-based image generator that is integrated into the chatbot system mentioned in the video. It is used to create photorealistic images based on user prompts. The video highlights the introduction of 'paint in 3', a feature that allows users to edit and refine the images generated by Dolly 3, improving its utility and interactivity.

💡Stable Audio 2.0

Stable Audio 2.0 is an AI-driven music generation tool that creates full musical tracks based on a single user prompt. It is noted for its ability to produce up to 3-minute long tracks and for providing 20 free credits per month, making it accessible for users to experiment with AI-generated music.

💡Sunno

Sunno is another AI music generation platform mentioned in the video, recognized for its high-quality output and advanced features. It is compared to Stable Audio 2.0, with the video suggesting that Sunno offers superior audio fidelity and more accurate genre representation in its generated music.

💡Chat GPT 3.5

Chat GPT 3.5 is an AI chatbot model that can be used for free without logging in, as mentioned in the video. It represents a more accessible version of AI technology, allowing users to interact with and experience the capabilities of AI chatbots without barriers.

💡Sora

Sora is an AI platform that has been used to create the first music video, as highlighted in the video. It represents the application of AI in creative fields such as music video production, showcasing the technology's potential to generate new forms of media content.

💡Hyper

Hyper is a free AI tool that generates short video clips, as mentioned in the context of comparing it with Sora. It is used to demonstrate the accessibility and capabilities of AI in video generation, suggesting that with the right combination of tools and overlays, users can create unique content.

💡Anna portrait

Anna portrait is an AI-powered tool for creating animated portraits, inspired by the emotive Avatar talker but taking a different approach. It uses a combination of reference photos and videos to generate realistic and expressive character animations.

💡HiFi

HiFi is an upcoming AI video generator with a focus on improving video editing capabilities. It is led by Alex Masharov, the former head of AI at Snap, and operates with a lean team and limited hardware resources. HiFi aims to enhance the customization of characters and objects in videos and develop a more powerful video generation model.

💡AI filmmaking

AI filmmaking refers to the use of Artificial Intelligence in the creation and production of films. This includes the generation of visual content, soundtracks, and even the scripting of stories. The video touches on this concept by discussing various AI tools and platforms that are contributing to the evolution of filmmaking.

💡AI-generated music

AI-generated music is the process of using Artificial Intelligence to create and produce musical compositions. This technology can analyze and learn from existing music to generate new tracks that match specific genres or styles. The video explores the advancements in this area through the comparison of Stable Audio 2.0 and Sunno.

Highlights

Open AI introduces in-painting feature in Dolly 3, a long-awaited update.

Dolly 3's in-painting is not as intuitive as one might expect, requiring manual selection and editing.

Despite personal aesthetic preferences, the new in-painting feature represents a step forward for Dolly 3.

Stability AI's drama is not covered in the transcript, but Stable Audio 2.0 is mentioned, which generates full musical tracks up to 3 minutes long from a single prompt.

Stable Audio 2.0 is free, offering 20 credits per month for users to create music.

Sunno, another AI music generator, is highlighted as the current leader in AI-generated music, surpassing Stable Audio in quality and features.

Stable Audio's unique feature of adding personal audio references for music generation is noted.

Chat GPT 3.5 can now be used for free without logging in, showcasing its capabilities.

The first music video created with Sora, titled 'World Weight' by August Camp, is released.

The visual aesthetics of the 'World Weight' music video are praised, drawing comparisons to the style of Hyper.

A new video model, HiFi, is on the horizon, led by Alex Masharov, the former head of AI at Snap.

HiFi aims to build an improved video editor and a more powerful video generation model.

The Anna portrait animator is introduced, offering a new approach to character animation.

Visible Maker demonstrates a creative workflow combining various AI tools to create a character and voice.

AI technology continues to advance, with a quiet week suggesting an upcoming flood of new developments.

The presenter, Tim, will be attending the NAB show and judging the world's first AI Esports tournament.

A new emotive avatar, Anna portrait, is available, offering a different approach from Emo Talker.

Higgs Field AI is in beta, with a focus on video editing and character modification.