NEW GPT-4o: My Mind is Blown.

Joshua Chang
13 May 202406:28

TLDROpen AI has unveiled Chat GPT 40, a significant upgrade from GPT 4, offering twice the speed and capabilities at no cost. The new model includes features like Vision for image analysis, real-time data browsing, personalized memory, and complex data analysis. The most notable enhancements are the quick response times, averaging 320 milliseconds, and the expressive, emotive voice feature that can read, sing, and even tell dramatic bedtime stories. Additionally, a new desktop app allows for text and speech input, image uploads, and screen sharing for interactive assistance, promising to boost productivity and research.

Takeaways

  • 🆕 OpenAI has announced a new model called GPT-40, which is faster and more capable than GPT-4.
  • 🆓 GPT-40 is now free to use, whereas GPT-4 previously required a $20 monthly subscription.
  • 🔍 GPT-40 includes features like Vision for image analysis, Browse for real-time internet data, and Memory for personalized responses.
  • 💬 The voice feature in GPT-40 is significantly improved, with quicker response times and more expressive speech.
  • 📈 GPT-40's response times are as fast as 232 milliseconds, which is close to the average human response rate.
  • 🎭 The new model can change its tone and expressiveness, including dramatic and robotic voices, and even sing.
  • 👀 A new subset of Vision allows the model to analyze objects in real-time using a camera.
  • 🖥️ OpenAI introduced a desktop app for GPT-40, enabling text, speech, and image inputs, as well as screen sharing for analysis.
  • 📊 The desktop app can assist with tasks like graph analysis, making it a potential productivity tool for research and computer-based work.
  • 🔠 The 'O' in GPT-40 signifies the integration of multimodal inputs (text, speech, vision) into a single neural network, enhancing the model's understanding and response.
  • 🔍 The update aims to capture more emotional and tonal information compared to previous models that transcribed voice to text, losing some nuances.

Q & A

  • What is the new model announced by Open AI called, and what are its main improvements?

    -The new model announced by Open AI is called GPT-40. Its main improvements include being 2 times faster and more capable than GPT-4, with features like Vision for image analysis, Browse for real-time internet data, Memory for remembering user facts, and analyzing complex data like Excel spreadsheets.

  • Is GPT-40 available for free use?

    -Yes, GPT-40 is now available for free use, which was previously a paid subscription at $20 per month for GPT-4.

  • What new features were demonstrated in the presentation of GPT-40?

    -The presentation demonstrated new features such as quick response times, the ability to interrupt the conversation by speaking, expressive and energetic voice responses, and the capability to change its tone and even sing.

  • How fast are the response times for GPT-40?

    -GPT-40 has response times as quick as 232 milliseconds with an average of 320 milliseconds, which is comparable to the average human response rate in a conversation.

  • What is the significance of the 'O' in GPT-40?

    -The 'O' in GPT-40 signifies that it takes multimodal inputs—text, speech, and vision—all into the same neural network, processing them together rather than separately as in previous models.

  • What is the new desktop app feature announced for GPT-40?

    -The new desktop app for GPT-40 allows users to input text and speech, upload images, and even share their screen to ask questions about what they are looking at, enhancing productivity and research capabilities.

  • How does the new model handle voice inputs differently from the previous models?

    -Unlike previous models that transcribed voice inputs into text, GPT-40 processes voice inputs directly, capturing emotions and tones that would be lost in text transcription.

  • What is the capability of GPT-40's Vision feature in real-time?

    -GPT-40's Vision feature allows users to point their camera at something and ask questions about it in real-time, effectively giving the AI 'eyes' to see and analyze the environment.

  • Can GPT-40 read bedtime stories with different tones and expressiveness?

    -Yes, GPT-40 can read bedtime stories with various tones and levels of expressiveness, as demonstrated in the presentation where it read a story dramatically and in a robotic voice.

  • What is the potential impact of GPT-40's new features on productivity and research?

    -The new features of GPT-40, especially the desktop app with screen sharing capabilities, could significantly enhance productivity and research by providing immediate analysis and insights on various tasks and data.

  • How does the expressiveness of GPT-40's voice compare to previous models?

    -GPT-40's voice is more expressive and energetic than previous models, which might feel like talking to an overly caffeinated friend. This could be seen as a step towards more human-like interactions, but also raises questions about the appropriate tone for an AI assistant.

Outlines

00:00

🚀 Launch of Open AI's Chat GPT 40 with Enhanced Features

Josh introduces the new Chat GPT 40 by Open AI, emphasizing its improved speed and capabilities over its predecessor, GPT 4. The model is now free, a change from the previous $20 monthly subscription. GPT 40 retains features like Vision for image analysis, Browse for real-time internet data, memory for personalized responses, and complex data analysis. The update will be rolled out in the coming weeks. The presentation's highlight was a demo showcasing the model's quick response times, average of 320 milliseconds, and the ability to handle interruptions naturally. The voice feature has been significantly updated to be more expressive and emotive, with the option to adjust the tone and even sing.

05:00

🔍 Multimodal Integration and New Desktop App in GPT 40

The second paragraph delves into the 'O' in GPT 40, signifying the integration of multimodal inputs—text, speech, and vision—into a single neural network, as opposed to previous models that processed voice inputs by transcribing them to text, losing emotional and tonal information. This update allows for a more natural and rich interaction. Additionally, Open AI announced a new desktop app that supports text and speech inputs, image uploads, and screen sharing, which can be utilized for increased productivity and research. The app's potential for assisting with tasks on the computer is highlighted, along with the model's ability to analyze and respond to on-screen content in real-time.

Mindmap

Keywords

💡GPT-40

GPT-40 refers to the hypothetical next-generation AI model discussed in the video, which is claimed to be twice as fast and more capable than its predecessor, GPT-4. It is central to the video's theme as it represents a significant leap in AI capabilities, including faster response times and enhanced features such as voice interaction. The script mentions GPT-40's ability to process multimodal inputs, indicating a more integrated approach to understanding and responding to user queries.

💡Multimodal inputs

Multimodal inputs encompass various forms of input an AI can process, such as text, speech, and images. In the context of the video, the new GPT-40 model is highlighted for its ability to handle these different types of inputs within the same neural network, which is a significant advancement over previous models that processed them separately. This integration allows for a more nuanced understanding of user intent and context.

💡Voice feature

The voice feature in the video script refers to the AI's capability to interact with users through spoken language. The GPT-40 model is said to have improved this feature, with faster response times and more expressive, emotive speech. The script provides examples of the AI responding to questions and telling a story with a dramatic, robotic voice, showcasing the versatility of its voice interactions.

💡Response time

Response time in the video is used to describe the speed at which the AI model GPT-40 can process and reply to user inputs. The script emphasizes the impressively quick response times of GPT-40, comparing them to the average human conversation rate, which is a key aspect of the model's performance and user experience.

💡Bedtime story

A bedtime story in the script is an example of the creative and interactive capabilities of the GPT-40 model. The AI is asked to tell a story about robots and love, demonstrating its ability to generate narrative content in a conversational and engaging manner. This use of the AI shows its potential for entertainment and educational purposes.

💡Emotion

Emotion in the context of the video refers to the expressiveness and emotional tone that the AI can convey through its voice feature. The GPT-40 model is noted for having an 'overly energized' and emotive voice, which is a new aspect of its interaction capabilities. The script discusses the potential for customization of the AI's emotional tone, indicating a more personalized user experience.

💡Vision

The vision feature mentioned in the script is the AI's ability to process and understand images. The GPT-40 model is said to have advanced this capability, allowing users to upload images or use their camera to ask questions about what they see in real-time. This feature expands the AI's functionality beyond text and voice, making it a more comprehensive assistant.

💡Desktop app

The desktop app discussed in the video is a new application for using the GPT-40 model. It allows for text and speech input, image uploads, and screen sharing, enabling the AI to analyze and respond to content on the user's screen. This feature is highlighted as a significant productivity tool, with potential applications in research and other computer-based tasks.

💡Productivity

Productivity in the video refers to the potential increase in efficiency and effectiveness that the GPT-40 model and its features, such as the desktop app and screen sharing, can bring to users. The script suggests that these features can help users work more effectively, particularly those who spend a lot of time on their computers.

💡Omni model

The term 'Omni model' in the script refers to the GPT-40's approach to processing inputs from various modalities simultaneously within the same neural network. This is contrasted with previous models that handled text, speech, and images separately. The Omni model is designed to provide a more integrated and comprehensive understanding of user inputs, enhancing the AI's overall performance.

Highlights

Open AI has announced Chat GPT 40, a new flagship model that is 2 times faster and more capable than GPT 4.

GPT 40 will be free to use, a change from GPT 4's previous $20 monthly subscription.

GPT 40 retains features like Vision, Browse, Memory, and complex data analysis from GPT 4.

New features of GPT 40 will be rolled out in the coming weeks.

GPT 40 demonstrated impressive capabilities in a presentation, including answering questions and reading stories.

Response times for GPT 40 are as quick as 232 milliseconds, average 320 milliseconds, comparable to human conversational response rates.

Users can interrupt GPT 40's conversation by simply speaking, an intuitive feature.

The expressiveness and energy of GPT 40's voice have been significantly enhanced.

GPT 40 can change its tone, including dramatic and robotic voices, and even sing.

A new feature allows GPT 40 to use a camera for real-time questions about objects in view.

A new desktop app for GPT 40 includes text, speech input, image upload, and screen sharing capabilities.

The desktop app can analyze and interact with content on the user's screen, enhancing productivity and research.

The 'O' in GPT 40 signifies the integration of multimodal inputs into a single neural network, improving response quality.

Previously, voice inputs were transcribed to text, losing emotional and tonal information.

The Omni model of GPT 40 considers all input modalities for a more comprehensive response.

Open AI's update to GPT 40 is seen as a significant advancement in AI capabilities.

The video creator expresses curiosity about Google's upcoming AI announcements.