Deploy Hugging Face models on Google Cloud: directly from Vertex AI
TLDRIn this video, Julian from Hugging Face demonstrates a third method for deploying Hugging Face models on Google Cloud using Vertex AI. Starting directly from the Google Cloud console's Model Garden, he guides viewers through the process of deploying a 'tiny llama' model from the Hugging Face Hub, setting up an endpoint, and testing the model with a sample prompt. The video also shows how to undeploy and delete the model and endpoint, offering a complete tutorial on managing Hugging Face models on Google Cloud.
Takeaways
- 😀 Julian from Hugging Face demonstrates deploying models on Google Cloud using Vertex AI.
- 🔗 The video offers a third method for deployment, distinct from using inference endpoints or the Hugging Face Hub.
- 💻 The process begins directly from the Google Cloud console, specifically the Vertex AI's Model Garden page.
- 📍 The 'Deploy from Hugging Face Hub' option is highlighted for ease of use.
- 🦙 The example uses the 'tiny llama' model from the Hugging Face Hub for demonstration.
- 🌐 The model deployment does not require a token and can be done within the same region as the user's project.
- ⏱️ Deployment takes a few minutes, during which the presenter pauses the video.
- 📝 The video shows how to test the deployed model using a prompt and a stop token.
- 🗑️ It also covers how to undeploy the model and delete the endpoint if desired.
- 🔄 The model remains in the model registry even after undeployment, requiring re-import if deleted.
Q & A
What is the main topic of the video?
-The main topic of the video is deploying Hugging Face models on Google Cloud using Vertex AI, specifically starting directly from the Vertex AI Model Garden.
Who is the presenter of the video?
-The presenter of the video is Julian from Hugging Face.
What are the three ways to deploy Hugging Face models on Google Cloud mentioned in the video?
-The three ways are: 1) Using inference endpoints from the model page on the Hugging Face Hub, 2) Using Vertex AI and managing infrastructure, and 3) Directly from Vertex AI referencing models on the Hub.
What is the 'Model Garden' mentioned in the video?
-The 'Model Garden' is a page on Google Cloud's Vertex AI where you can deploy models, including those from the Hugging Face Hub.
How does the presenter suggest testing the deployed model?
-The presenter suggests testing the deployed model by providing a prompt in the format required for the specific model, such as a question for a chat model, and using a stop token.
What is the significance of the 'stop token' in the context of the video?
-The 'stop token' is used to indicate the end of the input when testing the deployed model, ensuring that the model knows when to stop generating a response.
How does the presenter demonstrate the deployment process?
-The presenter demonstrates the deployment process by selecting a model from the Hugging Face Hub, filling in the necessary details such as region, model name, and endpoint name, and then clicking on 'deploy'.
What is the purpose of the 'deploy from Hugging Face Hub' option in the Model Garden?
-The 'deploy from Hugging Face Hub' option allows users to deploy models directly from the Hugging Face Hub to Google Cloud's Vertex AI without needing to visit the Hub separately.
Why might the presenter suggest deleting the deployed model and endpoint?
-The presenter suggests deleting the deployed model and endpoint to avoid unnecessary costs and to demonstrate how to clean up resources after testing or when they are no longer needed.
What is the difference between undeploying a model and deleting an endpoint according to the video?
-Undeploying a model removes the model from the endpoint but does not delete the endpoint itself. Deleting an endpoint is a separate operation that completely removes the endpoint from Google Cloud.
How does the presenter ensure that viewers can follow along?
-The presenter ensures viewers can follow along by providing clear instructions, pausing the video for operations that take time, and showing each step of the process.
Outlines
😀 Deploying Hugging Face Models on Google Cloud
Julian from HuggingFace demonstrates a third method of deploying Hugging Face models to Google Cloud, starting directly from Vertex AI and referring to models on the Hugging Face Hub. He guides viewers through the process of deploying a model from the Google Cloud console, specifically from the Model Garden page, by selecting a model and filling in deployment details. Julian also shows how to test the deployed model and emphasizes the importance of subscribing and enabling notifications for future content. The video concludes with a brief mention of deleting the deployed model and endpoint, ensuring viewers know how to clean up resources after deployment.
👍 Wrapping Up the Deployment Tutorial
In the concluding paragraph, Julian expresses his hope that the tutorial was found useful and educational, focusing on deploying models on Google Cloud. He encourages viewers to show their appreciation by giving the video a thumbs up and subscribing to the channel for more content. Julian's closing remarks are a call to action for viewers to continue learning and engaging with the content, promising more informative videos in the future.
Mindmap
Keywords
💡Hugging Face
💡Google Cloud
💡Vertex AI
💡Model Garden
💡Deploy
💡Inference Endpoints
💡Autocomplete
💡Instance
💡Token
💡Undeploy
💡Model Registry
Highlights
Julian from Hugging Face demonstrates deploying models on Google Cloud using Vertex AI.
This is the third method for deploying Hugging Face models on Google Cloud.
Deployment starts directly from the Google Cloud console, specifically the Vertex AI Model Garden page.
The 'Deploy from Hugging Face Hub' option is a direct way to deploy models.
Auto-complete feature helps in selecting the model from the Hugging Face Hub.
The 'tiny llama' model is selected for demonstration.
No need for a token when deploying from Vertex AI.
Details such as model name and endpoint name need to be filled in for deployment.
Deployment takes a few minutes, as shown in the paused video.
The model Garden view shows a curated list of models available for deployment.
After deployment, the endpoint is tested with a sample prompt.
The 'llama' model is used for a chatbot example, generating a response to a question.
A demonstration of how to undeploy a model from the endpoint is provided.
Undeploying removes the model from the endpoint but does not delete the endpoint itself.
Endpoints can be deleted separately from the model registry.
Deleting a model from the registry requires re-importing it for future deployments.
Three different methods for deploying Hugging Face models on Google Cloud are summarized.
Julian encourages viewers to give the video a thumbs up and subscribe for more content.