Deploy Hugging Face models on Google Cloud: directly from Vertex AI

Julien Simon
9 Apr 202405:20

TLDRIn this video, Julian from Hugging Face demonstrates a third method for deploying Hugging Face models on Google Cloud using Vertex AI. Starting directly from the Google Cloud console's Model Garden, he guides viewers through the process of deploying a 'tiny llama' model from the Hugging Face Hub, setting up an endpoint, and testing the model with a sample prompt. The video also shows how to undeploy and delete the model and endpoint, offering a complete tutorial on managing Hugging Face models on Google Cloud.

Takeaways

  • 😀 Julian from Hugging Face demonstrates deploying models on Google Cloud using Vertex AI.
  • 🔗 The video offers a third method for deployment, distinct from using inference endpoints or the Hugging Face Hub.
  • 💻 The process begins directly from the Google Cloud console, specifically the Vertex AI's Model Garden page.
  • 📍 The 'Deploy from Hugging Face Hub' option is highlighted for ease of use.
  • 🦙 The example uses the 'tiny llama' model from the Hugging Face Hub for demonstration.
  • 🌐 The model deployment does not require a token and can be done within the same region as the user's project.
  • ⏱️ Deployment takes a few minutes, during which the presenter pauses the video.
  • 📝 The video shows how to test the deployed model using a prompt and a stop token.
  • 🗑️ It also covers how to undeploy the model and delete the endpoint if desired.
  • 🔄 The model remains in the model registry even after undeployment, requiring re-import if deleted.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is deploying Hugging Face models on Google Cloud using Vertex AI, specifically starting directly from the Vertex AI Model Garden.

  • Who is the presenter of the video?

    -The presenter of the video is Julian from Hugging Face.

  • What are the three ways to deploy Hugging Face models on Google Cloud mentioned in the video?

    -The three ways are: 1) Using inference endpoints from the model page on the Hugging Face Hub, 2) Using Vertex AI and managing infrastructure, and 3) Directly from Vertex AI referencing models on the Hub.

  • What is the 'Model Garden' mentioned in the video?

    -The 'Model Garden' is a page on Google Cloud's Vertex AI where you can deploy models, including those from the Hugging Face Hub.

  • How does the presenter suggest testing the deployed model?

    -The presenter suggests testing the deployed model by providing a prompt in the format required for the specific model, such as a question for a chat model, and using a stop token.

  • What is the significance of the 'stop token' in the context of the video?

    -The 'stop token' is used to indicate the end of the input when testing the deployed model, ensuring that the model knows when to stop generating a response.

  • How does the presenter demonstrate the deployment process?

    -The presenter demonstrates the deployment process by selecting a model from the Hugging Face Hub, filling in the necessary details such as region, model name, and endpoint name, and then clicking on 'deploy'.

  • What is the purpose of the 'deploy from Hugging Face Hub' option in the Model Garden?

    -The 'deploy from Hugging Face Hub' option allows users to deploy models directly from the Hugging Face Hub to Google Cloud's Vertex AI without needing to visit the Hub separately.

  • Why might the presenter suggest deleting the deployed model and endpoint?

    -The presenter suggests deleting the deployed model and endpoint to avoid unnecessary costs and to demonstrate how to clean up resources after testing or when they are no longer needed.

  • What is the difference between undeploying a model and deleting an endpoint according to the video?

    -Undeploying a model removes the model from the endpoint but does not delete the endpoint itself. Deleting an endpoint is a separate operation that completely removes the endpoint from Google Cloud.

  • How does the presenter ensure that viewers can follow along?

    -The presenter ensures viewers can follow along by providing clear instructions, pausing the video for operations that take time, and showing each step of the process.

Outlines

00:00

😀 Deploying Hugging Face Models on Google Cloud

Julian from HuggingFace demonstrates a third method of deploying Hugging Face models to Google Cloud, starting directly from Vertex AI and referring to models on the Hugging Face Hub. He guides viewers through the process of deploying a model from the Google Cloud console, specifically from the Model Garden page, by selecting a model and filling in deployment details. Julian also shows how to test the deployed model and emphasizes the importance of subscribing and enabling notifications for future content. The video concludes with a brief mention of deleting the deployed model and endpoint, ensuring viewers know how to clean up resources after deployment.

05:01

👍 Wrapping Up the Deployment Tutorial

In the concluding paragraph, Julian expresses his hope that the tutorial was found useful and educational, focusing on deploying models on Google Cloud. He encourages viewers to show their appreciation by giving the video a thumbs up and subscribing to the channel for more content. Julian's closing remarks are a call to action for viewers to continue learning and engaging with the content, promising more informative videos in the future.

Mindmap

Keywords

💡Hugging Face

Hugging Face is a company that specializes in natural language processing (NLP) and provides a platform for developers to build, train, and deploy NLP models. In the context of the video, Hugging Face is the source of the AI models that are being deployed on Google Cloud.

💡Google Cloud

Google Cloud is a suite of cloud computing services that run on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, file storage, and Google Drive. The video discusses deploying Hugging Face models on Google Cloud, specifically using Vertex AI.

💡Vertex AI

Vertex AI is a managed service on Google Cloud that enables developers to build, deploy, and manage machine learning models. It simplifies the process of AI model deployment and is the main platform used in the video to deploy models from Hugging Face.

💡Model Garden

The Model Garden is a feature within Google Cloud's Vertex AI that allows users to deploy models directly from the platform. In the video, the presenter uses the Model Garden to deploy a Hugging Face model without having to visit the Hugging Face Hub.

💡Deploy

To deploy a model in the context of the video means to make it accessible and executable on a cloud platform, such as Google Cloud. The process involves setting up the necessary infrastructure and configurations to enable the model to process data and provide predictions.

💡Inference Endpoints

Inference endpoints are access points on a cloud platform that allow users to send data to a deployed model and receive predictions. The video discusses deploying Hugging Face models to Google Cloud using inference endpoints as one of the methods.

💡Autocomplete

Autocomplete is a feature that suggests possible completions of a word or phrase as the user types. In the video, the presenter uses the autocomplete feature to quickly select a model from the Hugging Face Hub when deploying it on Google Cloud.

💡Instance

In cloud computing, an instance refers to a virtual machine or a container that runs an application or service. The video mentions selecting an instance type when deploying a model, which affects the resources allocated to the model.

💡Token

A token in this context refers to an access token or API key that is used to authenticate requests to a service. The video mentions that a token is not needed when deploying models directly from Vertex AI, simplifying the deployment process.

💡Undeploy

Undeploying a model means to remove it from the cloud platform, making it inaccessible for inference. The video shows how to undeploy a model from Google Cloud, which is a necessary step before deleting the model from the registry if desired.

💡Model Registry

A model registry is a repository where machine learning models are stored, versioned, and managed. In the video, the presenter mentions the model registry in the context of managing and deleting models after they have been undeployed.

Highlights

Julian from Hugging Face demonstrates deploying models on Google Cloud using Vertex AI.

This is the third method for deploying Hugging Face models on Google Cloud.

Deployment starts directly from the Google Cloud console, specifically the Vertex AI Model Garden page.

The 'Deploy from Hugging Face Hub' option is a direct way to deploy models.

Auto-complete feature helps in selecting the model from the Hugging Face Hub.

The 'tiny llama' model is selected for demonstration.

No need for a token when deploying from Vertex AI.

Details such as model name and endpoint name need to be filled in for deployment.

Deployment takes a few minutes, as shown in the paused video.

The model Garden view shows a curated list of models available for deployment.

After deployment, the endpoint is tested with a sample prompt.

The 'llama' model is used for a chatbot example, generating a response to a question.

A demonstration of how to undeploy a model from the endpoint is provided.

Undeploying removes the model from the endpoint but does not delete the endpoint itself.

Endpoints can be deleted separately from the model registry.

Deleting a model from the registry requires re-importing it for future deployments.

Three different methods for deploying Hugging Face models on Google Cloud are summarized.

Julian encourages viewers to give the video a thumbs up and subscribe for more content.