Generate LLM Embeddings On Your Local Machine

13 Jan 202413:53

TLDRIn this informative video, viewers learn how to utilize large language models (LLMs) on their local machines to generate embeddings for various data types, such as news article titles or product descriptions. The process involves installing an LLM like 'llama' locally, sending API requests from Python, and using a vector store like 'faiss' to store and perform similarity searches on the embeddings. The tutorial demonstrates embedding a 'Hello World' prompt and then comparing it to a collection of news titles to find the most similar items, which is a valuable technique for building recommender systems. The video concludes with practical advice on filling the database with diverse data to enhance the recommendation engine's accuracy.


  • πŸ˜€ Large language models (LLMs) can be used locally to create embeddings for data representation in vector space.
  • πŸ” Embeddings are vector representations of data, such as news article titles, which can be stored in a vector store for similarity searches.
  • πŸ› οΈ To use LLMs locally, one needs to have 'ollama' installed, which facilitates running large language models on a local machine.
  • πŸ’» The installation of 'ollama' is straightforward, with specific commands for Windows, Linux, and Mac systems.
  • πŸ“š After installation, 'ollama' can be used to run models like llama 2 or mistol through the command line.
  • πŸ”Œ To generate embeddings, one can send POST requests to the local embeddings API endpoint from Python using packages like 'requests', 'numpy', and 'faiss'.
  • πŸ“ˆ The embeddings API requires specifying the model and the prompt (data to be embedded) in a JSON object.
  • πŸ“Š The resulting embeddings are vectors of a fixed size, which can be used for similarity searches in a vector database.
  • πŸ”Ž For recommender systems, embeddings can help find the most similar items or articles based on vector similarity, not just text matches.
  • πŸ“ The script demonstrates embedding a 'Hello World' prompt and a collection of news article titles, then finding the most similar titles to a new prompt.
  • πŸš€ The intelligence of the embeddings comes from the LLMs, which understand the context and meaning to provide relevant similarity searches.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is how to use large language models locally to create embeddings and store them in a vector store for performing similarity searches, which can be useful for recommender systems.

  • What are embeddings in the context of this video?

    -Embeddings, in this context, are vector space representations of given data, such as news articles or titles, which can be used to perform similarity searches and find the most similar items.

  • Why are embeddings useful for recommender systems?

    -Embeddings are useful for recommender systems because they allow for the comparison of new items with existing ones in vector space, enabling the system to recommend items that are similar based on their embeddings.

  • What is the role of a large language model in creating embeddings?

    -The role of a large language model in creating embeddings is to convert text data into a vector format that can be understood and compared by a computer, capturing the semantic meaning of the text.

  • What is the first step mentioned in the video to use large language models locally?

    -The first step mentioned in the video is to have 'ollama' installed on your system, which is a convenient way to run large language models locally.

  • How can one install 'ollama' on different operating systems?

    -On Windows, you have to use the Windows Subsystem for Linux. On Linux, there is a specific command to run. On Mac, you have to download it. The installation process is quite simple and involves running a command or downloading the software.

  • What are the Python packages needed to send a request to the API from Python?

    -The Python packages needed are 'requests', 'numpy', and 'faiss', which is the vector store used in the video.

  • Can other vector stores be used instead of 'faiss'?

    -Yes, other vector stores can be used instead of 'faiss' if the user is familiar with different ones. The choice of vector store depends on the user's preference and requirements.

  • What is the process of sending a request to the embeddings API?

    -The process involves sending a POST request to a local endpoint on your system, specifically targeting the embeddings API by specifying the model and the prompt in a JSON object.

  • How can one generate a collection of titles for embedding?

    -One can either generate their own titles or use a service like 'chat GPT' to provide a list of titles. The video suggests creating a diverse set of titles to ensure meaningful embeddings.

  • What is the final step in using embeddings for a recommender system?

    -The final step is to fill up the database with a large number of embeddings related to what you are trying to recommend, such as items from a store or news articles, and then use these embeddings to find and recommend similar items.



πŸ€– Local Use of Large Language Models for Embeddings

This paragraph introduces the video's main topic, which is utilizing large language models (LLMs) locally to create embeddings. Embeddings are representations of data in vector space, allowing for similarity searches, which are particularly useful for recommender systems. The video will demonstrate how to embed various types of data, such as news article titles, into vectors that can be stored and searched within a vector store. The process involves using a local installation of 'olama' to run LLMs and Python packages like 'requests', 'numpy', and 'faiss' for handling API requests and vector operations. The goal is to generate meaningful embeddings that can effectively retrieve similar items when performing searches.


πŸ“š Embedding News Article Titles for Similarity Search

The second paragraph delves into the process of embedding news article titles into vector space. It explains how to use the local embeddings API to convert text data into vector form, which can then be stored in a vector database. The paragraph provides a step-by-step guide on setting up the environment, sending a POST request to the embeddings API, and handling the JSON response to extract the embedding vector. It also discusses creating a vector database index and populating it with a collection of titles. The ultimate objective is to perform a similarity search to find the most relevant articles based on a new title, showcasing the practical application of embeddings in recommender systems.


πŸ” Demonstrating Similarity Search with Embedded Titles

In this paragraph, the video script describes an experiment to demonstrate the similarity search functionality using embedded news article titles. It details the process of creating an index, embedding a set of predefined titles, and then using a new title to find the most similar articles in the database. The script includes code snippets to illustrate how to perform these actions programmatically. The experiment shows that the system can identify related content based on the vector embeddings, even when the exact words are not identical. The paragraph concludes with suggestions on how to expand the database with more data to improve the recommendation system's accuracy and effectiveness.



πŸ’‘Large Language Models (LLM)

Large Language Models (LLM) are advanced artificial intelligence systems designed to understand and generate human-like text based on the input they receive. In the context of the video, LLMs are utilized to create 'embeddings', which are vector representations of text data. The video discusses using LLM locally to generate these embeddings for tasks such as similarity search in recommender systems, which can enhance the way content is suggested to users based on their preferences.


Embeddings in the video refer to the process of converting text data into numerical vectors in a high-dimensional space. These vectors represent the semantic meaning of the text and can be used for various machine learning tasks, such as similarity searches. The script mentions embedding news article titles or other text forms into vector space, which allows for the comparison and identification of similar content.

πŸ’‘Vector Store

A vector store, as mentioned in the video, is a type of database designed to store and manage vector data, which in this case are the embeddings of text. The video explains that once text data is converted into embeddings, they can be stored in a vector store to facilitate efficient similarity searches, which is crucial for applications like recommender systems.

πŸ’‘Similarity Search

Similarity search is a technique used to find items that are similar to a given item based on certain criteria. In the video, it is demonstrated how embeddings can be used to perform similarity searches to find the most similar articles or items. This is particularly useful in recommender systems where the system can suggest articles or products that are similar to what a user has shown interest in.

πŸ’‘Recommender Systems

Recommender systems are algorithms that suggest content or products to users based on their preferences or past interactions. The video script explains how embeddings can be used in recommender systems to find and suggest the most similar items to a user. For example, if a user likes a particular news article, the system can recommend other articles with similar embeddings.


Localhost refers to a network name that means the local server on a computer. In the context of the video, it is mentioned that the embeddings API runs locally on the user's system, accessible via 'localhost'. This allows for the local execution of large language models without the need for external server calls, which can be more efficient and secure.


API stands for Application Programming Interface, which is a set of rules and protocols for building and interacting with software applications. The video script describes sending a request to the embeddings API, which is a service running on the local machine, to generate embeddings for given text data.


Python is a widely used high-level programming language known for its readability and versatility. The video script mentions using Python to send requests to the embeddings API. This involves using Python packages like 'requests', 'numpy', and 'faiss' (a library for efficient similarity search and clustering of dense vectors) to interact with the LLM and handle the embeddings.


Faiss is a library developed by Facebook AI Research for efficient similarity search and clustering of dense vectors. In the video, 'faiss' is chosen as the vector store to manage the embeddings. It is used to create an index that stores the embeddings and performs similarity searches to find the nearest neighbors in the vector space.


In the context of embeddings, dimension refers to the size of the vector representing the text data. The video mentions that the embeddings generated by 'llama 2' have a dimension of 4,996. This dimension is crucial as it determines the complexity and expressiveness of the embeddings, affecting the accuracy of similarity searches.


Learn how to use large language models locally to create embeddings.

Embeddings are representations in vector space of given data.

Store embeddings in a vector store for similarity search.

Use embeddings for recommender systems.

Embeddings can be created from text data or any representable text form.

Intelligence in finding good embeddings is contained in large language models.

Need ollama on your system for running large language models locally.

Installation of ollama is easy and can be done from the GitHub page.

Use ollama to run large language models on Local Host.

Send requests to the API from Python for embeddings.

Need packages like requests, numpy, and faiss for the process.

Target the embeddings API by sending a POST request to a local endpoint.

Embeddings are generated by sending a JSON object containing the model and prompt.

Example of embedding the 'Hello World' text into vector space.

Dimension of the vectors from llama 2 is 4,996.

Create a collection of embedded titles for similarity search.

Use faiss as a vector store to store the embeddings.

Create an index for the vector database with specified dimensions.

Embed a collection of news article titles and store them in the index.

Compare new titles to existing embeddings to find the most similar articles.

Example of finding the most similar article to 'recent progress in AI'.

Demonstration of the intelligence in matching similar content, not just words.

Fill the database with lots of data for effective recommendations.

Use the same structure for different items to get similar item recommendations.