Run AI Models Locally: Ollama Tutorial (Step-by-Step Guide + WebUI)

Leon van Zyl
8 Jul 202414:52

TLDRThis tutorial introduces Olama, a platform for running AI models locally without the need for costly cloud services. It guides viewers through the process of installing Olama, downloading and running open-source models, and using advanced features like the Open Web UI for a more interactive experience. The video also covers creating custom models and interacting with them via a web interface, providing a comprehensive guide for those interested in AI experimentation with privacy and security.


  • πŸš€ Ollama is a tool that allows you to run AI models locally without the need for expensive cloud services.
  • πŸ’‘ Running AI models locally ensures privacy and security as no data is sent to cloud services.
  • πŸ–₯️ To get started with Ollama, you need to download and install it from the official website and then run it via a desktop app or command line.
  • πŸ” Ollama's basic commands include listing available models, downloading models, and running them.
  • πŸ“š You can find and download various AI models from the Ollama website, with different sizes catering to different hardware capabilities.
  • πŸ”„ The 'olama pool' command allows you to download models without immediately running them, providing more control over the process.
  • πŸ“ The 'olama show' command displays detailed information about a specific model, including its base model, parameters, and context size.
  • πŸ—‘οΈ The 'olama rm' command lets you remove a model from your local setup if it's no longer needed.
  • πŸ”§ Special commands within the chat window, like 'set', allow you to adjust session attributes such as 'temperature' for creativity and 'system' for model personality or instructions.
  • πŸ”– You can save changes to a model's parameters and system message as a new model, creating a customized experience.
  • 🌐 Open Web UI is an optional, attractive user interface for interacting with your models, which can be installed using Docker.
  • πŸ”— Ollama also provides API endpoints for developers to integrate AI model functionalities into their applications, demonstrated through a Postman example.

Q & A

  • What is the main purpose of Olama?

    -Olama allows users to download and run free, open source, and uncensored AI models on their local machine without the need for cloud services, ensuring privacy and security.

  • How can I install Olama on my computer?

    -To install Olama, visit, click on download, select your operating system, and execute the downloaded file. Follow the installation prompts to complete the process.

  • What are the two ways to start Olama after installation?

    -You can start Olama by running the Olama desktop app, which will display the Olama icon in your system tray, or by opening your command prompt or terminal and running the 'olamaserve' command.

  • How can I verify that Olama is running correctly?

    -You can verify that Olama is running by entering 'olama' in the terminal. If it is working correctly, you should see a list of all possible commands.

  • What does the 'olama list' command do?

    -The 'olama list' command displays all the available models that are currently installed on your machine.

  • How can I download a model using Olama?

    -To download a model, visit the Olama website, select the model you are interested in, and follow the instructions provided on the right-hand side of the page. You can copy the provided command and paste it into the command prompt to download the model.

  • What is the difference between the 9 billion parameter model and the 27 billion parameter model?

    -The main difference between the 9 billion parameter model and the 27 billion parameter model is their size and the hardware requirements needed to run them. The larger model is typically meant for enterprise-grade hardware, while the smaller models are suitable for most users.

  • How can I interact with the models using the Open Web UI?

    -After installing the Open Web UI, you can interact with the models through a user-friendly interface. You can select models from a dropdown menu and chat with them or upload documents to ask questions based on the content.

  • What is the significance of the 'temperature' parameter in Olama?

    -The 'temperature' parameter in Olama determines the creativity of the model's responses. A value between 0 and 1, where 1 is completely creative and 0 is factual and sticks to the system prompt.

  • How can I create a custom model with a specific personality or role?

    -To create a custom model, you can use the 'set system' command to define a personality or role, and adjust other parameters as needed. Then, save these changes as a new model by using the 'save' command followed by the desired model name.

  • What is the process for installing the Open Web UI for Olama?

    -To install the Open Web UI for Olama, you need to have Docker installed on your machine. After installing Docker, you can run a command found on the Open Web UI page to start the UI, which will then be accessible through a web browser.



πŸš€ Introduction to Olama: Free AI Models for Personal Use

This paragraph introduces Olama, a platform that allows users to download and run open-source AI models without incurring high costs. It emphasizes the privacy and security benefits of running models locally, without the need for an internet connection. The video promises to guide viewers through setting up Olama, downloading models, and using advanced features. It also mentions the installation of Open Web UI, a user-friendly interface for interacting with AI models, including the ability to chat with documents using RAG (Retrieval-Augmented Generation).


πŸ“ Olama Setup and Basic Commands

The viewer is guided through the process of downloading and installing Olama, starting the application, and familiarizing themselves with basic commands. The paragraph explains how to check if Olama is running, how to list available models, and how to download and run models. It also covers how to view model details and remove models, providing a foundation for users to start experimenting with AI models on their own machines.


🎭 Customizing AI Models and Conversation History

This section delves into customizing AI models by adjusting parameters like temperature, which affects the model's creativity, and setting system messages to define the model's personality or role. The paragraph demonstrates how to save these customizations as new models and how to exit the chat interface. It also highlights the model's ability to recall information from the conversation history, showcasing the interactive and dynamic nature of AI models.

πŸ›  Advanced Model Creation and API Usage

The script explains how to create custom AI models by setting parameters and system messages in a text file, then using the 'Olama create' command to generate a new model. It also touches on the technical side of Olama, introducing API endpoints for developers to interact with models programmatically. The paragraph provides an example of using Postman to call an API endpoint for model completion, illustrating the flexibility and power of Olama for advanced users.

🌐 Installing Open Web UI for Enhanced User Experience

The final paragraph focuses on enhancing the user experience with Olama by installing Open Web UI, which provides a more visually appealing interface than the terminal. It outlines the prerequisites, specifically the need for Docker, and provides a step-by-step guide to installing and running Open Web UI. The viewer is shown how to sign up, select models, and interact with them through the web interface, including the ability to chat with documents using RAG.



πŸ’‘AI Models

AI Models, or Artificial Intelligence Models, refer to the algorithms and computational frameworks that enable machines to perform tasks that typically require human intelligence, such as understanding natural language or recognizing images. In the context of the video, AI models are used to process and generate human-like text based on user input, and they can be run locally without reliance on cloud services.


Ollama is the name of the software discussed in the video, which allows users to download, install, and run AI models on their local machines. It is an open-source and uncensored alternative to cloud-based AI services, emphasizing privacy and cost-effectiveness. The script describes the process of installing Ollama, interacting with it through the command line, and using it to run various AI models.

πŸ’‘Open Source

Open Source refers to a type of software whose source code is available to the public for viewing, modifying, and enhancing. In the video, the AI models available through Ollama are open source, meaning users can freely use, study, and contribute to their development. This is contrasted with proprietary software, which is typically restricted to the company that owns it.


Uncensored implies that the content or functionality is not subject to review or restriction by a third party. In the context of the video, uncensored models on Ollama mean that the AI operates without external control or filtering, allowing for more freedom in the types of responses and interactions it can generate.

πŸ’‘Local Machine

A local machine refers to a user's personal computer or device. The video emphasizes running AI models on a local machine as opposed to using cloud-based services, which can have cost implications and raise privacy concerns. By running models locally, users can maintain control over their data and the AI's operation.


WebUI stands for Web User Interface, which is a graphical interface accessible through a web browser. In the video, Open Web UI is introduced as a user-friendly alternative to the command-line interface for interacting with Ollama and its AI models. It allows for a more visual and intuitive experience when working with AI models.


RAG stands for Retrieval-Augmented Generation, a machine learning technique that combines retrieval of relevant information with generative models to produce more accurate and context-aware responses. The video mentions that the Open Web UI includes RAG capabilities, allowing users to upload documents and interact with them through the AI model.


In the context of AI models, parameters are variables within the model that can be adjusted to change its behavior. The video discusses setting parameters such as 'temperature,' which controls the creativity versus factuality of the model's responses, and the 'system message,' which can define the model's persona or role.

πŸ’‘API Endpoints

API stands for Application Programming Interface, and endpoints are specific locations in the API that perform certain tasks or functions. The video mentions that Ollama provides API endpoints for creating messages, managing models, and other operations, allowing developers to integrate Ollama's AI capabilities into their own applications.


Docker is a platform that allows developers to develop, ship, and run applications in containers. Containers are lightweight, portable, and self-sufficient, making it easier to manage applications. In the video, Docker is required to install and run the Open Web UI for Ollama, simplifying the setup process.


Ollama allows you to run free, open source, and uncensored AI models locally without the need for expensive cloud services.

Models run locally, ensuring privacy and security of your data without the need for an internet connection.

Ollama installation is straightforward, requiring only a few steps on the Ollama website.

Ollama can be started via a desktop app or through a command prompt with the 'ollamaserve' command.

Basic Ollama commands include listing available models and checking if Ollama is running correctly.

Models can be downloaded and installed by selecting from featured, popular, or new models on the Ollama website.

Different model sizes have varying hardware requirements, with smaller models suitable for most users.

Downloading a model can be done without running it, using the 'ollama pool' command followed by the model name.

Multiple models can be downloaded and managed using Ollama's command-line interface.

Model details can be viewed with the 'ollama show' command, providing insights into the model's parameters and context size.

Models can be removed using the 'ollama rm' command, freeing up space on your machine.

Running a model is as simple as typing 'ollama run' followed by the model name.

Ollama models can generate creative content, such as song lyrics, based on user prompts.

Conversation history is stored for the session, allowing models to recall information from previous interactions.

The 'set' command in Ollama allows customization of session parameters like temperature for creativity level.

The system message can be set to give the model a specific personality or role, influencing its responses.

Custom models can be saved with unique settings and system messages for future use.

Creating a custom model involves writing a configuration file and using the 'ollama create' command.

Ollama provides API endpoints for developers to integrate model capabilities into their applications.

The open web UI for Ollama offers a more attractive and user-friendly interface for interacting with models.

Docker is required to install the open web UI, providing an easy setup process with a single command.

The web UI allows users to chat with models and upload documents for context-aware responses.