Post

Run LLM locally with Ollama and Open WebUI

llm

How to run LLM in side of a Docker container

To enable sharing your Nvidia GPU with a docker container, you need to set up GPU passthrough.

1
2
3
4
5
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

Create new Docker network

1
docker network create -d bridge lama

We need to create a docker-compose.yml file to run all required services.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
version: '3'
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    volumes:
      - ./backend/data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    ports:
      - "8080:8080"
    networks:
      - lama
  
  ollama:
    image: ollama/ollama
    container_name: ollama
    restart: unless-stopped
    volumes:
      - ./ollama:/root/.ollama
    ports:
      - "11434:11434"
    runtime: nvidia
    networks:
      - lama

networks:
  lama:
    external: true

Run our new docker-compose.yml

1
docker-copose up -d

Add a model to Ollama

1
docker exec ollama ollama pull llama2

Go to the address localhost:8080 to visit the newly started container. Once there, you can create the first user.

llm

Afterward, you should confirm that you have a valid connection to Ollama. You can do this by going into your settings and then selecting connections. Make sure the URL is correct and start the connection test. You can include your Stable Diffusion server URL in the Images section to obtain its capabilities.

llm

How to install LLM on bare metal

Install Ollama

Ollama is an advanced AI tool designed to enable users to set up and execute large language models like Llama 2 locally. This innovative tool caters to a broad spectrum of users, from seasoned AI professionals to enthusiasts eager to explore the realms of natural language processing without relying on cloud-based solutions.

https://ollama.com/download

1
curl -fsSL https://ollama.com/install.sh | sh

Add a model to Ollama

1
ollama pull llama2

You can open a new terminal window to monitor GPU usage and confirm module utilization.

1
watch -n 0.5 nvidia-smi

Run Open WebUi Docker Container

1
docker run -d --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Install Stable Diffusion

1
2
3
4
5
6
7
8
9
sudo apt install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \
libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev git

curl https://pyenv.run | bash

pyenv install 3.10

pyenv global 3.10
1
2
3
wget -q https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh
chmod +x webui.sh
./webui.sh --listen --api
This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.