Reviving an old GPU: Setting up Ollama and Llama 3.1 in a homelab

Jul 25, 2024

I’ve been wanting to use an old GPU I have had sitting around for a while and the recent release of Llama 3.1… I think I have found a reason to blow off the dust.

Hardware

The hardware on this server is not ideal but this list will hopefully help you know what may be needed.

CPU: Intel i7-3770 @ 3.4GHz
RAM: 32GB (4x8GB) DDR3 1333Mhz
GPU: GTX 1080 with Driver 535.183.01

As you can see, it isn’t much but it also used to be a pretty good gaming PC!

Setup

To get started, I am using Portainer to help orchestrate my docker-compose.yml. This allows me to easily manage multiple containers across a fleet of virtual machines. This particular server is only running Plex and now Ollama.

Update packages
```
sudo apt update
```
Install NVIDIA drivers and NVIDIA
```
sudo apt install nvidia-driver-535
```
Upgrade system and reboot
```
sudo apt upgrade
sudo reboot
```

Install NVIDIA toolkit

sudo apt-get install -y nvidia-container-toolkit

Restart docker
```
sudo systemctl restart docker
```

Your mileage may vary system to system and you may need to install nvidia-docker2 and you may need to use the nvidia-container-toolkit-base instead of what I have above. If you get the error below then you may need to reinstall the NVIDIA drivers.

Failed to initialize NVML: Driver/library version mismatch

Docker Compose

So I talked about how I orchestrate my compose files. I managed to get it to work in a single compose file. I have other things such as a reverse proxy configured that allow me to add SSL certificates on top of these containers. This is mostly pulled from Open WebUI’s github here.

---
version: "3.8"
services:
  ollama:
    volumes:
      - /opt/ollama:/root/.ollama
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama:latest
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: ["gpu"]
  open-webui:
    build:
      context: .
      args:
        OLLAMA_BASE_URL: '/ollama'
      dockerfile: Dockerfile
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - /opt/ollama/webui:/app/backend/data
    depends_on:
      - ollama
    ports:
      - 8080:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:11434'
      - 'WEBUI_SECRET_KEY='
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

Conclusion

Thank you for surviving this long. With the above, I was able to get Llama 3.1 running on my GTX 1080 and it is actually quite fast. I’ve been a big user of OpenAI’s ChatGPT 4o and speed wise, this is a bit faster in its responses.

I did notice some differences. I had to prompt Llama 3.1 to give me code as outputs, otherwise it leaned towards text output. I will post some more in this space as I test it more.

Thanks all!

Cory

3d6564

Discussion about this post