How to Use GPU

Ollama supports GPU acceleration for model inference. Here's how to configure it on Windows.

NVIDIA

Supported GPUs

  • NVIDIA GeForce RTX series (20/30/40/50 series and above)
  • NVIDIA GeForce GTX 16 series and above
  • NVIDIA Tesla series
  • 6GB+ VRAM recommended
  • CUDA Capability 7.0 or higher

Install CUDA

  1. Visit NVIDIA website to download CUDA Toolkit (https://developer.nvidia.com/cuda-downloads)
  2. Select Windows and your version
  3. Download and install CUDA Toolkit (v11.7 or later recommended)
  4. Verify installation by running:
nvidia-smi
  1. Restart Ollama to enable GPU acceleration

AMD

Supported GPUs

Officially supported:

  • AMD Radeon RX 9000 series
  • AMD Radeon RX 7000 series
  • AMD Radeon RX 6000 series
  • AMD Instinct series
  • 6GB+ VRAM recommended

Install HIP

  1. Download and install the latest AMD drivers
  2. Install HIP SDK (https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html)
  3. Restart Ollama to enable GPU acceleration

Unsupported AMD GPUs

Some AMD GPUs (500 series, RDNA 5000 series, 680M, etc.) lack official ROCm support. Use the following workaround:

Ollama-for-AMD

  1. Visit https://github.com/likelovewant/ollama-for-amd
  2. Download pre-compiled binaries or build from source
  3. Download pre-compiled rocblas and library files
  4. Replace rocblas.dll and library files accordingly
  5. Restart Ollama

Easier Method

  1. Use Ollama-For-AMD-Installer
  2. Select your GPU model and click "Check latest version"
  3. The tool will automatically complete all configuration

Important Notes

  1. If GPU still can't be used (common on dual-GPU laptops), try setting environment variables to force Ollama to use a specific GPU
  2. Set your power plan to "High Performance" mode
  3. Keep your GPU drivers up to date
  4. Monitor VRAM usage to avoid overflow
  5. Close other GPU-intensive applications when using large models