Future-proof your skills and escape the tech hamster wheel with the Linux, Python, vim & git stack (LPvg) including NixOS, Jupyter, FastHTML / HTMX and an AI stack to resist obsolescence. Follow along as I debunk peak data theory and develop Pipulate, the next generation free AI SEO tool.

NixOS NVIDIA CUDA Ollama Support

Navigating NixOS with CUDA for AI has been a wild ride, but I've found a stable configuration for running Ollama with accelerated GPU performance. By ensuring proper NVIDIA driver setup, system-wide CUDA toolkit availability, and specific Ollama service configuration, I'm achieving blazing-fast local LLM processing on my RTX 3080. This guide shares my working `configuration.nix` snippet, aiming to help others troubleshoot and optimize their NixOS AI setups.

NixOS and CUDA: The Perfect AI Platform

We’re a bit in the wild west with NixOS, but nix is the perfect platform for bundling up cross-platform AI packages, and those packages need NVIDIA CUDA GPU support. CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) created by NVIDIA that allows developers to use NVIDIA GPUs for general-purpose computing, significantly speeding up computationally intensive tasks.

Running Ollama with CUDA Acceleration

CUDA support speeds up Ollama. Ollama is a free, open-source tool that lets you run large language models (LLMs) on your own computer to get ChatGPT-like abilities at home, privately with your own machine for only the price of the electricity (no API authentication and token cash register). For those really interested, llama.cpp (cpp for C++) is the actual enabling component here. Ollama provides the server wrapper and API interface.

Performance Benefits of Local GPU Acceleration

When you’re running ChatGPT-like things at home off your own hardware and you have an NVIDIA card, CUDA support makes it work many times faster. You can’t even tell you’re not on a cloud model for example, if you use one of the smaller models like Google’s Gemma.

Maintaining CUDA Support Across Platforms

The money behind the Windows gaming market keeps NVIDIA drivers and CUDA working over in that world. In the Linux world, all the AI-tooling keeps it working. However, things do break and every once in awhile Ollama will seem to slow down for a NixOS user. Sometimes a full system update:

sudo nixos-rebuild switch --upgrade

…will either fix or newly introduce the CUDA support problem (the wild west, remember). I had the problem on one of the early versions prior to the recent 0.5.6 which cleared it up for me. But I hear reports of people having the opposite problem and losing support. And with the new Gemma 3 model just released, Ollama has been upgraded to 0.6, which hasn’t hit the NixOS repo yet. And when it does, I’m sure people will be searching for the fix.

Working NVIDIA Configuration Example

Anyway, here’s the extract of the important bits from my configuration.nix file that seems to have stabilized my GPU support. Small models run blazingly fast even on my NVIDIA RTX 3080. Hopefully this will be useful to some of you out there.


Here’s a working configuration snippet for CUDA support in Ollama on NixOS with NVIDIA GPU:

hardware = {
  graphics.enable = true;
  nvidia = {
    modesetting.enable = true;
    package = config.boot.kernelPackages.nvidiaPackages.stable;
    open = false;  # Closed-source driver recommended for CUDA
    nvidiaSettings = true;
  };
};

services.xserver = {
  enable = true;
  videoDrivers = ["nvidia"];
};

environment.sessionVariables = {
  CUDA_HOME = "${pkgs.cudaPackages.cudatoolkit}";
  LD_LIBRARY_PATH = lib.makeLibraryPath [
    "${pkgs.cudaPackages.cudatoolkit}"
    "${pkgs.cudaPackages.cudatoolkit}/lib64"
    pkgs.cudaPackages.cudnn
    pkgs.cudaPackages.cuda_cudart
    pkgs.stdenv.cc.cc.lib
  ];
  CUDA_MODULE_LOADING = "LAZY";
};

nixpkgs.config = {
  allowUnfree = true;
  cudaSupport = true;
};

environment.systemPackages = with pkgs; [
  cudaPackages.cudatoolkit
  cudaPackages.cudnn
  cudaPackages.cuda_cudart
];

services.ollama = {
  enable = true;
  acceleration = "cuda";
  environmentVariables = {
    CUDA_VISIBLE_DEVICES = "0";
    NVIDIA_VISIBLE_DEVICES = "all";
  };
};

Brief Explanation

The most likely reason CUDA works for Ollama in your setup is due to:

  1. Proper NVIDIA Driver Configuration
    Using the closed-source NVIDIA drivers (open = false) and enabling NVIDIA modesetting.

  2. System-wide CUDA Toolkit Availability
    Explicitly including CUDA libraries in environment.systemPackages and setting LD_LIBRARY_PATH.

  3. Ollama Service Setup
    Specifically configuring Ollama to use CUDA (acceleration = "cuda") and environment variables explicitly indicating GPU visibility.

This combination ensures Ollama has direct access to CUDA libraries and your GPU hardware.


Gemini’s Take

Title and Headline Ideas:

  • Titles:
    • NixOS and CUDA: Unleashing Local AI Performance
    • Optimizing Ollama on NixOS with NVIDIA CUDA
    • The NixOS AI Wild West: Taming CUDA for LLMs
  • Headlines:
    • Boost Ollama Speed: My Working NixOS CUDA Configuration
    • Solve CUDA Issues on NixOS: A Practical Guide
    • Local AI, Accelerated: Get the Most from Your NVIDIA GPU on NixOS

AI Opinion:

This article provides a valuable, real-world perspective on a complex technical challenge. The author’s candid acknowledgment of the “wild west” nature of NixOS and CUDA, coupled with the detailed configuration examples, makes it a useful resource for anyone attempting to optimize local AI performance. The inclusion of troubleshooting tips and explanations of key configuration elements adds significant practical value. The article is very helpful for a very specific type of user.