Back to KB
Difficulty
Intermediate
Read Time
5 min

Calling CUDA from Go without cgo

By Codcompass Team··5 min read

Go is great at infrastructure.

It gives us fast builds, simple deployment, lightweight concurrency, and the ability to ship a single binary.

But Go has always been awkward around one increasingly important area: GPUs.

A lot of modern AI, analytics, vector processing, and high-throughput data work now runs on NVIDIA GPUs through CUDA. The problem is that most CUDA access from application code still lives in the Python world.

This post is about why calling CUDA from Go matters, why cgo is often painful, and what a pure-Go runtime-loaded CUDA Driver API approach can look like.

The problem

Many production backend systems are written in Go.

But most GPU tooling is centered around Python libraries like:

  • PyTorch
  • TensorFlow
  • JAX
  • CuPy

That often creates an architecture like this:

Go service
  -> HTTP/gRPC
  -> Python GPU worker
  -> CUDA
  -> Python GPU worker
  -> Go service

Enter fullscreen mode Exit fullscreen mode

This works, but it adds:

  • another service
  • another runtime
  • serialization overhead
  • extra deployment complexity
  • extra observability/debugging surface

Sometimes the Python service exists only because the Go service cannot easily talk to CUDA directly.

Why not just use cgo?

The usual way to call native libraries from Go is cgo.

For CUDA, that might look like this:

// #cgo LDFLAGS: -lcuda
// #include <cuda.h>
import "C"

Enter fullscreen mode Exit fullscreen mode

That works, but it changes the Go developer experience.

Now you need:

  • a C compiler
  • CUDA headers
  • CUDA libraries available at build time
  • more fragile CI builds
  • harder cross-compilation
  • platform-specific linking behavior

One of Go’s best properties is this:

CGO_ENABLED=0 go build ./...

Enter fullscreen mode Exit fullscreen mode

A clean binary.

No C toolchain.

No build-time CUDA dependency.

So the interesting question is:

Can Go talk to CUDA without cgo?

Yes — by loading the CUDA Driver API dynamically at runtime.

Runtime-loading CUDA

CUDA exposes a Driver API through the NVIDIA driver library.

On Linux:

libcuda.so.1

Enter fullscreen mode Exit fullscreen mode

On Windows:

nvcuda.dll

Enter fullscreen mode Exit fullscreen mode

Instead of linking to CUDA at build

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register — Start Free Trial

7-day free trial · Cancel anytime · 30-day money-back

message queue comparison rabbitmq kafka — How To & Examples | Codcompass