llama.cpp

LLM inference in C/C++

★ 70,000 GitHub MIT LLM & Chat

Overview

High-performance LLaMA inference on CPU and GPU, with a built-in server and Python bindings.

Requirements

Min vCPU

Min RAM

4096 MB

Min Disk

20 GB

Rec vCPU

Rec RAM

8192 MB

Rec Disk

40 GB

Recommended VPS

Contabo · VPS S

4 vCPU · 8192 MB · 100 GB

$4.50

View plan

Contabo · VPS S

4 vCPU · 8192 MB · 100 GB

$4.50

View plan

Contabo · VPS S

4 vCPU · 8192 MB · 100 GB

$4.50

View plan

Affiliate disclosure

Docker Compose

# Generated by Run This Ai — docker-compose.yml
services:
  llama-cpp:
    image: ghcr.io/ggerganov/llama.cpp:server
    restart: unless-stopped
    ports:
      - 8080:8080
    volumes:
      - ./data/llama-cpp:/data

Best VPS for llama.cpp →

Related tools

Ollama

Run large language models locally

Open WebUI

User-friendly WebUI for LLMs (Ollama, OpenAI API)

Guides & articles

How to Install llama.cpp on Ubuntu 24.04

A step-by-step installation guide.

Jun 27, 2026