Llamafile
Distribute and run LLMs as single-file executables — no installation needed
Overview
Llamafile is a groundbreaking Mozilla project that collapses all the complexity of LLMs into a single-file executable. Built on llama.cpp and Cosmopolitan Libc, it lets you run powerful language models on virtually any operating system (macOS, Linux, Windows, FreeBSD) and CPU architecture without any installation or dependencies. Just download, make executable, and run. With 25k+ GitHub stars, it also includes whisperfile for single-file speech-to-text. Llamafile supports a wide range of open models including Llama, Mistral, Qwen, and more, making local LLM inference truly accessible to everyone.
Requirements
Min vCPU
1
Min RAM
1024 MB
Min Disk
10 GB
Rec vCPU
4
Rec RAM
4096 MB
Rec Disk
20 GB
Recommended VPS
Contabo · VPS S
4 vCPU · 8192 MB · 100 GB
Contabo · VPS S
4 vCPU · 8192 MB · 100 GB
Contabo · VPS S
4 vCPU · 8192 MB · 100 GB
Affiliate disclosure
Docker Compose
# Generated by Run This Ai — docker-compose.yml
services:
llamafile:
image: ghcr.io/mozilla-ai/llamafile:latest
restart: unless-stopped
ports:
- 8080:8080
volumes:
- ./data/llamafile:/data
Related tools
Ollama
Run large language models locally
Open WebUI
User-friendly WebUI for LLMs (Ollama, OpenAI API)
llama.cpp
LLM inference in C/C++
SGLang
High-performance serving framework for LLMs with RadixAttention
Jan
Open-source ChatGPT replacement — run LLMs locally with full control and privacy
LocalAI
Open-source AI engine — run any model (LLM, vision, voice, image, video) on any hardware, no GPU required