SGLang
High-performance serving framework for LLMs with RadixAttention
Überblick
High-performance serving framework for LLMs with RadixAttention
Anforderungen
Min vCPU
1
Min RAM
1024 MB
Min Disk
10 GB
Rec vCPU
2
Rec RAM
2048 MB
Rec Disk
20 GB
Empfohlener VPS
Hetzner · CX22
2 vCPU · 4096 MB · 40 GB
Hetzner · CX22
2 vCPU · 4096 MB · 40 GB
Hetzner · CX22
2 vCPU · 4096 MB · 40 GB
Affiliate-Hinweis
Docker Compose
# Generated by Run This Ai — docker-compose.yml
services:
sglang:
image: lmsysorg/sglang:latest
restart: unless-stopped
ports:
- 8080:8080
volumes:
- ./data/sglang:/data
SGLang: install
SGLang
High-performance serving framework for LLMs with RadixAttention
Prerequisites
- Docker installed (version 24.0+)
- Docker Compose (version 2.20+)
- At least 1GB RAM (2GB recommended)
Quick start with Docker
# Pull the image
docker pull lmsysorg/sglang:latest
# Run the container
docker run -d --name sglang -p 8080:8080 lmsysorg/sglang:latest
Key features
- Self-hosted and open source
- Docker-based deployment
- License: Apache-2.0
- Repository: https://github.com/sgl-project/sglang
- Docker image:
lmsysorg/sglang:latest
SGLang: overview
SGLang
High-performance serving framework for LLMs with RadixAttention
Key features
- Self-hosted and open source
- Docker-based deployment
- License: Apache-2.0
- Repository: https://github.com/sgl-project/sglang
- Docker image:
lmsysorg/sglang:latest
What it's good for
SGLang runs entirely on your own infrastructure — your data never leaves your server.
Verwandte Tools
Ollama
Run large language models locally
Open WebUI
User-friendly WebUI for LLMs (Ollama, OpenAI API)
llama.cpp
LLM inference in C/C++
Jan
Open-source ChatGPT replacement — run LLMs locally with full control and privacy
Llamafile
Distribute and run LLMs as single-file executables — no installation needed
LocalAI
Open-source AI engine — run any model (LLM, vision, voice, image, video) on any hardware, no GPU required