Installation¶
This guide covers installing cyllama on different platforms.
Requirements¶
- Python 3.10 or later
- C++ compiler (clang or gcc)
- CMake 3.21+
- Git
Platform-Specific Requirements¶
macOS:
Ubuntu/Debian:
Fedora/RHEL:
Install from PyPI¶
GPU-Accelerated Variants¶
GPU variants are available on PyPI as separate packages (dynamically linked, Linux x86_64 only):
pip install cyllama-cuda12 # NVIDIA GPU (CUDA 12.4)
pip install cyllama-rocm # AMD GPU (ROCm 6.3, requires glibc >= 2.35)
pip install cyllama-sycl # Intel GPU (oneAPI SYCL 2025.3)
pip install cyllama-vulkan # Cross-platform GPU (Vulkan)
All GPU variants install the same cyllama Python package -- only the compiled backend differs. Install one at a time (they replace each other). GPU variants require the corresponding driver/runtime installed on your system.
You can verify which backend is active after installation:
Build from Source¶
# Clone repository
git clone https://github.com/shakfu/cyllama.git
cd cyllama
# Build everything (downloads llama.cpp, whisper.cpp, builds cyllama)
make
# Download a test model
make download
# Verify installation
python -c "from cyllama import complete; print('OK')"
Build Options¶
Default Build¶
The default build enables GPU acceleration appropriate for your platform:
- macOS: Metal (Apple GPU)
- Linux: CPU-only (GPU backends optional)
GPU Backends¶
Build with specific GPU support:
# NVIDIA CUDA
make build-cuda
# Vulkan (cross-platform)
make build-vulkan
# CPU only (no GPU)
make build-cpu
# Multiple backends
GGML_CUDA=1 GGML_VULKAN=1 make build
See Building with Different Backends for detailed GPU setup instructions.
Optional Components¶
Stable Diffusion support:
WITH_STABLEDIFFUSION=1 make build
# Use stable-diffusion.cpp's own vendored ggml (instead of llama.cpp's)
SD_USE_VENDORED_GGML=1 make build
Whisper support (included by default):
Build System¶
Cyllama uses scikit-build-core with CMake for building the Cython extensions. The build process:
- Dependencies:
makedownloads and builds llama.cpp, whisper.cpp (and optionally stable-diffusion.cpp) - Cython compilation: CMake compiles
.pyxfiles to C++ using Cython - Extension linking: C++ extensions are linked against the static libraries
- Installation: Extensions are installed in editable mode
Build Commands¶
| Command | Description |
|---|---|
make |
Full build (dependencies + editable install) |
make wheel |
Build wheel for distribution |
make clean |
Remove build artifacts |
make reset |
Full reset including thirdparty |
make remake |
Clean rebuild with tests |
make leaks |
RSS-growth memory leak detection |
Wheel Distribution¶
To build a distributable wheel:
The wheel includes all compiled extensions and can be installed on systems with matching platform/Python version.
Installing Models¶
LLM Models (GGUF format)¶
Download the default test model:
Or download manually from Hugging Face:
# Example: Download a model
curl -L -o models/llama.gguf \
"https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q8_0.gguf"
Whisper Models¶
Download from ggerganov/whisper.cpp:
curl -L -o models/ggml-base.en.bin \
"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin"
Stable Diffusion Models¶
Download SDXL Turbo or other SD models in GGUF or safetensors format.
Verification¶
Test Installation¶
# Run test suite
make test
# Quick smoke test
python -c "
from cyllama import complete
print(complete('Hello', model_path='models/Llama-3.2-1B-Instruct-Q8_0.gguf', max_tokens=10))
"
Check GPU Support¶
from cyllama.llama.llama_cpp import ggml_backend_load_all
# Load all available backends
ggml_backend_load_all()
# Check what's available
from cyllama.llama.llama_cpp import LlamaModel, LlamaModelParams
params = LlamaModelParams()
params.n_gpu_layers = 99 # Request GPU offload
# If GPU is available, layers will be offloaded
Troubleshooting¶
"No module named 'cyllama'"¶
Make sure you're in the project directory or have installed cyllama:
Build Errors¶
Clean and rebuild:
Metal Not Working (macOS)¶
Ensure Xcode Command Line Tools are installed:
CUDA Not Found (Linux)¶
Add CUDA to your PATH:
Development Install¶
For development with editable install:
git clone https://github.com/shakfu/cyllama.git
cd cyllama
make # Builds dependencies and installs in editable mode
For manual editable install (after dependencies are built):
Next Steps¶
- User Guide - Learn the API
- Cookbook - Common patterns and recipes
- Building with Different Backends - GPU setup details