cyllama¶

This is the official documentation for cyllama, a high-performance Python library for local AI inference.

About¶

cyllama provides high-performance Cython bindings to three C++ inference engines -- all from Python with zero runtime dependencies:

This documentation covers:

Python developers who want to run LLMs locally without cloud dependencies
ML engineers looking for a lightweight alternative to PyTorch-based inference
Application developers building AI-powered features with predictable latency
Researchers who need direct access to model internals and sampling parameters

No machine learning expertise is required for basic usage.

Code examples use Python 3.12+ syntax:

from cyllama import complete

response = complete("Hello!", model_path="models/llama.gguf")

Shell commands are shown with bash syntax:

make build
make test

cyllama is open source and available at:

Issues, contributions, and feedback are welcome.