Stable Diffusion Integration¶
Cyllama wraps stable-diffusion.cpp to provide image and video generation capabilities in Python.
Note: Build with WITH_STABLEDIFFUSION=1 to enable this module. By default, stable-diffusion.cpp links against llama.cpp's ggml. To use stable-diffusion.cpp's own vendored ggml instead, set SD_USE_VENDORED_GGML=1.
Overview¶
The stable diffusion module provides Python bindings to stable-diffusion.cpp, enabling:
- Text-to-image generation
- Image-to-image transformation
- Inpainting with masks
- ControlNet guided generation
- Video generation (with compatible models like Wan, CogVideoX)
- ESRGAN image upscaling
- Model format conversion
Quick Start¶
Text-to-Image¶
from cyllama.sd import text_to_image
images = text_to_image(
model_path="models/sd_xl_turbo_1.0.q8_0.gguf",
prompt="a photo of a cute cat",
width=512,
height=512,
sample_steps=4,
cfg_scale=1.0
)
images[0].save("output.png")
With Model Reuse¶
For generating multiple images, reuse the context:
from cyllama.sd import SDContext, SDContextParams
params = SDContextParams()
params.model_path = "models/sd_xl_turbo_1.0.q8_0.gguf"
with SDContext(params) as ctx:
for prompt in ["a cat", "a dog", "a bird"]:
images = ctx.generate(
prompt=prompt,
sample_steps=4,
cfg_scale=1.0
)
images[0].save(f"{prompt.replace(' ', '_')}.png")
API Reference¶
Convenience Functions¶
text_to_image()¶
Generate images from a text prompt.
def text_to_image(
model_path: str,
prompt: str,
negative_prompt: str = "",
width: int = 512,
height: int = 512,
seed: int = -1,
batch_count: int = 1,
sample_steps: int = 20,
cfg_scale: float = 7.0,
sample_method: SampleMethod = None,
scheduler: Scheduler = None,
n_threads: int = -1,
vae_path: str = None,
taesd_path: str = None,
clip_l_path: str = None,
clip_g_path: str = None,
t5xxl_path: str = None,
control_net_path: str = None,
lora_model_dir: str = None,
clip_skip: int = -1,
eta: float = 0.0,
slg_scale: float = 0.0,
vae_tiling: bool = False,
offload_to_cpu: bool = False,
keep_clip_on_cpu: bool = False,
keep_vae_on_cpu: bool = False,
diffusion_flash_attn: bool = False
) -> List[SDImage]
| Parameter | Type | Default | Description |
|---|---|---|---|
model_path |
str | required | Path to model file |
prompt |
str | required | Text prompt |
negative_prompt |
str | "" | What to avoid |
width |
int | 512 | Output width |
height |
int | 512 | Output height |
seed |
int | -1 | Random seed (-1 = random) |
batch_count |
int | 1 | Number of images |
sample_steps |
int | 20 | Sampling steps |
cfg_scale |
float | 7.0 | CFG guidance scale |
sample_method |
SampleMethod | None | Sampling method |
scheduler |
Scheduler | None | Noise scheduler |
clip_skip |
int | -1 | CLIP layers to skip |
n_threads |
int | -1 | Thread count (-1 = auto) |
eta |
float | 0.0 | Eta for DDIM/TCD samplers |
slg_scale |
float | 0.0 | Skip layer guidance scale |
vae_tiling |
bool | False | Enable VAE tiling for large images |
offload_to_cpu |
bool | False | Offload weights to CPU (low VRAM) |
diffusion_flash_attn |
bool | False | Use flash attention |
image_to_image()¶
Transform an existing image with text guidance.
def image_to_image(
model_path: str,
init_image: SDImage,
prompt: str,
negative_prompt: str = "",
strength: float = 0.75,
seed: int = -1,
sample_steps: int = 20,
cfg_scale: float = 7.0,
...
) -> List[SDImage]
The strength parameter (0.0-1.0) controls how much to transform the input image.
SDContext¶
Main context class for model loading and generation.
from cyllama.sd import SDContext, SDContextParams
params = SDContextParams()
params.model_path = "models/sd-v1-5.gguf"
params.n_threads = 4
ctx = SDContext(params)
# Check if loaded successfully
if ctx.is_valid:
images = ctx.generate(
prompt="a beautiful landscape",
negative_prompt="blurry, ugly",
width=512,
height=512,
sample_steps=20,
cfg_scale=7.0,
flow_shift=0.0
)
Methods:
| Method | Description |
|---|---|
generate(...) |
Generate images from text prompt |
generate_video(...) |
Generate video frames (requires video model) |
get_default_sample_method() |
Get model's default sampler |
get_default_scheduler() |
Get model's default scheduler |
is_valid |
Check if context is valid |
SDContextParams¶
Configuration for model loading.
params = SDContextParams()
# Model paths
params.model_path = "model.gguf" # Main model
params.diffusion_model_path = "unet.gguf" # Diffusion model (for split models)
params.vae_path = "vae.safetensors" # VAE model
params.clip_l_path = "clip_l.safetensors" # CLIP-L (SDXL/SD3)
params.clip_g_path = "clip_g.safetensors" # CLIP-G (SDXL/SD3)
params.clip_vision_path = "clip_vision.safetensors" # CLIP vision
params.t5xxl_path = "t5xxl.safetensors" # T5-XXL (SD3/FLUX)
params.llm_path = "qwen.gguf" # LLM encoder (FLUX2)
params.llm_vision_path = "qwen_vision.gguf" # LLM vision encoder
params.taesd_path = "taesd.safetensors" # TAESD for fast preview
params.control_net_path = "controlnet.gguf" # ControlNet model
params.photo_maker_path = "photomaker.bin" # PhotoMaker model
params.high_noise_diffusion_model_path = "..." # High-noise model (Wan2.2 MoE)
params.lora_model_dir = "loras/" # LoRA directory
params.embedding_dir = "embeddings/" # Embeddings directory
params.tensor_type_rules = "^vae\\.=f16" # Mixed precision rules
# Numeric/enum parameters
params.n_threads = 4 # Thread count
params.wtype = SDType.F16 # Weight type
params.rng_type = RngType.CUDA # RNG type
params.sampler_rng_type = RngType.CPU # Sampler RNG type
params.prediction = Prediction.DEFAULT # Prediction type
params.lora_apply_mode = LoraApplyMode.AUTO # LoRA application mode
params.chroma_t5_mask_pad = 0 # Chroma T5 mask pad
# Boolean flags
params.vae_decode_only = True # VAE decode only (faster)
params.enable_mmap = True # Enable memory-mapped loading
params.offload_params_to_cpu = False # Offload to CPU (low VRAM)
params.keep_clip_on_cpu = False # Keep CLIP on CPU
params.keep_vae_on_cpu = False # Keep VAE on CPU
params.keep_control_net_on_cpu = False # Keep ControlNet on CPU
params.diffusion_flash_attn = False # Flash attention
params.diffusion_conv_direct = False # Direct convolution
params.vae_conv_direct = False # VAE direct convolution
params.tae_preview_only = False # TAESD for preview only
params.circular_x = False # Circular padding X (tileable)
params.circular_y = False # Circular padding Y (tileable)
params.qwen_image_zero_cond_t = False # Zero conditioning for Qwen
params.chroma_use_dit_mask = True # DiT mask for Chroma
params.chroma_use_t5_mask = False # T5 mask for Chroma
SDImage¶
Image wrapper with numpy and PIL integration, plus file I/O.
from cyllama.sd import SDImage
# Load from file (PNG, JPEG, BMP, TGA, GIF, PSD, HDR, PIC supported)
img = SDImage.load("input.png")
img = SDImage.load("input.jpg", channels=3) # Force RGB
# Properties
print(img.width, img.height, img.channels)
print(img.shape) # (H, W, C)
print(img.is_valid)
# Save to file (PNG, JPEG, BMP supported)
img.save("output.png")
img.save("output.jpg", quality=90)
img.save("output.bmp")
# Convert to numpy (requires numpy)
arr = img.to_numpy() # Returns (H, W, C) uint8 array
# Create from numpy
img = SDImage.from_numpy(arr)
# Convert to PIL (requires Pillow)
pil_img = img.to_pil()
SDImageGenParams¶
Detailed generation parameters for advanced control.
from cyllama.sd import SDImageGenParams, SDImage
params = SDImageGenParams()
params.prompt = "a cute cat"
params.negative_prompt = "ugly, blurry"
params.width = 512
params.height = 512
params.seed = 42
params.batch_count = 1
params.strength = 0.75 # For img2img
params.clip_skip = -1
params.control_strength = 0.9 # ControlNet strength
# VAE tiling for large images
params.vae_tiling_enabled = True
params.vae_tile_size = (512, 512)
params.vae_tile_overlap = 0.5
# EasyCache acceleration
params.easycache_enabled = True
params.easycache_threshold = 0.1
params.easycache_range = (0.0, 1.0)
# Set init image for img2img
init_img = SDImage.load("input.png")
params.set_init_image(init_img)
# Set mask for inpainting
mask_img = SDImage.load("mask.png")
params.set_mask_image(mask_img)
# Set control image for ControlNet
params.set_control_image(control_img, strength=0.8)
# Access sample parameters
sample = params.sample_params
sample.sample_steps = 20
sample.cfg_scale = 7.0
sample.sample_method = SampleMethod.EULER
sample.scheduler = Scheduler.KARRAS
sample.eta = 0.0
sample.slg_scale = 2.5 # Skip layer guidance
sample.slg_layer_start = 0.01
sample.slg_layer_end = 0.2
sample.img_cfg_scale = 1.5 # Image CFG (inpaint)
sample.distilled_guidance = 3.5 # For FLUX
SDSampleParams¶
Sampling configuration.
from cyllama.sd import SDSampleParams, SampleMethod, Scheduler
params = SDSampleParams()
params.sample_method = SampleMethod.EULER_A
params.scheduler = Scheduler.KARRAS
params.sample_steps = 20
params.cfg_scale = 7.0
params.eta = 0.0 # Noise multiplier
params.shifted_timestep = 0 # NitroFusion models
params.flow_shift = 0.0 # Flow shift (SD3.x/Wan)
params.img_cfg_scale = 1.5 # Image guidance
params.distilled_guidance = 3.5 # FLUX guidance
params.slg_scale = 0.0 # Skip layer guidance
params.slg_layer_start = 0.01
params.slg_layer_end = 0.2
Upscaler¶
ESRGAN-based image upscaling.
from cyllama.sd import Upscaler, SDImage
# Load upscaler model
upscaler = Upscaler(
"models/esrgan-x4.bin",
n_threads=4,
offload_to_cpu=False,
direct=False
)
# Check upscale factor
print(f"Factor: {upscaler.upscale_factor}x")
# Upscale an image
img = SDImage.load("input.png")
upscaled = upscaler.upscale(img)
upscaled.save("upscaled.png")
# Multiple upscale passes
for _ in range(2):
img = upscaler.upscale(img) # 16x total
Enums¶
SampleMethod¶
Sampling methods for diffusion:
| Value | Description |
|---|---|
EULER |
Euler method |
EULER_A |
Euler ancestral |
HEUN |
Heun's method |
DPM2 |
DPM-2 |
DPMPP2S_A |
DPM++ 2S ancestral |
DPMPP2M |
DPM++ 2M |
DPMPP2Mv2 |
DPM++ 2M v2 |
IPNDM |
IPNDM |
IPNDM_V |
IPNDM-V |
LCM |
Latent Consistency Model |
DDIM_TRAILING |
DDIM trailing |
TCD |
TCD |
Scheduler¶
Noise schedulers:
| Value | Description |
|---|---|
DISCRETE |
Discrete scheduler |
KARRAS |
Karras scheduler |
EXPONENTIAL |
Exponential scheduler |
AYS |
AYS scheduler |
GITS |
GITS scheduler |
SGM_UNIFORM |
SGM uniform |
SIMPLE |
Simple scheduler |
SMOOTHSTEP |
Smoothstep scheduler |
LCM |
LCM scheduler |
Prediction¶
Prediction types:
| Value | Description |
|---|---|
DEFAULT |
Auto-detect from model |
EPS |
Epsilon prediction |
V |
V-prediction |
EDM_V |
EDM V-prediction |
SD3_FLOW |
SD3 flow matching |
FLUX_FLOW |
FLUX flow matching |
FLUX2_FLOW |
FLUX2 flow matching |
SDType¶
Data types for quantization:
- Float:
F32,F16,BF16 - 4-bit:
Q4_0,Q4_1,Q4_K - 5-bit:
Q5_0,Q5_1,Q5_K - 8-bit:
Q8_0,Q8_1,Q8_K - K-quants:
Q2_K,Q3_K,Q6_K
LoraApplyMode¶
LoRA application modes:
| Value | Description |
|---|---|
AUTO |
Auto-detect best mode |
IMMEDIATELY |
Apply at load time |
AT_RUNTIME |
Apply during generation |
PreviewMode¶
Preview modes during generation:
| Value | Description |
|---|---|
NONE |
No preview |
PROJ |
Projection preview |
TAE |
TAESD preview |
VAE |
Full VAE preview |
Callbacks¶
Set callbacks for logging, progress, and preview during generation.
from cyllama.sd import (
set_log_callback,
set_progress_callback,
set_preview_callback,
LogLevel,
PreviewMode
)
# Log callback
def log_cb(level: LogLevel, text: str):
level_names = {0: 'DEBUG', 1: 'INFO', 2: 'WARN', 3: 'ERROR'}
print(f'[{level_names.get(level, level)}] {text}', end='')
set_log_callback(log_cb)
# Progress callback
def progress_cb(step: int, steps: int, time_ms: float):
pct = (step / steps) * 100 if steps > 0 else 0
print(f'Step {step}/{steps} ({pct:.1f}%) - {time_ms:.2f}s')
set_progress_callback(progress_cb)
# Preview callback
def preview_cb(step: int, frames: list, is_noisy: bool):
if frames:
frames[0].save(f"preview_{step}.png")
set_preview_callback(
preview_cb,
mode=PreviewMode.TAE,
interval=5,
denoised=True,
noisy=False
)
# Clear callbacks
set_log_callback(None)
set_progress_callback(None)
set_preview_callback(None)
Model Conversion¶
Convert models between formats with optional quantization.
from cyllama.sd import convert_model, SDType
convert_model(
input_path="sd-v1-5.safetensors",
output_path="sd-v1-5-q4_0.gguf",
output_type=SDType.Q4_0,
vae_path="vae-ft-mse.safetensors", # Optional
tensor_type_rules="^vae\\.=f16" # Optional mixed precision
)
ControlNet Preprocessing¶
Apply Canny edge detection for ControlNet conditioning.
from cyllama.sd import SDImage, canny_preprocess
img = SDImage.load("photo.png")
# Apply Canny preprocessing (modifies image in place)
success = canny_preprocess(
img,
high_threshold=0.8,
low_threshold=0.1,
weak=0.5,
strong=1.0,
inverse=False
)
img.save("edges.png")
CLI Tool¶
Command-line interface with subcommands for all operations.
txt2img - Text to Image¶
python -m cyllama.sd txt2img \
--model models/sd_xl_turbo_1.0.q8_0.gguf \
--prompt "a beautiful sunset" \
--output sunset.png \
--steps 4 --cfg-scale 1.0
# Using diffusion model directly (FLUX, etc.)
python -m cyllama.sd txt2img \
--diffusion-model models/flux-dev.gguf \
--vae models/ae.safetensors \
--clip-l models/clip_l.safetensors \
--t5xxl models/t5xxl.gguf \
--prompt "a photo of a cat" \
-W 1024 -H 1024
# With memory optimization
python -m cyllama.sd txt2img \
--diffusion-model models/flux.gguf \
--vae models/ae.safetensors \
--llm models/qwen.gguf \
--offload-to-cpu \
--diffusion-fa \
--prompt "a lovely cat" \
-W 512 -H 1024
img2img - Image to Image¶
python -m cyllama.sd img2img \
--model models/sd-v1-5.gguf \
--init-img input.png \
--prompt "oil painting style" \
--strength 0.7 \
--output styled.png
inpaint - Inpainting¶
python -m cyllama.sd inpaint \
--model models/sd-inpaint.gguf \
--init-img photo.png \
--mask mask.png \
--prompt "a red hat" \
--output inpainted.png
controlnet - ControlNet Guided Generation¶
python -m cyllama.sd controlnet \
--model models/sd-v1-5.gguf \
--control-net models/control_canny.gguf \
--control-image edges.png \
--prompt "a beautiful landscape" \
--control-strength 0.9
# With automatic Canny preprocessing
python -m cyllama.sd controlnet \
--model models/sd-v1-5.gguf \
--control-net models/control_canny.gguf \
--control-image photo.png \
--canny \
--prompt "anime style"
video - Video Generation¶
# Text to video
python -m cyllama.sd video \
--model models/wan2.1.gguf \
--prompt "a cat walking" \
--video-frames 16 \
--fps 24
# Image to video
python -m cyllama.sd video \
--model models/wan2.1.gguf \
--init-img first_frame.png \
--prompt "camera slowly zooms in" \
--video-frames 24
# Frame interpolation
python -m cyllama.sd video \
--model models/wan2.1.gguf \
--init-img start.png \
--end-img end.png \
--video-frames 16
upscale - Image Upscaling¶
python -m cyllama.sd upscale \
--model models/esrgan-x4.bin \
--input image.png \
--output image_4x.png
# Multiple passes
python -m cyllama.sd upscale \
--model models/esrgan-x4.bin \
--input image.png \
--output image_16x.png \
--repeats 2
convert - Model Conversion¶
python -m cyllama.sd convert \
--input sd-v1-5.safetensors \
--output sd-v1-5-q4_0.gguf \
--type q4_0
# With VAE baking
python -m cyllama.sd convert \
--input sdxl-base.safetensors \
--output sdxl-q8_0.gguf \
--type q8_0 \
--vae sdxl-vae.safetensors
info - System Information¶
CLI Options Reference¶
Model Options (most subcommands):
| Option | Description |
|---|---|
--model, -m |
Main model file |
--diffusion-model |
Diffusion model (for split architectures) |
--vae |
VAE model |
--taesd |
TAESD model (fast preview) |
--clip-l |
CLIP-L model (SDXL/SD3) |
--clip-g |
CLIP-G model (SDXL/SD3) |
--clip-vision |
CLIP vision model |
--t5xxl |
T5-XXL model (SD3/FLUX) |
--llm |
LLM text encoder (FLUX2) |
--llm-vision |
LLM vision encoder |
--control-net |
ControlNet model |
--lora-dir |
LoRA models directory |
--embd-dir |
Embeddings directory |
Generation Options:
| Option | Description |
|---|---|
--prompt, -p |
Text prompt |
--negative, -n |
Negative prompt |
--output, -o |
Output file path |
--width, -W |
Image width |
--height, -H |
Image height |
--steps |
Sampling steps |
--cfg-scale |
CFG guidance scale |
--seed, -s |
Random seed (-1 = random) |
--batch, -b |
Batch count |
--clip-skip |
CLIP layers to skip |
Sampler Options:
| Option | Description |
|---|---|
--sampler |
Sampling method |
--scheduler |
Noise scheduler |
--eta |
Eta for DDIM/TCD |
--rng |
RNG type (std_default, cuda, cpu) |
--prediction |
Prediction type override |
Guidance Options:
| Option | Description |
|---|---|
--slg-scale |
Skip layer guidance scale |
--skip-layer-start |
SLG start point |
--skip-layer-end |
SLG end point |
--guidance |
Distilled guidance (FLUX) |
--img-cfg-scale |
Image CFG scale |
Memory Options:
| Option | Description |
|---|---|
--threads, -t |
Thread count |
--offload-to-cpu |
Offload weights to CPU |
--clip-on-cpu |
Keep CLIP on CPU |
--vae-on-cpu |
Keep VAE on CPU |
--control-net-cpu |
Keep ControlNet on CPU |
--diffusion-fa |
Flash attention |
--vae-tiling |
Enable VAE tiling |
Other Options:
| Option | Description |
|---|---|
--verbose, -v |
Verbose output |
--progress |
Show progress |
--preview |
Preview mode (none, proj, tae, vae) |
Supported Models¶
| Model Family | Examples | Notes |
|---|---|---|
| SD 1.x/2.x | sd-v1-5, sd-v2-1 | Standard models |
| SDXL | sdxl-base, sdxl-turbo | Use cfg_scale=1.0, steps=1-4 for turbo |
| SD3/SD3.5 | sd3-medium, sd3.5-large | May need T5-XXL encoder |
| FLUX | flux.1-dev, flux.1-schnell | Needs clip_l + t5xxl or llm |
| FLUX2 | flux2-* | Uses LLM encoder (Qwen) |
| Wan/CogVideoX | wan-2.1, cogvideox | Video generation |
| LoRA | *.safetensors | Place in lora_model_dir |
| ControlNet | control_* | Use with control images |
| ESRGAN | esrgan-x4 | Upscaling only |
Utility Functions¶
from cyllama.sd import (
get_num_cores,
get_system_info,
type_name,
sample_method_name,
scheduler_name
)
print(f"CPU cores: {get_num_cores()}")
print(get_system_info())
print(type_name(SDType.Q4_0)) # "q4_0"
print(sample_method_name(SampleMethod.EULER)) # "euler"
print(scheduler_name(Scheduler.KARRAS)) # "karras"
Performance Tips¶
- Use turbo models for fast generation (1-4 steps, cfg_scale=1.0)
- Quantize models to Q4_0 or Q8_0 for memory efficiency
- Reuse SDContext when generating multiple images
- Set n_threads to match physical CPU cores
- Use
--offload-to-cpufor low VRAM GPUs - Enable
--diffusion-fa(flash attention) for faster inference - Use
--vae-tilingfor generating large images - Use progress callback to track long generations
Troubleshooting¶
Model Loading Errors¶
import os
if not os.path.exists(model_path):
raise FileNotFoundError(f"Model not found: {model_path}")
Out of Memory¶
- Use smaller model (SD 1.5 vs SDXL)
- Use quantized model (Q4_0 vs F16)
- Reduce image dimensions
- Reduce batch_count
- Enable
--offload-to-cpu - Enable
--vae-tilingfor large images
Slow Generation¶
- Use turbo/LCM models with fewer steps
- Enable flash attention (
--diffusion-fa) - Increase n_threads
- Use direct convolution (
--diffusion-conv-direct)
FLUX/SD3 Models Not Working¶
- Ensure you have the required encoders (clip_l, t5xxl)
- For FLUX2, use
--llminstead of--t5xxl - Check prediction type matches model
See Also¶
- stable-diffusion.cpp repository
- SDXL Turbo - Fast generation model
- API Reference - Detailed API documentation