MultiGen LLVM Backend - Complete User Guide¶
Last Updated: October 16, 2025 Version: v0.1.84 Status: Production Ready
Table of Contents¶
- Overview
- Quick Start
- Features & Capabilities
- Compilation Modes
- Optimization Levels
- String Operations
- Memory Safety
- Performance
- Error Handling
- Advanced Usage
- Troubleshooting
- Architecture
Overview¶
The LLVM backend is MultiGen's 6th production-ready backend, generating native executables via LLVM IR. It offers unique advantages:
Key Strengths¶
- Up to 36.5% faster with O3 optimization (vs O0)
- 2nd smallest binaries (37KB average)
- Most complete string API (9 methods)
- Full LLVM optimization pipeline (60+ passes, O0-O3)
- Memory-safe (ASAN verified, 0 leaks)
- Dual compilation modes (AOT + JIT)
- Platform-independent (via LLVM)
- Better error messages (descriptive runtime errors)
Production Status¶
[x] 7/7 benchmarks passing (100% coverage) [x] 1020 comprehensive tests (100% pass rate) [x] 0 memory leaks (ASAN verified) [x] 0 memory errors (use-after-free, overflow, etc.) [x] ~8,300 lines runtime library (verified) [x] 60+ optimization passes (O0-O3 support)
Quick Start¶
Prerequisites¶
# Install LLVM tools (required for AOT mode)
# macOS (Homebrew)
brew install llvm
# Ubuntu/Debian
sudo apt-get install llvm clang
# The tools needed are:
# - llc (LLVM static compiler)
# - clang (C compiler for linking)
Basic Usage¶
# Convert Python to LLVM IR
multigen convert --target llvm program.py
# Build executable (AOT mode, default O2 optimization)
multigen build --target llvm program.py
# Build with maximum optimization (O3)
multigen build --target llvm -O aggressive program.py
# Generated files:
# - program.ll (LLVM IR - original)
# - program.opt.ll (LLVM IR - optimized)
# - program.o (object file)
# - program (executable)
Hello World Example¶
Features & Capabilities¶
Supported Python Features¶
Core Language:
- [x] Functions and recursion
- [x] Control flow (if/elif/else, while, for)
- [x] Type annotations
- [x] Global variables
- [x] Arithmetic operations
- [x] Boolean operations
- [x] Comparison operations
Data Structures:
- [x] Lists (
vec_int,vec_str,vec_vec_int) - [x] Dicts (
map_int_int,map_str_int) - [x] Sets (
set_int) - [x] Strings (full support)
- [x] 2D arrays (nested lists)
String Operations (9 methods - most complete):
- [x]
split()- split by delimiter - [x]
lower()- lowercase conversion - [x]
strip()- remove whitespace - [x]
upper()- uppercase conversion - [x]
join()- join string array - [x]
replace()- replace substring - [x]
startswith()- prefix check - [x]
endswith()- suffix check - [x] Concatenation with
+
Built-in Functions:
- [x]
len()- length of containers/strings - [x]
range()- iteration ranges - [x]
print()- formatted output - [x]
min(),max(),sum()- aggregations - [x]
abs()- absolute value
Advanced Features:
- [x] List comprehensions
- [x] Dict comprehensions
- [x] Set comprehensions
- [x] Nested containers
- [x] Container methods (append, get, contains)
Current Limitations¶
- [X] Classes/OOP (limited support)
- [X] File I/O operations
- [X] Module imports
- [X] Exception handling
- [X] Generators
- [X] Dict methods (keys, values, items)
Compilation Modes¶
The LLVM backend supports two compilation modes with different trade-offs:
AOT (Ahead-of-Time) Mode - Default¶
Best for: Production deployments
# Standard compilation (default O2)
multigen build --target llvm program.py
# With specific optimization level
multigen build --target llvm -O none program.py # O0 (debugging)
multigen build --target llvm -O basic program.py # O1 (development)
multigen build --target llvm -O moderate program.py # O2 (default)
multigen build --target llvm -O aggressive program.py # O3 (max performance)
Characteristics:
- Generates standalone executable
- Small binary size (37KB average)
- Fast execution with optimization (54-86ms depending on level)
- No runtime dependencies
- Moderate compilation (410-430ms with optimization)
Pipeline: Python → Static IR → LLVM IR → Optimizer (60+ passes) → llc → Object File → Executable
JIT (Just-in-Time) Mode¶
Best for: Development and testing
from multigen.backends.llvm.jit_executor import jit_compile_and_run
# Execute LLVM IR directly
result = jit_compile_and_run("program.ll", verbose=True)
print(f"Result: {result}")
Characteristics:
- 7.7x faster total time
- No intermediate files
- In-memory execution
- Great for rapid iteration
- Requires llvmlite runtime
Performance Comparison:
| Metric | AOT Mode | JIT Mode |
|---|---|---|
| Compilation | 330.7ms | ~50ms |
| Execution | 152.9ms | Similar |
| Total | 483.6ms | ~200ms |
| Speedup | 1.0x | 2.4x |
| Binary | Yes | No |
Optimization Levels¶
The LLVM backend includes a complete optimization pass pipeline (v0.1.84) with 60+ LLVM passes, delivering up to 36.5% performance improvement.
Overview¶
| Level | Name | Use Case | Compile Time | Execute Time | IR Size |
|---|---|---|---|---|---|
| O0 | none | Debugging | 408ms | 85.6ms | 6.0KB |
| O1 | basic | Development | 428ms (+5%) | 83.9ms (-2%) | 1.8KB (-70%) |
| O2 | moderate | Production | 424ms (+4%) | 84.8ms (-1%) | 3.3KB (-44%) |
| O3 | aggressive | Max Performance | 424ms (+4%) | 54.3ms (-36.5%) | 3.3KB (-44%) |
Benchmark: Fibonacci (n=29)
Usage¶
# Debug build (O0) - No optimization
multigen build -t llvm -O none program.py
# Use for: debugging, preserving IR structure
# Development build (O1) - Basic optimizations
multigen build -t llvm -O basic program.py
# Use for: fast iteration, basic optimizations
# Production build (O2) - Default, balanced
multigen build -t llvm program.py # or -O moderate
# Use for: production deployments, best balance
# Performance build (O3) - Aggressive optimizations
multigen build -t llvm -O aggressive program.py
# Use for: performance-critical applications
Optimization Passes¶
O0 (none) - Debugging:
- No optimization passes applied
- Preserves original IR structure
- Fastest compilation
- Use for debugging with tools like gdb/lldb
O1 (basic) - Development:
- Dead argument elimination
- Dead code elimination
- Global optimization
- Interprocedural constant propagation (IPSCCP)
- Control flow graph simplification
- Instruction combining
- Result: 70% IR size reduction, 2% faster execution
O2 (moderate) - Production Default:
- All O1 passes, plus:
- Function inlining (threshold: 225 instructions)
- Global dead code elimination
- Expression reassociation
- Sparse conditional constant propagation (SCCP)
- Scalar replacement of aggregates (SROA)
- Tail call elimination
- Loop rotation and simplification
- Memory copy optimization
- Dead store elimination
- Result: 44% IR size reduction, 1% faster execution
O3 (aggressive) - Maximum Performance:
- All O2 passes, plus:
- Aggressive dead code elimination
- Aggressive instruction combining
- Loop unrolling
- Loop unroll-and-jam (nested loops)
- Loop strength reduction
- Argument promotion (pass by value → by reference)
- Function merging (combine identical functions)
- Result: 44% IR size reduction, 36.5% faster execution
IR Transformations¶
Example: Recursive Fibonacci
Original (unoptimized):
define i64 @fibonacci(i64 %n) {
entry:
%cmp = icmp sle i64 %n, 1
br i1 %cmp, label %base, label %recursive
base:
ret i64 %n
recursive:
%n1 = sub i64 %n, 1
%fib1 = call i64 @fibonacci(i64 %n1)
%n2 = sub i64 %n, 2
%fib2 = call i64 @fibonacci(i64 %n2)
%result = add i64 %fib1, %fib2
ret i64 %result
}
After O3 optimization:
; Function Attrs: nofree nosync nounwind memory(none)
define i64 @fibonacci(i64 %.1) local_unnamed_addr #0 {
entry:
%cmp_tmp4 = icmp slt i64 %.1, 2
br i1 %cmp_tmp4, label %common.ret, label %if.merge
if.merge:
; Tail-call optimized recursive call
%call_tmp = tail call i64 @fibonacci(i64 %sub_tmp)
; Loop transformation applied
%cmp_tmp = icmp ult i64 %.1.tr6, 4
br i1 %cmp_tmp, label %common.ret, label %if.merge
common.ret:
%result = phi i64 [ 0, %entry ], [ %add_tmp, %if.merge ]
ret i64 %result
}
Optimizations Applied:
- [x] Tail call optimization (
tail call) - [x] Control flow restructuring
- [x] Function attributes (
nofree,nosync,nounwind) - [x] Loop transformations
- [x] Dead code elimination
- [x] Constant propagation
Performance Impact¶
Compilation Time:
- O0 → O1/O2/O3: +3-5% (15-20ms overhead)
- Minimal impact, well worth the runtime gains
Binary Size:
- All levels produce ~37KB binaries
- Size unchanged by optimization level
- Inlining and DCE balance each other out
Execution Time:
- O0 (baseline): 85.6ms
- O1: 83.9ms (+2.0% faster)
- O2: 84.8ms (+0.9% faster)
- O3: 54.3ms (+36.5% faster )
IR Size:
- Original: 6.0KB
- O1: 1.8KB (-70%, most aggressive reduction)
- O2/O3: 3.3KB (-44%, balanced)
Debugging Optimized Code¶
# Save both original and optimized IR
multigen build -t llvm -O aggressive program.py
# View original IR
cat build/src/program.ll
# View optimized IR
cat build/src/program.opt.ll
# Compare with diff
diff -u build/src/program.ll build/src/program.opt.ll
# Disable optimizations for debugging
multigen build -t llvm -O none program.py
API Usage¶
from multigen.backends.llvm.optimizer import LLVMOptimizer
# Create optimizer with specific level
optimizer = LLVMOptimizer(opt_level=3)
# Optimize LLVM IR
optimized_ir = optimizer.optimize(original_ir)
# Get optimization configuration
info = optimizer.get_optimization_info()
print(f"Level: {info['opt_name']}") # "O3"
print(f"Inlining threshold: {info['inlining_threshold']}") # 275
print(f"Vectorization: {info['vectorization_enabled']}") # True
Recommendations¶
For Development:
- Use O0 or O1 for fastest iteration
- Binary debugging works best with O0
- O1 provides quick feedback on optimization potential
For Testing:
- Use O2 (default) to match production behavior
- Test critical paths with O3 to ensure correctness
- Run memory tests (ASAN) at all optimization levels
For Production:
- Use O2 as default (best balance)
- Use O3 for performance-critical applications
- Profile before and after to measure real-world impact
For Benchmarking:
- Always use O3 for fair comparisons
- Measure compilation + execution time
- Report optimization level with results
String Operations¶
The LLVM backend has the most complete string API among all MultiGen backends:
Basic Operations¶
def string_demo() -> int:
# Concatenation
s1: str = "Hello"
s2: str = "World"
result: str = s1 + " " + s2 # "Hello World"
# Length
length: int = len(result) # 11
# Case conversion
upper: str = result.upper() # "HELLO WORLD"
lower: str = result.lower() # "hello world"
# Whitespace removal
padded: str = " hello "
clean: str = padded.strip() # "hello"
return 0
String Methods (v0.1.83)¶
def advanced_strings() -> int:
text: str = "hello,world,python"
# Split by delimiter
parts: list[str] = text.split(",") # ["hello", "world", "python"]
# Join with separator
joined: str = "-".join(parts) # "hello-world-python"
# Replace substring
replaced: str = text.replace("hello", "goodbye") # "goodbye,world,python"
# Prefix/suffix checking
starts: bool = text.startswith("hello") # True
ends: bool = text.endswith("python") # True
return 0
String Array Operations¶
def string_arrays() -> int:
# Create string list
words: list[str] = ["apple", "banana", "cherry"]
# Iterate and process
for word in words:
upper_word: str = word.upper()
print(upper_word)
# Join into single string
result: str = ", ".join(words) # "apple, banana, cherry"
return 0
Memory Safety¶
The LLVM backend is memory-safe with comprehensive verification:
ASAN Verification (v0.1.82)¶
All benchmarks pass with zero issues:
# Run memory tests
make test-memory-llvm
# Results:
# [x] fibonacci - 0 leaks, 0 errors
# [x] matmul - 0 leaks, 0 errors
# [x] quicksort - 0 leaks, 0 errors
# [x] wordcount - 0 leaks, 0 errors
# [x] list_ops - 0 leaks, 0 errors
# [x] dict_ops - 0 leaks, 0 errors
# [x] set_ops - 0 leaks, 0 errors
Runtime Library¶
- ~8,300 lines of C runtime code
- Zero external dependencies (except libc)
- Verified memory-safe via AddressSanitizer
- Manual memory management with proper cleanup
Memory Model¶
// All containers use explicit allocation/deallocation
vec_int* list = vec_int_init_ptr();
vec_int_push(list, 42);
vec_int_free(list); // Must call to avoid leaks
// Strings are allocated and must be freed
char* result = multigen_str_upper("hello");
free(result); // Generated code handles this
Performance¶
Execution Speed¶
Competitive with top backends with O3 optimization:
| Rank | Backend | Avg Time | Notes |
|---|---|---|---|
| 1 | Go | 64.5ms | Fastest |
| 2 | LLVM (O3) | ~180-200ms | With optimization |
| 2 | LLVM (O0-O2) | 224.5ms | Without full optimization |
| 3 | Rust | 249.3ms | |
| 4 | C++ | 254.9ms | |
| 5 | C | 256.2ms |
Note: LLVM with O3 optimization delivers 36.5% faster execution on fibonacci benchmark
Binary Size¶
2nd smallest binaries:
| Rank | Backend | Avg Size |
|---|---|---|
| 1 | C++ | 36.1KB |
| 2 | LLVM | 37.0KB |
| 3 | C | 65.6KB |
| 4 | Rust | 446.1KB |
| 5 | OCaml | 811.2KB |
Compilation Speed¶
| Backend | Avg Time |
|---|---|
| Go | 82.0ms (fastest) |
| Rust | 218.8ms |
| OCaml | 258.1ms |
| LLVM | 330.7ms |
| C | 384.1ms |
| C++ | 396.4ms |
Optimization Performance¶
# Use optimization flags for better performance
multigen build -t llvm -O aggressive program.py # O3
multigen build -t llvm -O moderate program.py # O2 (default)
multigen build -t llvm -O basic program.py # O1
multigen build -t llvm -O none program.py # O0
Measured Impact (Fibonacci benchmark):
- O0 → O3: 36.5% faster execution (85.6ms → 54.3ms)
- O0 → O3: Only 4% slower compilation (408ms → 424ms)
- Binary size: Unchanged (~37KB)
- IR size: 44% smaller (6.0KB → 3.3KB)
See Optimization Levels section for full details.
Error Handling¶
The LLVM backend provides descriptive error messages (v0.1.83):
Runtime Errors¶
Before (v0.1.82):
After (v0.1.83):
vec_int error: Index 5 out of bounds (size = 3)
map_int_int error: NULL pointer passed to map_int_int_set
set_int error: Failed to allocate entry for value 42
Error Message Format¶
All runtime errors follow this pattern:
Common Error Types¶
- NULL Pointer Errors:
- Index Out of Bounds:
- Memory Allocation Failures:
- Container Full Errors:
Debugging Tips¶
# Enable AddressSanitizer for detailed error reports
export ASAN_OPTIONS=detect_leaks=1:symbolize=1
multigen build --target llvm --enable-asan program.py
# Run with verbose output
./build/program 2>&1 | tee error.log
Advanced Usage¶
Container Operations¶
def container_demo() -> int:
# Lists
numbers: list[int] = [1, 2, 3, 4, 5]
numbers.append(6)
first: int = numbers[0]
length: int = len(numbers)
# 2D arrays
matrix: list[list[int]] = [[1, 2], [3, 4]]
value: int = matrix[0][1] # 2
# Dicts
scores: dict[str, int] = {"alice": 95, "bob": 87}
scores["charlie"] = 92
alice_score: int = scores["alice"]
# Sets
unique: set[int] = {1, 2, 3, 2, 1} # {1, 2, 3}
unique.add(4)
has_two: bool = 2 in unique
return 0
Comprehensions¶
def comprehension_demo() -> int:
# List comprehension
squares: list[int] = [x * x for x in range(10)]
# Dict comprehension
square_map: dict[int, int] = {x: x * x for x in range(5)}
# Set comprehension
even_set: set[int] = {x for x in range(10) if x % 2 == 0}
# Nested comprehension
matrix: list[list[int]] = [[i + j for j in range(3)] for i in range(3)]
return 0
Recursion¶
def fibonacci(n: int) -> int:
if n <= 1:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
def main() -> int:
result: int = fibonacci(10) # 55
print(result)
return 0
Troubleshooting¶
Common Issues¶
1. LLVM Tools Not Found¶
Error: llc: command not found
Solution:
# macOS - add Homebrew LLVM to PATH
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
# Or set explicitly
export LLVM_LLC_PATH=/opt/homebrew/opt/llvm/bin/llc
export LLVM_CLANG_PATH=/opt/homebrew/opt/llvm/bin/clang
2. Compilation Fails¶
Error: IR verification failed
Solution:
# Check LLVM IR syntax
cat build/program.ll | llc -verify-machineinstrs
# Enable verbose output
multigen build --target llvm --verbose program.py
3. Runtime Crashes¶
Error: Segmentation fault
Solution:
# Enable AddressSanitizer
multigen build --target llvm --enable-asan program.py
# Run and check output
./build/program 2>&1 | grep "ERROR:"
4. Performance Issues¶
Problem: Slow execution
Solution:
# Use O3 optimization
export LLVM_OPTIMIZATION_LEVEL=3
multigen build --target llvm program.py
# Or switch to JIT mode for development
python -m multigen.backends.llvm.jit_executor program.ll
Getting Help¶
- Check error messages (v0.1.83+) for specific issues
- Enable verbose mode:
--verbose - Run memory tests:
make test-memory-llvm - Review generated IR:
cat build/program.ll - File issues: GitHub Issues
Architecture¶
Compilation Pipeline¶
Python Source Code
↓
AST Parsing
↓
Static IR (type-checked, optimized)
↓
LLVM IR Generation
↓
LLVM Optimizer (NEW v0.1.84)
• 60+ optimization passes
• O0/O1/O2/O3 support
• Saves .opt.ll file
↓
llc (LLVM Static Compiler)
↓
Object File (.o)
↓
clang (Linker)
↓
Native Executable
Runtime Library Structure¶
src/multigen/backends/llvm/runtime/
├── vec_int_minimal.c # List[int] operations
├── vec_str_minimal.c # List[str] operations
├── vec_vec_int_minimal.c # List[List[int]] operations
├── map_int_int_minimal.c # Dict[int, int] operations
├── map_str_int_minimal.c # Dict[str, int] operations
├── set_int_minimal.c # Set[int] operations
├── multigen_llvm_string.c # String operations (9 methods)
└── multigen_llvm_string.h # String API declarations
Key Components¶
- IRToLLVMConverter (
ir_to_llvm.py) - Converts Static IR to LLVM IR
- ~4,000 lines of Python code
-
Handles all language features
-
LLVM Optimizer (
optimizer.py) - NEW v0.1.84 - Full optimization pass pipeline
- 60+ LLVM passes (O0-O3)
- Pipeline tuning options
-
Saves optimized IR for debugging
-
Runtime Declarations (
runtime_decls.py) - LLVM IR function declarations
- Type definitions for containers
-
Links to C runtime
-
LLVM Compiler (
compiler.py) - Compiles IR to native code
- Handles llc/clang invocation
-
Manages optimization levels
-
LLVM Builder (
builder.py) - Build system integration
- Makefile generation
-
Optimization level support
-
JIT Executor (
jit_executor.py) - In-memory execution
- llvmlite integration
- Fast development cycles
Type System¶
Python Type LLVM Type C Runtime Type
----------- --------- --------------
int i64 long long
float double double
bool i1 bool
str i8* char*
list[int] %struct.vec_int* vec_int*
dict[int, int] %struct.map_int* map_int_int*
set[int] %struct.set_int* set_int*
Additional Resources¶
Documentation¶
- Backend Comparison - Compare all 7 backends
- Memory Testing Guide - ASAN usage
- JIT Mode Details - JIT compilation
Development¶
- Architecture Overview - Design decisions
- Runtime Implementation:
src/multigen/backends/llvm/runtime/ - Test Suite:
tests/test_backend_llvm*.py - String Method Tests:
tests/test_llvm_string_methods.py
Examples¶
See tests/benchmarks/algorithms/ for working examples:
fibonacci.py- Recursionmatmul.py- 2D arraysquicksort.py- List operationswordcount.py- String processinglist_ops.py- List comprehensionsdict_ops.py- Dictionary operationsset_ops.py- Set operations
Changelog¶
v0.1.84 (October 16, 2025) - MAJOR PERFORMANCE RELEASE¶
- [x] Full optimization pipeline (60+ LLVM passes)
- [x] 36.5% performance gain with O3 (fibonacci: 54ms vs 86ms)
- [x] O0/O1/O2/O3 support via CLI flags
- [x] Minimal overhead (only 3-5% slower compilation)
- [x] 1020 tests passing (14 new optimizer tests)
- [x] Production-ready optimization infrastructure
v0.1.83 (October 15, 2025)¶
- [x] 5 new string methods (join, replace, upper, startswith, endswith)
- [x] Better error messages (descriptive runtime errors)
- [x] 107 comprehensive tests (723% increase)
- [x] Most complete string API among all backends
v0.1.82 (October 15, 2025)¶
- [x] Memory safety verification (ASAN integration)
- [x] 0 memory leaks across all benchmarks
- [x] Automated memory testing infrastructure
- [x] Production-ready status achieved
v0.1.80 (October 2025)¶
- [x] 7/7 benchmarks passing (100% coverage)
- [x] Container runtime complete (~8,300 lines)
- [x] JIT compilation mode (7.7x faster dev cycle)
- [x] Production-ready backend
Last Updated: October 16, 2025 Maintained By: MultiGen Team Status: Production Ready
Performance: Up to 36.5% faster with O3 optimization (v0.1.84)