MultiGen LLVM Backend - Complete User Guide¶

Last Updated: October 16, 2025 Version: v0.1.84 Status: Production Ready

Table of Contents¶

Overview
Quick Start
Features & Capabilities
Compilation Modes
Optimization Levels
String Operations
Memory Safety
Performance
Error Handling
Advanced Usage
Troubleshooting
Architecture

Overview¶

The LLVM backend is MultiGen's 6th production-ready backend, generating native executables via LLVM IR. It offers unique advantages:

Key Strengths¶

Up to 36.5% faster with O3 optimization (vs O0)
2nd smallest binaries (37KB average)
Most complete string API (9 methods)
Full LLVM optimization pipeline (60+ passes, O0-O3)
Memory-safe (ASAN verified, 0 leaks)
Dual compilation modes (AOT + JIT)
Platform-independent (via LLVM)
Better error messages (descriptive runtime errors)

Production Status¶

[x] 7/7 benchmarks passing (100% coverage) [x] 1020 comprehensive tests (100% pass rate) [x] 0 memory leaks (ASAN verified) [x] 0 memory errors (use-after-free, overflow, etc.) [x] ~8,300 lines runtime library (verified) [x] 60+ optimization passes (O0-O3 support)

Quick Start¶

Prerequisites¶

# Install LLVM tools (required for AOT mode)
# macOS (Homebrew)
brew install llvm

# Ubuntu/Debian
sudo apt-get install llvm clang

# The tools needed are:
# - llc (LLVM static compiler)
# - clang (C compiler for linking)

Basic Usage¶

# Convert Python to LLVM IR
multigen convert --target llvm program.py

# Build executable (AOT mode, default O2 optimization)
multigen build --target llvm program.py

# Build with maximum optimization (O3)
multigen build --target llvm -O aggressive program.py

# Generated files:
# - program.ll      (LLVM IR - original)
# - program.opt.ll  (LLVM IR - optimized)
# - program.o       (object file)
# - program         (executable)

Hello World Example¶

# hello.py
def main() -> int:
    print("Hello from LLVM!")
    return 0

# Compile and run
multigen build --target llvm hello.py
./build/hello

Features & Capabilities¶

Supported Python Features¶

Core Language:

[x] Functions and recursion
[x] Control flow (if/elif/else, while, for)
[x] Type annotations
[x] Global variables
[x] Arithmetic operations
[x] Boolean operations
[x] Comparison operations

Data Structures:

[x] Lists (vec_int, vec_str, vec_vec_int)
[x] Dicts (map_int_int, map_str_int)
[x] Sets (set_int)
[x] Strings (full support)
[x] 2D arrays (nested lists)

String Operations (9 methods - most complete):

[x] split() - split by delimiter
[x] lower() - lowercase conversion
[x] strip() - remove whitespace
[x] upper() - uppercase conversion
[x] join() - join string array
[x] replace() - replace substring
[x] startswith() - prefix check
[x] endswith() - suffix check
[x] Concatenation with +

Built-in Functions:

[x] len() - length of containers/strings
[x] range() - iteration ranges
[x] print() - formatted output
[x] min(), max(), sum() - aggregations
[x] abs() - absolute value

Advanced Features:

[x] List comprehensions
[x] Dict comprehensions
[x] Set comprehensions
[x] Nested containers
[x] Container methods (append, get, contains)

Current Limitations¶

[X] Classes/OOP (limited support)
[X] File I/O operations
[X] Module imports
[X] Exception handling
[X] Generators
[X] Dict methods (keys, values, items)

Compilation Modes¶

The LLVM backend supports two compilation modes with different trade-offs:

AOT (Ahead-of-Time) Mode - Default¶

Best for: Production deployments

# Standard compilation (default O2)
multigen build --target llvm program.py

# With specific optimization level
multigen build --target llvm -O none program.py        # O0 (debugging)
multigen build --target llvm -O basic program.py       # O1 (development)
multigen build --target llvm -O moderate program.py    # O2 (default)
multigen build --target llvm -O aggressive program.py  # O3 (max performance)

Characteristics:

Generates standalone executable
Small binary size (37KB average)
Fast execution with optimization (54-86ms depending on level)
No runtime dependencies
Moderate compilation (410-430ms with optimization)

Pipeline: Python → Static IR → LLVM IR → Optimizer (60+ passes) → llc → Object File → Executable

JIT (Just-in-Time) Mode¶

Best for: Development and testing

from multigen.backends.llvm.jit_executor import jit_compile_and_run

# Execute LLVM IR directly
result = jit_compile_and_run("program.ll", verbose=True)
print(f"Result: {result}")

Characteristics:

7.7x faster total time
No intermediate files
In-memory execution
Great for rapid iteration
Requires llvmlite runtime

Performance Comparison:

Metric	AOT Mode	JIT Mode
Compilation	330.7ms	~50ms
Execution	152.9ms	Similar
Total	483.6ms	~200ms
Speedup	1.0x	2.4x
Binary	Yes	No

Optimization Levels¶

The LLVM backend includes a complete optimization pass pipeline (v0.1.84) with 60+ LLVM passes, delivering up to 36.5% performance improvement.

Overview¶

Level	Name	Use Case	Compile Time	Execute Time	IR Size
O0	none	Debugging	408ms	85.6ms	6.0KB
O1	basic	Development	428ms (+5%)	83.9ms (-2%)	1.8KB (-70%)
O2	moderate	Production	424ms (+4%)	84.8ms (-1%)	3.3KB (-44%)
O3	aggressive	Max Performance	424ms (+4%)	54.3ms (-36.5%)	3.3KB (-44%)

Benchmark: Fibonacci (n=29)

Usage¶

# Debug build (O0) - No optimization
multigen build -t llvm -O none program.py
# Use for: debugging, preserving IR structure

# Development build (O1) - Basic optimizations
multigen build -t llvm -O basic program.py
# Use for: fast iteration, basic optimizations

# Production build (O2) - Default, balanced
multigen build -t llvm program.py  # or -O moderate
# Use for: production deployments, best balance

# Performance build (O3) - Aggressive optimizations
multigen build -t llvm -O aggressive program.py
# Use for: performance-critical applications

Optimization Passes¶

O0 (none) - Debugging:

No optimization passes applied
Preserves original IR structure
Fastest compilation
Use for debugging with tools like gdb/lldb

O1 (basic) - Development:

Dead argument elimination
Dead code elimination
Global optimization
Interprocedural constant propagation (IPSCCP)
Control flow graph simplification
Instruction combining
Result: 70% IR size reduction, 2% faster execution

O2 (moderate) - Production Default:

All O1 passes, plus:
Function inlining (threshold: 225 instructions)
Global dead code elimination
Expression reassociation
Sparse conditional constant propagation (SCCP)
Scalar replacement of aggregates (SROA)
Tail call elimination
Loop rotation and simplification
Memory copy optimization
Dead store elimination
Result: 44% IR size reduction, 1% faster execution

O3 (aggressive) - Maximum Performance:

All O2 passes, plus:
Aggressive dead code elimination
Aggressive instruction combining
Loop unrolling
Loop unroll-and-jam (nested loops)
Loop strength reduction
Argument promotion (pass by value → by reference)
Function merging (combine identical functions)
Result: 44% IR size reduction, 36.5% faster execution

IR Transformations¶

Example: Recursive Fibonacci

Original (unoptimized):

define i64 @fibonacci(i64 %n) {
entry:
  %cmp = icmp sle i64 %n, 1
  br i1 %cmp, label %base, label %recursive

base:
  ret i64 %n

recursive:
  %n1 = sub i64 %n, 1
  %fib1 = call i64 @fibonacci(i64 %n1)
  %n2 = sub i64 %n, 2
  %fib2 = call i64 @fibonacci(i64 %n2)
  %result = add i64 %fib1, %fib2
  ret i64 %result
}

After O3 optimization:

; Function Attrs: nofree nosync nounwind memory(none)
define i64 @fibonacci(i64 %.1) local_unnamed_addr #0 {
entry:
  %cmp_tmp4 = icmp slt i64 %.1, 2
  br i1 %cmp_tmp4, label %common.ret, label %if.merge

if.merge:
  ; Tail-call optimized recursive call
  %call_tmp = tail call i64 @fibonacci(i64 %sub_tmp)
  ; Loop transformation applied
  %cmp_tmp = icmp ult i64 %.1.tr6, 4
  br i1 %cmp_tmp, label %common.ret, label %if.merge

common.ret:
  %result = phi i64 [ 0, %entry ], [ %add_tmp, %if.merge ]
  ret i64 %result
}

Optimizations Applied:

[x] Tail call optimization (tail call)
[x] Control flow restructuring
[x] Function attributes (nofree, nosync, nounwind)
[x] Loop transformations
[x] Dead code elimination
[x] Constant propagation

Performance Impact¶

Compilation Time:

O0 → O1/O2/O3: +3-5% (15-20ms overhead)
Minimal impact, well worth the runtime gains

Binary Size:

All levels produce ~37KB binaries
Size unchanged by optimization level
Inlining and DCE balance each other out

Execution Time:

O0 (baseline): 85.6ms
O1: 83.9ms (+2.0% faster)
O2: 84.8ms (+0.9% faster)
O3: 54.3ms (+36.5% faster )

IR Size:

Original: 6.0KB
O1: 1.8KB (-70%, most aggressive reduction)
O2/O3: 3.3KB (-44%, balanced)

Debugging Optimized Code¶

# Save both original and optimized IR
multigen build -t llvm -O aggressive program.py

# View original IR
cat build/src/program.ll

# View optimized IR
cat build/src/program.opt.ll

# Compare with diff
diff -u build/src/program.ll build/src/program.opt.ll

# Disable optimizations for debugging
multigen build -t llvm -O none program.py

API Usage¶

from multigen.backends.llvm.optimizer import LLVMOptimizer

# Create optimizer with specific level
optimizer = LLVMOptimizer(opt_level=3)

# Optimize LLVM IR
optimized_ir = optimizer.optimize(original_ir)

# Get optimization configuration
info = optimizer.get_optimization_info()
print(f"Level: {info['opt_name']}")  # "O3"
print(f"Inlining threshold: {info['inlining_threshold']}")  # 275
print(f"Vectorization: {info['vectorization_enabled']}")  # True

Recommendations¶

For Development:

Use O0 or O1 for fastest iteration
Binary debugging works best with O0
O1 provides quick feedback on optimization potential

For Testing:

Use O2 (default) to match production behavior
Test critical paths with O3 to ensure correctness
Run memory tests (ASAN) at all optimization levels

For Production:

Use O2 as default (best balance)
Use O3 for performance-critical applications
Profile before and after to measure real-world impact

For Benchmarking:

Always use O3 for fair comparisons
Measure compilation + execution time
Report optimization level with results

String Operations¶

The LLVM backend has the most complete string API among all MultiGen backends:

Basic Operations¶

def string_demo() -> int:
    # Concatenation
    s1: str = "Hello"
    s2: str = "World"
    result: str = s1 + " " + s2  # "Hello World"

    # Length
    length: int = len(result)  # 11

    # Case conversion
    upper: str = result.upper()  # "HELLO WORLD"
    lower: str = result.lower()  # "hello world"

    # Whitespace removal
    padded: str = "  hello  "
    clean: str = padded.strip()  # "hello"

    return 0

String Methods (v0.1.83)¶

def advanced_strings() -> int:
    text: str = "hello,world,python"

    # Split by delimiter
    parts: list[str] = text.split(",")  # ["hello", "world", "python"]

    # Join with separator
    joined: str = "-".join(parts)  # "hello-world-python"

    # Replace substring
    replaced: str = text.replace("hello", "goodbye")  # "goodbye,world,python"

    # Prefix/suffix checking
    starts: bool = text.startswith("hello")  # True
    ends: bool = text.endswith("python")     # True

    return 0

String Array Operations¶

def string_arrays() -> int:
    # Create string list
    words: list[str] = ["apple", "banana", "cherry"]

    # Iterate and process
    for word in words:
        upper_word: str = word.upper()
        print(upper_word)

    # Join into single string
    result: str = ", ".join(words)  # "apple, banana, cherry"

    return 0

Memory Safety¶

The LLVM backend is memory-safe with comprehensive verification:

ASAN Verification (v0.1.82)¶

All benchmarks pass with zero issues:

# Run memory tests
make test-memory-llvm

# Results:
# [x] fibonacci      - 0 leaks, 0 errors
# [x] matmul         - 0 leaks, 0 errors
# [x] quicksort      - 0 leaks, 0 errors
# [x] wordcount      - 0 leaks, 0 errors
# [x] list_ops       - 0 leaks, 0 errors
# [x] dict_ops       - 0 leaks, 0 errors
# [x] set_ops        - 0 leaks, 0 errors

Runtime Library¶

~8,300 lines of C runtime code
Zero external dependencies (except libc)
Verified memory-safe via AddressSanitizer
Manual memory management with proper cleanup

Memory Model¶

// All containers use explicit allocation/deallocation
vec_int* list = vec_int_init_ptr();
vec_int_push(list, 42);
vec_int_free(list);  // Must call to avoid leaks

// Strings are allocated and must be freed
char* result = multigen_str_upper("hello");
free(result);  // Generated code handles this

Performance¶

Execution Speed¶

Competitive with top backends with O3 optimization:

Rank	Backend	Avg Time	Notes
1	Go	64.5ms	Fastest
2	LLVM (O3)	~180-200ms	With optimization
2	LLVM (O0-O2)	224.5ms	Without full optimization
3	Rust	249.3ms
4	C++	254.9ms
5	C	256.2ms

Note: LLVM with O3 optimization delivers 36.5% faster execution on fibonacci benchmark

Binary Size¶

2nd smallest binaries:

Rank	Backend	Avg Size
1	C++	36.1KB
2	LLVM	37.0KB
3	C	65.6KB
4	Rust	446.1KB
5	OCaml	811.2KB

Compilation Speed¶

Backend	Avg Time
Go	82.0ms (fastest)
Rust	218.8ms
OCaml	258.1ms
LLVM	330.7ms
C	384.1ms
C++	396.4ms

Optimization Performance¶

# Use optimization flags for better performance
multigen build -t llvm -O aggressive program.py  # O3
multigen build -t llvm -O moderate program.py    # O2 (default)
multigen build -t llvm -O basic program.py       # O1
multigen build -t llvm -O none program.py        # O0

Measured Impact (Fibonacci benchmark):

O0 → O3: 36.5% faster execution (85.6ms → 54.3ms)
O0 → O3: Only 4% slower compilation (408ms → 424ms)
Binary size: Unchanged (~37KB)
IR size: 44% smaller (6.0KB → 3.3KB)

See Optimization Levels section for full details.

Error Handling¶

The LLVM backend provides descriptive error messages (v0.1.83):

Runtime Errors¶

Before (v0.1.82):

exit(1)  // Silent crash, no information

After (v0.1.83):

vec_int error: Index 5 out of bounds (size = 3)
map_int_int error: NULL pointer passed to map_int_int_set
set_int error: Failed to allocate entry for value 42

Error Message Format¶

All runtime errors follow this pattern:

{container_type} error: {specific_message}

Common Error Types¶

NULL Pointer Errors:

vec_int error: NULL pointer passed to vec_int_push

Index Out of Bounds:

vec_int error: Index 10 out of bounds (size = 5)

Memory Allocation Failures:

map_str_int error: Failed to allocate memory for capacity 1024

Container Full Errors:

map_int_int error: Failed to find entry slot (map full)

Debugging Tips¶

# Enable AddressSanitizer for detailed error reports
export ASAN_OPTIONS=detect_leaks=1:symbolize=1
multigen build --target llvm --enable-asan program.py

# Run with verbose output
./build/program 2>&1 | tee error.log

Advanced Usage¶

Container Operations¶

def container_demo() -> int:
    # Lists
    numbers: list[int] = [1, 2, 3, 4, 5]
    numbers.append(6)
    first: int = numbers[0]
    length: int = len(numbers)

    # 2D arrays
    matrix: list[list[int]] = [[1, 2], [3, 4]]
    value: int = matrix[0][1]  # 2

    # Dicts
    scores: dict[str, int] = {"alice": 95, "bob": 87}
    scores["charlie"] = 92
    alice_score: int = scores["alice"]

    # Sets
    unique: set[int] = {1, 2, 3, 2, 1}  # {1, 2, 3}
    unique.add(4)
    has_two: bool = 2 in unique

    return 0

Comprehensions¶

def comprehension_demo() -> int:
    # List comprehension
    squares: list[int] = [x * x for x in range(10)]

    # Dict comprehension
    square_map: dict[int, int] = {x: x * x for x in range(5)}

    # Set comprehension
    even_set: set[int] = {x for x in range(10) if x % 2 == 0}

    # Nested comprehension
    matrix: list[list[int]] = [[i + j for j in range(3)] for i in range(3)]

    return 0

Recursion¶

def fibonacci(n: int) -> int:
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

def main() -> int:
    result: int = fibonacci(10)  # 55
    print(result)
    return 0

Troubleshooting¶

Common Issues¶

1. LLVM Tools Not Found¶

Error: llc: command not found

Solution:

# macOS - add Homebrew LLVM to PATH
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"

# Or set explicitly
export LLVM_LLC_PATH=/opt/homebrew/opt/llvm/bin/llc
export LLVM_CLANG_PATH=/opt/homebrew/opt/llvm/bin/clang

2. Compilation Fails¶

Error: IR verification failed

Solution:

# Check LLVM IR syntax
cat build/program.ll | llc -verify-machineinstrs

# Enable verbose output
multigen build --target llvm --verbose program.py

3. Runtime Crashes¶

Error: Segmentation fault

Solution:

# Enable AddressSanitizer
multigen build --target llvm --enable-asan program.py

# Run and check output
./build/program 2>&1 | grep "ERROR:"

4. Performance Issues¶

Problem: Slow execution

Solution:

# Use O3 optimization
export LLVM_OPTIMIZATION_LEVEL=3
multigen build --target llvm program.py

# Or switch to JIT mode for development
python -m multigen.backends.llvm.jit_executor program.ll

Getting Help¶

Check error messages (v0.1.83+) for specific issues
Enable verbose mode: --verbose
Run memory tests: make test-memory-llvm
Review generated IR: cat build/program.ll
File issues: GitHub Issues

Architecture¶

Compilation Pipeline¶

Python Source Code
       ↓
   AST Parsing
       ↓
  Static IR (type-checked, optimized)
       ↓
  LLVM IR Generation
       ↓
  LLVM Optimizer (NEW v0.1.84)
  • 60+ optimization passes
  • O0/O1/O2/O3 support
  • Saves .opt.ll file
       ↓
     llc (LLVM Static Compiler)
       ↓
  Object File (.o)
       ↓
    clang (Linker)
       ↓
Native Executable

Runtime Library Structure¶

src/multigen/backends/llvm/runtime/
├── vec_int_minimal.c          # List[int] operations
├── vec_str_minimal.c          # List[str] operations
├── vec_vec_int_minimal.c      # List[List[int]] operations
├── map_int_int_minimal.c      # Dict[int, int] operations
├── map_str_int_minimal.c      # Dict[str, int] operations
├── set_int_minimal.c          # Set[int] operations
├── multigen_llvm_string.c         # String operations (9 methods)
└── multigen_llvm_string.h         # String API declarations

Key Components¶

IRToLLVMConverter (ir_to_llvm.py)
Converts Static IR to LLVM IR
~4,000 lines of Python code
Handles all language features
LLVM Optimizer (optimizer.py) - NEW v0.1.84
Full optimization pass pipeline
60+ LLVM passes (O0-O3)
Pipeline tuning options
Saves optimized IR for debugging
Runtime Declarations (runtime_decls.py)
LLVM IR function declarations
Type definitions for containers
Links to C runtime
LLVM Compiler (compiler.py)
Compiles IR to native code
Handles llc/clang invocation
Manages optimization levels
LLVM Builder (builder.py)
Build system integration
Makefile generation
Optimization level support
JIT Executor (jit_executor.py)
In-memory execution
llvmlite integration
Fast development cycles

Type System¶

Python Type          LLVM Type           C Runtime Type
-----------          ---------           --------------
int                  i64                 long long
float                double              double
bool                 i1                  bool
str                  i8*                 char*
list[int]            %struct.vec_int*    vec_int*
dict[int, int]       %struct.map_int*    map_int_int*
set[int]             %struct.set_int*    set_int*

Additional Resources¶

Documentation¶

Backend Comparison - Compare all 7 backends
Memory Testing Guide - ASAN usage
JIT Mode Details - JIT compilation

Development¶

Architecture Overview - Design decisions
Runtime Implementation: src/multigen/backends/llvm/runtime/
Test Suite: tests/test_backend_llvm*.py
String Method Tests: tests/test_llvm_string_methods.py

Examples¶

See tests/benchmarks/algorithms/ for working examples:

fibonacci.py - Recursion
matmul.py - 2D arrays
quicksort.py - List operations
wordcount.py - String processing
list_ops.py - List comprehensions
dict_ops.py - Dictionary operations
set_ops.py - Set operations

Changelog¶

v0.1.84 (October 16, 2025) - MAJOR PERFORMANCE RELEASE¶

[x] Full optimization pipeline (60+ LLVM passes)
[x] 36.5% performance gain with O3 (fibonacci: 54ms vs 86ms)
[x] O0/O1/O2/O3 support via CLI flags
[x] Minimal overhead (only 3-5% slower compilation)
[x] 1020 tests passing (14 new optimizer tests)
[x] Production-ready optimization infrastructure

v0.1.83 (October 15, 2025)¶

[x] 5 new string methods (join, replace, upper, startswith, endswith)
[x] Better error messages (descriptive runtime errors)
[x] 107 comprehensive tests (723% increase)
[x] Most complete string API among all backends

v0.1.82 (October 15, 2025)¶

[x] Memory safety verification (ASAN integration)
[x] 0 memory leaks across all benchmarks
[x] Automated memory testing infrastructure
[x] Production-ready status achieved

v0.1.80 (October 2025)¶

[x] 7/7 benchmarks passing (100% coverage)
[x] Container runtime complete (~8,300 lines)
[x] JIT compilation mode (7.7x faster dev cycle)
[x] Production-ready backend

Last Updated: October 16, 2025 Maintained By: MultiGen Team Status: Production Ready

Performance: Up to 36.5% faster with O3 optimization (v0.1.84)