Skip to content

Cyllama Agent Framework Overview

Cyllama includes a zero-dependency agent framework for building tool-using LLM agents. The framework provides three agent architectures, each designed for different reliability and control requirements.

Table of Contents

  1. Quick Start
  2. Architecture
  3. Tools
  4. Agents
  5. ReActAgent
  6. ConstrainedAgent
  7. ContractAgent
  8. Events and Results
  9. Configuration
  10. Best Practices
  11. Async Agents

Quick Start

from cyllama import LLM
from cyllama.agents import ReActAgent, tool

# Define a tool
@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    return str(eval(expression))

# Create agent
llm = LLM("models/Llama-3.2-1B-Instruct-Q8_0.gguf")
agent = ReActAgent(llm=llm, tools=[calculate])

# Run task
result = agent.run("What is 25 * 4?")
print(result.answer)  # "100"

Architecture

                        ┌──────────────────────────────────┐
                        │           User Task              │
                        └──────────────┬───────────────────┘
                        ┌──────────────▼───────────────────┐
                        │         Agent Layer              │
                        │  ┌─────────────────────────────┐ │
                        │  │ ReActAgent | ContractAgent  │ │
                        │  │      ConstrainedAgent       │ │
                        │  └─────────────────────────────┘ │
                        └──────────────┬───────────────────┘
              ┌────────────────────────┼──────────────────────┐
              │                        │                      │
    ┌─────────▼─────────┐   ┌──────────▼────────┐   ┌─────────▼─────────┐
    │   Tool Registry   │   │       LLM         │   │  Event Stream     │
    │  - Tool lookup    │   │  - Generation     │   │  - THOUGHT        │
    │  - Schema gen     │   │  - Streaming      │   │  - ACTION         │
    │  - Execution      │   │  - Grammar (opt)  │   │  - OBSERVATION    │
    └───────────────────┘   └───────────────────┘   │  - ANSWER         │
                                                    │  - ERROR          │
                                                    └───────────────────┘

Tools

Tools are Python functions that agents can invoke. The @tool decorator automatically extracts type information and generates JSON schemas.

Defining Tools

from cyllama.agents import tool

@tool
def search_web(query: str, max_results: int = 5) -> str:
    """
    Search the web for information.

    Args:
        query: Search query string
        max_results: Maximum number of results to return
    """
    # Implementation
    return f"Results for: {query}"

@tool(name="calc", description="Evaluate math expressions")
def calculate(expression: str) -> float:
    """Safe math evaluation."""
    return eval(expression)

Tool Parameters

Type hints are automatically converted to JSON schema types:

Python Type JSON Schema Type
str string
int integer
float number
bool boolean
list array
dict object

Parameters without default values are marked as required.

Tool Registry

Tools can be managed via the ToolRegistry class:

from cyllama.agents import Tool, ToolRegistry

registry = ToolRegistry()
registry.register(search_web)
registry.register(calculate)

# Get tool by name
tool = registry.get("search_web")

# Generate prompt descriptions
prompt = registry.to_prompt_string()

# Generate JSON schemas (OpenAI format)
schemas = registry.to_json_schema()

Agents

ReActAgent

Implements the ReAct (Reasoning + Acting) pattern where the agent alternates between thinking and acting.

Pattern:

Thought: [reasoning about what to do]
Action: tool_name({"arg": "value"})
Observation: [result from tool]
... (repeat)
Thought: I now know the answer
Answer: [final answer]

Reference: ReAct: Synergizing Reasoning and Acting in Language Models

Usage:

from cyllama import LLM
from cyllama.agents import ReActAgent, tool

@tool
def search(query: str) -> str:
    return f"Results for: {query}"

agent = ReActAgent(
    llm=LLM("model.gguf"),
    tools=[search],
    max_iterations=10,
    verbose=True,
)

result = agent.run("Search for Python tutorials")

Parameters:

Parameter Type Default Description
llm LLM required LLM instance for generation
tools List[Tool] None Available tools
system_prompt str default Custom system prompt
max_iterations int 10 Maximum thought/action cycles
verbose bool False Print reasoning to stdout
generation_config GenerationConfig default LLM generation settings
detect_loops bool True Enable loop detection
max_consecutive_same_action int 2 Same action repeat limit
max_consecutive_same_tool int 4 Same tool repeat limit
max_context_chars int 6000 Context truncation limit

Strengths:

  • Natural reasoning trace for debugging
  • Works well with most instruction-tuned models
  • Flexible action format

Weaknesses:

  • Parsing can fail on malformed output
  • Requires larger models for reliable tool calling

ConstrainedAgent

Uses GBNF grammar constraints to guarantee valid JSON tool calls. Eliminates parsing failures by constraining the LLM's output space.

Usage:

from cyllama import LLM
from cyllama.agents import ConstrainedAgent, tool

@tool
def calculate(expression: str) -> str:
    return str(eval(expression))

agent = ConstrainedAgent(
    llm=LLM("model.gguf"),
    tools=[calculate],
    format="json",
    allow_reasoning=True,
)

result = agent.run("What is 100 / 4?")

Parameters:

Parameter Type Default Description
llm LLM required LLM instance for generation
tools List[Tool] None Available tools
system_prompt str default Custom system prompt
max_iterations int 10 Maximum tool call cycles
verbose bool False Print actions to stdout
generation_config ConstrainedGenerationConfig default Generation settings
format str "json" Output format (json, json_array, function_call)
allow_reasoning bool False Include reasoning field
use_cache bool True Cache compiled grammars
detect_loops bool True Enable loop detection

Output Format:

{"type": "tool_call", "tool_name": "calculate", "tool_args": {"expression": "100/4"}}

or

{"type": "answer", "content": "The result is 25"}

Strengths:

  • 100% valid JSON output (grammar-enforced)
  • Works with smaller models
  • Eliminates parsing failures

Weaknesses:

  • Less natural output format
  • Grammar compilation overhead (mitigated by caching)

ContractAgent

Contract-based agent inspired by C++26 contracts (P2900). Adds preconditions, postconditions, and runtime assertions to tool calls.

Usage:

from cyllama import LLM
from cyllama.agents import ContractAgent, tool, pre, post, ContractPolicy

@tool
@pre(lambda args: args['x'] != 0, "cannot divide by zero")
@post(lambda r: r is not None, "result must not be None")
def divide(a: float, x: float) -> float:
    """Divide a by x."""
    return a / x

agent = ContractAgent(
    llm=LLM("model.gguf"),
    tools=[divide],
    policy=ContractPolicy.ENFORCE,
    task_precondition=lambda task: len(task) > 10,
    answer_postcondition=lambda ans: len(ans) > 0,
)

result = agent.run("What is 100 divided by 4?")

Contract Decorators:

# Precondition - checked before tool execution
@pre(lambda args: args['count'] > 0, "count must be positive")

# Postcondition - checked after tool execution
@post(lambda result: len(result) > 0, "must return non-empty result")

# Postcondition with access to original arguments
@post(lambda r, args: len(r) <= args['max_len'], "result too long")

Contract Policies:

Policy Checks Handler Called Continues Terminates
IGNORE No No Yes No
OBSERVE Yes Yes (on fail) Yes No
ENFORCE Yes Yes (on fail) No Yes
QUICK_ENFORCE Yes No No Yes

Runtime Assertions:

from cyllama.agents import contract_assert

@tool
def process_data(data: str) -> str:
    parsed = json.loads(data)
    contract_assert(isinstance(parsed, dict), "data must be JSON object")
    return str(parsed)

Agent-Level Contracts:

agent = ContractAgent(
    llm=llm,
    tools=[...],
    task_precondition=lambda task: len(task) >= 10,
    answer_postcondition=lambda ans: "error" not in ans.lower(),
    iteration_invariant=lambda state: state.iterations < 20,
)

Violation Handler:

def my_handler(violation: ContractViolation) -> None:
    print(f"VIOLATION: {violation.kind} at {violation.location}")
    print(f"  Message: {violation.message}")
    # Log, alert, etc.

agent = ContractAgent(
    llm=llm,
    tools=[...],
    violation_handler=my_handler,
)

Strengths:

  • Runtime verification of tool behavior
  • Configurable violation handling
  • Agent-level invariants

Weaknesses:

  • Additional overhead for contract checking
  • Requires explicit contract definitions

Events and Results

Event Types

Agents emit events during execution:

from cyllama.agents import EventType

class EventType(Enum):
    THOUGHT = "thought"           # Agent reasoning
    ACTION = "action"             # Tool invocation
    OBSERVATION = "observation"   # Tool result
    ANSWER = "answer"             # Final answer
    ERROR = "error"               # Error occurred
    CONTRACT_CHECK = "contract_check"         # Contract being evaluated
    CONTRACT_VIOLATION = "contract_violation" # Violation detected

Streaming Events

for event in agent.stream("What is 2+2?"):
    if event.type == EventType.THOUGHT:
        print(f"Thinking: {event.content}")
    elif event.type == EventType.ACTION:
        print(f"Calling: {event.content}")
    elif event.type == EventType.OBSERVATION:
        print(f"Result: {event.content}")
    elif event.type == EventType.ANSWER:
        print(f"Answer: {event.content}")

AgentResult

result = agent.run("What is 2+2?")

print(result.answer)      # Final answer string
print(result.success)     # True if completed without error
print(result.error)       # Error message if failed
print(result.iterations)  # Number of iterations
print(result.steps)       # List of AgentEvent
print(result.metrics)     # AgentMetrics (timing, counts)

AgentMetrics

metrics = result.metrics
print(f"Total time: {metrics.total_time_ms}ms")
print(f"Iterations: {metrics.iterations}")
print(f"Tool calls: {metrics.tool_calls}")
print(f"Generation time: {metrics.generation_time_ms}ms")
print(f"Tool time: {metrics.tool_time_ms}ms")
print(f"Loop detected: {metrics.loop_detected}")
print(f"Errors: {metrics.error_count}")

Configuration

GenerationConfig (ReActAgent)

from cyllama import GenerationConfig

config = GenerationConfig(
    temperature=0.7,
    max_tokens=512,
    top_k=40,
    top_p=0.95,
    min_p=0.05,
    stop_sequences=["Observation:"],
)

agent = ReActAgent(llm=llm, tools=tools, generation_config=config)

ConstrainedGenerationConfig

from cyllama.agents import ConstrainedGenerationConfig

config = ConstrainedGenerationConfig(
    temperature=0.7,
    max_tokens=512,
    top_k=40,
    top_p=0.95,
    min_p=0.05,
)

agent = ConstrainedAgent(llm=llm, tools=tools, generation_config=config)

Best Practices

1. Choose the Right Agent

Use Case Recommended Agent
General-purpose tasks ReActAgent
Smaller models ConstrainedAgent
Critical applications ContractAgent
Debugging/explainability ReActAgent (verbose)
High reliability required ConstrainedAgent + ContractAgent

2. Tool Design

# Good: Clear description, typed parameters, docstring
@tool
def search_database(query: str, limit: int = 10) -> str:
    """
    Search the database for records matching query.

    Args:
        query: Search term
        limit: Maximum results to return
    """
    return db.search(query, limit)

# Bad: Vague description, no types
@tool
def search(q):
    return db.search(q)

3. Error Handling

@tool
def risky_operation(data: str) -> str:
    """Perform operation that might fail."""
    try:
        result = process(data)
        return f"Success: {result}"
    except ValueError as e:
        return f"Error: Invalid data - {e}"
    except Exception as e:
        return f"Error: {e}"

4. Loop Prevention

Configure loop detection to prevent infinite loops:

agent = ReActAgent(
    llm=llm,
    tools=tools,
    detect_loops=True,
    max_consecutive_same_action=2,  # Same exact action
    max_consecutive_same_tool=4,    # Same tool with any args
    max_iterations=10,              # Hard limit
)

5. Context Management

Prevent context overflow with truncation:

agent = ReActAgent(
    llm=llm,
    tools=tools,
    max_context_chars=6000,  # Truncate older history
)

6. Contracts for Safety

Use contracts for safety-critical tools:

@tool
@pre(lambda args: 0 <= args['amount'] <= 1000, "amount must be 0-1000")
@pre(lambda args: args['account_id'].isalnum(), "invalid account ID")
@post(lambda r: r.startswith("TX"), "must return transaction ID")
def transfer_funds(account_id: str, amount: float) -> str:
    """Transfer funds to account."""
    return banking_api.transfer(account_id, amount)

Async Agents

For non-blocking agent execution in async applications, use the async agent wrappers.

AsyncReActAgent

Async wrapper for ReActAgent:

import asyncio
from cyllama.agents import AsyncReActAgent, tool

@tool
def search(query: str) -> str:
    """Search for information."""
    return f"Results for: {query}"

async def main():
    async with AsyncReActAgent(
        "models/Llama-3.2-1B-Instruct-Q8_0.gguf",
        tools=[search],
        max_iterations=5
    ) as agent:
        # Async run
        result = await agent.run("Search for Python tutorials")
        print(result.answer)

        # Async streaming
        async for event in agent.stream("Find information about AI"):
            print(f"{event.type.value}: {event.content}")

asyncio.run(main())

AsyncConstrainedAgent

Async wrapper for ConstrainedAgent:

from cyllama.agents import AsyncConstrainedAgent, tool

@tool
def calculate(expression: str) -> str:
    """Evaluate a math expression."""
    return str(eval(expression))

async def main():
    async with AsyncConstrainedAgent(
        "models/Llama-3.2-1B-Instruct-Q8_0.gguf",
        tools=[calculate]
    ) as agent:
        result = await agent.run("What is 100 / 4?")
        print(result.answer)

asyncio.run(main())

run_agent_async()

Helper function to run any synchronous agent asynchronously:

from cyllama import LLM
from cyllama.agents import ReActAgent, run_agent_async, tool

@tool
def greet(name: str) -> str:
    return f"Hello, {name}!"

# Create sync agent
llm = LLM("models/Llama-3.2-1B-Instruct-Q8_0.gguf")
agent = ReActAgent(llm=llm, tools=[greet])

# Run asynchronously
async def main():
    result = await run_agent_async(agent, "Greet Alice")
    print(result.answer)

asyncio.run(main())

Async Agent Parameters

Both AsyncReActAgent and AsyncConstrainedAgent accept the same parameters as their synchronous counterparts:

Parameter Type Default Description
model_path str required Path to GGUF model file
tools List[Tool] None Available tools
config GenerationConfig None LLM configuration
system_prompt str default Custom system prompt
max_iterations int 10 Maximum iterations
verbose bool False Print output

Thread Safety

Async agents use an internal lock to serialize access, ensuring thread-safe operation. For true parallel execution, create multiple agent instances:

async def parallel_agents():
    async with AsyncReActAgent("model.gguf", tools=tools) as agent1, \
               AsyncReActAgent("model.gguf", tools=tools) as agent2:

        task1 = agent1.run("Task 1")
        task2 = agent2.run("Task 2")

        results = await asyncio.gather(task1, task2)

References