Cyllama Agent Framework Overview¶

Cyllama includes a zero-dependency agent framework for building tool-using LLM agents. The framework provides three agent architectures, each designed for different reliability and control requirements.

Table of Contents¶

Quick Start
Architecture
Tools
Agents
ReActAgent
ConstrainedAgent
ContractAgent
Events and Results
Configuration
Best Practices
Async Agents

Quick Start¶

from cyllama import LLM
from cyllama.agents import ReActAgent, tool

# Define a tool
@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    return str(eval(expression))

# Create agent
llm = LLM("models/Llama-3.2-1B-Instruct-Q8_0.gguf")
agent = ReActAgent(llm=llm, tools=[calculate])

# Run task
result = agent.run("What is 25 * 4?")
print(result.answer)  # "100"

Architecture¶

                        ┌──────────────────────────────────┐
                        │           User Task              │
                        └──────────────┬───────────────────┘
                                       │
                        ┌──────────────▼───────────────────┐
                        │         Agent Layer              │
                        │  ┌─────────────────────────────┐ │
                        │  │ ReActAgent | ContractAgent  │ │
                        │  │      ConstrainedAgent       │ │
                        │  └─────────────────────────────┘ │
                        └──────────────┬───────────────────┘
                                       │
              ┌────────────────────────┼──────────────────────┐
              │                        │                      │
    ┌─────────▼─────────┐   ┌──────────▼────────┐   ┌─────────▼─────────┐
    │   Tool Registry   │   │       LLM         │   │  Event Stream     │
    │  - Tool lookup    │   │  - Generation     │   │  - THOUGHT        │
    │  - Schema gen     │   │  - Streaming      │   │  - ACTION         │
    │  - Execution      │   │  - Grammar (opt)  │   │  - OBSERVATION    │
    └───────────────────┘   └───────────────────┘   │  - ANSWER         │
                                                    │  - ERROR          │
                                                    └───────────────────┘

Tools¶

Tools are Python functions that agents can invoke. The @tool decorator automatically extracts type information and generates JSON schemas.

Defining Tools¶

from cyllama.agents import tool

@tool
def search_web(query: str, max_results: int = 5) -> str:
    """
    Search the web for information.

    Args:
        query: Search query string
        max_results: Maximum number of results to return
    """
    # Implementation
    return f"Results for: {query}"

@tool(name="calc", description="Evaluate math expressions")
def calculate(expression: str) -> float:
    """Safe math evaluation."""
    return eval(expression)

Tool Parameters¶

Type hints are automatically converted to JSON schema types:

Python Type	JSON Schema Type
`str`	`string`
`int`	`integer`
`float`	`number`
`bool`	`boolean`
`list`	`array`
`dict`	`object`

Parameters without default values are marked as required.

Tool Registry¶

Tools can be managed via the ToolRegistry class:

from cyllama.agents import Tool, ToolRegistry

registry = ToolRegistry()
registry.register(search_web)
registry.register(calculate)

# Get tool by name
tool = registry.get("search_web")

# Generate prompt descriptions
prompt = registry.to_prompt_string()

# Generate JSON schemas (OpenAI format)
schemas = registry.to_json_schema()

Agents¶

ReActAgent¶

Implements the ReAct (Reasoning + Acting) pattern where the agent alternates between thinking and acting.

Pattern:

Thought: [reasoning about what to do]
Action: tool_name({"arg": "value"})
Observation: [result from tool]
... (repeat)
Thought: I now know the answer
Answer: [final answer]

Reference: ReAct: Synergizing Reasoning and Acting in Language Models

Usage:

from cyllama import LLM
from cyllama.agents import ReActAgent, tool

@tool
def search(query: str) -> str:
    return f"Results for: {query}"

agent = ReActAgent(
    llm=LLM("model.gguf"),
    tools=[search],
    max_iterations=10,
    verbose=True,
)

result = agent.run("Search for Python tutorials")

Parameters:

Parameter	Type	Default	Description
`llm`	`LLM`	required	LLM instance for generation
`tools`	`List[Tool]`	`None`	Available tools
`system_prompt`	`str`	default	Custom system prompt
`max_iterations`	`int`	`10`	Maximum thought/action cycles
`verbose`	`bool`	`False`	Print reasoning to stdout
`generation_config`	`GenerationConfig`	default	LLM generation settings
`detect_loops`	`bool`	`True`	Enable loop detection
`max_consecutive_same_action`	`int`	`2`	Same action repeat limit
`max_consecutive_same_tool`	`int`	`4`	Same tool repeat limit
`max_context_chars`	`int`	`6000`	Context truncation limit

Strengths:

Natural reasoning trace for debugging
Works well with most instruction-tuned models
Flexible action format

Weaknesses:

Parsing can fail on malformed output
Requires larger models for reliable tool calling

ConstrainedAgent¶

Uses GBNF grammar constraints to guarantee valid JSON tool calls. Eliminates parsing failures by constraining the LLM's output space.

Usage:

from cyllama import LLM
from cyllama.agents import ConstrainedAgent, tool

@tool
def calculate(expression: str) -> str:
    return str(eval(expression))

agent = ConstrainedAgent(
    llm=LLM("model.gguf"),
    tools=[calculate],
    format="json",
    allow_reasoning=True,
)

result = agent.run("What is 100 / 4?")

Parameters:

Parameter	Type	Default	Description
`llm`	`LLM`	required	LLM instance for generation
`tools`	`List[Tool]`	`None`	Available tools
`system_prompt`	`str`	default	Custom system prompt
`max_iterations`	`int`	`10`	Maximum tool call cycles
`verbose`	`bool`	`False`	Print actions to stdout
`generation_config`	`ConstrainedGenerationConfig`	default	Generation settings
`format`	`str`	`"json"`	Output format (`json`, `json_array`, `function_call`)
`allow_reasoning`	`bool`	`False`	Include reasoning field
`use_cache`	`bool`	`True`	Cache compiled grammars
`detect_loops`	`bool`	`True`	Enable loop detection

Output Format:

{"type": "tool_call", "tool_name": "calculate", "tool_args": {"expression": "100/4"}}

or

{"type": "answer", "content": "The result is 25"}

Strengths:

100% valid JSON output (grammar-enforced)
Works with smaller models
Eliminates parsing failures

Weaknesses:

Less natural output format
Grammar compilation overhead (mitigated by caching)

ContractAgent¶

Contract-based agent inspired by C++26 contracts (P2900). Adds preconditions, postconditions, and runtime assertions to tool calls.

Usage:

from cyllama import LLM
from cyllama.agents import ContractAgent, tool, pre, post, ContractPolicy

@tool
@pre(lambda args: args['x'] != 0, "cannot divide by zero")
@post(lambda r: r is not None, "result must not be None")
def divide(a: float, x: float) -> float:
    """Divide a by x."""
    return a / x

agent = ContractAgent(
    llm=LLM("model.gguf"),
    tools=[divide],
    policy=ContractPolicy.ENFORCE,
    task_precondition=lambda task: len(task) > 10,
    answer_postcondition=lambda ans: len(ans) > 0,
)

result = agent.run("What is 100 divided by 4?")

Contract Decorators:

# Precondition - checked before tool execution
@pre(lambda args: args['count'] > 0, "count must be positive")

# Postcondition - checked after tool execution
@post(lambda result: len(result) > 0, "must return non-empty result")

# Postcondition with access to original arguments
@post(lambda r, args: len(r) <= args['max_len'], "result too long")

Contract Policies:

Policy	Checks	Handler Called	Continues	Terminates
`IGNORE`	No	No	Yes	No
`OBSERVE`	Yes	Yes (on fail)	Yes	No
`ENFORCE`	Yes	Yes (on fail)	No	Yes
`QUICK_ENFORCE`	Yes	No	No	Yes

Runtime Assertions:

from cyllama.agents import contract_assert

@tool
def process_data(data: str) -> str:
    parsed = json.loads(data)
    contract_assert(isinstance(parsed, dict), "data must be JSON object")
    return str(parsed)

Agent-Level Contracts:

agent = ContractAgent(
    llm=llm,
    tools=[...],
    task_precondition=lambda task: len(task) >= 10,
    answer_postcondition=lambda ans: "error" not in ans.lower(),
    iteration_invariant=lambda state: state.iterations < 20,
)

Violation Handler:

def my_handler(violation: ContractViolation) -> None:
    print(f"VIOLATION: {violation.kind} at {violation.location}")
    print(f"  Message: {violation.message}")
    # Log, alert, etc.

agent = ContractAgent(
    llm=llm,
    tools=[...],
    violation_handler=my_handler,
)

Strengths:

Runtime verification of tool behavior
Configurable violation handling
Agent-level invariants

Weaknesses:

Additional overhead for contract checking
Requires explicit contract definitions

Events and Results¶

Event Types¶

Agents emit events during execution:

from cyllama.agents import EventType

class EventType(Enum):
    THOUGHT = "thought"           # Agent reasoning
    ACTION = "action"             # Tool invocation
    OBSERVATION = "observation"   # Tool result
    ANSWER = "answer"             # Final answer
    ERROR = "error"               # Error occurred
    CONTRACT_CHECK = "contract_check"         # Contract being evaluated
    CONTRACT_VIOLATION = "contract_violation" # Violation detected

Streaming Events¶

for event in agent.stream("What is 2+2?"):
    if event.type == EventType.THOUGHT:
        print(f"Thinking: {event.content}")
    elif event.type == EventType.ACTION:
        print(f"Calling: {event.content}")
    elif event.type == EventType.OBSERVATION:
        print(f"Result: {event.content}")
    elif event.type == EventType.ANSWER:
        print(f"Answer: {event.content}")

AgentResult¶

result = agent.run("What is 2+2?")

print(result.answer)      # Final answer string
print(result.success)     # True if completed without error
print(result.error)       # Error message if failed
print(result.iterations)  # Number of iterations
print(result.steps)       # List of AgentEvent
print(result.metrics)     # AgentMetrics (timing, counts)

AgentMetrics¶

metrics = result.metrics
print(f"Total time: {metrics.total_time_ms}ms")
print(f"Iterations: {metrics.iterations}")
print(f"Tool calls: {metrics.tool_calls}")
print(f"Generation time: {metrics.generation_time_ms}ms")
print(f"Tool time: {metrics.tool_time_ms}ms")
print(f"Loop detected: {metrics.loop_detected}")
print(f"Errors: {metrics.error_count}")

Configuration¶

GenerationConfig (ReActAgent)¶

from cyllama import GenerationConfig

config = GenerationConfig(
    temperature=0.7,
    max_tokens=512,
    top_k=40,
    top_p=0.95,
    min_p=0.05,
    stop_sequences=["Observation:"],
)

agent = ReActAgent(llm=llm, tools=tools, generation_config=config)

ConstrainedGenerationConfig¶

from cyllama.agents import ConstrainedGenerationConfig

config = ConstrainedGenerationConfig(
    temperature=0.7,
    max_tokens=512,
    top_k=40,
    top_p=0.95,
    min_p=0.05,
)

agent = ConstrainedAgent(llm=llm, tools=tools, generation_config=config)

Best Practices¶

1. Choose the Right Agent¶

Use Case	Recommended Agent
General-purpose tasks	ReActAgent
Smaller models	ConstrainedAgent
Critical applications	ContractAgent
Debugging/explainability	ReActAgent (verbose)
High reliability required	ConstrainedAgent + ContractAgent

2. Tool Design¶

# Good: Clear description, typed parameters, docstring
@tool
def search_database(query: str, limit: int = 10) -> str:
    """
    Search the database for records matching query.

    Args:
        query: Search term
        limit: Maximum results to return
    """
    return db.search(query, limit)

# Bad: Vague description, no types
@tool
def search(q):
    return db.search(q)

3. Error Handling¶

@tool
def risky_operation(data: str) -> str:
    """Perform operation that might fail."""
    try:
        result = process(data)
        return f"Success: {result}"
    except ValueError as e:
        return f"Error: Invalid data - {e}"
    except Exception as e:
        return f"Error: {e}"

4. Loop Prevention¶

Configure loop detection to prevent infinite loops:

agent = ReActAgent(
    llm=llm,
    tools=tools,
    detect_loops=True,
    max_consecutive_same_action=2,  # Same exact action
    max_consecutive_same_tool=4,    # Same tool with any args
    max_iterations=10,              # Hard limit
)

5. Context Management¶

Prevent context overflow with truncation:

agent = ReActAgent(
    llm=llm,
    tools=tools,
    max_context_chars=6000,  # Truncate older history
)

6. Contracts for Safety¶

Use contracts for safety-critical tools:

@tool
@pre(lambda args: 0 <= args['amount'] <= 1000, "amount must be 0-1000")
@pre(lambda args: args['account_id'].isalnum(), "invalid account ID")
@post(lambda r: r.startswith("TX"), "must return transaction ID")
def transfer_funds(account_id: str, amount: float) -> str:
    """Transfer funds to account."""
    return banking_api.transfer(account_id, amount)

Async Agents¶

For non-blocking agent execution in async applications, use the async agent wrappers.

AsyncReActAgent¶

Async wrapper for ReActAgent:

import asyncio
from cyllama.agents import AsyncReActAgent, tool

@tool
def search(query: str) -> str:
    """Search for information."""
    return f"Results for: {query}"

async def main():
    async with AsyncReActAgent(
        "models/Llama-3.2-1B-Instruct-Q8_0.gguf",
        tools=[search],
        max_iterations=5
    ) as agent:
        # Async run
        result = await agent.run("Search for Python tutorials")
        print(result.answer)

        # Async streaming
        async for event in agent.stream("Find information about AI"):
            print(f"{event.type.value}: {event.content}")

asyncio.run(main())

AsyncConstrainedAgent¶

Async wrapper for ConstrainedAgent:

from cyllama.agents import AsyncConstrainedAgent, tool

@tool
def calculate(expression: str) -> str:
    """Evaluate a math expression."""
    return str(eval(expression))

async def main():
    async with AsyncConstrainedAgent(
        "models/Llama-3.2-1B-Instruct-Q8_0.gguf",
        tools=[calculate]
    ) as agent:
        result = await agent.run("What is 100 / 4?")
        print(result.answer)

asyncio.run(main())

run_agent_async()¶

Helper function to run any synchronous agent asynchronously:

from cyllama import LLM
from cyllama.agents import ReActAgent, run_agent_async, tool

@tool
def greet(name: str) -> str:
    return f"Hello, {name}!"

# Create sync agent
llm = LLM("models/Llama-3.2-1B-Instruct-Q8_0.gguf")
agent = ReActAgent(llm=llm, tools=[greet])

# Run asynchronously
async def main():
    result = await run_agent_async(agent, "Greet Alice")
    print(result.answer)

asyncio.run(main())

Async Agent Parameters¶

Both AsyncReActAgent and AsyncConstrainedAgent accept the same parameters as their synchronous counterparts:

Parameter	Type	Default	Description
`model_path`	`str`	required	Path to GGUF model file
`tools`	`List[Tool]`	`None`	Available tools
`config`	`GenerationConfig`	`None`	LLM configuration
`system_prompt`	`str`	default	Custom system prompt
`max_iterations`	`int`	`10`	Maximum iterations
`verbose`	`bool`	`False`	Print output

Thread Safety¶

Async agents use an internal lock to serialize access, ensuring thread-safe operation. For true parallel execution, create multiple agent instances:

async def parallel_agents():
    async with AsyncReActAgent("model.gguf", tools=tools) as agent1, \
               AsyncReActAgent("model.gguf", tools=tools) as agent2:

        task1 = agent1.run("Task 1")
        task2 = agent2.run("Task 2")

        results = await asyncio.gather(task1, task2)

Cyllama Agent Framework Overview¶

Table of Contents¶

Quick Start¶

Architecture¶

Tools¶

Defining Tools¶

Tool Parameters¶

Tool Registry¶

Agents¶

ReActAgent¶

ConstrainedAgent¶

ContractAgent¶

Events and Results¶

Event Types¶

Streaming Events¶

AgentResult¶

AgentMetrics¶

Configuration¶

GenerationConfig (ReActAgent)¶

ConstrainedGenerationConfig¶

Best Practices¶

1. Choose the Right Agent¶

2. Tool Design¶

3. Error Handling¶

4. Loop Prevention¶

5. Context Management¶

6. Contracts for Safety¶

Async Agents¶

AsyncReActAgent¶

AsyncConstrainedAgent¶

run_agent_async()¶

Async Agent Parameters¶

Thread Safety¶

References¶