Inferna Agent Framework Overview¶
Inferna includes a zero-dependency agent framework for building tool-using LLM agents. The framework provides three agent architectures, each designed for different reliability and control requirements.
Table of Contents¶
Quick Start¶
from inferna import LLM
from inferna.agents import ReActAgent, tool
# Define a tool
@tool
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression."""
return str(eval(expression))
# Create agent
llm = LLM("models/Llama-3.2-1B-Instruct-Q8_0.gguf")
agent = ReActAgent(llm=llm, tools=[calculate])
# Run task
result = agent.run("What is 25 * 4?")
print(result.answer) # "100"
Architecture¶
┌──────────────────────────────────┐
│ User Task │
└──────────────┬───────────────────┘
│
┌──────────────▼───────────────────┐
│ Agent Layer │
│ ┌─────────────────────────────┐ │
│ │ ReActAgent | ContractAgent │ │
│ │ ConstrainedAgent │ │
│ └─────────────────────────────┘ │
└──────────────┬───────────────────┘
│
┌────────────────────────┼──────────────────────┐
│ │ │
┌─────────▼─────────┐ ┌──────────▼────────┐ ┌─────────▼─────────┐
│ Tool Registry │ │ LLM │ │ Event Stream │
│ - Tool lookup │ │ - Generation │ │ - THOUGHT │
│ - Schema gen │ │ - Streaming │ │ - ACTION │
│ - Execution │ │ - Grammar (opt) │ │ - OBSERVATION │
└───────────────────┘ └───────────────────┘ │ - ANSWER │
│ - ERROR │
└───────────────────┘
Tools¶
Tools are Python functions that agents can invoke. The @tool decorator automatically extracts type information and generates JSON schemas.
Defining Tools¶
from inferna.agents import tool
@tool
def search_web(query: str, max_results: int = 5) -> str:
"""
Search the web for information.
Args:
query: Search query string
max_results: Maximum number of results to return
"""
# Implementation
return f"Results for: {query}"
@tool(name="calc", description="Evaluate math expressions")
def calculate(expression: str) -> float:
"""Safe math evaluation."""
return eval(expression)
Tool Parameters¶
Type hints are automatically converted to JSON schema types:
| Python Type | JSON Schema Type |
|---|---|
str |
string |
int |
integer |
float |
number |
bool |
boolean |
list |
array |
dict |
object |
Parameters without default values are marked as required.
Tool Registry¶
Tools can be managed via the ToolRegistry class:
from inferna.agents import Tool, ToolRegistry
registry = ToolRegistry()
registry.register(search_web)
registry.register(calculate)
# Get tool by name
tool = registry.get("search_web")
# Generate prompt descriptions
prompt = registry.to_prompt_string()
# Generate JSON schemas (OpenAI format)
schemas = registry.to_json_schema()
Agents¶
ReActAgent¶
Implements the ReAct (Reasoning + Acting) pattern where the agent alternates between thinking and acting.
Pattern:
Thought: [reasoning about what to do]
Action: tool_name({"arg": "value"})
Observation: [result from tool]
... (repeat)
Thought: I now know the answer
Answer: [final answer]
Reference: ReAct: Synergizing Reasoning and Acting in Language Models
Usage:
from inferna import LLM
from inferna.agents import ReActAgent, tool
@tool
def search(query: str) -> str:
return f"Results for: {query}"
agent = ReActAgent(
llm=LLM("model.gguf"),
tools=[search],
max_iterations=10,
verbose=True,
)
result = agent.run("Search for Python tutorials")
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
llm |
LLM |
required | LLM instance for generation |
tools |
List[Tool] |
None |
Available tools |
system_prompt |
str |
default | Custom system prompt |
max_iterations |
int |
10 |
Maximum thought/action cycles |
verbose |
bool |
False |
Print reasoning to stdout |
generation_config |
GenerationConfig |
default | LLM generation settings |
detect_loops |
bool |
True |
Enable loop detection |
max_consecutive_same_action |
int |
2 |
Same action repeat limit |
max_consecutive_same_tool |
int |
4 |
Same tool repeat limit |
max_context_chars |
int |
6000 |
Context truncation limit |
Strengths:
-
Natural reasoning trace for debugging
-
Works well with most instruction-tuned models
-
Flexible action format
Weaknesses:
-
Parsing can fail on malformed output
-
Requires larger models for reliable tool calling
ConstrainedAgent¶
Uses GBNF grammar constraints to guarantee valid JSON tool calls. Eliminates parsing failures by constraining the LLM's output space.
Usage:
from inferna import LLM
from inferna.agents import ConstrainedAgent, tool
@tool
def calculate(expression: str) -> str:
return str(eval(expression))
agent = ConstrainedAgent(
llm=LLM("model.gguf"),
tools=[calculate],
format="json",
allow_reasoning=True,
)
result = agent.run("What is 100 / 4?")
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
llm |
LLM |
required | LLM instance for generation |
tools |
List[Tool] |
None |
Available tools |
system_prompt |
str |
default | Custom system prompt |
max_iterations |
int |
10 |
Maximum tool call cycles |
verbose |
bool |
False |
Print actions to stdout |
generation_config |
ConstrainedGenerationConfig |
default | Generation settings |
format |
str |
"json" |
Output format (json, json_array, function_call) |
allow_reasoning |
bool |
False |
Include reasoning field |
use_cache |
bool |
True |
Cache compiled grammars |
detect_loops |
bool |
True |
Enable loop detection |
Output Format:
or
Strengths:
-
100% valid JSON output (grammar-enforced)
-
Works with smaller models
-
Eliminates parsing failures
Weaknesses:
-
Less natural output format
-
Grammar compilation overhead (mitigated by caching)
ContractAgent¶
Contract-based agent inspired by C++26 contracts (P2900). Adds preconditions, postconditions, and runtime assertions to tool calls.
Usage:
from inferna import LLM
from inferna.agents import ContractAgent, tool, pre, post, ContractPolicy
@tool
@pre(lambda args: args['x'] != 0, "cannot divide by zero")
@post(lambda r: r is not None, "result must not be None")
def divide(a: float, x: float) -> float:
"""Divide a by x."""
return a / x
agent = ContractAgent(
llm=LLM("model.gguf"),
tools=[divide],
policy=ContractPolicy.ENFORCE,
task_precondition=lambda task: len(task) > 10,
answer_postcondition=lambda ans: len(ans) > 0,
)
result = agent.run("What is 100 divided by 4?")
Contract Decorators:
# Precondition - checked before tool execution
@pre(lambda args: args['count'] > 0, "count must be positive")
# Postcondition - checked after tool execution
@post(lambda result: len(result) > 0, "must return non-empty result")
# Postcondition with access to original arguments
@post(lambda r, args: len(r) <= args['max_len'], "result too long")
Contract Policies:
| Policy | Checks | Handler Called | Continues | Terminates |
|---|---|---|---|---|
IGNORE |
No | No | Yes | No |
OBSERVE |
Yes | Yes (on fail) | Yes | No |
ENFORCE |
Yes | Yes (on fail) | No | Yes |
QUICK_ENFORCE |
Yes | No | No | Yes |
Runtime Assertions:
from inferna.agents import contract_assert
@tool
def process_data(data: str) -> str:
parsed = json.loads(data)
contract_assert(isinstance(parsed, dict), "data must be JSON object")
return str(parsed)
Agent-Level Contracts:
agent = ContractAgent(
llm=llm,
tools=[...],
task_precondition=lambda task: len(task) >= 10,
answer_postcondition=lambda ans: "error" not in ans.lower(),
iteration_invariant=lambda state: state.iterations < 20,
)
Violation Handler:
def my_handler(violation: ContractViolation) -> None:
print(f"VIOLATION: {violation.kind} at {violation.location}")
print(f" Message: {violation.message}")
# Log, alert, etc.
agent = ContractAgent(
llm=llm,
tools=[...],
violation_handler=my_handler,
)
Strengths:
-
Runtime verification of tool behavior
-
Configurable violation handling
-
Agent-level invariants
Weaknesses:
-
Additional overhead for contract checking
-
Requires explicit contract definitions
Events and Results¶
Event Types¶
Agents emit events during execution:
from inferna.agents import EventType
class EventType(Enum):
THOUGHT = "thought" # Agent reasoning
ACTION = "action" # Tool invocation
OBSERVATION = "observation" # Tool result
ANSWER = "answer" # Final answer
ERROR = "error" # Error occurred
CONTRACT_CHECK = "contract_check" # Contract being evaluated
CONTRACT_VIOLATION = "contract_violation" # Violation detected
Streaming Events¶
for event in agent.stream("What is 2+2?"):
if event.type == EventType.THOUGHT:
print(f"Thinking: {event.content}")
elif event.type == EventType.ACTION:
print(f"Calling: {event.content}")
elif event.type == EventType.OBSERVATION:
print(f"Result: {event.content}")
elif event.type == EventType.ANSWER:
print(f"Answer: {event.content}")
AgentResult¶
result = agent.run("What is 2+2?")
print(result.answer) # Final answer string
print(result.success) # True if completed without error
print(result.error) # Error message if failed
print(result.iterations) # Number of iterations
print(result.steps) # List of AgentEvent
print(result.metrics) # AgentMetrics (timing, counts)
AgentMetrics¶
metrics = result.metrics
print(f"Total time: {metrics.total_time_ms}ms")
print(f"Iterations: {metrics.iterations}")
print(f"Tool calls: {metrics.tool_calls}")
print(f"Generation time: {metrics.generation_time_ms}ms")
print(f"Tool time: {metrics.tool_time_ms}ms")
print(f"Loop detected: {metrics.loop_detected}")
print(f"Errors: {metrics.error_count}")
Configuration¶
GenerationConfig (ReActAgent)¶
from inferna import GenerationConfig
config = GenerationConfig(
temperature=0.7,
max_tokens=512,
top_k=40,
top_p=0.95,
min_p=0.05,
stop_sequences=["Observation:"],
)
agent = ReActAgent(llm=llm, tools=tools, generation_config=config)
ConstrainedGenerationConfig¶
from inferna.agents import ConstrainedGenerationConfig
config = ConstrainedGenerationConfig(
temperature=0.7,
max_tokens=512,
top_k=40,
top_p=0.95,
min_p=0.05,
)
agent = ConstrainedAgent(llm=llm, tools=tools, generation_config=config)
Best Practices¶
1. Choose the Right Agent¶
| Use Case | Recommended Agent |
|---|---|
| General-purpose tasks | ReActAgent |
| Smaller models | ConstrainedAgent |
| Critical applications | ContractAgent |
| Debugging/explainability | ReActAgent (verbose) |
| High reliability required | ConstrainedAgent + ContractAgent |
2. Tool Design¶
# Good: Clear description, typed parameters, docstring
@tool
def search_database(query: str, limit: int = 10) -> str:
"""
Search the database for records matching query.
Args:
query: Search term
limit: Maximum results to return
"""
return db.search(query, limit)
# Bad: Vague description, no types
@tool
def search(q):
return db.search(q)
3. Error Handling¶
@tool
def risky_operation(data: str) -> str:
"""Perform operation that might fail."""
try:
result = process(data)
return f"Success: {result}"
except ValueError as e:
return f"Error: Invalid data - {e}"
except Exception as e:
return f"Error: {e}"
4. Loop Prevention¶
Configure loop detection to prevent infinite loops:
agent = ReActAgent(
llm=llm,
tools=tools,
detect_loops=True,
max_consecutive_same_action=2, # Same exact action
max_consecutive_same_tool=4, # Same tool with any args
max_iterations=10, # Hard limit
)
5. Context Management¶
Prevent context overflow with truncation:
6. Contracts for Safety¶
Use contracts for safety-critical tools:
@tool
@pre(lambda args: 0 <= args['amount'] <= 1000, "amount must be 0-1000")
@pre(lambda args: args['account_id'].isalnum(), "invalid account ID")
@post(lambda r: r.startswith("TX"), "must return transaction ID")
def transfer_funds(account_id: str, amount: float) -> str:
"""Transfer funds to account."""
return banking_api.transfer(account_id, amount)
Async Agents¶
For non-blocking agent execution in async applications, use the async agent wrappers.
AsyncReActAgent¶
Async wrapper for ReActAgent:
import asyncio
from inferna.agents import AsyncReActAgent, tool
@tool
def search(query: str) -> str:
"""Search for information."""
return f"Results for: {query}"
async def main():
async with AsyncReActAgent(
"models/Llama-3.2-1B-Instruct-Q8_0.gguf",
tools=[search],
max_iterations=5
) as agent:
# Async run
result = await agent.run("Search for Python tutorials")
print(result.answer)
# Async streaming
async for event in agent.stream("Find information about AI"):
print(f"{event.type.value}: {event.content}")
asyncio.run(main())
AsyncConstrainedAgent¶
Async wrapper for ConstrainedAgent:
from inferna.agents import AsyncConstrainedAgent, tool
@tool
def calculate(expression: str) -> str:
"""Evaluate a math expression."""
return str(eval(expression))
async def main():
async with AsyncConstrainedAgent(
"models/Llama-3.2-1B-Instruct-Q8_0.gguf",
tools=[calculate]
) as agent:
result = await agent.run("What is 100 / 4?")
print(result.answer)
asyncio.run(main())
run_agent_async()¶
Helper function to run any synchronous agent asynchronously:
from inferna import LLM
from inferna.agents import ReActAgent, run_agent_async, tool
@tool
def greet(name: str) -> str:
return f"Hello, {name}!"
# Create sync agent
llm = LLM("models/Llama-3.2-1B-Instruct-Q8_0.gguf")
agent = ReActAgent(llm=llm, tools=[greet])
# Run asynchronously
async def main():
result = await run_agent_async(agent, "Greet Alice")
print(result.answer)
asyncio.run(main())
Async Agent Parameters¶
Both AsyncReActAgent and AsyncConstrainedAgent accept the same parameters as their synchronous counterparts:
| Parameter | Type | Default | Description |
|---|---|---|---|
model_path |
str |
required | Path to GGUF model file |
tools |
List[Tool] |
None |
Available tools |
config |
GenerationConfig |
None |
LLM configuration |
system_prompt |
str |
default | Custom system prompt |
max_iterations |
int |
10 |
Maximum iterations |
verbose |
bool |
False |
Print output |
Thread Safety¶
Async agents use an internal lock to serialize access, ensuring thread-safe operation. For true parallel execution, create multiple agent instances:
async def parallel_agents():
async with AsyncReActAgent("model.gguf", tools=tools) as agent1, \
AsyncReActAgent("model.gguf", tools=tools) as agent2:
task1 = agent1.run("Task 1")
task2 = agent2.run("Task 2")
results = await asyncio.gather(task1, task2)