AgenticAssure

Test your agents before users break them.

AgenticAssure runs custom scenarios against your AI agent and scores the results. Catch failures, hallucinations, and broken tool calls before deployment. Works with any framework.

Test Output

$ agenticassure run scenarios/ --adapter my_agent.Agent
 
Loaded 4 scenario(s) from 1 suite(s)
Using adapter: my_agent.Agent
 
Suite: customer-support-agent
┌──────────────────┬────────┬───────┬──────────┬────────────────────────────────────┐
│ Scenario         │ Status │ Score │ Duration │ Details                            │
├──────────────────┼────────┼───────┼──────────┼────────────────────────────────────┤
│ greeting         │ PASS   │ 1.00  │ 132ms    │ Output contains expected substring │
│ order_lookup     │ PASS   │ 1.00  │ 248ms    │ Tool get_order called correctly    │
│ return_policy    │ PASS   │ 0.94  │ 301ms    │ Similarity: 0.94 (threshold 0.8)  │
│ edge_case_empty  │ FAIL   │ 0.00  │ 89ms     │ No output produced                │
└──────────────────┴────────┴───────┴──────────┴────────────────────────────────────┘
 
Pass rate: 75.0% (3/4)

Getting Started

pip install agenticassure

// Initialize a new project:

agenticassure init my-tests
cd my-tests
agenticassure run scenarios/ --adapter my_agent.Agent

Or use our AI-assisted setup prompts to have a coding agent configure everything for you.

How It Works

1. Define Scenarios

Write test cases in simple YAML files. Each scenario specifies an input prompt, the expected output or tool calls, and which scorers to use. No code required for the test definitions themselves.

2. Write an Adapter

Create a small wrapper class with a single run() method that connects AgenticAssure to your agent. Works with any framework — OpenAI, LangChain, or your own custom setup.

3. Run and Report

Execute your scenarios from the CLI or Python. AgenticAssure runs each test, scores the results, and generates structured reports in your terminal, as HTML, or as JSON.

Example Scenario

A single YAML file defines your entire test suite. Each scenario is self-contained — input, expectations, and scoring rules all in one place.

# scenarios/support_tests.yaml
suite:
  name: customer-support-agent
  config:
    default_timeout: 30
    retries: 1
    default_scorers: ["passfail"]
 
scenarios:
  - name: Basic greeting
    input: "Hello, who are you?"
    expected_output: "hello"
    tags: [basic]
 
  - name: Order lookup
    input: "Look up order ORD-001"
    expected_tools: [get_order]
    expected_tool_args:
      get_order:
        order_id: "ORD-001"
 
  - name: Return policy (semantic match)
    input: "What is your return policy?"
    expected_output: "We offer a 30-day return policy."
    scorers: [similarity]
    metadata:
      similarity_threshold: 0.8

Scorers

Every scenario is evaluated by one or more scorers. A test passes only when all its scorers pass. Mix and match them per scenario.

PassFail

The default scorer. Checks that output exists, expected tools were called with the right arguments, and the expected output appears in the response.

Exact Match

Strict string comparison. The agent's output must match the expected output exactly — useful for deterministic responses.

Regex

Pattern matching against the agent's output. Define a regex in the scenario metadata to validate structure, formats, or specific content patterns.

Similarity

Semantic comparison using cosine similarity via sentence-transformers. Set a threshold to control how close the meaning needs to be. Great for natural language responses.

Have Questions?

Join our community to ask questions, share feedback, and connect with other developers building with AgenticAssure.

[ Join the Community ]