The Agent

Turn any LLM into its
best version.
One function call.

build_agent() returns a production-ready agent powered by the Reasoning Engine — a custom execution runtime with composable reasoning patterns, parallel tool execution, and 0.02ms overhead. Add tool comprehension, persistent memory, security guardrails, semantic caching, human approval, streaming, model failover, and sandboxed execution. Each capability is one parameter.

build_agent.py

from promptise import build_agent

agent = await build_agent(
    model="openai:gpt-5-mini",
    servers={"crm": HTTPServerSpec(url="...")},
    memory=ChromaProvider(),
    conversation_store=PostgresStore(),
    observe=True,
    guardrails=True,
    sandbox=SandboxConfig(enabled=True),
    optimize_tools="semantic",
)

What is the Promptise Agent?

One function call.
Every production capability.

Most agent frameworks give you an LLM wrapper with a generic tool loop and leave the rest to you. Promptise gives you a custom execution runtime — the Reasoning Engine — that powers every agent with composable reasoning patterns, parallel tool execution, auto schema injection, and 0.02ms overhead. Add memory, security, caching, and observability with one parameter each.

⚡

Without Promptise

Wire up tool schemas manually. Build your own memory layer. Implement guardrails from scratch. Glue together caching, streaming, and observability. Ship in 3 months.

◆

With build_agent()

Point at MCP servers — tools appear. Add memory=ChromaProvider() — context is injected every turn. Add guardrails=True — injection attacks are blocked. Ship today.

⟐

What you get

MCP tool discovery, 3 memory providers, semantic cache, 6-head guardrails, human approval, streaming with tool visibility, model fallback, adaptive learning, Docker sandbox, and 8-destination observability.

PromptiseAgent = LLM + tools + memory + guardrails + cache + streaming + observability

AGENT LIFECYCLE

The agent request lifecycle

Every request flows through four phases. Each capability activates only when configured. Disabled features have zero overhead.

1INPUT PHASE

Message Received

User input enters the system

Session Resolution

Multi-user identity & isolation

Security Scan

Injection & PII detection

2CONTEXT PHASE

Context Assembly

Priority-based prompt building

Memory Retrieval

Semantic search over history

Semantic Cache

Check for cached response

Tool Discovery

Auto-load from MCP servers

3EXECUTION PHASE

LLM Reasoning

Model processes with context

Tool Calling

Execute selected tools

Human Approval

Gate for sensitive operations

Sandbox Execution

Isolated code runtime

Model Failover

Auto-switch on provider outage

4OUTPUT PHASE

Stream Response

Token-by-token delivery

Trace Capture

Full observability logging

Learn & Adapt

Feedback loop for improvement

build_agent()

All 13 capabilities. One function call. Each phase activates only what you configure — zero overhead for disabled features.

memory=Trueguardrails=Truesandbox=Trueobserve=Truecache=True

TOOL DISCOVERY

Your agent understands every tool you give it.

Point your agent at your tool servers. It connects, reads every tool's schema — what it does, what parameters it needs, what it returns — and starts using them correctly. No manual definitions. No adapter code. No JSON schemas to maintain.

Connect

HTTP/WS

Discover

Schemas

Convert

Types

Execute

Tools

server_connections.py

servers={
    "crm": HTTPServerSpec(
        url="https://crm.internal/mcp",
        auth=JWTAuth(secret="...")
    ),
    "analytics": HTTPServerSpec(
        url="https://analytics.internal/mcp"
    ),
    "email": HTTPServerSpec(
        url="https://email.internal/mcp",
        transport="sse"
    ),
}

# Automatic discovery — no manual tool definitions
# Add/remove servers → tool set updates automatically
# Schema changes picked up on next connection

→When a tool changes on the server, the agent picks up the new version automatically

→Nested objects, arrays, enums, unions — all handled

→Connect one server or twenty simultaneously

→The agent's capabilities grow with your infrastructure

SEMANTIC OPTIMIZATION

Cut your token costs by up to 0%

Two layers of intelligent optimization work together: semantic tool selection reduces what gets sent to the LLM, and semantic caching skips the LLM entirely for repeated queries.

Semantic Tool Selection

Send only relevant tools

An agent with 40 tools sends all 40 descriptions on every message — thousands of wasted tokens. Semantic selection matches the user's intent against tool capabilities using local embeddings and sends only the 5-8 relevant tools.

Without optimization~2,000 tokens

With semantic selection~400 tokens

optimize_tools="semantic"

Semantic Caching

Skip the LLM entirely

"What's my balance?" and "Show my balance" mean the same thing. Semantic caching matches incoming queries against cached responses using the same local embedding model. Above threshold? Return cached response instantly.

Query

92% match

0 tokens

cache=SemanticCache(threshold=0.92)

Combined effect

Semantic tool selection reduces tokens per request. Semantic caching eliminates requests entirely. Together they deliver 40-70% fewer tokens with tool selection and 30-50% cost savings with caching. Both use the same local embedding model — no external API calls.

40-70%

cost savings

MEMORY

Your agent remembers everything.

Without memory, every conversation starts from zero. The agent rediscovers the same information, asks the same questions, makes the same mistakes. Promptise memory automatically searches for relevant context and injects it before every invocation — no explicit retrieval calls, no extra code.

Local DevLocal vector search

ProductionChromaDB with persistent storage

EnterpriseMem0 graph search + entity extraction

All behind the same interface. Switch providers by changing one parameter.

ChromaProvider

Local vector search with persistent storage. Semantic recall across sessions.

Mem0Provider

Enterprise-grade graph + vector with entity extraction and relationship tracking.

InMemoryProvider

Fast in-memory storage for testing and development workflows.

persistent_conversations.py

# Four database backends:
# PostgreSQL — distributed production
# SQLite — single-node deployment  
# Redis — ephemeral caching
# In-memory — testing

response = await agent.chat(
    "What's my order status?",
    session_id="sess_a1b2c3",
    user_id="user-42",
    conversation_store=PostgresConversationStore(
        dsn="postgresql://..."
    )
)

# Multi-user session isolation
# Ownership enforcement built-in
# History loads automatically

OBSERVABILITY

Know exactly what your agent did.

LLMs are black boxes. You send a message, get a response, and have no idea what happened in between. Promptise turns the black box into a white box. Every LLM turn, every tool call, every token count, every retry, every error — captured automatically.

OFF

Zero overhead

BASIC

Trace counts + timing

STANDARD

Traces + tool calls + errors

FULL

Everything including prompts

Destinations

HTML ReportJSON FileStructured LogConsolePrometheusOpenTelemetryWebhookCallback

Captured Data

Token countsLatenciesTool callsRetriesErrorsFull tracesDecision sequence

Open the interactive HTML report and see the exact sequence of decisions. When something goes wrong at 3am, you find the exact failing decision in minutes, not hours.

SECURITY GUARDRAILS

Security that protects your users and your reputation.

Six detection heads protect every message — automatically, locally, with zero external API calls. DeBERTa ML model for injection, 69 PII patterns, 96 credential patterns, GLiNER NER for names and addresses, Llama Guard for content safety, and custom rules for your domain.

Prompt Injection

Manipulates agents into ignoring instructions

DeBERTa ML model scores injection attempts before they reach the LLM

PII Leakage

Exposes customer data in responses

69 regex patterns: credit cards, SSNs, government IDs across 22+ countries, emails, phones, medical records

Credential Leakage

Reveals API keys through tool outputs

96 patterns across 60+ services: AWS, OpenAI, GitHub, Stripe, database connection strings, private keys

Every model runs on your infrastructure

No data is sent to external services for scanning. Named entity recognition finds person names and physical addresses that pattern matching cannot. Content safety classification covers 13 harm categories via Llama Guard (local) or Azure AI Content Safety (cloud).

CODE SANDBOX

Give your agent its own computer.

When your agent writes code — data analysis scripts, automation tasks, debugging routines — that code needs to run somewhere safe. Not in your application process. Not with access to your filesystem or credentials.

Container isolation

Syscall filtering

Capability dropping

Read-only root

Resource limits

Network isolation

gVisor kernel

Auto-injected sandbox tools

# The agent gets five tools automatically:
execute_code()    # Run scripts
read_file()       # Read files
write_file()      # Write files  
list_files()      # List files
install_package() # Install packages

# It writes a script, runs it, reads the output,
# iterates — all within the sandbox.

# Path traversal prevention ensures escape is impossible.
# Shell injection prevention keeps arguments safe.

Network modes: none | restricted | full

HUMAN APPROVAL

Agents that ask before they act.

Autonomous agents will inevitably need to send emails, process payments, delete records, or deploy code. Without guardrails, your options are: let the agent do it unsupervised, or do not give it those tools at all.

Human-in-the-loop approval is the middle ground. Mark sensitive tools with glob patterns — send_*, delete_*, payment_*. When the agent tries to call one, execution pauses.

Approval Request

Fires to your webhook — Slack, PagerDuty, a custom dashboard, or an in-process queue. The agent waits for a human decision.

Approved

Tool executes

Denied

Agent adapts

Timed out

Configurable

Streaming event types

tool_startWith human-readable display name

tool_endWith result summary and duration

tokensWith cumulative text for reconnection

completionWith full response and tool call summary

errorWith error details and context

REAL-TIME STREAMING

Responses that appear in real time.

When an agent takes 10 seconds to respond, users think the app is broken. With streaming and tool visibility, they see exactly what is happening — “Searching customer database...”, “Found 3 results”, then the answer appearing token by token.

Argument redaction via guardrails ensures sensitive data never appears in the stream.

MODEL FAILOVER

Models that never go down.

Your primary LLM provider will have outages. With a fallback chain, the agent automatically tries the next model when the primary fails.

→Per-model circuit breakers prevent cascading timeouts

→Failed models skipped until recovery window opens

→Global timeout caps total wait time

ADAPTIVE LEARNING

Agents that learn from their mistakes.

When a tool call fails, most agents retry with the same approach. Promptise agents classify the failure first.

Infrastructure

Logged, not learned

Strategy

Triggers learning

Synthesized advice is stored in memory, injected before future invocations, and decays over time if not reinforced.

ANY MODEL

Change the model.
Keep everything else.

ACTIVE

openai:gpt-5-mini

AVAILABLE

anthropic:claude-sonnet-4

AVAILABLE

google:gemini-2.0-flash

AVAILABLE

ollama:llama3

Your instructions, your tools, your memory, your guardrails, your entire configuration stays identical. Models change. Prices shift. New capabilities emerge. Your agent logic never rewrites.

Production readiness

Ship on day one.
Not month three.

Most agent frameworks get you to a demo in a day and production in three months. Promptise gets you to production in a day because the hard parts are already built.

Tool discovery automated

Point at MCP server URLs. Tools appear. No manual wiring.

Security hardened

6-head guardrail: injection, PII, credentials, NER, toxicity, custom.

Costs bounded

Semantic optimization cuts tokens 40-70%. Budget hooks enforce limits.

Crashes recoverable

Model fallback chain. Circuit breakers. Graceful degradation.

Fully observable

8 export destinations. Per-turn traces. Token counting. Latency tracking.

Multi-user safe

CallerContext per request. Session ownership. Per-user cache isolation.

Human-in-the-loop ready

Approval policies pause on sensitive tools. Webhook + callback handlers.

Compliance auditable

HMAC audit logs. Conversation persistence. GDPR purge_user().

Failure handling

What happens when
things go wrong?

Production agents fail. The question is whether you built for it.

LLM provider goes down?

FallbackChain switches to the next provider. Circuit breaker prevents retry storms.

Model Fallback

Token budget exceeded?

BudgetHook stops the graph. ExecutionReport records what was accomplished.

Budget Enforcement

Agent stuck in a loop?

CycleDetectionHook detects repeating patterns and forces graph end.

Cycle Detection

Tool crashes mid-execution?

RETRYABLE flag retries with exponential backoff. CRITICAL flag aborts if essential.

Node Flags

User sends prompt injection?

DeBERTa ML model blocks it in real time. 69 PII patterns caught. Logged for audit.

Guardrails

Agent produces PII in output?

Output guardrails scan every response. PII redacted before it reaches the user.

Output Scanning

Comparison

Built in.
Not bolted on.

Capability	LangChain	AutoGen
MCP auto-discovery	—	—
Custom reasoning patterns	—	—
ML-based prompt injection detection	—	—
Semantic tool optimization	—	—
Persistent vector memory		—
Human-in-the-loop approval	—
Model fallback chain	—	—
Semantic response cache	—	—
Per-user session isolation	—	—
Docker sandbox execution	—	—
GDPR purge_user()	—	—
One function call setup	—	—

Build production agents.
Not prototypes.

Open source. Apache 2.0. Install it, build something, ship it.

$ pip install promptise

promptise

The Agent

Turn any LLM into its
best version.
One function call.

build_agent.py

from promptise import build_agent

agent = await build_agent(
    model="openai:gpt-5-mini",
    servers={"crm": HTTPServerSpec(url="...")},
    memory=ChromaProvider(),
    conversation_store=PostgresStore(),
    observe=True,
    guardrails=True,
    sandbox=SandboxConfig(enabled=True),
    optimize_tools="semantic",
)

What is the Promptise Agent?

One function call.
Every production capability.

⚡

Without Promptise

Wire up tool schemas manually. Build your own memory layer. Implement guardrails from scratch. Glue together caching, streaming, and observability. Ship in 3 months.

◆

With build_agent()

Point at MCP servers — tools appear. Add memory=ChromaProvider() — context is injected every turn. Add guardrails=True — injection attacks are blocked. Ship today.

⟐

What you get

PromptiseAgent = LLM + tools + memory + guardrails + cache + streaming + observability

AGENT LIFECYCLE

The agent request lifecycle

Every request flows through four phases. Each capability activates only when configured. Disabled features have zero overhead.

1INPUT PHASE

Message Received

User input enters the system

Session Resolution

Multi-user identity & isolation

Security Scan

Injection & PII detection

2CONTEXT PHASE

Context Assembly

Priority-based prompt building

Memory Retrieval

Semantic search over history

Semantic Cache

Check for cached response

Tool Discovery

Auto-load from MCP servers

3EXECUTION PHASE

LLM Reasoning

Model processes with context

Tool Calling

Execute selected tools

Human Approval

Gate for sensitive operations

Sandbox Execution

Isolated code runtime

Model Failover

Auto-switch on provider outage

4OUTPUT PHASE

Stream Response

Token-by-token delivery

Trace Capture

Full observability logging

Learn & Adapt

Feedback loop for improvement

build_agent()

All 13 capabilities. One function call. Each phase activates only what you configure — zero overhead for disabled features.

memory=Trueguardrails=Truesandbox=Trueobserve=Truecache=True

TOOL DISCOVERY

Your agent understands every tool you give it.

Connect

HTTP/WS

Discover

Schemas

Convert

Types

Execute

Tools

server_connections.py

servers={
    "crm": HTTPServerSpec(
        url="https://crm.internal/mcp",
        auth=JWTAuth(secret="...")
    ),
    "analytics": HTTPServerSpec(
        url="https://analytics.internal/mcp"
    ),
    "email": HTTPServerSpec(
        url="https://email.internal/mcp",
        transport="sse"
    ),
}

# Automatic discovery — no manual tool definitions
# Add/remove servers → tool set updates automatically
# Schema changes picked up on next connection

→When a tool changes on the server, the agent picks up the new version automatically

→Nested objects, arrays, enums, unions — all handled

→Connect one server or twenty simultaneously

→The agent's capabilities grow with your infrastructure

SEMANTIC OPTIMIZATION

Cut your token costs by up to 0%

Two layers of intelligent optimization work together: semantic tool selection reduces what gets sent to the LLM, and semantic caching skips the LLM entirely for repeated queries.

Semantic Tool Selection

Send only relevant tools

Without optimization~2,000 tokens

With semantic selection~400 tokens

optimize_tools="semantic"

Semantic Caching

Skip the LLM entirely

Query

92% match

0 tokens

cache=SemanticCache(threshold=0.92)

Combined effect

40-70%

cost savings

MEMORY

Your agent remembers everything.

Local DevLocal vector search

ProductionChromaDB with persistent storage

EnterpriseMem0 graph search + entity extraction

All behind the same interface. Switch providers by changing one parameter.

ChromaProvider

Local vector search with persistent storage. Semantic recall across sessions.

Mem0Provider

Enterprise-grade graph + vector with entity extraction and relationship tracking.

InMemoryProvider

Fast in-memory storage for testing and development workflows.

persistent_conversations.py

# Four database backends:
# PostgreSQL — distributed production
# SQLite — single-node deployment  
# Redis — ephemeral caching
# In-memory — testing

response = await agent.chat(
    "What's my order status?",
    session_id="sess_a1b2c3",
    user_id="user-42",
    conversation_store=PostgresConversationStore(
        dsn="postgresql://..."
    )
)

# Multi-user session isolation
# Ownership enforcement built-in
# History loads automatically

OBSERVABILITY

Know exactly what your agent did.

OFF

Zero overhead

BASIC

Trace counts + timing

STANDARD

Traces + tool calls + errors

FULL

Everything including prompts

Destinations

HTML ReportJSON FileStructured LogConsolePrometheusOpenTelemetryWebhookCallback

Captured Data

Token countsLatenciesTool callsRetriesErrorsFull tracesDecision sequence

Open the interactive HTML report and see the exact sequence of decisions. When something goes wrong at 3am, you find the exact failing decision in minutes, not hours.

SECURITY GUARDRAILS

Security that protects your users and your reputation.

Prompt Injection

Manipulates agents into ignoring instructions

DeBERTa ML model scores injection attempts before they reach the LLM

PII Leakage

Exposes customer data in responses

69 regex patterns: credit cards, SSNs, government IDs across 22+ countries, emails, phones, medical records

Credential Leakage

Reveals API keys through tool outputs

96 patterns across 60+ services: AWS, OpenAI, GitHub, Stripe, database connection strings, private keys

Every model runs on your infrastructure

CODE SANDBOX

Give your agent its own computer.

Container isolation

Syscall filtering

Capability dropping

Read-only root

Resource limits

Network isolation

gVisor kernel

Auto-injected sandbox tools

# The agent gets five tools automatically:
execute_code()    # Run scripts
read_file()       # Read files
write_file()      # Write files  
list_files()      # List files
install_package() # Install packages

# It writes a script, runs it, reads the output,
# iterates — all within the sandbox.

# Path traversal prevention ensures escape is impossible.
# Shell injection prevention keeps arguments safe.

Network modes: none | restricted | full

HUMAN APPROVAL

Agents that ask before they act.

Human-in-the-loop approval is the middle ground. Mark sensitive tools with glob patterns — send_*, delete_*, payment_*. When the agent tries to call one, execution pauses.

Approval Request

Fires to your webhook — Slack, PagerDuty, a custom dashboard, or an in-process queue. The agent waits for a human decision.

Approved

Tool executes

Denied

Agent adapts

Timed out

Configurable

Streaming event types

tool_startWith human-readable display name

tool_endWith result summary and duration

tokensWith cumulative text for reconnection

completionWith full response and tool call summary

errorWith error details and context

REAL-TIME STREAMING

Responses that appear in real time.

Argument redaction via guardrails ensures sensitive data never appears in the stream.

MODEL FAILOVER

Models that never go down.

Your primary LLM provider will have outages. With a fallback chain, the agent automatically tries the next model when the primary fails.

→Per-model circuit breakers prevent cascading timeouts

→Failed models skipped until recovery window opens

→Global timeout caps total wait time

ADAPTIVE LEARNING

Agents that learn from their mistakes.

When a tool call fails, most agents retry with the same approach. Promptise agents classify the failure first.

Infrastructure

Logged, not learned

Strategy

Triggers learning

Synthesized advice is stored in memory, injected before future invocations, and decays over time if not reinforced.

ANY MODEL

Change the model.
Keep everything else.

ACTIVE

openai:gpt-5-mini

AVAILABLE

anthropic:claude-sonnet-4

AVAILABLE

google:gemini-2.0-flash

AVAILABLE

ollama:llama3

Your instructions, your tools, your memory, your guardrails, your entire configuration stays identical. Models change. Prices shift. New capabilities emerge. Your agent logic never rewrites.

Production readiness

Ship on day one.
Not month three.

Most agent frameworks get you to a demo in a day and production in three months. Promptise gets you to production in a day because the hard parts are already built.

Tool discovery automated

Point at MCP server URLs. Tools appear. No manual wiring.

Security hardened

6-head guardrail: injection, PII, credentials, NER, toxicity, custom.

Costs bounded

Semantic optimization cuts tokens 40-70%. Budget hooks enforce limits.

Crashes recoverable

Model fallback chain. Circuit breakers. Graceful degradation.

Fully observable

8 export destinations. Per-turn traces. Token counting. Latency tracking.

Multi-user safe

CallerContext per request. Session ownership. Per-user cache isolation.

Human-in-the-loop ready

Approval policies pause on sensitive tools. Webhook + callback handlers.

Compliance auditable

HMAC audit logs. Conversation persistence. GDPR purge_user().

Failure handling

What happens when
things go wrong?

Production agents fail. The question is whether you built for it.

LLM provider goes down?

FallbackChain switches to the next provider. Circuit breaker prevents retry storms.

Model Fallback

Token budget exceeded?

BudgetHook stops the graph. ExecutionReport records what was accomplished.

Budget Enforcement

Agent stuck in a loop?

CycleDetectionHook detects repeating patterns and forces graph end.

Cycle Detection

Tool crashes mid-execution?

RETRYABLE flag retries with exponential backoff. CRITICAL flag aborts if essential.

Node Flags

User sends prompt injection?

DeBERTa ML model blocks it in real time. 69 PII patterns caught. Logged for audit.

Guardrails

Agent produces PII in output?

Output guardrails scan every response. PII redacted before it reaches the user.

Output Scanning

Comparison

Built in.
Not bolted on.

Capability	LangChain	AutoGen
MCP auto-discovery	—	—
Custom reasoning patterns	—	—
ML-based prompt injection detection	—	—
Semantic tool optimization	—	—
Persistent vector memory		—
Human-in-the-loop approval	—
Model fallback chain	—	—
Semantic response cache	—	—
Per-user session isolation	—	—
Docker sandbox execution	—	—
GDPR purge_user()	—	—
One function call setup	—	—

Build production agents.
Not prototypes.

Open source. Apache 2.0. Install it, build something, ship it.

$ pip install promptise

Turn any LLM into itsbest version.One function call.

One function call.Every production capability.

Without Promptise

With build_agent()

What you get

The agent request lifecycle

Your agent understands every tool you give it.

Cut your token costs by up to 0%

Semantic Tool Selection

Semantic Caching

Combined effect

Your agent remembers everything.

ChromaProvider

Mem0Provider

InMemoryProvider

Know exactly what your agent did.

Destinations

Captured Data

Security that protects your users and your reputation.

Prompt Injection

PII Leakage

Credential Leakage

Give your agent its own computer.

Agents that ask before they act.

Responses that appear in real time.

Models that never go down.

Agents that learn from their mistakes.

Ship on day one.Not month three.

Tool discovery automated

Security hardened

Costs bounded

Crashes recoverable

Fully observable

Multi-user safe

Human-in-the-loop ready

Compliance auditable

What happens whenthings go wrong?

LLM provider goes down?

Token budget exceeded?

Agent stuck in a loop?

Tool crashes mid-execution?

User sends prompt injection?

Agent produces PII in output?

Built in.Not bolted on.

Build production agents.Not prototypes.

Turn any LLM into itsbest version.One function call.

One function call.Every production capability.

Without Promptise

With build_agent()

What you get

The agent request lifecycle

Your agent understands every tool you give it.

Cut your token costs by up to 0%

Semantic Tool Selection

Semantic Caching

Combined effect

Your agent remembers everything.

ChromaProvider

Mem0Provider

InMemoryProvider

Know exactly what your agent did.

Destinations

Captured Data

Security that protects your users and your reputation.

Prompt Injection

PII Leakage

Credential Leakage

Give your agent its own computer.

Agents that ask before they act.

Responses that appear in real time.

Models that never go down.

Agents that learn from their mistakes.

Ship on day one.Not month three.

Tool discovery automated

Security hardened

Costs bounded

Crashes recoverable

Fully observable

Multi-user safe

Human-in-the-loop ready

Turn any LLM into its
best version.
One function call.

One function call.
Every production capability.

Ship on day one.
Not month three.

What happens when
things go wrong?

Built in.
Not bolted on.

Build production agents.
Not prototypes.

Turn any LLM into its
best version.
One function call.

One function call.
Every production capability.

Ship on day one.
Not month three.

What happens when
things go wrong?

Built in.
Not bolted on.

Build production agents.
Not prototypes.