Turn any LLM into its
best version.
One function call.
build_agent() returns a production-ready agent powered by the Reasoning Engine — a custom execution runtime with composable reasoning patterns, parallel tool execution, and 0.02ms overhead. Add tool comprehension, persistent memory, security guardrails, semantic caching, human approval, streaming, model failover, and sandboxed execution. Each capability is one parameter.
from promptise import build_agent
agent = await build_agent(
model="openai:gpt-5-mini",
servers={"crm": HTTPServerSpec(url="...")},
memory=ChromaProvider(),
conversation_store=PostgresStore(),
observe=True,
guardrails=True,
sandbox=SandboxConfig(enabled=True),
optimize_tools="semantic",
)What is the Promptise Agent?
One function call.
Every production capability.
Most agent frameworks give you an LLM wrapper with a generic tool loop and leave the rest to you. Promptise gives you a custom execution runtime — the Reasoning Engine — that powers every agent with composable reasoning patterns, parallel tool execution, auto schema injection, and 0.02ms overhead. Add memory, security, caching, and observability with one parameter each.
Without Promptise
Wire up tool schemas manually. Build your own memory layer. Implement guardrails from scratch. Glue together caching, streaming, and observability. Ship in 3 months.
With build_agent()
Point at MCP servers — tools appear. Add memory=ChromaProvider() — context is injected every turn. Add guardrails=True — injection attacks are blocked. Ship today.
What you get
MCP tool discovery, 3 memory providers, semantic cache, 6-head guardrails, human approval, streaming with tool visibility, model fallback, adaptive learning, Docker sandbox, and 8-destination observability.
PromptiseAgent = LLM + tools + memory + guardrails + cache + streaming + observability
The agent request lifecycle
Every request flows through four phases. Each capability activates only when configured. Disabled features have zero overhead.
build_agent()All 13 capabilities. One function call. Each phase activates only what you configure — zero overhead for disabled features.
Your agent understands every tool you give it.
Point your agent at your tool servers. It connects, reads every tool's schema — what it does, what parameters it needs, what it returns — and starts using them correctly. No manual definitions. No adapter code. No JSON schemas to maintain.
servers={
"crm": HTTPServerSpec(
url="https://crm.internal/mcp",
auth=JWTAuth(secret="...")
),
"analytics": HTTPServerSpec(
url="https://analytics.internal/mcp"
),
"email": HTTPServerSpec(
url="https://email.internal/mcp",
transport="sse"
),
}
# Automatic discovery — no manual tool definitions
# Add/remove servers → tool set updates automatically
# Schema changes picked up on next connectionCut your token costs by up to 0%
Two layers of intelligent optimization work together: semantic tool selection reduces what gets sent to the LLM, and semantic caching skips the LLM entirely for repeated queries.
Semantic Tool Selection
Send only relevant tools
An agent with 40 tools sends all 40 descriptions on every message — thousands of wasted tokens. Semantic selection matches the user's intent against tool capabilities using local embeddings and sends only the 5-8 relevant tools.
optimize_tools="semantic"Semantic Caching
Skip the LLM entirely
"What's my balance?" and "Show my balance" mean the same thing. Semantic caching matches incoming queries against cached responses using the same local embedding model. Above threshold? Return cached response instantly.
cache=SemanticCache(threshold=0.92)Combined effect
Semantic tool selection reduces tokens per request. Semantic caching eliminates requests entirely. Together they deliver 40-70% fewer tokens with tool selection and 30-50% cost savings with caching. Both use the same local embedding model — no external API calls.
Your agent remembers everything.
Without memory, every conversation starts from zero. The agent rediscovers the same information, asks the same questions, makes the same mistakes. Promptise memory automatically searches for relevant context and injects it before every invocation — no explicit retrieval calls, no extra code.
All behind the same interface. Switch providers by changing one parameter.
ChromaProvider
Local vector search with persistent storage. Semantic recall across sessions.
Mem0Provider
Enterprise-grade graph + vector with entity extraction and relationship tracking.
InMemoryProvider
Fast in-memory storage for testing and development workflows.
# Four database backends:
# PostgreSQL — distributed production
# SQLite — single-node deployment
# Redis — ephemeral caching
# In-memory — testing
response = await agent.chat(
"What's my order status?",
session_id="sess_a1b2c3",
user_id="user-42",
conversation_store=PostgresConversationStore(
dsn="postgresql://..."
)
)
# Multi-user session isolation
# Ownership enforcement built-in
# History loads automaticallyKnow exactly what your agent did.
LLMs are black boxes. You send a message, get a response, and have no idea what happened in between. Promptise turns the black box into a white box. Every LLM turn, every tool call, every token count, every retry, every error — captured automatically.
Destinations
Captured Data
Open the interactive HTML report and see the exact sequence of decisions. When something goes wrong at 3am, you find the exact failing decision in minutes, not hours.
Security that protects your users and your reputation.
Six detection heads protect every message — automatically, locally, with zero external API calls. DeBERTa ML model for injection, 69 PII patterns, 96 credential patterns, GLiNER NER for names and addresses, Llama Guard for content safety, and custom rules for your domain.
Prompt Injection
Manipulates agents into ignoring instructions
DeBERTa ML model scores injection attempts before they reach the LLM
PII Leakage
Exposes customer data in responses
69 regex patterns: credit cards, SSNs, government IDs across 22+ countries, emails, phones, medical records
Credential Leakage
Reveals API keys through tool outputs
96 patterns across 60+ services: AWS, OpenAI, GitHub, Stripe, database connection strings, private keys
No data is sent to external services for scanning. Named entity recognition finds person names and physical addresses that pattern matching cannot. Content safety classification covers 13 harm categories via Llama Guard (local) or Azure AI Content Safety (cloud).
Give your agent its own computer.
When your agent writes code — data analysis scripts, automation tasks, debugging routines — that code needs to run somewhere safe. Not in your application process. Not with access to your filesystem or credentials.
# The agent gets five tools automatically: execute_code() # Run scripts read_file() # Read files write_file() # Write files list_files() # List files install_package() # Install packages # It writes a script, runs it, reads the output, # iterates — all within the sandbox. # Path traversal prevention ensures escape is impossible. # Shell injection prevention keeps arguments safe.
none | restricted | fullAgents that ask before they act.
Autonomous agents will inevitably need to send emails, process payments, delete records, or deploy code. Without guardrails, your options are: let the agent do it unsupervised, or do not give it those tools at all.
Human-in-the-loop approval is the middle ground. Mark sensitive tools with glob patterns — send_*, delete_*, payment_*. When the agent tries to call one, execution pauses.
Fires to your webhook — Slack, PagerDuty, a custom dashboard, or an in-process queue. The agent waits for a human decision.
Tool executes
Agent adapts
Configurable
tool_startWith human-readable display nametool_endWith result summary and durationtokensWith cumulative text for reconnectioncompletionWith full response and tool call summaryerrorWith error details and contextResponses that appear in real time.
When an agent takes 10 seconds to respond, users think the app is broken. With streaming and tool visibility, they see exactly what is happening — “Searching customer database...”, “Found 3 results”, then the answer appearing token by token.
Argument redaction via guardrails ensures sensitive data never appears in the stream.
Models that never go down.
Your primary LLM provider will have outages. With a fallback chain, the agent automatically tries the next model when the primary fails.
Agents that learn from their mistakes.
When a tool call fails, most agents retry with the same approach. Promptise agents classify the failure first.
Logged, not learned
Triggers learning
Synthesized advice is stored in memory, injected before future invocations, and decays over time if not reinforced.
Change the model.
Keep everything else.
openai:gpt-5-minianthropic:claude-sonnet-4google:gemini-2.0-flashollama:llama3Your instructions, your tools, your memory, your guardrails, your entire configuration stays identical. Models change. Prices shift. New capabilities emerge. Your agent logic never rewrites.
Production readiness
Ship on day one.
Not month three.
Most agent frameworks get you to a demo in a day and production in three months. Promptise gets you to production in a day because the hard parts are already built.
Tool discovery automated
Point at MCP server URLs. Tools appear. No manual wiring.
Security hardened
6-head guardrail: injection, PII, credentials, NER, toxicity, custom.
Costs bounded
Semantic optimization cuts tokens 40-70%. Budget hooks enforce limits.
Crashes recoverable
Model fallback chain. Circuit breakers. Graceful degradation.
Fully observable
8 export destinations. Per-turn traces. Token counting. Latency tracking.
Multi-user safe
CallerContext per request. Session ownership. Per-user cache isolation.
Human-in-the-loop ready
Approval policies pause on sensitive tools. Webhook + callback handlers.
Compliance auditable
HMAC audit logs. Conversation persistence. GDPR purge_user().
Failure handling
What happens when
things go wrong?
Production agents fail. The question is whether you built for it.
LLM provider goes down?
FallbackChain switches to the next provider. Circuit breaker prevents retry storms.
Token budget exceeded?
BudgetHook stops the graph. ExecutionReport records what was accomplished.
Agent stuck in a loop?
CycleDetectionHook detects repeating patterns and forces graph end.
Tool crashes mid-execution?
RETRYABLE flag retries with exponential backoff. CRITICAL flag aborts if essential.
User sends prompt injection?
DeBERTa ML model blocks it in real time. 69 PII patterns caught. Logged for audit.
Agent produces PII in output?
Output guardrails scan every response. PII redacted before it reaches the user.
Comparison
Built in.
Not bolted on.
| Capability | Promptise | LangChain | AutoGen |
|---|---|---|---|
| MCP auto-discovery | — | — | |
| Custom reasoning patterns | — | — | |
| ML-based prompt injection detection | — | — | |
| Semantic tool optimization | — | — | |
| Persistent vector memory | — | ||
| Human-in-the-loop approval | — | ||
| Model fallback chain | — | — | |
| Semantic response cache | — | — | |
| Per-user session isolation | — | — | |
| Docker sandbox execution | — | — | |
| GDPR purge_user() | — | — | |
| One function call setup | — | — |
Build production agents.
Not prototypes.
Open source. Apache 2.0. Install it, build something, ship it.
$ pip install promptise