How long does it take to complete this guide?

This guide takes approximately 75 min read to read and understand.

Planning API Calls in Prompts

Learn how to prompt models to plan, sequence, and safely execute API calls. Master the Plan → Validate → Execute → Verify loop with tool selection, schema checks, retries, safety gates, and post execution verification for reliable workflows.

September 6, 2025

75 min read

Promptise Team

Intermediate

Prompt EngineeringTool UseAgentsProduction PracticesContext EngineeringObservabilitySafetyIdempotencyError HandlingOrchestrationReliability

Promise: After this guide you’ll be able to prompt a model to plan, sequence, and safely execute API calls—from a single function call to a resilient, multi-step workflow with error handling, budgets, and confirmation gates. You’ll learn the mental model, practical templates, and the small set of guardrails that make tool-using LLMs reliable in production.

Why this matters (and what we mean by “planning”)

Most tool use prompts jump straight from “what the user asked” to “call the tool.” That works for one-off lookups, then crumbles when you need argument validation, ordering across multiple APIs, retries, rate limits, and user confirmation. “Planning API calls” means getting the model to first produce an explicit plan artifact—a compact, machine-checkable description of what calls to make, in what order, with what preconditions and exit tests—before anything is executed. You then run that plan with a thin, deterministic executor.

A few terms in plain language:

Tool / Function / API: Anything you expose to the model to call; usually defined by a name and JSON schema for arguments.
Planner: A prompt that asks the model to produce a structured plan (not to execute yet).
Executor: Your deterministic code that reads the plan and actually calls the APIs, handling retries, timeouts, and logs.
Call contracts: The rules around an API call (required fields, idempotency keys, acceptable ranges, retryable errors).
Guards: Pre-commit checks (validate arguments), post-commit checks (validate results), and user confirmation for sensitive ops.

💡 Insight: Separate thinking (LLM) from doing (your executor). When the model has to juggle both at once, reliability drops.

The core move: Plan → Validate → Execute → Verify

Here’s the mental model you’ll reuse:

Plan: Ask the model for a structured plan of calls with dependencies, constraints, and stop conditions.
Validate: Check the plan against schemas, budgets, and safety rules; auto-rewrite if it fails.
Execute: Convert steps into real API calls. Apply idempotency keys, retries, timeouts, and rate limits.
Verify: Prompt the model to verify the results against the user goal; if gaps remain, iterate with a new (smaller) plan.

This keeps the LLM’s creativity where it helps (planning) and your code’s determinism where it matters (execution).

A compact demonstration (single tool + one dependency)

Goal: “Add a calendar event for my outdoor run at 7am tomorrow, but only if the weather is clear.”

Tools available:

json
get_weather({ city, date }) -> { condition: "clear|rain|snow|cloudy", high_c, low_c }
json
create_event({ title, start_iso, end_iso, location }) -> { event_id }

Planner prompt (one-shot): Use this when you want the model to plan the right calls in the right order.

What it does: Requests a structured plan (no execution) with preconditions and stop criteria.
You are a planner. Given the user goal and available tools, produce a JSON plan to achieve the goal.

USER_GOAL:
"Add a calendar event for my outdoor run at 7am tomorrow in Zurich if weather is clear."

TOOLS (names & arg schemas):
1) get_weather { city: string, date: ISO8601 date }
2) create_event { title: string, start_iso: ISO8601 datetime, end_iso: ISO8601 datetime, location: string }

CONSTRAINTS:
- Do not execute, only plan.
- Include `preconditions`, `steps`, and a `stop_when` clause.
- Steps must set precise arguments, and reference outputs from prior steps via $.stepN.field.
- If weather is not "clear", stop with reason "Weather not suitable".
- City defaults to "Zurich" if not provided; timezone Europe/Zurich.
- Event duration is 1 hour.

OUTPUT FORMAT (JSON only):
{
"preconditions": [...],
"steps": [
{"id": "s1", "call": "get_weather", "args": {...}},
{"id": "s2", "if": "$.s1.condition == 'clear'", "call": "create_event", "args": {...}}
],
"stop_when": "($.s1.condition != 'clear')"
}
Expected plan (abridged):
{
"preconditions": ["timezone=Europe/Zurich"],
"steps": [
{"id": "s1", "call": "get_weather", "args": {"city": "Zurich", "date": "2025-09-08"}},
{"id": "s2", "if": "$.s1.condition == 'clear'",
"call": "create_event",
"args": {
"title": "Outdoor run",
"start_iso": "2025-09-08T07:00:00+02:00",
"end_iso": "2025-09-08T08:00:00+02:00",
"location": "Zurich"
}
}
],
"stop_when": "$.s1.condition != 'clear'"
}

Your executor applies schemas, runs s1, inspects its output, and either runs s2 or stops with a clear reason. You can now add retries, budgets, and logging without changing the planner prompt.

Deepening the technique

1) Tool selection policies (when

Models often over-call. Teach restraint with opt-in policies:

Confidence gating: Ask the model to include a why_call justification per step; reject steps with vague or circular reasoning.
Budgeting: Provide a tokens/$ budget and cost per call; the planner must include estimated_cost and why_worth_it.
Freshness rules: “Only call if data must be <24h old. Otherwise use prior cached value cache_key.”

Prompt insert (one line): “Include for each step: why_call, estimated_latency_s, cache_key (or null). If cached, skip.”

⚠️ Pitfall: “Check the weather” every time. Fix: Give a cache key formula (city|date) and TTL, then require the planner to test cache before calling.

2) Argument planning and validation

Most production issues are boring: missing fields, wrong formats, timezone drift.

Schema-first: Paste concise JSON Schemas into the system prompt. The model should conform to enums and formats.
Normalization: Add rules: “Normalize phone numbers to E.164,” “All timestamps ISO8601 with timezone Europe/Zurich.”
Defaults and fallbacks: Provide deterministic defaults and explicitly state when to ask the user to clarify.

Prompt insert: “For each arg, show source: user, default, or derived_from ($.stepX.field). If user is ambiguous, add a needs_clarification note instead of guessing.”

💡 Insight: Asking for clarification in the plan (not mid-execution) reduces half-baked calls.

3) Sequencing and dependencies

Express dependencies explicitly so your executor can short-circuit.

Partial order: if expressions referencing prior step outputs ($.stepN.field).
Join patterns: Allow parallel steps when independent, then a join step that waits for both.
Stop conditions: stop_when plus per-step abort_if (e.g., http 4xx).

Prompt insert: “Where steps are independent, mark parallel_group: "A". The executor will run members of group A concurrently.”

4) Error handling, retries, and idempotency

Don’t make the LLM reason about transient vs. permanent errors at execution time—give it a retry policy table it must reference in the plan.

Retryable vs. fatal: Map HTTP codes or error classes to behaviors (retry_exponential, ask_user, abort).
Idempotency keys: Require a request_id that’s stable per intent (hash(title+start_iso+location)).
Backoff & jitter: The plan declares a policy; the executor implements it deterministically.
Post-commit checks: E.g., after creating an order, fetch by id and verify fields match.

Prompt insert: “Include a retry_policy object with on_errors, max_attempts, and backoff for each mutating call.”

5) Safety and secrets

Never let the LLM emit secrets; keep tokens server-side.

Redaction: Instruct: “Never include API keys or Authorization headers—executor handles them.”
Allowlist: Provide an explicit list of callable tools; refuse anything outside.
Sensitive ops: Require a confirmation_text and a requires_user_confirm: true flag for destructive steps.

⚠️ Pitfall: The model invents an endpoint or action (“delete all”). Fix: Add “If a required tool is missing, produce a tool_gap section with the exact capability needed; do not attempt with a surrogate.”

6) Observability and evaluation

Treat plans and executions as first-class artifacts.

Trace IDs: Include a trace_id in the plan; propagate through executor logs.
Success criteria: Ask the model for a success_checklist aligned to the user goal; use it in post-verification.
Offline eval: Re-run captured user requests against a mock executor and compare plans over time.

Prompt insert: “Add success_checklist: 2–5 crisp checks that must be true to consider the goal met.”

7) Comparing strategies: ReAct vs. Plan/Execute

ReAct interleaves thoughts and actions (“think, call, think, call”). It’s flexible but mixes reasoning and execution, making consistency and safety trickier.
Plan/Execute separates phases. It’s usually more reliable, easier to test, and friendlier to budgets and approvals. Use it by default for multi-step or sensitive workflows; switch to ReAct for exploratory tasks where you can sandbox effects.

In practice: copy-ready templates

Use these as building blocks; swap in your own tools and policies.

A) Minimal planner (single or multi-step)

What it does: Produces a validated, executable plan JSON for a user goal and a fixed toolset.
SYSTEM: You are an API planner. Output JSON only; no prose.

TOOLS:
{{APISPEC_JSON}} // list of {name, args_schema, returns, is_mutating}

POLICIES:
- Do not execute tools.
- Only use listed tools; if missing, return "tool_gap".
- All times ISO8601 with timezone {{TIMEZONE}}.
- Respect rate limits: {{RATE_LIMITS}}
- For mutating calls, include idempotency_key (stable hash of intent).
- Use caches when available; assume read-through cache by key and TTL.

OUTPUT SHAPE:
{
"trace_id": "{{TRACE_ID}}",
"preconditions": [...],
"steps": [
{
"id": "s1",
"call": "tool_name",
"args": {...},
"why_call": "...",
"estimated_latency_s": 1.2,
"cache_key": "..." | null,
"parallel_group": "A" | null,
"abort_if": "expression over prior outputs" | null,
"retry_policy": {"on_errors": ["429","5xx"], "max_attempts": 3, "backoff": "exponential"}
}
],
"stop_when": "expression",
"success_checklist": ["...", "..."],
"tool_gap": null | {"needed_capability": "...", "suggested_tool_name": "..."}
}

USER_GOAL: {{GOAL_TEXT}}

B) Plan validator / auto-fixer

What it does: Asks the model to self-critique a plan and emit a fixed version that matches schemas and policies.
SYSTEM: Validate and, if possible, minimally fix the following plan.
Provide only JSON. If unfixed, include "errors": [...]

INPUT_PLAN: {{PLAN_JSON}}

VALIDATION RULES:
- Tools must exist in {{APISPEC_JSON}}.
- Args must satisfy the tool's JSON schema.
- Mutating calls require idempotency_key and confirmation_text if risky.
- Retry policy present for mutating calls.
- No missing refs (e.g., $.sX fields must exist).

OUTPUT:
{"fixed_plan": {...}, "errors": []}

C) Post-execution verifier

What it does: After execution, ask the model to judge whether the goal was met and propose a minimal follow-up plan if not.
SYSTEM: Given the user goal, the executed steps with results, and the success_checklist, decide if the goal was met. If not, output a minimal follow-up plan (same schema).

USER_GOAL: {{GOAL_TEXT}}
EXECUTION_TRACE: {{STEPS_AND_RESULTS_JSON}}
SUCCESS_CHECKLIST: {{CHECKS_JSON}}

OUTPUT:
{"met": true|false, "reason": "...", "follow_up_plan": null | {...}}

D) Confirmation gate (for sensitive actions)

What it does: Generates a human-readable confirmation string for the final approval UI.
SYSTEM: Generate a short confirmation sentence for the user that describes the exact effect of this plan.
PLAN: {{PLAN_JSON}}
OUTPUT: {"confirmation_text": "..." }

Troubleshooting: what goes wrong and what to try

Hallucinated endpoints or fields. The planner invented a tool or arg. Fix: Paste the tool catalog and say “refuse anything not on this list; use tool_gap instead.”
Malformed JSON or wrong types. Fix: Add a validator stage and an auto-fixer prompt. For stubborn cases, enforce json-only output and use tiny examples in the schema.
Over-calling tools (looping). Fix: Add max_steps, per-step why_call, and a total budget; require “what changes after this step?” in the plan justification.
Timezone and date drift. Fix: Pin a canonical timezone (e.g., Europe/Zurich) and require explicit conversions.
Retries on fatal errors (e.g., 400). Fix: Provide a retry table mapping codes/classes to behaviors and require a retry_policy per step.
Unsafe mutations. Fix: Require requires_user_confirm: true and a confirmation_text for destructive actions; executor must enforce it.
Pagination and partial results. Fix: Teach a loop pattern: page → collect → stop on next_token=null. Cap pages and expose page_limit in the plan.

Mini Lab (5–7 minutes)

Scenario: You have two tools.

json
search_flights({ from, to, date }) -> { itineraries: [{ depart_iso, arrive_iso, price_eur, airline, id }], next_token }
json
hold_itinerary({ id, email, hold_minutes, idempotency_key }) -> { hold_id, expires_iso }

Task: “Find me a morning flight from Zurich to Berlin next Friday under €150 and hold the best option for 15 minutes. Email is alex@example.com.”

Your steps:

Use the Minimal planner template. Set timezone to Europe/Zurich. Add a cost/latency budget and a page limit of 2.
Ensure the plan filters itineraries by depart_iso 06:00–12:00 and price_eur <= 150.
Require an idempotency key for the hold (hash(id+email+date)).
After execution, run the Post-execution verifier to confirm the hold and expiry window.

Expected (abridged) plan shape:

{
"preconditions": ["timezone=Europe/Zurich"],
"steps": [
{
"id": "s1",
"call": "search_flights",
"args": {"from": "ZRH", "to": "BER", "date": "2025-09-12"},
"why_call": "Need options filtered to morning and price <= 150",
"abort_if": null,
"retry_policy": {"on_errors": ["429","5xx"], "max_attempts": 3, "backoff": "exponential"}
},
{
"id": "s2",
"call": "hold_itinerary",
"if": "exists(min_price_morning($.s1.itineraries) <= 150)",
"args": {
"id": "min_price_morning($.s1.itineraries).id",
"email": "alex@example.com",
"hold_minutes": 15,
"idempotency_key": "hash(min_price_morning($.s1.itineraries).id+'alex@example.com'+'2025-09-12')"
},
"retry_policy": {"on_errors": ["409","429","5xx"], "max_attempts": 3, "backoff": "exponential"}
}
],
"stop_when": "no morning itineraries or all > 150",
"success_checklist": [
"One itinerary held",
"Price <= 150",
"Depart between 06:00 and 12:00",
"Expiry >= now + 10m"
]
}

If your model emits a different shape, run it through the Plan validator and iterate.

Variations and boundaries

When not to plan: For a single, idempotent read (e.g., “what’s the price of X?”), a direct function call is fine; planning overhead isn’t worth it.
Speculative plans: For high-latency stacks, you can pre-plan alternatives (“Plan A/B”) and execute only the first feasible one to reduce end-to-end time.
Router + planner: Precede planning with a small classifier (“No tool”, “Read only”, “Mutating”) to pick a lighter template for simple cases.
Mocks and sandboxes: Always keep a mock executor. It lets you regression-test plan quality without hitting real APIs (and avoids surprise charges).

Production notes that pay off

Short API cards beat long docs. Give the model 4–8 lines per tool: name, purpose, key args, return shape, and 2–3 constraints. It plans better with concise specs.
Budget tokens the same way you budget money. Ask the planner to record estimated_cost_tokens and set a ceiling.
Normalize outputs right away. Teach the planner to create “projection” steps (e.g., extract the 3 fields you need) before downstream calls.
Log everything. Store {trace_id, plan, execution_log, final_verdict}. You’ll thank yourself when debugging.

Summary & Conclusion

Planning API calls in prompts isn’t about fancy syntax—it’s about discipline. Make the model produce a simple, explicit plan; validate it; execute deterministically; and verify the outcome against the goal. That separation turns flaky tool use into something you can test, budget, and trust.

Start with the Plan → Validate → Execute → Verify loop. Layer on tool selection policies to avoid waste, argument planning to reduce trivial failures, retry/idempotency to tame the network, and confirmation gates to keep users safe. Use compact API cards, keep timezones and schemas explicit, and always keep a mock path for evaluation.

With these pieces in place, your LLM stops guessing and starts operating—one small, well-structured plan at a time.

Next steps

Instrument your stack: Add trace IDs and structured logs for plans and executions; build a tiny viewer that diff-plots plan changes over time.
Harden your templates: Wrap the Minimal Planner and Plan Validator into reusable helpers; add your organization’s specific safety and budgeting rules.
Grow your tool catalog: Curate concise API cards with schemas and examples. As coverage grows, your planner gets smarter—without getting riskier.

Learning Paths

Structured Learning

Follow guided learning paths from beginner to advanced. Master prompt engineering step by step.

Explore Paths

Continue Your Learning Journey

Ready to Master More? Explore our comprehensive guides and take your prompt engineering skills to the next level.

Explore More Guides Browse Learning Paths