Learn to make LLMs return schema-matching JSON. Write a minimal JSON Schema, constrain outputs with a system prompt, and auto-validate every response with a repair loop. Includes a hands-on lab to build, test, and confirm valid outputs.
If you’ve ever asked a model for JSON and got back a half-valid blob wrapped in prose, this guide is for you. We’ll make the model produce JSON that actually matches a schema and show how to validate the response automatically. You’ll leave with a reusable system prompt, a tiny schema, and a short validation script you can run.
Structured output means the model returns data in a predictable shape—fields, types, and constraints—not free text. A schema is a compact contract that defines that shape. A validator is a small program that checks whether the JSON follows the contract and tells you exactly what’s wrong if it doesn’t.
Why this matters now: as soon as you rely on LLM output in a workflow—rendering a UI, writing to a database, or triggering another tool—guesswork becomes risk. Schema-checked JSON turns “pretty good text” into “safe, machine-ready data.”
Think of three actors working together:
Contract (Schema). You define the exact fields and types you want. Strings vs integers. Allowed values. What’s required, what’s optional.
Constrain (Prompt). You tell the model to output only JSON and to follow the schema. You forbid extra prose and explain how to handle uncertainty (e.g., use null).
Check (Validation). You parse the model’s JSON and run a validator. If it fails, you either ask the model to fix it or you reject the result safely.
Example goal: Extract movie metadata.
Contract: title (string), year (integer 1878–2030), genres (array of enum), family_friendly (boolean), content_warnings (array of strings), confidence (number 0–1).
Constrain: “Respond with JSON only, no markdown, no comments. If unknown, use null. Keys in this order.”
Check: Run a validator; if it fails, send the validator’s error back to the model: “Fix to match schema.”
💡 Insight: Models happily comply when you specify how to handle unknowns. “Use null when unsure” prevents hallucinated values and keeps validation green.
Start with the problem statement: “Read a short blurb and return consistent movie metadata.” Without structure, you might try: “Summarize this movie.” You’ll get prose and inconsistent fields.
Now tighten it:
Write the schema. Keep it tiny. Fewer fields mean fewer errors.
Write the system prompt. This is the “policy”: style, format, and strict output rules.
Write the user prompt. This is the “task”: the specific input to fill the schema.
Validate the result. If invalid, show the error and ask for a corrected JSON.
⚠️ Pitfall: Mixing policy and task in one big prompt often leads to drift. Keep the “only JSON” rules in the system prompt and the specific request in the user message.
Below is a starter system prompt you can reuse for any structured-output task.
Starter system prompt (policy)
You are a careful data formatter. Always return ONLY raw JSON that matches the provided JSON Schema exactly. Rules: - Do not include markdown fences, explanations, comments, or trailing text. - Use null for unknown/unsure fields. - Keep keys in the order they appear in the schema. - Do not invent fields or values outside the schema. - Arrays must be JSON arrays, not comma-separated strings. If the instruction conflicts with the schema, the schema wins.
JSON Schema (contract) — movie metadata
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "MovieMetadata", "type": "object", "additionalProperties": false, "required": ["title", "year", "genres", "family_friendly", "content_warnings", "confidence"], "properties": { "title": { "type": "string", "minLength": 1 }, "year": { "type": "integer", "minimum": 1878, "maximum": 2030 }, "genres": { "type": "array", "items": { "enum": ["action", "comedy", "drama", "thriller", "scifi", "romance", "family", "animation"] }, "minItems": 1 }, "family_friendly": { "type": "boolean" }, "content_warnings": { "type": "array", "items": { "type": "string" } }, "confidence": { "type": "number", "minimum": 0, "maximum": 1 } } }
User prompt (task) — vague vs precise
Vague:
Read the paragraph and give me the movie info as JSON.
Precise:
Fill the "MovieMetadata" schema for the text below. Remember: JSON only, no markdown or comments. Use null if unsure.
TEXT:
A 1995 family adventure about a boy and his dog crossing the Rockies. Heartwarming, rated PG, with a brief peril scene.
Expected good JSON (shape, not truth)
{
"title": null,
"year": 1995,
"genres": ["family", "drama"],
"family_friendly": true,
"content_warnings": ["peril"],
"confidence": 0.7
}
💡 Insight: When you want the model to permit null, put it in the rules and the schema (no required? you still want the key present with null), or keep it required and instruct the model to set null when unknown.
You can validate in any language. Here are two tiny options.
Python (with jsonschema)
# pip install jsonschema import json, sys from jsonschema import validate, Draft202012Validator schema = json.loads(open("schema.json").read()) candidate = json.loads(open("model_output.json").read()) v = Draft202012Validator(schema) errors = sorted(v.iter_errors(candidate), key=lambda e: e.path) if errors: print("INVALID") for e in errors: path = ".".join([str(p) for p in e.path]) or "(root)" print(f"- {path}: {e.message}") sys.exit(1) print("VALID")
Node.js (with ajv)
// npm i ajv const fs = require("fs"); const Ajv = require("ajv"); const ajv = new Ajv({ allErrors: true, strict: true }); const schema = JSON.parse(fs.readFileSync("schema.json", "utf8")); const data = JSON.parse(fs.readFileSync("model_output.json", "utf8")); const validate = ajv.compile(schema); const valid = validate(data); if (!valid) { console.log("INVALID"); for (const err of validate.errors) console.log(`- ${err.instancePath || "(root)"} ${err.message}`); process.exit(1); } console.log("VALID");
Repair loop (prompt to fix invalid JSON)
Your previous JSON did not validate. Here are the errors: {{VALIDATOR_ERRORS}} Return a corrected JSON that satisfies the original schema and rules. JSON only.
The model returns prose around JSON. This is almost always a prompt issue. Strengthen the system prompt: “ONLY raw JSON, no markdown fences, no explanations.” If you must allow fences for chat UI readability, strip them before validation.
Types drift (e.g., "year": "1995"). Remind the model that numbers must be numbers and include one tiny exemplar. If drift persists, feed validator errors back verbatim and ask for a corrected JSON.
Enums and booleans are flaky under ambiguity. Prefer small, explicit enums and include the “use null when unsure” rule. Over time, monitor which fields fail and adjust the schema or upstream instructions.
Large outputs hit token limits. In beginners’ setups, start with small schemas and page results (ask for arrays in chunks). For production, consider a structured-output API feature if your provider offers it.
Your task: extract support ticket triage data from a short message using schema+validation.
Schema
{ "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "TicketTriage", "type": "object", "additionalProperties": false, "required": ["summary", "priority", "category", "needs_handoff", "confidence"], "properties": { "summary": { "type": "string", "minLength": 5 }, "priority": { "enum": ["low", "medium", "high", "urgent"] }, "category": { "enum": ["billing", "access", "bug", "question"] }, "needs_handoff": { "type": "boolean" }, "confidence": { "type": "number", "minimum": 0, "maximum": 1 } } }
System prompt
You are a careful data formatter. Return ONLY raw JSON that matches the TicketTriage schema. No prose, no markdown. Use null if unsure.
User prompt
Fill the TicketTriage schema for this message:
“Hi, I can’t log into my account since yesterday. The password reset link says it expired. Please help quickly!”
Expected output (one possible valid answer)
{
"summary": "User cannot log in; password reset link expired.",
"priority": "high",
"category": "access",
"needs_handoff": true,
"confidence": 0.8
}
Now run your validator. If it’s invalid, feed the errors back to the model and request a corrected JSON. Confirm you reach “VALID.”
You learned the simple flow for reliable structured outputs: write a minimal schema, constrain the model with a strict system prompt, and validate every response. This turns creative language output into dependable data you can ship and automate around.
Common pitfalls—like extra prose, type drift, or enum confusion—usually vanish when you separate policy (system) from task (user) and tell the model exactly how to handle unknowns. When things still go wrong, the validator’s errors are your best debugging tool—show them to the model and ask for a fix.
Structured outputs are not about fancy tricks; they’re about clarity and contracts. Keep schemas small, prompts explicit, and validation non-negotiable. That’s how you get JSON without tears.
Next steps
Swap in your own schema (e.g., product specs or blog metadata) and run the same loop.
Add a repair loop that automatically retries once with validator errors before failing.
Track validation failures over a week and refine your schema or prompts based on patterns.
Follow guided learning paths from beginner to advanced. Master prompt engineering step by step.
Explore PathsReady to Master More? Explore our comprehensive guides and take your prompt engineering skills to the next level.