Beginner’s guide to few-shot prompting. Learn zero-shot vs. few-shot, how to choose crisp examples, and test both on a labeling task. Includes a reusable template for reliable labels and JSON outputs.
Few-shot prompting is a simple way to cut mistakes by showing the model two or three mini examples before asking it to work. It’s like giving a quick demo before handing someone the task. In this guide, you’ll learn what “zero-shot” and “few-shot” mean, why a handful of examples help, and how to format them so the model copies the pattern reliably.
Zero-shot means you ask the model to do a task with instructions only and no examples. Few-shot means you add a few short input→output examples (called exemplars) above your real inputs. A label task is any job where each input maps to one choice from a fixed set of labels (e.g., Bug, Billing, Feature).
Our promise: you’ll add 2–3 examples to a tiny labeler and see fewer errors, especially on edge cases.
Models learn your pattern from the prompt you provide. Instructions tell the rules. Examples demonstrate the rules with concrete cases. Together, they anchor the model’s output format and reduce ambiguity.
Compact example
Task: classify short support messages as one of {Bug, Billing, Feature, Other}.
Zero-shot ask (instructions only): “Classify each message into one of: Bug, Billing, Feature, Other. Return JSON lines with fields text and label.”
Few-shot ask (instructions + 3 examples): Same instructions, but include three clearly formatted examples first. The model now imitates the format, the label set, and the level of brevity.
💡 Insight: The model copies the shape and vocabulary it sees. If your examples are clean and consistent, your outputs will be too.
Here’s a simple flow you can reuse for almost any label task.
Define the contract. Name the labels and the output schema (keys, casing, one object per line).
Pick 2–3 exemplars. Choose diverse, unambiguous cases that cover common paths and one mild edge case.
Format consistently. Keep the same keys, label spelling, and punctuation across all examples.
Place examples before the real inputs. Then say “Now classify:” and list the new items.
We’ll keep both instructions and examples short. Short, crisp examples reduce token bloat and drift. Always use the exact label strings you want back.
Use this once and keep it stable across runs.
SYSTEM PROMPT — “Promptise Assistant (Beginner Labeler)”
You are a careful labeler. Follow the label set exactly. Output JSON Lines (one object per line).
Never invent new labels. Never add commentary. If uncertain, choose the closest label.
Vague user prompt
Can you sort these messages?
Precise user prompt
Classify each message into exactly one of these labels: - Bug = product is broken or error message - Billing = payments, invoices, refunds, plan price - Feature = request for new capability - Other = anything else Return JSON Lines with keys: "text" and "label" (string). Use labels exactly: Bug, Billing, Feature, Other. Now classify: 1) "App crashes when I click Export" 2) "How do I get a refund for last month?" 3) "Please add SSO for contractors"
You are a careful labeler. Output JSON Lines with "text" and "label". Label set: Bug, Billing, Feature, Other. Now classify: 1) "Exports CSV but numbers are garbled" 2) "Upgrade me to Pro annual, please" 3) "My card was charged twice"
You are a careful labeler. Output JSON Lines with "text" and "label". Label set: Bug, Billing, Feature, Other. # Examples {"text": "The app freezes on login with a 502", "label": "Bug"} {"text": "Can I switch from monthly to yearly billing?", "label": "Billing"} {"text": "Could you add dark mode to the dashboard?", "label": "Feature"} # Now classify {"text": "Exports CSV but numbers are garbled"} {"text": "Upgrade me to Pro annual, please"} {"text": "My card was charged twice"}
⚠️ Pitfall: Don’t put labels in lowercase in examples and Title Case in instructions. The model will mix them. Match spelling everywhere.
Too many examples? More isn’t always better. Past 3–5, gains taper and tokens increase. Start with 2–3.
Ambiguous labels? Your definitions may be fuzzy. Add one-line label descriptions and one clarifying exemplar.
Format drift? If outputs start to include explanations, tighten the system prompt: “Never add commentary.”
Edge cases mis-labeled? Add one targeted exemplar showing the correct handling of that edge.
Order effects? Rarely, swapping example order changes outcomes. Place the most representative example first.
💡 Insight: Each exemplar teaches multiple things at once: label meanings, tone, and output shape. Keep them minimal but informative.
You’ll compare accuracy on sentiment labels {Positive, Negative, Neutral} for eight short reviews.
Dataset (8 items):
“Loved the speed—finished my work in minutes.”
“Support ignored me for a week.”
“It’s fine. Does what it says.”
“The new update is fantastic!”
“Crashes every other day.”
“Price is okay for what you get.”
“Phenomenal team. They fixed my issue fast.”
“Not terrible, but I’m disappointed.”
Step A — Zero-shot run (baseline) Use this as your user message (with the system prompt above):
Classify each review as exactly one of: Positive, Negative, Neutral. Output JSON Lines with keys "text" and "label". No commentary. {"text": "Loved the speed—finished my work in minutes."} {"text": "Support ignored me for a week."} {"text": "It’s fine. Does what it says."} {"text": "The new update is fantastic!"} {"text": "Crashes every other day."} {"text": "Price is okay for what you get."} {"text": "Phenomenal team. They fixed my issue fast."} {"text": "Not terrible, but I’m disappointed."}
Step B — Few-shot run (add 3 exemplars)
Classify each review as exactly one of: Positive, Negative, Neutral. Output JSON Lines with keys "text" and "label". No commentary. # Examples {"text": "Great value and super helpful staff.", "label": "Positive"} {"text": "Terrible experience—bugs everywhere.", "label": "Negative"} {"text": "Works as expected. Nothing special.", "label": "Neutral"} # Now classify {"text": "Loved the speed—finished my work in minutes."} {"text": "Support ignored me for a week."} {"text": "It’s fine. Does what it says."} {"text": "The new update is fantastic!"} {"text": "Crashes every other day."} {"text": "Price is okay for what you get."} {"text": "Phenomenal team. They fixed my issue fast."} {"text": "Not terrible, but I’m disappointed."}
Expected output shape (snippet, not the only correct labels):
{"text":"Loved the speed—finished my work in minutes.","label":"Positive"}
{"text":"Support ignored me for a week.","label":"Negative"}
{"text":"It’s fine. Does what it says.","label":"Neutral"}
...
Now tally how many feel right to you. Few-shot should reduce boundary mistakes (e.g., “Not terrible, but I’m disappointed.” skewing Negative vs Neutral) and formatting errors.
Few-shot prompting improves reliability by showing the model your intended pattern before it acts. With two or three clean examples and a stable system prompt, you anchor labels, tone, and output format. The result is fewer ambiguous choices and fewer formatting surprises.
The trade-off is tokens and curation time. Too many examples add cost without much benefit, and sloppy exemplars teach the wrong habits. Keep the contract clear, pick diverse samples, and enforce exact label strings to avoid drift.
As you practice, notice how a single targeted exemplar can fix a recurring error. That’s the fastest way to “teach” the model without changing anything else.
Next steps
Swap in your own domain labels (e.g., P1/P2/P3, Spam/NotSpam) and repeat the lab.
Add one tricky edge exemplar and observe whether that specific error disappears.
Wrap this into a template you can reuse across tasks, keeping the system prompt stable and only changing examples.
Follow guided learning paths from beginner to advanced. Master prompt engineering step by step.
Explore PathsReady to Master More? Explore our comprehensive guides and take your prompt engineering skills to the next level.