Foundations · Day 6
Structured Outputs: Teaching AI to Answer in the Shape You Need
A thoughtful answer is wonderful. An answer shaped exactly the way your application needs is something else entirely.
April 30, 2026·10 min read
✦ The Form That Held Everything
Imagine you are applying for a job online.
You have a rich story. Experience, projects, achievements, a career that curves in interesting ways.
Now the form asks for specific fields. Full name. Email. Years of experience. Current role. Skills.
You cannot pour your whole story into the email field. The form is not evaluating whether your story is meaningful. It is asking for meaning in a shape the system can hold.
That is exactly what happens in AI applications.
The model may produce a thoughtful response. But your application needs the response to arrive as fields it can recognize. A score field. A summary field. An array of strengths. A nextAction chosen from a fixed list.
The website form cannot guess which sentence contains your email.
Your backend should not have to guess which sentence contains the score.
When software has to guess, reliability begins to weaken.
✦ When the Same Answer Keeps Changing
The problem with plain text is not that it is wrong. The problem is that it is not stable.
Say you are building a resume reviewer. You run the same prompt three times.
Run one returns a structured breakdown: score out of ten, strengths as bullets, weaknesses, next steps.
Run two returns a flowing paragraph: roughly the same assessment, different arrangement, different words for the same ideas.
Run three returns JSON, but with rating instead of score, good_points instead of strengths, recommendation instead of next_steps.
The meaning across all three is similar.
Your code does not work with similar. It works with exact.
The model may understand meaning. Your application needs contracts.
This is the core tension structured outputs are designed to resolve.
✦ What It Means to Have a Shape
Take the resume reviewer from above. The candidate is a frontend developer with solid React experience but no measurable outcomes in their work history.
A free-form response to that prompt might arrive like this:
This candidate looks strong overall. I'd say around 8 out of 10. Their React work is solid and they clearly owned their projects, but there's no mention of measurable outcomes and system design depth is hard to see. They should rewrite the experience section with specific numbers.
Useful to read. But your UI cannot display "around 8" as a score. Your backend cannot store "they clearly owned their projects" as a list. Your dashboard cannot render improvements as cards from a single paragraph. Your next function cannot act on any of this without parsing English first.
A structured output delivers the same assessment as an object:
{
"score": 8,
"summary": "Strong frontend profile with room to show measurable impact.",
"strengths": [
"Clear React expertise",
"Good ownership of past projects"
],
"improvements": [
{
"area": "Impact",
"suggestion": "Add measurable outcomes to each project."
},
{
"area": "System design",
"suggestion": "Mention architecture decisions and tradeoffs you owned."
}
],
"nextStep": "Rewrite the work experience section with specific numbers and outcomes."
}
Same assessment. Different shape. And that difference is the whole thing.
Now the output is not just something to read. It is something to use.
Your frontend can display the score. Your database can store the summary. Your dashboard can render each improvement as a card. Your next function can decide what to do with the result.
This is where AI starts to feel less like a chatbot and more like a backend capability. The model still provides the intelligence. The structure makes that intelligence usable.
✦ The Prompt Is the Instruction. The Schema Is the Agreement.
Prompting is still important. But structured outputs introduce a second layer.
The prompt says what you want. The schema says what shape the answer must arrive in.
Think of the prompt as instruction. Think of the schema as agreement.
The prompt may say:
Review this resume for a frontend developer role. Focus on clarity, impact, and technical depth.
The schema says: whatever you conclude, return it in this exact shape, with these exact fields, holding these exact types.
You are no longer simply hoping for a useful answer. You are designing the boundary between intelligence and software.
Most AI bugs do not happen because the model cannot write. They happen because the output is hard to trust, hard to parse, or hard to connect to the rest of the system. A clear schema makes that boundary visible, and workable.
✦ What This Looks Like in Code
Here is a Node.js example using Zod for schema validation. Read it slowly. Even if every line is not yet familiar, read it for the shape.
import OpenAI from "openai";
import { zodTextFormat } from "openai/helpers/zod";
import { z } from "zod";
const openai = new OpenAI();
const ResumeFeedback = z.object({
score: z.number(),
summary: z.string(),
strengths: z.array(z.string()),
improvements: z.array(
z.object({
area: z.string(),
suggestion: z.string(),
})
),
nextStep: z.string(),
});
const response = await openai.responses.parse({
model: "gpt-4o-mini",
input: [
{
role: "system",
content:
"You are a careful resume reviewer. Return practical feedback for a frontend developer role.",
},
{
role: "user",
content: resumeText,
},
],
text: {
format: zodTextFormat(ResumeFeedback, "resume_feedback"),
},
});
const feedback = response.output_parsed;
The schema definition is where the contract lives:
const ResumeFeedback = z.object({
score: z.number(),
summary: z.string(),
strengths: z.array(z.string()),
improvements: z.array(
z.object({
area: z.string(),
suggestion: z.string(),
})
),
nextStep: z.string(),
});
You are making a promise to the system: the score will be a number, the summary will be text, the strengths will be a list of strings, each improvement will carry an area and a suggestion.
This is not formatting. It is a contract in code.
Your UI can now trust that feedback.strengths is an array. You can map over it without guessing:
feedback.strengths.map((strength) => {
return `<li>${strength}</li>`;
});
Your backend can store the fields separately. Your analytics can compare scores across users. Your workflow can branch based on the result. The model provides the intelligence. The schema makes that intelligence something the rest of your application can hold.
This candidate has solid frontend experience and shows clear ownership of projects. I'd rate them around 8 out of 10. Their React work is impressive, but they haven't shown measurable outcomes for their contributions. System design depth is also not visible. The best next step would be to rewrite the work experience section with specific metrics and impact numbers.
Toggle fields
UI preview
Strong frontend profile with room to show measurable impact.
Rewrite work experience with specific outcomes and numbers.
Free-form answers are easy to read. Structured answers are easy to build with.
✦ JSON Is Not the Goal
It is easy to misread this lesson.
Many developers hear "structured outputs" and think: make the model return JSON. That is the surface of the idea, not the idea itself.
The deeper goal is reliability.
You want the model's output to be predictable enough to parse, clear enough to validate, stable enough to store, specific enough to render, and safe enough to pass into the next step of your system.
JSON is one common shape for achieving that reliability. But the shape is not the point. The reliability is.
This matters most when AI has to feed another part of your application. A resume reviewer that returns a score and a next action. An interview coach that returns correctness scores and follow-up questions. A document Q&A system that returns an answer alongside source references. A moderation tool that returns category, severity, and recommended response.
In all of these, a paragraph is not enough. The answer must become data.
✦ When Every Response Can Also Become Data
There is a quieter benefit that most people miss initially.
Imagine your AI interview coach is evaluating answers across a hundred learners. Each response arrives as a paragraph. Comparison is nearly impossible. You can read them. You cannot measure them.
But if each response returns structured fields:
{
"correctness": 7,
"clarity": 6,
"missingConcepts": ["edge cases", "time complexity"],
"recommendedPractice": "Solve two similar problems with explicit constraints."
}
Now you can ask real questions. Which concepts are most commonly missing? Which learners are improving week over week? Which practice questions produce the lowest scores? Where does the curriculum need reinforcement?
Without structure, every response is a conversation. With structure, every response can also become data.
That distinction matters enormously for anyone building a learning system, a coaching product, or anything that needs to improve over time based on what it observes.
✦ The Trap of Designing Too Much Too Soon
Structure is powerful. Premature structure is a cage.
When you are still discovering what the right product experience is, free-form responses can teach you a great deal. You may not yet know which fields actually matter. You ask for confidence, but later realize riskLevel is more useful. You ask for one nextStep, but users need three options: a quick fix, deeper practice, and a longer-term goal.
The schema you design on day one is rarely the schema you want on day thirty.
Start with the minimum your application actually needs. Ask: what will my UI render? What will my backend store? What will my next function need? What will I need to measure?
Design only those fields. A good schema is not the most complete schema. It is the one that matches the actual job your application needs to do right now.
Think of one AI feature you want to build. You have decided it needs to return structured data. Before writing any schema, ask yourself: if you could only keep three fields in the response object, which three would they be? Your answer to that question is not a technical decision. It is a product decision. It tells you what your application actually needs, as opposed to what would be nice to have.
✦ When the Model Supplies the Props
For developers who already think in React, something clicks at this point.
A component expects props. Specific props. Named props. Typed props. If FeedbackCard expects a score and receives a paragraph instead, it does not gracefully degrade. It breaks.
<FeedbackCard
score={feedback.score}
summary={feedback.summary}
strengths={feedback.strengths}
improvements={feedback.improvements}
nextStep={feedback.nextStep}
/>
The moment AI output becomes structured, it flows naturally into your component tree. The model is no longer writing to the user. It is supplying props to your interface.
There is also a pattern worth knowing for cases where the response itself should feel free. A journaling assistant. A writing coach. A reflective chatbot. These experiences need natural language to reach the user. But your system may still need structured information quietly, in the background.
The model can return both at once:
{
"visibleResponse": "A warm paragraph shown directly to the user.",
"detectedMood": "discouraged",
"suggestedFollowUp": "Offer encouragement and one small, concrete next step."
}
The user sees only the warm paragraph. Your system uses the other fields to decide what comes next. Human outside. Structure inside.
An AI response is also an API response. It deserves the same care.
✦ Try It Yourself
Pick one of these features. Do not write code yet. Write the shape.
First, write the natural prompt you would send. Then write the fields your application needs. Then write a sample JSON output by hand, with real values, as if the model had already responded.
Resume Reviewer: score, summary, strengths, improvements, nextStep
Interview Evaluator: correctness, clarity, missingConcepts, followUpQuestion, practiceSuggestion
Study Planner: topics, dailyPlan, estimatedTime, risks, encouragement
Bug Report Summarizer: issueSummary, severity, expectedBehavior, actualBehavior, ownerSuggestion
Writing the sample output by hand, before generating it, is not a small thing. It forces you to see what the structure actually needs to hold. Before AI can fill a shape, you must be able to see the shape yourself. The model fills it. You design it.
The Resume Reviewer from the list above is already built. If you want to see the full working version before writing your own, it is ready to clone.
Resume Reviewer
A working resume reviewer that takes plain text input and returns a structured assessment: score, summary, strengths, improvement areas, and a next step. Built with the OpenAI structured outputs API and Zod schema validation.
- ◆Structured output enforced via zodResponseFormat: the schema is a contract, not a suggestion
- ◆Response renders as a proper UI: score, bullets, improvement cards, next step
- ◆Same Zod schema from the article, running in a real application
- ◆API key kept server-side only
✦ What This Opens Up
You now have a new layer in your toolkit.
First, you learned that AI predicts tokens. Then that context limits what it can see. Then that prompting shapes how it responds. Then you made your first API call.
Now you know that the response itself can have architecture.
This prepares you for everything that comes next. Embeddings will turn meaning into searchable form. RAG will bring external knowledge into the conversation. Agents will use structured outputs to make decisions and take actions. Evaluation will measure whether the system is actually working, over time, across users.
All of these depend on one quiet skill: knowing how to turn intelligence into something software can hold.
Structured outputs are that holding shape.
✦ Learn More
- OpenAI Structured Outputs documentation: the official guide to enforcing output schemas with the API
- JSON Schema: Getting Started: a clear introduction to defining and validating data shapes
- Zod documentation: the TypeScript-first schema validation library used in the code example above
You are designing an AI feature that gives feedback on someone's writing. One approach: the response arrives as warm, free-flowing prose that the user reads and feels understood by. Another approach: it arrives as structured data your system uses to show a score, highlight specific strengths, and suggest a next practice exercise. Both are valid products. Both serve the user. But they require completely different approaches to building. Which would you build first, and what does that decision reveal about what you believe the product actually is?
A river is powerful because it flows. But a river that enters the fields through quiet channels gives life where it is needed.
Let AI flow for humans. Let structure guide it for systems.