Structured Outputs and JSON Mode: Getting Reliable Data from LLMs in Production
on Ai, Llm, Structuredoutputs, Pydantic, Openai, Anthropic, Productionai
The Problem with Free-Form LLM Outputs
You’ve built an LLM pipeline. You ask the model to extract information and return it as JSON. It works 95% of the time. The other 5%? The model adds a preamble (“Here’s the JSON you requested:”), wraps code in markdown fences, returns invalid JSON, or decides to be helpful by adding fields you didn’t ask for.
In production, 95% reliability is not reliability. It’s a source of incidents.
Structured outputs — the ability to constrain LLM responses to a specific schema — have become one of the most important reliability features in production LLM systems. In 2026, every major provider offers some form of this, and the tooling around it has matured significantly.
Photo by Markus Spiske on Unsplash
Provider Support: Where Things Stand in 2026
| Provider | Method | Schema Type | Reliability |
|---|---|---|---|
| OpenAI | response_format: json_schema | JSON Schema | ~100% (constrained decoding) |
| Anthropic | Tool use / betas: ["structured-json"] | JSON Schema | ~99.9% |
| Google Gemini | response_mime_type, response_schema | OpenAPI subset | ~99.9% |
| AWS Bedrock | Provider-specific | Varies | Varies |
| Ollama (local) | format: json or schema | JSON Schema | Model-dependent |
The key distinction: constrained decoding (OpenAI’s approach) vs. instruction-based (earlier approaches). Constrained decoding modifies the token sampling process to only allow tokens that would produce valid schema-conforming output. It’s mechanically guaranteed, not probabilistic.
OpenAI Structured Outputs: The Reference Implementation
from openai import OpenAI
from pydantic import BaseModel
from typing import Optional
client = OpenAI()
class ExtractedCompany(BaseModel):
name: str
industry: str
founded_year: Optional[int]
headquarters: str
employee_count_range: str # e.g., "100-500", "1000+"
is_public: bool
notable_products: list[str]
def extract_company_info(text: str) -> ExtractedCompany:
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-11-20",
messages=[
{
"role": "system",
"content": "Extract company information from the provided text. "
"Be precise; use null for fields where information is not present."
},
{
"role": "user",
"content": text
}
],
response_format=ExtractedCompany
)
# If structured outputs worked, this is guaranteed valid
return response.choices[0].message.parsed
# Usage
company = extract_company_info("""
Stripe was founded in 2010 by Patrick and John Collison.
The fintech company is headquartered in South San Francisco
and has over 7,000 employees. Products include Stripe Payments,
Stripe Connect, and Stripe Billing.
""")
print(company.name) # "Stripe"
print(company.founded_year) # 2010
print(company.is_public) # False
print(company.notable_products) # ["Stripe Payments", "Stripe Connect", "Stripe Billing"]
The .parse() method on the beta client handles schema generation from the Pydantic model and deserializes the response automatically. No manual JSON parsing, no try/except for parse errors.
Anthropic: Tool Use Pattern
Anthropic doesn’t have native structured outputs in the same way, but tool use with a schema is functionally equivalent:
import anthropic
from pydantic import BaseModel
import json
client = anthropic.Anthropic()
class SentimentAnalysis(BaseModel):
sentiment: str # positive, negative, neutral
confidence: float # 0.0 to 1.0
key_phrases: list[str]
summary: str
def analyze_sentiment(text: str) -> SentimentAnalysis:
tool_schema = SentimentAnalysis.model_json_schema()
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
tools=[{
"name": "report_sentiment",
"description": "Report the sentiment analysis results",
"input_schema": tool_schema
}],
tool_choice={"type": "tool", "name": "report_sentiment"},
messages=[{
"role": "user",
"content": f"Analyze the sentiment of this text:\n\n{text}"
}]
)
# Extract tool use result
tool_use = next(
block for block in response.content
if block.type == "tool_use"
)
return SentimentAnalysis(**tool_use.input)
By using tool_choice={"type": "tool", "name": "..."}, we force the model to always call our tool — which means it always returns structured data in our schema.
Building a Provider-Agnostic Structured Output Layer
In production, you often want to swap models without rewriting business logic. Here’s a clean abstraction:
from abc import ABC, abstractmethod
from typing import TypeVar, Type
from pydantic import BaseModel
import anthropic
import openai
T = TypeVar('T', bound=BaseModel)
class StructuredLLM(ABC):
@abstractmethod
def extract(self, prompt: str, schema: Type[T], system: str = "") -> T:
pass
class OpenAIStructured(StructuredLLM):
def __init__(self, model: str = "gpt-4o-2024-11-20"):
self.client = openai.OpenAI()
self.model = model
def extract(self, prompt: str, schema: Type[T], system: str = "") -> T:
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
response = self.client.beta.chat.completions.parse(
model=self.model,
messages=messages,
response_format=schema
)
return response.choices[0].message.parsed
class AnthropicStructured(StructuredLLM):
def __init__(self, model: str = "claude-sonnet-4-5"):
self.client = anthropic.Anthropic()
self.model = model
def extract(self, prompt: str, schema: Type[T], system: str = "") -> T:
tool_name = f"extract_{schema.__name__.lower()}"
kwargs = {
"model": self.model,
"max_tokens": 2048,
"tools": [{
"name": tool_name,
"description": f"Extract and return {schema.__name__} data",
"input_schema": schema.model_json_schema()
}],
"tool_choice": {"type": "tool", "name": tool_name},
"messages": [{"role": "user", "content": prompt}]
}
if system:
kwargs["system"] = system
response = self.client.messages.create(**kwargs)
tool_use = next(b for b in response.content if b.type == "tool_use")
return schema(**tool_use.input)
# Usage — same code regardless of provider
def process_document(text: str, llm: StructuredLLM) -> DocumentSummary:
return llm.extract(
prompt=f"Summarize this document:\n\n{text}",
schema=DocumentSummary,
system="You are a precise document analyzer."
)
# Swap providers trivially
llm = OpenAIStructured() # or AnthropicStructured()
summary = process_document(document_text, llm)
Schema Design Patterns
1. Use Enums for Constrained Values
from enum import Enum
class Priority(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class IssueClassification(BaseModel):
priority: Priority # LLM can only return valid enum values
category: str
affected_users: int
2. Use Optional for Uncertain Fields
from typing import Optional
class PersonInfo(BaseModel):
name: str # Required — fail if not present
email: Optional[str] = None # Optional — null if not found
age: Optional[int] = None # Optional — null if not mentioned
company: Optional[str] = None
3. Nested Schemas for Complex Data
class Address(BaseModel):
street: str
city: str
country: str
postal_code: str
class ContactInfo(BaseModel):
name: str
email: str
phone: Optional[str] = None
address: Optional[Address] = None # Nested object
class BusinessCard(BaseModel):
contacts: list[ContactInfo] # Array of nested objects
notes: str
4. Field Descriptions for Better Extraction
from pydantic import Field
class FinancialMetrics(BaseModel):
revenue: float = Field(description="Annual revenue in USD millions")
growth_rate: float = Field(description="Year-over-year growth rate as percentage, e.g., 23.5 for 23.5%")
profit_margin: float = Field(description="Net profit margin as percentage")
fiscal_year: int = Field(description="Fiscal year this data refers to, e.g., 2025")
Field descriptions are included in the schema sent to the model and significantly improve extraction accuracy for ambiguous fields.
Error Handling and Retry Strategy
Even with structured outputs, production code needs defensive handling:
import time
from typing import Optional
class StructuredExtractionError(Exception):
pass
def extract_with_retry(
llm: StructuredLLM,
prompt: str,
schema: Type[T],
max_attempts: int = 3,
fallback_value: Optional[T] = None
) -> T:
last_error = None
for attempt in range(max_attempts):
try:
result = llm.extract(prompt, schema)
# Additional validation beyond schema
validate_business_rules(result)
return result
except ValidationError as e:
# Pydantic validation failed — schema conformant but invalid values
last_error = e
if attempt < max_attempts - 1:
# Add error context to prompt and retry
error_context = f"\n\nPrevious attempt failed validation: {str(e)}\nPlease ensure values meet the constraints."
prompt = prompt + error_context
time.sleep(0.5 * (attempt + 1)) # exponential backoff
except Exception as e:
last_error = e
if attempt < max_attempts - 1:
time.sleep(1.0 * (attempt + 1))
if fallback_value is not None:
return fallback_value
raise StructuredExtractionError(
f"Failed after {max_attempts} attempts"
) from last_error
def validate_business_rules(result: BaseModel):
"""Additional validation beyond Pydantic schema."""
if hasattr(result, 'confidence') and not (0 <= result.confidence <= 1):
raise ValidationError("confidence must be between 0 and 1")
if hasattr(result, 'email') and result.email and '@' not in result.email:
raise ValidationError(f"Invalid email format: {result.email}")
Performance Considerations
Structured outputs have overhead:
- Schema is included in the prompt (token cost)
- Constrained decoding can be slightly slower per token
- Tool use adds a round-trip overhead in some providers
Optimization strategies:
# 1. Keep schemas lean — only extract what you need
class MinimalExtraction(BaseModel):
key_fact: str # Don't add 10 optional fields "just in case"
confidence: float
# 2. Batch when possible
def batch_extract(texts: list[str], schema: Type[T]) -> list[T]:
# Some providers support batch APIs
# Or use asyncio for parallel requests
import asyncio
async def extract_one(text: str) -> T:
return await async_llm.extract(text, schema)
return asyncio.run(
asyncio.gather(*[extract_one(t) for t in texts])
)
# 3. Cache schema compilation
from functools import lru_cache
@lru_cache(maxsize=128)
def get_compiled_schema(model_class: Type[BaseModel]) -> dict:
return model_class.model_json_schema()
Conclusion
Structured outputs have transformed LLM pipeline reliability in production. The combination of:
- Provider-level schema enforcement (OpenAI’s constrained decoding, Anthropic’s tool forcing)
- Pydantic models for Python-native type safety
- Field descriptions for extraction accuracy
- Retry strategies for the edge cases that still slip through
- Provider abstraction for flexibility
…gives you the reliability needed to ship LLM features to production with confidence.
The 5% failure rate that plagued prompt-engineering-only approaches drops to under 0.1% with properly implemented structured outputs. For pipelines processing thousands of documents per day, that difference is the gap between a reliable product and an ops nightmare.
Are you using structured outputs in your LLM pipelines? What’s been your experience with schema design for complex nested extractions? I’d love to hear what patterns have worked for your team.
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
