GraphQL Federation in 2026: Building a Production Supergraph at Scale
on Graphql, Federation, Microservices, Api design, Apollo, Backend
GraphQL Federation in 2026: Building a Production Supergraph at Scale
GraphQL Federation promised to solve the hardest problem in API design: how do you give clients a unified, coherent interface to a distributed system? By 2026, it has largely delivered on that promise — but the devil is in the implementation details that vendor documentation tends to gloss over.
This guide is about running Federation in production: schema composition, performance at scale, security boundaries, and the operational practices that keep your supergraph healthy as your organization grows.
Photo by Taylor Vick on Unsplash
What Federation Actually Solves (and Doesn’t)
Federation solves:
- Clients getting a single, type-safe API for the entire platform
- Backend teams owning their domain types independently
- Schema evolution without breaking consumers
- Avoiding the “BFF proliferation” problem (20 backend-for-frontends)
Federation doesn’t solve:
- The need for good schema design (garbage in, garbage out)
- N+1 query problems within subgraphs
- Auth complexity (each subgraph still needs to validate)
- Operational complexity (you now have more moving parts)
Go in with clear eyes.
Federation 2: The Architecture
┌──────────────────────────────────────────────────────────┐
│ Clients │
│ (Web, Mobile, Third-Party, Internal) │
└──────────────────────┬───────────────────────────────────┘
│
GraphQL requests
│
┌──────────────────────▼───────────────────────────────────┐
│ Router / Gateway │
│ (Apollo Router, Hive Gateway, etc.) │
│ │
│ • Query planning (split query across subgraphs) │
│ • Schema composition validation │
│ • Auth token forwarding │
│ • Response merging │
│ • Rate limiting, caching, observability │
└────────┬───────────────┬───────────────┬─────────────────┘
│ │ │
HTTP/gRPC HTTP/gRPC HTTP/gRPC
│ │ │
┌────────▼───────┐ ┌─────▼──────┐ ┌─────▼──────────┐
│ Users │ │ Orders │ │ Products │
│ Subgraph │ │ Subgraph │ │ Subgraph │
│ │ │ │ │ │
│ User type │ │ Order │ │ Product │
│ Profile │ │ OrderItem │ │ Inventory │
│ Auth │ │ Payment │ │ Pricing │
└────────────────┘ └────────────┘ └────────────────┘
Defining Your Subgraph Schema
Users Subgraph
# users-subgraph/schema.graphql
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.6",
import: ["@key", "@shareable", "@external", "@requires", "@provides"])
type Query {
user(id: ID!): User
me: User
users(filter: UserFilter, pagination: Pagination): UserConnection!
}
type User @key(fields: "id") {
id: ID!
email: String!
name: String!
createdAt: DateTime!
# This field is "provided" by this subgraph — other subgraphs can use it
# without making another call back to users-subgraph
tier: UserTier!
}
enum UserTier {
FREE
PRO
ENTERPRISE
}
Orders Subgraph — Extending the User Type
# orders-subgraph/schema.graphql
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.6",
import: ["@key", "@shareable", "@external", "@requires", "@provides"])
type Query {
order(id: ID!): Order
orders(userId: ID!, pagination: Pagination): OrderConnection!
}
# Extend User to add order-related fields — owned by orders team
type User @key(fields: "id") {
id: ID! @external
orders(pagination: Pagination): OrderConnection!
totalOrderValue: Money!
}
type Order @key(fields: "id") {
id: ID!
user: User!
items: [OrderItem!]!
status: OrderStatus!
total: Money!
createdAt: DateTime!
}
type OrderItem {
product: Product! # References Product type from products-subgraph
quantity: Int!
unitPrice: Money!
}
# Stub — products-subgraph owns this type
type Product @key(fields: "id", resolvable: false) {
id: ID!
}
Products Subgraph
# products-subgraph/schema.graphql
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.6",
import: ["@key", "@shareable", "@external", "@requires", "@provides"])
type Query {
product(id: ID!): Product
products(filter: ProductFilter, pagination: Pagination): ProductConnection!
searchProducts(query: String!, limit: Int = 20): [Product!]!
}
type Product @key(fields: "id") {
id: ID!
name: String!
description: String
price: Money!
inventory: InventoryStatus!
category: Category!
images: [ProductImage!]!
}
The @requires Directive: Cross-Subgraph Field Dependencies
This is where federation gets powerful — and tricky:
# shipping-subgraph/schema.graphql
# User type has 'tier' from users-subgraph and 'totalOrderValue' from orders-subgraph
type User @key(fields: "id") {
id: ID! @external
tier: UserTier! @external # Comes from users-subgraph
totalOrderValue: Money! @external # Comes from orders-subgraph
# @requires tells the router: "to resolve shippingDiscount, I need tier and totalOrderValue"
# The router will fetch those first, then pass them to this subgraph
shippingDiscount: Money! @requires(fields: "tier totalOrderValue")
}
The resolver in shipping-subgraph receives the required fields:
// shipping-subgraph/resolvers.ts
const resolvers = {
User: {
shippingDiscount: (user: { id: string; tier: UserTier; totalOrderValue: Money }) => {
// tier and totalOrderValue are already resolved — no additional calls needed
if (user.tier === "ENTERPRISE") return { amount: 100, currency: "USD" };
if (user.totalOrderValue.amount > 10000) return { amount: 50, currency: "USD" };
return { amount: 0, currency: "USD" };
}
}
};
⚠️ Warning: Each @requires adds a hop to the query plan. Chain them carefully.
Query Planning and Performance
The router generates a query plan for every incoming request. Understanding query plans is essential for debugging performance.
Enabling Query Plan Inspection
# router.yaml
supergraph:
listen: 0.0.0.0:4000
sandbox:
enabled: true # Enables Apollo Sandbox UI
telemetry:
exporters:
tracing:
otlp:
endpoint: http://otel-collector:4317
# Enable query planning debug info
preview_query_plan_generation_mode: "new"
# A query that will generate a complex plan:
query UserDashboard($userId: ID!) {
user(id: $userId) {
name # → users-subgraph
email # → users-subgraph
tier # → users-subgraph
orders(pagination: { limit: 5 }) { # → orders-subgraph
nodes {
id
status
items {
quantity
product { # → products-subgraph (entity lookup)
name
price
}
}
}
}
shippingDiscount # → shipping-subgraph (requires tier + totalOrderValue)
}
}
The router will:
- Fetch
user.name, email, tierfrom users-subgraph - In parallel: fetch
user.ordersfrom orders-subgraph - After (2): batch-fetch
productentities from products-subgraph - After (1) +
totalOrderValuefrom orders: fetchshippingDiscountfrom shipping-subgraph - Merge all results
The N+1 Problem in Subgraphs
When the router resolves a list of entities, it calls your subgraph’s entity resolver. Without DataLoader, this creates N+1:
// ❌ N+1 — called once per product
const resolvers = {
Product: {
__resolveReference: async (reference: { id: string }) => {
return await db.products.findById(reference.id);
}
}
};
// ✅ DataLoader batching — all products in one call
import DataLoader from "dataloader";
const productLoader = new DataLoader(async (ids: readonly string[]) => {
const products = await db.products.findByIds(ids as string[]);
// Map back to input order — DataLoader requires this
return ids.map(id => products.find(p => p.id === id) ?? null);
});
const resolvers = {
Product: {
__resolveReference: async (reference: { id: string }) => {
return await productLoader.load(reference.id);
}
}
};
Authentication and Authorization Patterns
Federation centralizes auth at the router, but authorization lives in subgraphs.
Router: JWT Validation and Header Forwarding
# router.yaml
authentication:
router:
jwt:
jwks:
- url: https://your-auth-provider.com/.well-known/jwks.json
# Forward auth context to all subgraphs
headers:
all:
request:
- propagate:
named: x-user-id
- insert:
name: x-user-role
value: "{context.jwt.claims.role}"
Subgraph: Field-Level Authorization
// In your subgraph resolvers
import { GraphQLError } from "graphql";
const resolvers = {
Query: {
adminReport: (_, __, context) => {
if (context.userRole !== "ADMIN") {
throw new GraphQLError("Forbidden", {
extensions: { code: "FORBIDDEN" }
});
}
return generateReport();
}
},
User: {
email: (user, _, context) => {
// Users can see their own email; admins can see all
if (context.userId !== user.id && context.userRole !== "ADMIN") {
return null; // Or throw — depends on your UX preference
}
return user.email;
}
}
};
Schema Registry and CI/CD Integration
Rover CLI: Schema Checks in CI
# .github/workflows/schema-check.yml
name: GraphQL Schema Check
on:
pull_request:
paths:
- "orders-subgraph/**"
jobs:
schema-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Rover
run: curl -sSL https://rover.apollo.dev/nix/latest | sh
- name: Check Schema
env:
APOLLO_KEY: $
run: |
~/.rover/bin/rover subgraph check production \
--schema ./orders-subgraph/schema.graphql \
--name orders \
--routing-url https://orders-service.internal/graphql
Rover will detect:
- Breaking changes to existing types/fields
- Composition errors (your schema can’t be merged with others)
- Unused fields (with usage analytics)
Observability: Tracing Across Subgraphs
Every query span should show which subgraphs were called and how long each took:
# router.yaml
telemetry:
instrumentation:
spans:
router:
attributes:
graphql.operation.name:
request_header: x-operation-name
subgraph:
attributes:
subgraph.name: true
subgraph.url: true
exporters:
tracing:
otlp:
endpoint: http://jaeger:4317
apollo:
# Send operation metrics to Apollo Studio
client_name:
header_name: apollographql-client-name
client_version:
header_name: apollographql-client-version
With this in place, your Jaeger/Grafana Tempo traces will show:
- Total query time
- Time in query planning
- Per-subgraph fetch times
- Entity batch sizes
Photo by Luke Chesser on Unsplash
Production Configuration: Apollo Router
# Full production router.yaml
supergraph:
listen: 0.0.0.0:4000
introspection: false # Disable in production
limits:
max_depth: 15
max_aliases: 30
max_root_fields: 20
traffic_shaping:
router:
timeout: 30s
all:
timeout: 10s
experimental_retry:
min_per_sec: 10
ttl: 10s
retry_mutations: false
cors:
origins:
- https://your-app.com
- https://admin.your-app.com
response_cache:
enabled: true
subgraph:
all:
ttl: 60s
persisted_queries:
enabled: true # Only allow pre-registered queries in production
safelist:
enabled: true
require_id: true
Key Takeaways
- Federation 2 + Apollo Router is the mature production stack in 2026; self-hosted with Hive Gateway is a strong alternative
@keyand@requiresare the core directives — understand their performance implications- DataLoader is non-negotiable in entity resolvers; N+1 will destroy performance
- Auth belongs in two layers: validation at the router, authorization in subgraphs
- Rover CI integration catches breaking changes before they reach production
- Persisted queries provide both performance (smaller payloads) and security (only known operations allowed)
- Query plan inspection is your primary debugging tool for performance issues
The teams running Federation successfully in 2026 treat schema changes with the same rigor as database migrations — versioned, reviewed, and never broken without a migration path for consumers.
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
