Skip to content
AI Infrastructure
AI Infrastructure12 min read0 views

Scaling Socket.IO Past 100k Connections With the Redis Adapter

What it actually takes to get a Socket.IO cluster past 100,000 concurrent connections in 2026: sharded Redis adapter, namespace partitioning, and the bottlenecks nobody warns you about.

A single Node.js Socket.IO process tops out around 30k–40k connections. Everything past that is architecture, not configuration.

Why does Socket.IO need special help past 100k?

flowchart TD
  Client[Client] --> Edge[Cloudflare Worker]
  Edge -->|WS upgrade| DO[Durable Object]
  DO --> AI[(OpenAI Realtime WS)]
  AI --> DO
  DO --> Client
  DO -.hibernation.-> Storage[(Persisted state)]
CallSphere reference architecture

Because Socket.IO is not just a WebSocket library — it is an event protocol with rooms, namespaces, and reconnection semantics. The naive scaling story ("just add more pods") breaks because each pod only knows about the clients connected to itself. When agent-7 emits to a room that includes agent-12, but agent-12 lives on a different pod, the message is silently dropped unless your cluster has a shared message bus.

The bus is what the Redis adapter provides. With it, every io.to("call:abc").emit(...) becomes a Redis publish that every Socket.IO process subscribes to and locally fans out. Without it, your room broadcasts work in dev and silently fail in prod.

How does the sharded Redis adapter work?

The 2026 best practice is the sharded Redis adapter, which uses Redis 7's sharded pub/sub feature. Standard Redis pub/sub is a single keyspace — every publish replicates to every subscriber on every shard, which makes the broker the bottleneck around 60k–80k connections per cluster.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Sharded pub/sub partitions channels by hash slot, so each Redis shard only carries traffic for the rooms it owns. We see linear scaling to 500k+ connections across an 8-shard Redis cluster because the broker no longer fans out globally — it fans out to the shard owners only.

Add namespace partitioning on top. A namespace in Socket.IO is an isolated event channel; you can put your dashboard namespace on one fleet of pods and your call-streaming namespace on another, with completely separate Redis clusters. This bounds the blast radius and lets you scale the hot path independently.

CallSphere's implementation

CallSphere runs Socket.IO across two surfaces: the Sales Calling dashboard (used by 37 agents and their managers) and the After-hours operator console. We use the sharded Redis adapter on a 3-shard ElastiCache cluster, two Socket.IO namespaces (/calls and /dashboard), and a NestJS gateway behind an AWS NLB with no sticky sessions.

Why no sticky sessions? Because the Redis adapter handles cross-pod broadcast, every connection can land on any pod, and the load balancer stays simple. We provisioned for 50k concurrent dashboard connections per region; the actual peak across six verticals is closer to 12k, which leaves room for the affiliate program announced last quarter.

Code: bootstrapping the sharded adapter

import { Server } from "socket.io";
import { createShardedAdapter } from "@socket.io/redis-adapter";
import { createCluster } from "redis";

const pub = createCluster({ rootNodes: REDIS_NODES });
const sub = pub.duplicate();
await Promise.all([pub.connect(), sub.connect()]);

const io = new Server(httpServer, { transports: ["websocket"] });
io.adapter(createShardedAdapter(pub, sub));

io.of("/calls").on("connection", (socket) => {
  socket.on("join", ({ callId }) => socket.join(`call:${callId}`));
});

Build steps

  1. Pin Socket.IO 4.x and @socket.io/redis-adapter 8.x or newer; older versions do not support sharded pub/sub.
  2. Provision Redis 7+ in cluster mode. ElastiCache, Memorystore, and Upstash all support it.
  3. Disable long-polling by setting transports: ["websocket"] on the server. Long-polling triples your CPU at scale.
  4. Drop sticky sessions. With the adapter you do not need them.
  5. Set per-pod connection limits (we use 25k) and let the orchestrator scale horizontally above that.
  6. Monitor socket_io_connected Prometheus gauge and redis_pubsub_channels to catch broker pressure.

FAQ

Do I still need sticky sessions for upgrade? Only if you allow long-polling fallback. Pure WebSocket transport opens once and stays — sticky is unnecessary.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

What is the per-pod ceiling? With uws enabled, ~50k. With the default engine, ~30k. RAM, not CPU, is usually the bound — 8 KB per connection is realistic.

Can I use a single Redis instance? Up to about 60k connections, yes. Beyond that, switch to clustered/sharded pub/sub.

How do I rate limit per user? The adapter exposes a connection middleware; record user_id → connection_count in Redis and reject above a threshold.

What about cross-region? Use a separate Redis cluster per region with a federation layer (NATS or Kafka) bridging publishes you actually need cross-region. Do not stretch one Redis cluster across regions.

CallSphere supports 115+ database tables and 90+ tools across six verticals — the dashboard fan-out is one piece of that. Start the 14-day trial at $149/$499/$1499.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.