The Problem: Real-Time Vitals at Clinical Scale
When a healthcare client approached us with a deceptively simple request — "we need our clinical staff to see patient vitals update in real time" — the complexity revealed itself in the constraints. Their nursing stations monitored anywhere from 8 to 40 patients simultaneously. Vitals data (heart rate, SpO2, blood pressure, respiratory rate, temperature) streamed from bedside IoT devices every 1–3 seconds. The requirements were unambiguous:
- Sub-second latency from device reading to dashboard update for critical vitals
- HIPAA compliance across every layer of the data pipeline, including the WebSocket connections themselves
- Variable concurrent connections ranging from 50 during night shifts to 800+ during peak hours across multiple hospital units
- Zero tolerance for missed critical alerts — if a patient's heart rate spikes, every subscribed clinician must see it immediately
- Cost efficiency that scaled with actual usage, not provisioned capacity sitting idle at 3 AM
The client had initially prototyped with polling — a 5-second interval hitting a REST endpoint. At 40 patients per dashboard, that meant 480 HTTP requests per minute per clinician session. The approach was burning through API Gateway request quotas, hammering the database, and still delivering data that was 5 seconds stale. For a code blue scenario, 5 seconds is an eternity.
Architecture Overview
We landed on a fully serverless WebSocket architecture built on AWS. Here is how data flows from bedside device to clinician dashboard:
Ingestion Layer: Bedside IoT devices publish vitals readings via MQTT to AWS IoT Core. An IoT Rule routes messages into an Amazon Kinesis Data Stream with per-patient partition keys, guaranteeing ordered delivery within each patient's data stream.
Processing Layer: A Lambda function consumes from Kinesis with an event source mapping (batch size of 10, 1-second maximum batching window). This function performs threshold evaluation against per-patient alert configurations, transforms raw device payloads into the dashboard's vitals schema, and fans out to connected WebSocket clients.
Connection Layer: AWS API Gateway WebSocket API handles the persistent connections. Three Lambda route handlers manage the lifecycle: $connect for authentication and connection registration, $disconnect for cleanup, and sendMessage for client-to-server communication (subscription management, acknowledgments). DynamoDB stores connection state with a composite key design.
Delivery Layer: The Kinesis consumer Lambda queries DynamoDB for all connections subscribed to the relevant patient, then calls the API Gateway Management API's postToConnection to push data to each client.
This architecture eliminated every single server we would have needed to manage. No EC2 instances, no ECS tasks, no NAT gateways for a container cluster, no Auto Scaling Groups to tune.
Technical Challenges and Solutions
Connection State Management at Scale
The DynamoDB table design was critical. We used a composite primary key with connectionId as the partition key for direct lookups during disconnect cleanup, and a GSI with patientId as the partition key and connectionId as the sort key for fan-out queries. A second GSI keyed on userId supported administrative features like "disconnect all sessions for this user."
Each connection record stored:
connectionId(from API Gateway)userId(from the authenticated JWT)subscribedPatients(a string set of patient IDs this connection is monitoring)connectedAt(ISO timestamp)lastPingAt(for stale connection detection)unitId(hospital unit, used for bulk subscription patterns)
We initially made the mistake of storing subscriptions as a single record per connection with a set attribute for patient IDs. Fan-out queries required scanning every connection and filtering by patient. We restructured to a separate subscription table — one item per connection-patient pair — which turned fan-out into a simple query on the patient GSI. This dropped our P99 fan-out query time from 180ms to 8ms.
HIPAA-Compliant WebSocket Authentication
Standard WebSocket connections do not support custom headers during the handshake in browser environments. You cannot send an Authorization header from the browser's WebSocket constructor. This is a well-known limitation that makes WebSocket auth genuinely tricky in healthcare contexts where you cannot cut corners.
Our solution used the $connect route's query string parameters. The client included a short-lived, single-use connection token as a query parameter: wss://ws.example.com?token=<jwt>. The $connect Lambda handler validated this token against Cognito, verified the user's role included patient-monitoring permissions, checked that the token had not been used before (a DynamoDB conditional put on a token-use table with a 5-minute TTL), and only then allowed the connection to establish. If validation failed, the Lambda returned a 401, and API Gateway rejected the upgrade.
All WebSocket connections were encrypted via WSS (TLS), and we enabled API Gateway's access logging to CloudWatch with a custom log format that captured connection IDs, source IPs, and route keys — but explicitly excluded message payloads to prevent PHI from landing in logs. The DynamoDB tables holding connection state were encrypted at rest with customer-managed KMS keys, and the Kinesis stream used server-side encryption.
Graceful Reconnection Handling
Network interruptions are inevitable in hospital environments. Wi-Fi handoffs between access points, momentary network congestion, device sleep states — connections drop. Our reconnection strategy had to be invisible to clinical staff.
The client-side implementation used an exponential backoff with jitter, starting at 500ms and capping at 30 seconds. On reconnect, the client sent a resubscribe message containing its previous subscription set and the timestamp of the last received update. The server-side handler restored all subscriptions atomically and performed a "catch-up" query against a DynamoDB vitals-cache table (TTL of 5 minutes) to push any readings the client missed during the disconnection window.
We also implemented a server-side heartbeat. A scheduled Lambda ran every 60 seconds, querying for connections where lastPingAt was older than 90 seconds. For each stale connection, it attempted a zero-byte postToConnection call. If API Gateway returned a 410 (Gone), we cleaned up the connection record. This prevented phantom connections from accumulating and wasting fan-out cycles.
Lambda Cold Starts on WebSocket Routes
Cold starts on the $connect route directly impacted the user experience — a clinician opening their dashboard would stare at a "Connecting..." indicator for 3–5 seconds during a cold start. Unacceptable.
We used provisioned concurrency on the $connect handler (set to 5, covering our peak concurrent connection rate). The $disconnect and message handlers were less latency-sensitive, so we left them with on-demand scaling. For the Kinesis consumer Lambda — the most critical path — we allocated 512MB of memory (which proportionally increases CPU) and kept the initialization code minimal: DynamoDB client instantiation only, no database connection pools, no heavy SDK imports.
We also restructured the Kinesis consumer to lazy-initialize the API Gateway Management API client on first invocation rather than at module load. Combined with the Kinesis event source mapping's 1-second batching window, this meant the consumer Lambda stayed warm continuously during active monitoring periods.
Cost Optimization for High-Frequency Data
Naively fanning out every single vitals reading to every connected client would have been expensive. At 800 connections, 200 patients, and one reading per second per patient, that is 200 postToConnection calls per second just for fan-out — roughly 518 million API Gateway management API calls per month.
We implemented three optimizations:
Client-side throttling: The dashboard rendered at 1 FPS for non-critical vitals (temperature, blood pressure). The server tagged each message with a priority level, and the client discarded low-priority updates that arrived within the rendering window.
Server-side batching: Instead of sending one message per vital per patient, the Kinesis consumer aggregated readings within its batch window and sent a single payload per patient containing all vitals from that window. This reduced fan-out calls by roughly 60%.
Differential updates: After the initial full-state push on subscription, subsequent messages contained only changed values. A heart rate holding steady at 72 BPM did not generate a message. This cut message volume by another 40% during stable patient states.
After these optimizations, our actual API Gateway management API usage landed at approximately 45 million calls per month during peak periods.
Performance Results
After load testing with simulated device data and concurrent WebSocket connections:
- Median end-to-end latency (device reading to dashboard render): 380ms
- P95 latency: 620ms
- P99 latency: 940ms (within the sub-second requirement)
- Maximum sustained concurrent connections: 1,200 (tested), with API Gateway supporting up to 500,000
- Reconnection recovery time: 1.2 seconds median (including catch-up data delivery)
- Monthly cost at 800 peak connections: approximately $285/month for the entire WebSocket infrastructure (API Gateway, Lambda, DynamoDB on-demand)
- Cost per connection-hour: roughly $0.012
For comparison, the equivalent EC2-based deployment (two c5.xlarge instances in an Auto Scaling Group behind a Network Load Balancer, running a Node.js Socket.io server) would have cost approximately $380/month — but that is a floor cost paid regardless of usage. At 3 AM with 50 connections, the serverless cost dropped to near zero. The EC2 cost stayed at $380.
Lessons Learned
What we would do differently: We should have implemented the per-patient subscription table design from day one instead of iterating into it. The initial "subscriptions as a set attribute" approach worked fine in development with 5 connections but fell apart immediately under load testing. Data modeling for access patterns first — a core DynamoDB principle — would have saved us two days of restructuring.
We also underestimated the importance of structured observability. Adding correlation IDs that traced a single vitals reading from IoT Core through Kinesis, through the consumer Lambda, through the fan-out calls, and into client-side receipt acknowledgments was retrofitted and painful. Bake this in from the start.
What worked surprisingly well: API Gateway WebSocket APIs were more robust than we expected. We had concerns about connection limits, message size constraints (128KB turned out to be plenty for vitals data), and idle connection timeouts (the 10-minute default was extended to 30 minutes via a keep-alive ping). The managed service handled TLS termination, connection tracking, and route dispatch without a single operational issue across three months of production use.
DynamoDB on-demand capacity mode was also a perfect fit. The access patterns were spiky and unpredictable — shift changes caused connection surges, quiet periods dropped to near-zero. We never had to think about capacity planning.
Why Serverless Was the Right Call
The temptation with WebSockets is to reach for what you know: an EC2 instance running Socket.io or a containerized service on ECS. We evaluated that path seriously. Here is why we chose against it:
Operational overhead: A stateful WebSocket server requires sticky sessions, health checks that understand connection draining, graceful shutdown handling during deployments, and careful memory management as connection counts grow. With API Gateway WebSocket APIs, AWS manages all of that.
Scaling granularity: EC2 Auto Scaling operates on instance granularity. You are either running a whole server or you are not. Serverless scales per-connection and per-message. For a workload that fluctuates 16x between night shift and peak hours, this granularity matters financially.
HIPAA compliance surface area: Every self-managed server is a server you need to patch, harden, audit, and include in your BAA scope. Replacing those servers with managed services (API Gateway, Lambda, DynamoDB — all HIPAA-eligible) dramatically reduced the compliance burden.
Deployment confidence: Deploying a new version of a stateful WebSocket server requires connection draining, rolling updates, and careful orchestration to avoid dropping active clinical sessions. Deploying a new Lambda function version is atomic and instant, with no impact on existing connections (they continue routing to the previous version until they naturally disconnect).
The serverless approach was not without friction — the DynamoDB connection management layer is code we would not have written with Socket.io's built-in rooms, and the fan-out pattern required deliberate optimization. But the operational simplicity and cost profile made it the clear winner for a healthcare application where reliability is non-negotiable and usage patterns are inherently variable.