Scalable Chat

An open-source chat application to demonstrate the stateful routing pattern for building scalable real-time applications. Live demo

This project means to serve as a helpful base or reference for anyone making a collaborative webapp, multiplayer game, stateful AI agent, or other session-based real-time application.

Features

Real-time chat rooms

Clients create new chat rooms or join existing ones by ID. Messages are exchanged in real-time over WebSocket connections. Simple username based authentication.

Horizontal scalability

Clients belonging to the same room always connect to the same WebSocket server instance. Additional WebSocket servers created as necessary. No message broker or communication between WebSocket servers required.

Room concurrency

Each WebSocket server can handle multiple room sessions concurrently. Default limits are set to 100 users per room and 10 rooms per server instance, with unlimited server instances.

Persistence (not included)

For the purposes of this sample application, messages are not persisted beyond the lifetime of the WebSocket server. In order to support persistence, the session server would need to save and hydrate room data from storage (e.g. S3, Redis, etc).

Architecture

Overview

This project consists of three main components:

Client - React single-page application
Backend Server - Express.js API server for authentication and room management
- Scheduler - Module inside the Backend Server for interfacing with the Session Server. This project includes two Scheduler implementations:
  1. StaticScheduler for statically defined session server instances (e.g. for local development)
  2. HathoraScheduler for dynamically created session server instances running on Hathora Cloud
Session Server - Node.js WebSocket server for real-time chat functionality

Deployment Topolgy

Client - Collection of static files deployed behind a CDN. This project deploys to AWS S3 + CloudFront
Backend Server - Stateless Docker container with multiple replicas deployed behind a load balancer. This project deploys to AWS ECS Fargate
Session Server - Stateful Docker container with instances spawned on-demand and direct container ingress. This project deploys to Hathora Cloud

Data Flow

Create New Room

The client requests the backend server for a new chat room session
The backend server authorizes the request and invokes the scheduler module, which allocates a room to a session server instance (spwaning a new instance if necessary)
The backend server responds with the allocated roomId

The client then proceeds with the Join Existing Room flow using the obtained roomId.

Join Existing Room

The client requests the backend server for the session server instance host corresponding to a roomId
The backend server queries the scheduler module and responds with the host (or null if not found)
The client establishes a bi-directional connection with the session server instance

Scheduler

The Scheduler module inside the backend server is the key component of this architecture, it’s what allocates rooms to session servers instances. It boils down to a simple interface:

interface Scheduler {
  // assigns a new room to a session server instance, and returns its roomId
  createRoom(): Promise<string>;
  // returns the session server host corresponding to a given roomId
  getRoomHost(roomId: string): Promise<string | null>;
}

This project comes with two implementations: StaticScheduler and HathoraScheduler.

StaticScheduler

This is the default scheduler when running the backend server. It takes a static list of session server hosts via the SESSION_SERVER_HOST env var (comma delimited list for multiple hosts). createRoom randomly assigns the roomId to one of the hosts and stores the mapping in an in-memory map, and getRoomHost simply does a map lookup.

So while the StaticScheduler fully implements the Scheduler interface, it operates on a static list of session servers which imposes some key limitations:

There's no way to add additional session server capacity on demand (no horizontal scaling)
Not tolerant to session server crashes (it will continue assigning rooms to crashed servers)
The room mapping is stored in memory, and thus can't be safely scaled with multiple backend server replicas

HathoraScheduler

Disclosure: I work on Hathora Cloud

This is the scalable, production ready scheduler which leverages the Hathora hosting platform. It's configured via the HATHORA_APP_ID and HATHORA_TOKEN env vars, and it interacts with the service using the Hathora Typescript SDK.

These are the main features of Hathora which make it an ideal hosting platform for this use case:

Direct container ingress: each running container instance gets a unique host+port to connect to.
Fast on-demand container boots: single API call to boot a new container instance in under 5 seconds
Room concurrency: Hathora is "room aware" and assigns rooms to existing containers up to a configurable number of rooms per container

Alternative schedulers

While there are only two scheduler implementations included in this project, more implementions could be added as long as they conform to the simple Scheduler interface. For example, it would be relatively straightforward to add a KubernetesScheduler implementation which creates a new pod for every room.

Running locally

Clone the repository

git clone https://github.com/hpx7/scalable-chat
cd scalable-chat

Start services

Each service should run in a different terminal tab. See individual instructions for client, backend-server, and session-server.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.github/workflows		.github/workflows
backend-server		backend-server
client		client
session-server		session-server
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scalable Chat

Features