Scalability

An overview of scalability as a core non-functional requirement: what it is, what it isn’t, and the trade-offs behind common scaling techniques.


Context

Scalability is a system’s ability to handle increasing workload predictably and cost‑efficiently by adding resources and/or improving resource utilization.

“Workload” can mean different things depending on the system:

Scaling is the act of increasing capacity. The two classic approaches are:

Important: Scalability is related to, but not the same as, performance (how fast one node is) or elasticity (how quickly capacity adjusts).


Vertical Scaling (Scale Up)

Vertical scaling means upgrading a single machine (more CPU, RAM, faster disk, better NIC).

It’s often the simplest first step, but it has notable drawbacks:

Vertical scaling is still valuable when:


Horizontal Scaling (Scale Out)

Horizontal scaling means adding more nodes and distributing the load.

Key advantages:

However, horizontal scaling tends to move complexity into:


Stateless vs. Stateful Services

Stateless services

A service is stateless if any instance can handle any request because user/session state is not stored in the service’s memory.

Stateless services are typically straightforward to scale horizontally:

Stateful services

Stateful services keep critical state locally (in-memory session, on-disk data, in-process cache with correctness requirements). These are harder to scale because state must be:

In practice, the most difficult component to scale is often the database because it couples:

Common database scaling strategies include:


Load Balancing

A load balancer distributes incoming traffic across multiple nodes to:

Common implementations:

Layer 4 vs. Layer 7

Load balancers are commonly categorised by OSI layers:


Sticky Sessions (Session Affinity)

Sticky sessions mean the load balancer consistently routes a user’s requests to the same backend instance.

This can be useful when the application stores session state in-memory, but it has trade-offs:

Implementation approaches:

A more scalable alternative is to avoid stickiness by: