Availability

An overview of availability as a core non-functional requirement: what it is, what it isn’t, and the trade-offs behind common availability techniques.

Architecture

Context

Availability is a system’s ability to provide uninterrupted service to users. The higher the availability, the more reliable and resilient the system is to failures and disruptions, and there is less downtime for users.

Common Benchmarks

Availability can be measured using various metrics, e.g. uptime percentage, mean time between failures (MTBF), and mean time to recovery (MTTR). It is often expressed with 9s (e.g., 99.9% availability means 0.1% downtime per year).

Availability %	Downtime Per Year	Downtime Per Month	Downtime Per Day
99.9	8.77 hours	43.8 minutes	1.44 minutes
99.99	52.6 minutes	4.38 minutes	8.64 seconds
99.999	5.26 minutes	26.3 seconds	864 milliseconds

Asynchronous Communication

Asynchronous communication is a communication pattern that improves a system’s availability. It allows for decoupling of components, reducing the impact of failures and improving overall system resilience. Services can communicate through message queues, event buses, or other asynchronous communication mechanisms instead of relying on direct HTTP calls.

Synchronous communication should be used when direct response is necessary and strong consistency and/or low latency are key non-functional requirements.