Availability

An overview of availability as a core non-functional requirement: what it is, what it isn’t, and the trade-offs behind common availability techniques.


Context

Availability is a system’s ability to provide uninterrupted service to users. The higher the availability, the more reliable and resilient the system is to failures and disruptions, and there is less downtime for users.


Common Benchmarks

Availability can be measured using various metrics, e.g. uptime percentage, mean time between failures (MTBF), and mean time to recovery (MTTR). It is often expressed with 9s (e.g., 99.9% availability means 0.1% downtime per year).

Availability %Downtime Per YearDowntime Per MonthDowntime Per Day
99.98.77 hours43.8 minutes1.44 minutes
99.9952.6 minutes4.38 minutes8.64 seconds
99.9995.26 minutes26.3 seconds864 milliseconds

Asynchronous Communication

Asynchronous communication is a communication pattern that improves a system’s availability. It allows for decoupling of components, reducing the impact of failures and improving overall system resilience. Services can communicate through message queues, event buses, or other asynchronous communication mechanisms instead of relying on direct HTTP calls.

Synchronous communication should be used when direct response is necessary and strong consistency and/or low latency are key non-functional requirements.