Availability
An overview of availability as a core non-functional requirement: what it is, what it isn’t, and the trade-offs behind common availability techniques.
Context
Availability is a system’s ability to provide uninterrupted service to users. The higher the availability, the more reliable and resilient the system is to failures and disruptions, and there is less downtime for users.
Common Benchmarks
Availability can be measured using various metrics, e.g. uptime percentage, mean time between failures (MTBF), and mean time to recovery (MTTR). It is often expressed with 9s (e.g., 99.9% availability means 0.1% downtime per year).
| Availability % | Downtime Per Year | Downtime Per Month | Downtime Per Day |
|---|---|---|---|
| 99.9 | 8.77 hours | 43.8 minutes | 1.44 minutes |
| 99.99 | 52.6 minutes | 4.38 minutes | 8.64 seconds |
| 99.999 | 5.26 minutes | 26.3 seconds | 864 milliseconds |
Asynchronous Communication
Asynchronous communication is a communication pattern that improves a system’s availability. It allows for decoupling of components, reducing the impact of failures and improving overall system resilience. Services can communicate through message queues, event buses, or other asynchronous communication mechanisms instead of relying on direct HTTP calls.
Synchronous communication should be used when direct response is necessary and strong consistency and/or low latency are key non-functional requirements.