Redundancy

In the context of a distributed software, redundancy refers to the duplication of critical components or functions of the system. Adding redundancy reduces single points of failure, thus increasing reliability and availability. If one part of the system fails, another can take over its tasks, minimizing downtime and preventing data loss.

Redundancy in distributed software can be applied to data, program code, hardware, and network components:

Data redundancy: Storing copies of the same data on multiple nodes.
Software redundancy: Deploying multiple instances of the same application.
Hardware redundancy: Using additional hardware components, such as servers.
Network redundancy: Implementing multiple network paths and connections.

Redundancy is an important quality attribute in distributed software that requires high availability and resilience. Examples of systems that have high levels of redundancy include data centres and the cloud services that are served from them.

Isolation is a complementary design principle. This is about limiting the impact of failures so that they don’t affect the entire system. For example, if the primary node for a service fails, and if its failovers fail too, the rest of the system should continue to operate (albeit with reduced functionality).