Replication
Replication, also known as data redundancy, is the process of creating and maintaining identical copies of data across multiple servers or nodes.
Together with sharding, replication is a key technique for scaling databases. Sharding splits data across multiple servers, while replication duplicates data across multiple servers.
Data replication becomes necessary the more that data is distributed across multiple servers, for example in service-oriented or microservices architectures.
In a monolithic architecture, data is typically stored in a single database instance, and joins are typically used to query relational data across multiple tables.
But in a distributed software, those joins turn into network calls. This increases [latency], decreasing performance and having implications for the resilience of the system.
Data replication is one possible solution to these problems. Simply, data is replicated to wherever it is needed, so that discrete services are less reliant upon each other. Services therefore work faster by needing to use only local data.
Besides the performance benefits of reducing network calls, data replication also improves the fault tolerance and availability of distributed software, due to reduced dependencies between services.
Data replication is a useful strategy to support the [re-architecture] of system designs. For example, in the transition from monolithic services to more microservices, data replication can help to reduce the need for [scheduled downtime] while [data migrations] are run.
Trade-offs are made in terms of data consistency, and the increased system complexity required to manage the data replication and synchronization processes. Storage costs also increase.
Data synchronization can be achieved using either [batch processes] or [event streams]. Some database systems also provide built-in support for data replication across multiple nodes.
Data need not be replicated in the form of its [source of truth]. [Materialized views] can be used to store pre-processed data that’s optimized for specific use cases, such as reporting.