Horizontal scaling (aka. scaling out)
Horizontal scaling is a scalability technique that involves adding more servers to a system, rather than increasing the capacity of a single server (as in vertical scaling).
The aim is to increase [throughput] by distributing the load across multiple servers, rather than relying on a single server to handle all requests.
Horizontal scaling is often used in cloud computing environments, where it is easy to add or remove servers as needed, often dynamically ("auto-scaling").
To allow for horizontal scaling of applications themselves – so allowing traffic to be load balanced between multiple replicas of the essentially the same application – it is necessary to externalize as much state as possible. In practical terms, this means moving data from the local file system to external (shared) file systems, and the same for databases. Thus the compute resources are left stateless – pure compute, with no state management.