Scalability can be a confusing topic, because it is usually not defined in easy terms. If I were to characterize scalable system,
- The system should be able to accommodate increase in data
- The system should be able to accommodate increase in usage
- As the load increases on the system, the system still remains relatively accessible and maintainable.
It can be easy to confuse scalability with performance and these are two separate characteristics. A high performing system can quickly become non performing, if it cannot scale (the reverse is usually not true though). As the load increases on a system, we still want it to keep responding with a good (low) response time. This usually means that the hardware is using more resources to serve the request. How the hardware is provisioned depends on the landscape architecture. We can choose to either scale the hardware vertically (scale up) or horizontally (scale out). These are very different approaches but in a nutshell:
Vertical Scaling (scale up): Scaling is achieved by adding more hardware resources to an existing physical machine. Example, would be allocating more memory, adding more memory, more hard disk, additional CPUs, etc. When a hardware resource starts to run at capacity a bigger box is added to the mix. This hardware upgrade can continue to happen till a limit is reached. Therefore there is a physical limit to vertical scaling.
Horizontal Scaling (Scale out): We also add hardware to scale horizontally, except, horizontal scaling is achieved by adding machines in parallel to an existing machine. We can buy mid range machines and keep adding a new one as each runs out of resource. But of course the scalability will rarely improve proportionally and the TCO will also increase. There is networking supplies and setups required for each new machine, in addition to rack space, etc. Moreover, some of the resources might be very underutilized in horizontal scaling. For example, in a typical web application, the network I/O and memory might be bottlenecks. As we add more machines, we are also adding CPUs and hard disk, which are now underutilized.
As you can see, capacity planning for a server setup can become a thorny problem very soon and requires a systematic approach to design and scale the landscape, depending on the application. Another way to understand the issue can be a look at Amdhal’s law. This model explains what should be performance improvement we can expect by adding resources in parallel. P: proportion that is affected by the computation. S: Speedup A simple example: if an improvement can speed up 30% of the computation, P will be 0.3; if the improvement makes the portion affected twice as fast, S will be 2. The overall speedup will be: 1/(0.7 + 0.35) = 1/1.05 = 0.95