Vertical or horizontal? That is the question… this post will outline the differences between these two types of scalability strategies, when considering the architecture of your application.
These nuggets of information were extracted from Martin Kleppmann book on “Designing Data Intensive Applications”, this is a highly recommend read for those whom want to learn more about building enterprise systems, that is primarily handling significant quantities of data at scale.
Since reading this, I have extracted Martin Kleppann’s thoughts/findings into notes, that I have found useful when considering how to scale an application.
Scaling up (vertical scaling)
Adding resource to a machine to scale…
- Can cost significantly more to scale
- Utilise shared memory architecture
- Disks can be replaced without shutting down a machine and even CPUs
- Stuck to one location
- Twice the size does not mean it can handle twice the load
Shared nothing architecture (horizontal scaling)
Adding more machines to scale…
- Can use disk, memory, CPUs independently
- The co-ordination of nodes (virtual machines) is done on a software level on a conventional network
- Flexibility to choose what machine has the best price to performance ratio
- Ability to protect against losing an entire data centre
- Multi-region distributed architecture is achievable with this
Conclusion
So from what has been highlighted, between these two types of scaling strategies, the most popular solution today is horizontal scaling, whilst vertical scaling is considered a later strategy, but still widely adopted today due to it simplicity, and not all systems require to be multi regional scalable or have a future to support ongoing growth. Although, if the system is set for a future to grow gradually or exponentially you may find horizontal scaling at present to be the most durable architecture.
For more of a in depth reading on this, I recommend these links:
📚 Further Reading & Related Topics
If you’re exploring horizontal vs. vertical scaling in distributed systems, these related articles will provide deeper insights:
• How Does Partitioning Work When Requests Are Being Routed? – Learn how partitioning strategies impact scalability and workload distribution in horizontally scaled architectures.
• Latency Optimization Techniques: Unlocking Performance with Lock-Free Programming, Memory Barriers, and Efficient Data Structures – Explore performance optimization techniques that can enhance both horizontal and vertical scaling approaches.









Leave a comment