Scaling web applications is not trivial.
But what are the reasons to scale? How can you scale?
This article will describe some reasons for scaling along with scaling strategies.
In this article I use the term web application - this is meant in the widest sense - it might be APIs, websites etc. - commonly the issue when we discuss scaling is that these web applications are somewhat non-trivial.
And what does that even mean?
As a rule of thumb there must be some kind of server rendering - with web pages that’ll be dynamic content, loaded from a database eg. - with APIs it’ll most of the response content will be dynamic.
Reasons to scale
The reason to scale is performance.
Generally speaking server rendering takes time - on the server - which will add to the time a request takes to complete. Check out the following image showing a single request to a website (screenshot from Chrome):
As you see in this image the two interresting parts are the green and blue bars. The blue bar is quite the largest - we can’t do a whole lot about that at this point - the only thing that will change that is to increase the size of the pipeline; boosting the connection speed.
The green bar is the time the browser spent waiting for the server. 159.26 ms is not that long. But given a lot of requests to generate a single page view will add up. At some point it gets unbearable.
Even more important - that is 150 ms time the server spent making the content - depending on the server the stress the server was under during this time was higher than idle. Let’s just assume that the server had one CPU core at 100% during that time - given a simple quad core server this would give a maximum of approximately 27 requests per second.
27 req/sec is a rough estimate that will probably be higher in reality - no doubt about that - but don’t be fooled! There is a maximum! At that point all requests start to suffer - all users will be affected! The limit might not be bound to CPU - the CPU is by far the fastest resource in computers - memory and drive access is way way slower and more scarce. Even network interfaces have limits in number of connections!
When such a limit is hit you have a reason to scale!
How to scale
When you have identified a problem that can be solved with scaling the question is how?!
Generally there is two types of scaling: Vertical and Horizontal!
Vertical scaling is means adding more resources to the server. This can be adding more RAM, disk space/speed, processing power etc..
Usually applications scale well with vertical scaling - depending on the application. If the scaling is done in an area where the application is using up all system resources it’ll be running smoother/faster after scaling vertically.
Scaling the server this way is mostly done without the applications knowlegde - it just have access to more resources when it runs - and as resources are commonly abstracted away by the operating system anyway.
Horizontal scaling is the practice of adding more servers to run the application. This in effect will add more resources, like with vertical scaling, but this can’t be accessed uniformly by the application.
Normally the application needs to somewhat understand that it is being scaled horizontally. If an application is scaled to multiple servers behind a load balancer there can’t be any state shared between requests - or if there is this state needs to be shared between the servers - which in turn might introduce the next bottleneck for the application!
Scaling in live environments
I’m mainly focused around cloud and virtual servers when it comes to scaling.
Most cloud providers (like Amazon, Google, Microsoft, Digital Ocean, Rackspace etc.) offer vertical scaling for their virtual servers. In many instances it’s simply just toggling a slicer to select the size/amount of resources dedicated to the instance. This is one of the reasons these setups are insanely popular - when building an application it is very easy (and cost effective) to scale it like this.
It’s is way more complicated to scale horizontally!
First of all the application in question must not have any state between requests - and if it does this state needs to be synchronized between instances/servers - this is almost certanly not a trivial task. Many load balancers will help by keeping clients bound to one instance eg. - this binding is normally done by using a cookie or the IP address of the client. This puts requirements on the client - it has to maintain the same IP address and/or support cookies - both can prove to be difficult. Locking clients to servers will also prevent the utilization of the horizontal scaling
Secondly spinning up servers, adding to load balancing, starting the application etc. might not be a trivial task - I’m mainly thinking of deployment that require credentials, shared resources etc.. Fortunately many of the cloud providers mentioned before have a system for auto scaling (hopefully both up and down) servers behind a load balancer - and if your application fit into the requirements of these services you are blessed and have a much clearer path ahead with horizontal scaling. It is very easy to move a performance problem in the application to a database eg. using horizontal scaling - that’ll leave you with the problem of having to scale the database etc. as well.
Scaling with containers eg.
Services like Google App Engine, Heroku and tools like Kubernetes and Docker makes it somewhat easy to either package an application inside a container and have it deployed in a scalable manner - or to simply deploy the code of the application and have it run in a scalable environment managed totally by someone else.
Having deployed and scaled application - both vertically on horizontally - using the mentioned providers and tools - and I must say that using containers or even fully managed environments like Google App Engine.