Scalable Systems: System Design Case Study

Stage 1: The Database choice!

To start off I needed to think about where we will store the raw data. The data we were supposed to fetch was coming from third party events and it was very crooked. Crooked in a sense that it didn’t had the same structure for the same event triggered twice!. So, a very clear choice in this case to me was to choose a NoSQL database. And I chose elasticSearch for it.

Press enter or click to view image in full size

Created with Mermaid

Now, all of the raw data will be kept in elastic search and then it should be processed (formatted, indexed) as well right?

But then how do you do that for an event based system?

Let’s assume there is an event ‘A’ that is triggered and it gives us certain data and that same event ‘A’ occurred again while we w…

Stage 1: The Database choice!

Press enter or click to view image in full size

Created with Mermaid

Now, all of the raw data will be kept in elastic search and then it should be processed (formatted, indexed) as well right?

But then how do you do that for an event based system?

Let’s assume there is an event ‘A’ that is triggered and it gives us certain data and that same event ‘A’ occurred again while we were processing the data from its previous instance. So, think about it, in an asynchronous system which request will be fulfilled first? No idea right?! And we can’t make it synchronous because then the system will get timed out.

So, what do we do? What do we do when the supply of something is too fast but the processing is slow but we can’t keep waiting for it to finish as well?

Stage 2: The Queuing System: Amazon SQS!

So, we naturally arrived at the queuing solution for this. We introduced the AWS SQS queues. The idea was that with each event trigger the raw data will be passed through different queues and get processed in the background and our request won’t wait for it to get completed. The queuing will also maintain the order of updates in the data coming from the same event ‘A’ because a queue follows FIFO (First In First Out).

Press enter or click to view image in full size

Created with Mermaid

For the data in the queue, which might have failed to be processed correctly, we can maintain another queue. This data can then be retried automatically a few times and then a manual intervention would be needed. We can use AWS SQS DLQs (Dead letter queues) for this. You can set the retry number in the settings which defines how many times the process will be retried automatically before manual intervention is required. All of the data that failed even after the DLQ was then stored in Amazon Dynamo DB (the key -value based NoSQL DB). I’m not covering this thoroughly in this article but I thought to mention it here.

Stage 3: Micro-Services!

Also, there’s not just one single event from one service. There are multiple services having multiple events. So, for multiple services we will introduce different service for each on our end as well. Hence, a micro-services architecture.

Press enter or click to view image in full size

Created with Mermaid

So far, we have a DB to store our raw data, coming from some events, after some initial processing of course and we have queuing system to enable asynchronicity all while maintaining the order of the updates.

Now the next big step is that we need to calculate some metrics and we need to send it to the client when requested through API requests. Easy enough right? Just calculate those metrics, then save it (or not if you are just calculating those on the go every time client requests them) in the NoSQL DB and just send it over to the client side when requested.

But wait a second! There are a few other questions or concerns or points to ponder.

We can’t just save them in the same NoSQL DB (ElasticSearch in our case) or maybe we can but why do we want to keep different types of data in the same place.
A bigger question — who is requesting this data from the client side?? Users right?! And for what purpose?! There must be something else, some other purpose why users are requesting that data. So there are definitely two more entities that we need to have in our DB.

Do we really want to keep all of that in the same NoSQL DB where we are storing our raw data??

I mean we can but that will just crowd it unnecessarily plus these two new entities are well defined data because we can control what all information we want about these two entities so why do we need to keep it in NoSQL and not in clean and clear tabular format i.e. SQL relational DB like PostgreSQL?

This will solve two of our problems i.e. problem 1and 2mentioned above.

But we have one more problem, do we really want our client to interact with the same backend/server where we are handling the events?

Now pay attention to this because this is a game changer. We are going to have 2 backends! Yes you heard me right!

Stage 4: And then there were two backends!

We will keep one backend where we handle the events and their raw data.

And then there will be another backend, which will act as intermediary layer between the client and our main backend. And here we will keep a PostgreSQL DB to keep the Users data and the other entity as well which talked about previously.

So now the flow of the application is this:

Client requests intermediary backend for some data related to users and the other entity (unknown but we don’t need to know what that might be)
This intermediary backend will take this request and send it over to the main backend where we have kept processed data coming from various events.
This main backend will calculate metrics on the go and send it to intermediary backend.
The intermediary backend will store those metrics in its DB, the PostgreSQL and return the response to the user.

Press enter or click to view image in full size

Created with Mermaid

Now you must still be wondering what’s the use of having 2 backends because we could’ve easily just added PostgreSQL into the main backend and be done with it. Well, there are 2 reasons for having the 2nd backend:

To keep the API requests and the client facing data separate from the events.
This is a big one! — We don’t want our main backend service to run continuously all the time!

And now we introduce the server less system (using AWS lambda) into the picture!

Stage 5: The Serverless Architecture: AWS Lambdas

So, the data from events comes only when the events are triggered right? So why do we want to keep our service running the whole time and increase our costs? AWS lambda functions will only deploy the services when an event is triggered and otherwise they will run on minimal resources (costs and compute power will be reduced by a lot when there is no event).

And on the other hand our intermediary backend will be constantly up and running on an auto scalable instance of cloud like AWS ECS.

Phew! we finally have the design of our whole system ready!

Press enter or click to view image in full size

Created with Mermaid

Stage 1: The Database choice!

Stage 1: The Database choice!

Stage 2: The Queuing System: Amazon SQS!

Stage 3: Micro-Services!

Stage 4: And then there were two backends!

Stage 5: The Serverless Architecture: AWS Lambdas

Similar Posts