Don’t stress and setup your hybrid failover

Reading Time: 6 minutes

Supporting and maintaining an old and inefficient websites bring us a lot of problems and stress. Especially when a website was started a long time ago in already pretty outdated technology and architecture, but somehow it succeeded and can’t handle its own popularity. What a coincidence, I have the website exactly like this one! And recently I’ve managed to get rid of one stress point, at least for now. All this thanks to benefits of one particular public cloud service.

Get a life

Source: giphy.com

If you struggle with constantly failing services, have a lot on your head to take care of and in the meantime you have to revive your services a few times a week – this post is for you.

By “Get a life” I mean, take it off your head. Don’t spend another hour in the middle of a night or on vacations with your family on reviving the database. Don’t loose your hair because of the stress it causes. Be free and get a life.

Too popular to grow

I already mentioned that I own the website that fits exactly to the description above. I’ve also once or twice mentioned it in one of my posts before about react and typescript.

Source: giphy.com

Too popular to grow, what is he talking about? – you might ask. Why a popular service can’t grow, if it’s popular and probably making money?

Meet Bob

Don’t judge, hear me out. Imagine a guy – let’s call him Bob.

A few years back Bob created a simple website where he wanted to put some posts and news about a topic he’s interested in. During the years the website became one of the most popular websites in its area. Nobody expected it, but who would complain?

Now the website has a few people writing articles, a few partnerships, advertisments etc. It’s ready to be monetized more seriously. The problem is that the current traffic is a bit overwhelming for this little, old website. And it’s still growing.

Bob adds a new article that goes viral. The website is under siege. It’s starting to slow down. The real bottleneck is one, small database. It’s starting to clog up the processor time.

Suddenly – this happens.

And that would be it, viral post is wasted.

Source: https://vignette.wikia.nocookie.net/satireknight/images/5/5e/Nooo.gif/revision/latest?cb=20141107070850

Bob tries to save it as soon as possible, he does some magic with the database, restart a few processes and he kills some other. The website is saved again, back to work.

Next day Bob goes for a country trip with his family and suddenly a phone call.

Source: giphy.com

Bob, we have another viral and website just went down.

And Bob sits in the middle of a field with his smartphone. Crying and trying to connect to the server and fix it.

Imagine what Bob feels about expanding his website to the new oppurtinities. Well, he doesn’t feel safe with his website going down on a regular basis.

What Bob decided to do first? He will rewrite whole website from scratch as a scalable cloud application.

Great, but that will take him months or even longer – he has other responsbilities too.

Could Bob have a little of peace of mind during that process? How to be sure that he will not have to drop everything and run to save the website again?

How to get a peace of mind?

Source: giphy.com

There are many possible solutions that can help to keep website more or less alive until a new rewritten website emerges.

If the database is a bottleneck and the main reason of website crashes – one very obvious thing  left to do would be (apart from caching, indexing, installing elastic search – already tried all of that):

  • creating a DB cluster on the server with a good load balancing – pretty solid solution, but for a guy like me who is more dev than ops it would take plenty of time, plenty of energy. And I am not even sure if I would do that right. It is a good solution, but not if you want to spend time on recreating the website, instead of playing around with the solutions you’ve never done alone.

The easier solutions involve using a public cloud. As a programmer I like to limit my time spend on configuring and setting up infrastructure. I love many solutions that abstract that layer away from me. One of those solutions is a public cloud – Microsoft Azure. What exactly? Let’s see.

  • Moving a whole website with the underlying architecture to Azure – very nice, very nice – but the migration would take a lot of time, would require a lot of testing and would increase costs -the website may be popular, but it’s before the real monetization. It would require remodeling of the business model instantly. Again the solution is good and worth considering. But I chose something else.
  • Hybrid solution – cloning the database, keep master server on on-premise server and slave on Azure. Configure replication and load balancing between them – seems pretty temporal solution, but it has some serious (in my opinion) advantages. It’s cheap, it’s quick and it works. I get some peace of mind and some time to develop the real solution. Pretty sweet!

Hybrid DB Load Balancing – wut?

I chosed the quickest and easiest solutions I found. Why? I was desperate to get it out of head, create a fail safe, while I do other stuff. Or when I’m even not around for a couple of days to save the website.

First thing you would want to do is to create a new database, export scheme and data from your existing DB and import it into new one.

After you’re done with that set up the replication between those two DB servers.

This article focus is not around creating transactional replication, also I don’t even know which database you use, so let me direct you to some decent tutorials:

Alright! So now that we have a working replication, we can get to something more interesting.

There are couple services in Azure that can act as Load Balancers/Traffic Handlers:

  • Load Balancer
  • Application Gateway
  • Traffic Manager

Load Balancer

Load Balancer is a very good PaaS, I’m using it with plenty of my Azure resources and I like it very much. I haven’t had much problems with this solution. It works on a layer 4 of network of OSI model.

Except it’s going to do the magic only for resources present in Azure, under one of your subscriptions. It just won’t work for hybrid solutions.

That’s a shame.

Application Gateway

Also a good solution, but not really a load balancing solution. More like a reverse proxy. Also it will work only for HTTP and HTTPS protocols, so no chance you can do any proxy on any SQL port. It works on application layer (7) of OSI model, so it is a bit high level for my problem.

Well, at this point I was pretty disappointed. I thought I will be able to just click out of my problem on my magic Azure.

But I found one more possibility! And it can save a day.

Traffic Manager

Not so powerfull as Load Balancer, but not so limited as Application Gateway. It’s a pretty simple and basic failover solution/basic load balancing. It works in transport network layer exactly as Load Balancer and I can manage traffic on any TCP port!

This looks like a solution, right?

Let’s manage some traffic

The first thing you see in the Create Traffic Manager profile window is name and routing method. Now – routing method is pretty important. It’s how Traffic Manager will choose which endpoint to use.

Performance – if you want to choose endpoint basing on latency and responsiveness you want to go with this one

Priority  – If you have one main endpoint that you want to use and its clones you want to use as a failover – that’s your solution.

Geographical – pretty self explanatory. If your users should be serve with the endpoint closest to them, take this one.

Weighted – That’s an interesting one. If you want to distrubute your traffic proportionally to some predefined weights (e. g. 60% to server1, and 40% to server2), this would be a good solution.

I went with priority, it solves my problem in the best way. So I basically chose a failover over load balancing for now, when my master DB fails, traffic manager will switch to the slave server and I will have all the time in the world to fix master DB.

For Traffic Manager to control health of the endpoints we need to configure to listen on a proper port.

One last thing is to add actual endpoints – in my case those would be SQL ports for master and slave servers.

Go the the endpoints section.

And add your endpoints.  You will see that you can either use any of Azure PaaS services endpoints or just an external endpoint. Which in our hybrid case is just perfect.

Choose a priority 1 for master server and 2 for a slave server. Save.

Now you can just take the URL of the traffic manager and use it as an URL of DB host. Traffic manager should work now!

Does that work?

If you want to test if that works as expected, you can do what I did. Just turn off the master server and see if the connection still works.

NOTE: It will switch to another endpoint after certain amount of time you set in the Traffic Manager configuration.

What did I gain?

Remember I told you it’s all about saving time and nerves? It really is. I’ve used this solution for a few weeks now and I feel a lot better.

I don’t get night texts about server fails, have more time to focus on creating a new website and new scalable infrastructure.

If you have that problem, don’t bother with this stressful server nightwatch, get your own automated guard. Let it work for you. Until you come up with something better.

One Reply to “Don’t stress and setup your hybrid failover”

  1. Pingback: dotnetomaniak.pl

Leave a Reply

Your email address will not be published. Required fields are marked *