Road to GCP Part 1: Short History of Infra

This blog was originally released on our internal blog in February 2021, but since we think it could be useful to others, we decided to re-release it to the public.

Back in 2010, while we were working on the first release of Top Eleven (TE), we knew that it would be a backend-intensive game because of its high computation and database requirements. This basically meant we needed a lot of fast CPUs and fast Solid-state drives (SSDs). Back then, Amazon AWS was the only viable public cloud provider that had enough features and functionality to be considered an option. The problem was that the cost of renting virtual servers on AWS was over the roof, compared to renting bare metal dedicated servers. Even more importantly, AWS didn't offer SSDs with their servers, so we had no option but to choose the bare metal option.

Bear in mind that this was 2010. SSDs were still new and very expensive, and hardly any dedicated server providers offered them, so it made sense that cloud providers didn’t have them.

In the Top Eleven's early days, the game's backend was hosted on multiple big servers called Gameworlds (GW). At that time, we decided to use ServerLoft as our main dedicated server hosting provider to host GWs because they offered big juicy servers with SSDs, and they were cheap. In the beginning, players had to select their GW of choice because of which GWs had names back then, like Belgrade, Paris, London, etc. Later, we decided to make that whole process transparent for the users, so we created a new login service that we named Overlord. Overlord automatically decides on what GW you're playing. This is important because, for some strange reason, we decided to host our Overlord server in a different dedicated server provider called Redstation.

In the second part of 2011, we had a big failure in the ServerLoft data center, which powered off all of our GWs, and it took us the whole night and half a day to get things back online. The problem was a power failure of the ventilation system. Even before this incident, we constantly had problems with ServerLoft's professionalism or lack thereof, so this was the last straw. We migrated everything to Redstation, which had zero problems the whole time. We rented 40+ servers on the same day, prepared everything, migrated all our GWs within a week, canceled all the servers on ServerLoft in a single day, and never turned back.

With Redstation, we became their biggest customer, and they were ready to do whatever we needed to keep us there and make us happy. We had a special setup with our own custom network infrastructure and racks. We could order anything we wanted for our servers. They were always very professional, and uptime was great.

We ran this dedicated server setup for many years, and during this time, we made many improvements in our infrastructure that empowered our developers to own their games fully. Here are just a few highlights of these improvements:

Automated everything using Ansible,
Implemented infrastructure as code,
Introduced continuous integration and deployment
Deployed central log collecting, monitoring, and tracing systems, and many more...

On top of that, for many years, we managed our own Hadoop cluster for our analytics platform. This Hadoop cluster and the large Top Eleven player base allowed us to have big data when “Big Data” was just a hype word by big corporations.

But, by far, the most significant change came in 2017 when we deployed our private cloud solution based on OpenNebula, an open-source private cloud platform. That's the moment when we started using virtual machines (VMs), and we started building services to be cloud-ready. OpenNebula opened many possibilities for our developers, which were impossible before. For example, devs could:

Spin up a new clean VM in seconds to do some development/testing instead of reusing some shared, dedicated server that had who knows what on it.
Didn't have to wait for the Infra team to order a dedicated server (which could take weeks) to spin up a new production service.
Create many different testing environments within minutes, which was almost impossible before.
Learned how to make cloud-ready, robust, and fault-tolerant services, and many more.

You could ask: "Why did we decide to use OpenNebula instead of going to public cloud providers like AWS, Azure, or GCP?" The answer to this question is complex, but in a nutshell:

It was a smaller, incremental change because, besides introducing VMs into our current infrastructure, nothing else changed from the devs' perspective.
This allowed us to evaluate whether VMs will improve the lives of our devs as we thought they would.
At that time, we still had that stubbornness that we needed to do everything ourselves and that we didn't want to be locked into any commercial platform.

I'm not proud of the last bullet, but it was there, we can't ignore it :)

Anyway, OpenNebula was a huge success and a much-needed refreshment. All the developers loved it from the start, exceeding our expectations. Everything we thought it could solve, it did. But the biggest takeaway was that we all saw how much faster and more flexible we can be. OpenNebula also allowed us to introduce our current service-oriented architecture, which we have been using on Top Eleven for many years. Also, I will never forget this comment by one of our devs in the blog post, 5 months after introducing OpenNebula:

You can read the second part of this series of articles, in which I covered how we decided to migrate to GCP and the reasoning behind it.

Subscribe to Nordeus Engineering

Subscribe to Nordeus Engineering