Reducing Our Regression Testing by 80%: Screenshot Tests for Top Eleven

Top Eleven is a very complex game, with a high number of different screens and GUI elements. Doing regression testing by going through all these screens manually can take a while, and can be unreliable. That's why we decided to automate regression testing as much as possible, so that we can increase speed and quality.

These screens are only a fraction of all screens in Top Eleven

Our solution: automated Screenshot Tests.

Before we implemented Screenshot Tests, we were spending ~25 hours on manual regression testing before each release. Now, with Screenshot Tests, we have reduced it by over 80%, to around four hours.

How Do Screenshot Tests Help Us?

Our initial intention was that Screenshot Tests help us find GUI bugs, related to both rendering and interactions between GUI elements. However, as soon as we started using it, we discovered that they had additional benefits, for example:

They are a quick and easy way (especially for UI/UX designer) to check all graphical changes on the client after new functionality is implemented, and check whether the client looks better than before.
They are an easy way for an Art Director or UI/UX designer to track the evolution of certain screens in the game.
If a developer needs to check some functionality that is hard to access / mock-up, then the dev can do it easily by just running the appropriate test - reducing the need for making a bunch of code for the sake of it.

How It Works?

Tests are executed on real devices. Every few hours a build is created from the master branch and deployed to devices.

Screenshot Tests in action, on real devices

Rather than connecting to servers, tests are executed on the client, and we use the Unity Editor tool for recording responses. So, when the test is run, the client doesn’t connect to real server, but uses mocked data instead.

The concept is simple. A predefined sequence of clicking and dragging is automatically simulated. At certain points a screenshot is taken and compared with a reference image. If any difference is spotted, it's reported to us. If it was intentional, then we set a new image as the reference image. If it was unintentional, then we fix the bug. Simple.

For comparing screenshots, we are using a small, custom-made PHP server. It does the comparison itself, and is also used for displaying and managing screenshots.

How We're Using It on Top Eleven

Screenshot Tests on Top Eleven are testing every part of the client, performing basic actions.
Tests are running on four mobile devices. We plan to expand and cover more resolutions, operating systems, GPUs, CPUs etc, but even with only four devices we are detecting the vast majority of issues.
In each run ~550 screenshots are taken on each device, totalling more than 2200 screenshots per run! :)

Some Bugs Found on Top Eleven by Screenshot Tests So Far

Breaking bug in Tutorial - the user is unable to drag the player because the Mourinho hint pop-up is hiding it:

Hint suggests that user should move player, however the hint itself is hiding the player.

All players displayed using semi transparent renderers in Squad have nine stars, which is wrong. The first image is the reference image - how the Squad should look like. The second image shows how it actually looked due to a bug. Differences are marked with red and green colour and easy to spot in the third image.

How Squad actually looked like due to a bug

Differences are marked with red and green colour

Screenshot Tests are not only useful for finding technical bugs like these. They are also a very useful tool for checking what changed on our client after something was implemented. Let’s take a look at how one UI design issue was found using Screenshot Tests. In the first image we can see the Player pop-up before new Formations Improvements. In the second image we can see how it looked like after. The third image once again shows the differences marked in red:

Looking at these images we can see, for example, that instead of using the label “Foot: Right”, there is now a visual representation of the right foot coloured green. However we also noticed that the green arrow icon has changed, and now looks slightly distorted when seen on mobile devices:

If you take a closer look, you will notice that this arrow has a white outline. UI designer who created it thought that it will be used on a very dark background, in which case it would be clearly recognizable as arrow.

It turned out that this arrow is only one part of a whole batch of new icons, most of them used on wrong places by mistake. Our UI designer was super happy to see this flagged by Screenshot Tests and then fixed. Win! :)

What's The Future?

Our Screenshot Tests system is still a work in progress. First we need to make sure all tests are completely stable - executing and passing our own QA.

After that we will implement functionality that when a test fails, a notification is sent to all developers who have worked on it since its last test. This way we will further shift mindset of developers toward fully maintaining automation.

Despite some obvious advantages, working with real devices does also bring its downsides. Low battery, system pop-ups appearing, cable disconnecting, are a few of problems that can interrupt our tests. To rectify this, we're currently exploring solutions such as using Amazon AWS device farm.

We are planning to improve Screenshot Tests even further by using Optical Character Recognition. One test case would be to take a look at a player on the transfer market, and read his name by taking screenshot and parsing it as a string using OCR. Then the test will buy that player, find him in its Squad, and make sure his name is the same before and after purchase. This way we are moving even more toward creating automated tests that really represent the way that our users see and use our game.

With our Screenshot Tests, the ultimate aim is for us to check how the game is truly represented to the user. We are not checking some in-code details that may or may not impact the user - we are testing how the game actually looks to them. And that, at the end of the day, is the key.

Reducing Our Regression Testing by 80%: Screenshot Tests for Top Eleven

How Do Screenshot Tests Help Us?

How It Works?

How We're Using It on Top Eleven

Some Bugs Found on Top Eleven by Screenshot Tests So Far

What's The Future?

TeamCity Ansible Vault Integration

An Invite to Unity Hackweek 2018, Yes Please!

How Do Screenshot Tests Help Us?

How It Works?

How We're Using It on Top Eleven

Some Bugs Found on Top Eleven by Screenshot Tests So Far

What's The Future?

Subscribe to Nordeus Engineering

Subscribe to Nordeus Engineering