Testing the UI as a service

I’m going to touch on a few things we are doing around the UI testing of our websites, tracking features, perceptual difference testing, continuous delivery…. brace yourselves.

Traditional UI Testing

We wanted to get away from the traditional method of executing our automated UI tests against a complete application stack, avoiding some of the well known complexities this approach introduces:

Managing test data

Getting tests to setup data sets as they execute or seeding persistence stores with test data and re baselining.

Managing test environments

Complete application stack needs to be available at the correct versions.

Test instability and execution speed

As interactions are made with other applications, over protocols like http, this introduces more potential points of failure, as well as adding latency to the test execution.

What did we want from UI Testing

While discussing how to avoid the above complexities, a few other critical requirements were also identified:

Tests had to map to feature requirements
Tests had to execute in parallel
Tests had to be deterministic and consistent
All tests (100’s, potentially 1000’s) had to execute as part of a continuous delivery pipeline
Any changes in the presentation of GUI should be captured

So, here is where we currently are in our attempts to achieve this:

Stubbing the GUI’s backend

We wanted to treat the UI just like any other service, and use stubs to exercise the functionality of the service.

Our webapps are written using Angular and we found a good tool for stubbing called Angular Multimocks, that allowed us to stub the responses from the backend.

This means the application behaves in a consistent manner and responds as quick as possible.

Interacting with the GUI

Like a lot of other people we use Selenium and we host a grid in AWS with various nodes for the core OS and browser combinations we need to support.

We also have a grid hosted on-site with various mobile and iOS devices attached, for testing the UI on physical devices that we cannot host in AWS.

Mapping tests to features

Again we went with a popular tool, Cucumber.

Here we map the scenarios identified in the feature files to the scenarios in Angular Multimocks, meaning each test scenario maps to a set of predetermined backend responses.

This also forced us to get more intimate with our Product Owner, ensuring both parties are involved in the creation of scenarios to ease the automation of them.

The features are stored alongside the application code in source control.

We had to do some work to get the scenarios executing in parallel, rather than the existing plugins that support parallelism across features.

Capturing presentation changes

Using an api we knocked up to write/retrieve images to/from S3, we upload various screenshots captured during the UI test execution, which we are then able to compare to a set of ‘master’ screenshots.

The actual comparisons are performed using an adapted version of a perceptual diff tool.

This means we are able to automatically compare screenshots taken during the UI testing of a PR (we use github), against the master version of the screenshots and write any difference into the PR for review, which looks something like this:

Screenshot difference in a Pull Request

The above screenshot difference was an unexpected CSS change, in a shared library, making some text bold. This, although probably not that important, shows how powerful this is at capturing stuff a human would most likely miss.

And the result

Using the above we execute the UI tests in a couple of pipelines:

Pull Request Pipeline

For each PR created in gituhb, the UI test execution fits into the pipeline as follows:

Application build, unit test, integration test, coverage, etc
PR updated with result
Stubbed version of application deployed to S3
UI tests executed in parallel on selenium grid against application hosted in S3
PR updated with result
The screenshots generated during execution of UI tests are compared against master screenshots
PR updated with any screenshots differences
Team informed of new PR to review

Master Commit Pipeline

For each commit to master in gituhb, the UI test execution fits into the pipeline as follows:

Application build, unit test, integration test, coverage, etc
-
-
Stubbed version of application deployed to S3
UI tests executed in parallel on selenium grid against application hosted in S3
Dashboard updated with result
Master screenshots updated
-
-
Pre-production End 2 End Testing
-
-
Production

Conclusions

Lessons

A few things that have tripped us up along the way:

Animations make this sort of testing unreliable, so the service UI tests are executed with the animations within the app disabled.
For the perceptual difference testing you need the images to be captured by the same OS and browser versions, otherwise slight differences result in false negatives.

The good

We have created a platform for executing UI tests in a fast repeatable manner, within a continuous delivery pipeline.

The feedback loop for UI failures, functional or presentation is reduced. The results are readily available in PRs and on our dashboards.

Test features are determined with the business and the tests are understandable by anybody.

The not so sure

Obviously many features require the combination of services, meaning the features are split between components, tracking this is difficult.

Where does traditional end 2 end testing sit, is there any need for it, a smoke test, we are still working this out. At the moment we are still executing a small number of test, exercising some critical customer journeys.

Stuff still to do

Master screenshots should not really be updated automatically by the commit pipeline, we should have some promotion process.

Follow up

Some of the implementations of the above may be worth posts in their own right, let us know if there is any interest and we can do some follow up posts.