
Friend, can we agree that tests are a good idea? I won't scorn you for sometimes omitting them - time and budget constraints are what they are, and even the best intentioned of us sometimes have to just give our projects a lick and a promise. "Proper test coverage soon", you sweetly croon as you rock it to sleep, the knowledge that you're telling a dark, terrible lie twisting you up inside. Maybe you could just scrape enough budget together for some simple unit tests? Then, at least, you'd have "tests", right?
Well, yes, but also no. Here at A+L, we often receive broken birds from prospective clients - projects from other developers that may never have worked properly, and certainly don't do so now. We offer an in-depth Code Analysis service, where we go through your project file-by-file, line-by-line, and work out why things are so slow, badly performing, buggy and broken in your codebase, and offer recommendations for how to remedy these issues. One of the things we often see in the worst of these projects is no testing at all; but the second most common theme there is nominal testing - the tests exist, they can be run, but they don't actually test anything meaningful.
There are many flavours of testing. Unit and functional tests have their roles to play, and a comprehensive suite of testing will include them; but if you're looking for the most bang-for-your-buck, particularly on a web-based project, then you should be considering end-to-end testing.
I'm Lost: What is E2E Testing?
At its core, end-to-end testing involves automating interaction with an application in a manner identical to that of its intended end user, in such a way that every element of the application involved in normal usage can be exercised and verified by the testing.
For example, e2e testing for a web application would involve loading the site in a web browser (ideally the same browser and version(s) expected to be used by intended end users), logging in using the same interface an end user would, and exercising the various areas and functionality in the web browser by clicking, scrolling, tapping, double clicking, etc.
This is accomplished by controlling (aka driving) a browser through an interface that allows programmatic control of its operations, and offers the ability to draw structured data back out of the browser to confirm its state. This allows a test suite to direct a browser to connect to a certain URL, wait for the browser to fire the appropriate events indicating that it has successfully loaded the page to the point where certain DOM elements are in place and interactable, trigger events and inputs against these elements, and receive the results of these interactions from the browser. Anything a human user could do with the browser can be thus automated and tested.
Because the tests exercise the same functionality in the same way that an end user would do so, within the same environment and through the same interface, this approach to testing is more holistic than other approaches such as unit or functional testing (although it can be beneficially mixed with such tests in some cases). The application is at once tested from the client UI down to the server, database and related services. This is what lends end-to-end testing its name and value.
This paradigm appears most often in web development, but can be applied to other arenas of software development where the appropriate tooling exists.
Sounds Easy
Have I convinced you this might be useful? Ready to start throwing some tests together? You've heard there's some automated tools to track clicks and navigation and such, you can get some juniors on this and get it done, right?
Let's back up a moment. E2E Testing is software development. Not just that it tests software development, but that the tests themselves are software, and require the same level of attention to detail, careful planning and thoroughness in implementation as the software that it is testing.
Well-crafted tests return their time investment manifold in the improved quality of the associated software; poor tests become a drag on project velocity without producing measurable benefits. It is very easy to create hideous spaghetti tests that are slow, brittle and difficult to maintain or expand upon; it requires skill and practice to create tests that are maintainable, expandable, fast and robust.
Yes, there are automated tools that can help you throw something together quickly; no, you shouldn't rely on them, at least not exclusively - using them as a starting point may help expedite things, so long as you're prepared to refactor the often repetitive and loosely structured code generated by such tools into something robust and maintainable for your project.
Not to dissuade you from embarking on your e2e testing adventures, but it shouldn't be an afterthought! You'll need to budget for it, plan for it, and structure it, the same way as any other feature of your codebase. Do otherwise, and you'll likely regret trying to integrate it at all. This leads to a third often seen issue with broken bird codebases brought to us from other developers - the tests started off strong, but quickly spiraled into messy spaghetti code that became brittle, difficult to maintain, and a drag on the project, until the tests were simply abandoned and left to rot.
The Ends Justify the Tests
Alright, so now you've soberly considered both the need for tests, and the structure and budget that tests will require, and you have your ducks in a row. Now you just need the tools, and to figure out how to use them.
I recommend Playwright as the current best in class for browser automation and e2e testing. It has all the functionality that was once the exclusive property of a tight binding with the devtools protocol, as found in Puppeteer, without requiring the use of node.js; instead, there are language bindings for many of the languages commonly used in web development - JS, .Net, Java, and Python (both sync and async, although I would recommend use of the async API exclusively - more on that in a moment).
Automated testing works best in an environment where tests will be run independently of developers, and report on the results of the test runs in a way easily visible to the team and project manager. Continuous Integration (CI) is one way of achieving these goals. CI is pretty easy to achieve these days, whether you're using Github Actions, Gitlab CI/CD, AWS CodePipeline, etc. The exact setup of any of these is outside of the scope of this article, but may be a future topic - stay tuned for that.
In the meantime, wheter you have the means and motivation for CI or not, you can still take advantage of e2e testing. Let's assume you'll take my recommendation on Playwright, and discuss some SHALL and SHALL NOTs.
Playwright logically organizes tests on a per-file basis, and aims to run tests in parallel. This leads to various considerations:
- Tests MUST NOT be interdependent between test files, or else parallelism is impossible.
- Tests MUST NOT alter state in such a way as to prevent other tests from executing within a clean context. Tests MUST isolate the impact of alterations to application state to those areas which are the concerns of the relevant test suite. This may mean mocking out certain aspects, such as the database or some other service, in some cases. Tests SHOULD NOT mock out elements except as needed, and then MUST document what is mocked and why, with reasoning as to the impact on the tests included in the documentation.
- Tests within a file CAN be interdependent (serially reliant), but SHOULD NOT be - a test SHOULD NOT require that another test within the same file be run before it to set state. Instead, a test SHOULD rely on appropriate hooks to produce the needed state for the test. If for some reason making tests serial within a certain file is particularly desirable, this deviation MUST be documented and re-evaluated regularly.
- Tests SHOULD focus on testing just one thing. Related functionality SHOULD be grouped together in the same file, but NOT the same test.
E2E tests should be organized into their own directory, rather than living alongside source files - there is no reason to expect 1-to-1 mapping of test naming or areas of responsibility with those of application source files.
Besides global and test-specific fixtures or setup/teardown functions, functionality ancillary to specific tests should live in the same file as the tests which use that utility. Utilities that are useful globally should not be imported from other test files, but organized into a separate file for import. This can include things like mocks, user profiles, authentication and authorization helpers, etc.
Some areas of the application may not be suitable for e2e testing (for example, they rely on a 3rd-party service that, if mocked out, leaves too little to be worth testing for some reason). If this is the case, central documentation that is easily accessible and kept up-to-date should include areas that are excluded with detailed rationale, and these exclusions should be re-evaluated regularly.
On Sync vs Async APIs
As a tool for developers working in environments where async Python may not be available, or drop-in use for such projects, the Python binding for Playwright offers Sync and Async flavours. In truth, the Sync flavour is really “faking” a sync API via the use of greenlets, and this has an impact on certain elements and expectations of the application it’s running within. There are some circumstances where the sync API may be the right choice, but they are generally a “you will know it when you see it” scenario. If you don’t know whether you should use sync or async, you should use async. All new tests should be written using the async API. Playwright and the browser environment are both truly async in behaviour and expectations under the hood, and so employing the async paradigm is the correct and recommended approach.
Just Tell Me What to Write, Darn It!
Okay, we know what tool we're using, and how to use it. What kinds of tests are we actually writing? Our tests will fall into three broad categories:
- Functional Tests
- Adversarial Tests
- Regression Integrations
Let's looks at each of these in turn, and discuss what considerations go into crafting them.
Functional Tests
Functional tests should be crafted from the same requirements that lead to development tasks. Ideally the tests should be created by someone other than the developer that wrote the code to be tested. This allows validating the clarity of the requirements, and allows for revealing the misunderstandings, assumptions and blind spots of the developer. When a developer writes their own tests, they may make the same mistakes twice. That said, tests written by the same developer are better than no tests at all.
Testers do not need to be consulted during the requirements gathering/generation stage, any more than developers do in general; anything that can be made to be done in the browser can be tested via e2e testing, without exception. Some things are more difficult than others, and in the same way that requirements gathering might involve feasibility checks on development tasks, it may be worth consulting with developers experienced with e2e testing on whether a given feature or task, as written, would be more or less difficult to test than something equivalent but formulated differently.
Tests should ideally be written just behind the code they are testing; ideally, an MR for a feature cannot be considered ready for merge without accompanying, adequate and passing tests. In cases where the developer and test developer are different individuals, the test and feature code may be written in parallel, but this raises significant difficulties for the test developer, such as identifying what the locators will be for the elements involved in the test; as a result, I'd suggest that all tests be written against code that is considered ready for inclusion in the codebase in general.
Adversarial Tests
Besides testing the application “as intended” based on requirements, tests should also make efforts to exercise the forbidden or unexpected. For example, attempting to access areas of the application that the tested user shouldn't be able to interact with, making repeated requests in short order, sending unexpected data to the API, rapidly switching between areas of the application, etc., and confirming that attempts to access unauthorized areas or API endpoints does not succeed, rapid navigation doesn’t break expected layout, etc.
This is a potentially expensive step in terms of time and implementation cost, as the ways in which your application should not function are less bounded than the ways in which it should. As a result, some smaller or simpler projects can and should skip or limit the scope of this element of e2e testing. On larger or more complex projects, however, it forms a useful part of the overall approach to catching unexpected behaviour and confirming authentication and authorization is working as expected, and is particularly important for applications where different user roles possess different permissions and/or access.
Regression Integrations
Manual testing remains an important part of the overall testing plan and process, and an experienced QA will sometimes uncover bugs that were not discovered with the existing automated tests. When the bug is confirmed and its operation understood, this new knowledge MUST be encoded into the automated testing at that point. Creating new tests that can confirm the existence and triggering of the bug behaviour should be created as part of the process of fixing the bug, in the same manner as for the functional testing. These tests then become a way to catch regressions and offer developers and stakeholders confidence that future changes are unlikely to cause the return of previously resolved bugs.
Care must be taken at this point not to allow the tests to degrade in quality and focus. It will be tempting to throw tests for bug behaviour into existing functional tests, or bundle multiple bugs together. Instead, each bug should be broken out into its own test. These tests can live in the same test file as related functional tests if appropriate, but the same approach to test isolation must be observed as for functional tests. Each test should aim to do one specific and easily identified thing, and never more than that.
We Done Here?
Yeah, pretty much. Automated testing is such a critical part of software development, and it so often gets left for last, subsisting on the crumbs of budget left over after all the neat and shiny features have been completed. Don't let your project become one of the broken birds that clients bring to us to mend - put your e2e testing in the development schedule from the start, and give your project some healthy wings.