In contrast to all the other important test methodologies, "Test Like You Fly", or TLYF for short, emphasizes testing to find fundamental flaws in a system that will prevent it from performing the mission. Most testing methodologies strive to confirm that requirements - the input to our designs - are being met by the system as written.
TLYF is all about confirming that the system - as a whole - will operate in the environment it is designed to operate, the environment we typically refer to as "live" or "production". It is vitally important to find what doesn’t perform as expected and to understand the reasons for this anomalous behavior, especially where such defects can degrade, cripple, or abruptly end a mission.
During this 7-part series we will cover how to accomplish TLYF in detail in your own applications:
In our testing strategies we frequently attempt to isolate the component under test - large or small - from the rest of the system. We know what we want to test, and fine-grained or coarse, we test segments of the system. The larger the component, or the broader the test, the more we tend to try and prove it WILL work - work "as designed".
TLYF comes at the system from the other direction. Testing Like You Fly is designed to as completely as possible drive the system as it will exist in production. It demonstrates that the mission of the system or application can achieve success, not that it merely meets requirements. All components are wired together and available in as real-world an environment as is possible.
Testing Like You Fly originated in the aerospace industry, with the intention of mitigating risks by pre-playing all expected in flight scenarios in advance of actual flight. Given the one and often only chance to make most flights a success, it is crucial to execute all happy paths and possible failure modes while on the ground. Here we use the term "mission" to represent the activities in the live environment an application is designed for.
This testing approach requires all hands on deck. Everyone from design, developers, support personnel, operations, etc. should be engaged in the process of being "LYF Architects". LYF Architects spend time identifying how the test environment - prior to liftoff - can replicate the expectations of that live environment and identify gaps in testing to ensure success on the ground means success in the air.
The purpose of testing is NOT to prove that no flaws exist.
There are two types of testing we encounter - white box and black box.
White Box testing is detailed testing based on one premise - "I know how it works"
The input for white box testing are edge cases and extremes - the testing is for specific behaviors. Any code changes demand a re-test. Testing is performed in small increments. Detailed behavior is affirmed.
Black Box testing leaves the system opaque to the tester. Input is typical input, and the outcomes measure the results. Any requirements change demands a re-test. Testing is performed as end-to-end "day/weeks/years in the life" (DITL) - a Total Operations Chain Test of all First Time Activities. Testing needs to simulate activities with longer time frames through acceleration of time - as certainly monthly or once-a-year events still need tested.
Pilots cannot test aircraft by starting from 30,000 feet. They have to taxi to the runway, takeoff, fly to a desired altitude, and then, and only then, can they experience the plane in its natural habitat. Likewise our applications do not magically start processing requests immediately. Their environment is provisioned, they are "booted", they establish connections to their surroundings.
Only then are they prepared to "operate". During that operational period they are expected to handle on occasion a peak demand, and gracefully degrade. Should anything unanticipated occur they should log pertinent details for diagnosis, announce their "distress" and not collapse or fail miserably nor abruptly.
The purpose of TLYF is to experience all of the above before launch. With the comforting knowledge that whatever could happen has been tested, experienced, diagnosed, AND mitigated before launch. Operate the entire system end-to-end through its full business lifecycle before it becomes critical to the business.
The fundamental principle of TLYF is to perform activities for the first time pre-launch rather than during the actual mission, it follows that we must be cognizant of what those "first time" activities are. A "first-time" activity is not only the literal first time a discrete activity is performed, but is also the first time a repetitive set of activities (e.g., "nominal ops") is performed. Both versions of a first time activity are needed as the basis for the "days-in-the-life" (DITL) tests, with the second version necessary to flush out accumulation and asynchronous timing errors that need more than a single occurrence or cycle to allow this kind of flaw to manifest.
Ok. You have done your best to TLYF. Your application is humming along nicely doing what it does best - serving your customer's needs - fulfilling a mission. And over the horizon, you see a scenario brewing that you did not anticipate - a new set of circumstances, a change in the business needs, data that is outside the expected value ranges, etc. What do you do?
What you don't do is test in production. Of course not. You would not do that. Standing back you realize that this new scenario was not considered in your TLYF test environment. It may be obvious that you need to introduce this scenario into a full TLYF lifecycle test, in all its nuance, to a) discover how the mission will transpire, and b) to assess what adjustments need to be made to support this new scenario. The critical step in Fly Like You Test is appreciating that you should never "fly" any scenario in the production environment without first testing it.
How do I TLYF? Next up in this series: The Basics of Testing Like You Fly.