Simplified Vigilance Metrics

Russian scientists propose an alternative method to predict individual pilots’ fatigue on long-haul airline flights.

by Nadezhda V. Yakimovich, Eleanora I. Surina, Igor G. Gorodetsky and Vladimir V. Chironov | October 19, 2017

Airline pilots’ degree of fatigue during long-haul flights depends on several factors, including the quality of their sleep before awakening, the duration of the preceding period of wakefulness and the duration and intensity of their workload. Less well known is how best to incorporate knowledge about the individual pilot’s resistance to fatigue effects.

International experts have recognized the latter factor — and debated its significance — for at least six years (ASW, 6/11, p. 33). This article describes ongoing work on our theoretical model called the Adaptive Model of Operator Activity (AMOA) and a derivative practical application for airlines. The application is a computer-based, pilot fatigue assessment we call the Control of the Level of Efficiency Test (CLE Test). We reported on our preliminary evaluation of the CLE Test in an academic paper available in Russian and English. (See “Related Publications” for more information.)

This article focuses on the underlying theory and the line operations validation behind this tool, and how it was designed to overcome differences in airline pilots’ resistance to fatigue effects. We also note the tool’s relevance to the fatigue mitigation framework and guidance published by the International Civil Aviation Organization (ICAO).

Volga-Dnepr Airlines, part of the Volga-Dnepr Group of airlines headquartered in the Russian Federation, has been our research partner. The airline’s flight crews, risk managers and fatigue specialists assisted in CLE Test development and conducted the beta testing.

Evaluation occurred during their domestic and international short-haul flights and long-haul flights (including global charter flights operated with Antonov An-124-100 freighters by the group’s Ruslan division). They applied data from the CLE Test to their plans for reducing fatigue risks related to crew-pairing choices and for increasing pilots’ self-awareness of their susceptibility to fatigue-induced performance impairment.

AeroSafety World editors assisted in editing this article, which covers the highlights of our paper. We wrote the paper and article in our roles as scientists and professors within the Ergonomics Department of Moscow State Aviation Technological University (MATI), Moscow, Russian Federation.

Project Background

We consider fatigue a major threat among the human factors in aviation safety. Fatigue affects most aspects of flight crewmembers’ ability to safely do their job. We especially are interested in possibilities for further risk mitigation.

A key goal was to help airline safety management systems to account for the variations among fatigued pilots — on a scale from nominal performance to high-risk impairment — on tasks such as those involved in approach and landing. These variations have been evident in prior fatigue research, and they become more apparent as airlines collect, analyze and share safety data generated by fatigue risk management programs and flight data monitoring programs.

In a July 2011 joint publication — titled Fatigue Risk Management System (FRMS) Implementation Guide for Operators — ICAO, the International Federation of Air Line Pilots’ Associations and the International Air Transport Association defined crewmember fatigue as “a physiological state of reduced mental or physical performance capability resulting from sleep loss or extended wakefulness, circadian phase, or workload (mental and/or physical activity) that can impair a crewmember’s alertness and ability to safely operate an aircraft or perform safety related duties.”

Preflight-Test Concept

Essentially, our CLE Test simulates the fatigue-development process, and its resultant fatigue attributes, for pilots performing complex (in other words, combined) aircraft piloting tasks. The AMOA enabled us to build a “forecast of fatigue” into this test. The test then allowed us to predict the approximate time during an actual long-haul flight when so-called pronounced fatigue (operationally significant fatigue, detailed later in this article) would begin for that individual pilot. This enables us to share information about the procedure for diagnosis of a pilot’s resistance to fatigue on long-haul flights and our experimental verification (validation) of the procedure’s predictive capability.

We checked the test’s objectivity and validity by correlating the degree of forecast fatigue of pilots with estimates of which exceedances (significant deviations) of flight parameters were due to pronounced fatigue, as judged from flight data analysis.

Correlation of forecasts with objective indicators of pilot performance yielded a high statistical confidence level. Technically expressed, the Pearson correlation coefficient was r = –0.805 with sig = 0.002 (p <0.01). We therefore expect to recommend our technique, when certified, for practical application by airlines.

We first considered questions surrounding the concept of a pilot’s personal resistance to the development of fatigue; in other words, we studied the problem of “endurance of the pilot.” Some people are less resilient than others under their workload (that is, they tire earlier than others under the same workload).

Psycho-physiologists have empirically demonstrated that endurance depends on the fundamental properties of an individual’s central nervous system and, above all, on the “strength” of the nervous system. On this basis, pilots with a relatively strong nervous system are more resilient than others and therefore are more resistant to fatigue. And pilots with a relatively weaker nervous system, on the contrary, are less resistant to fatigue — that is, they get tired faster under their workload.

Moreover, the time at which a pilot’s fatigue begins during a specific flight depends not only on current (circumstantial) factors — such as the fatigue-inducing characteristics of the flight and the condition of the pilot — but also on what we call the innate factor of endurance. Endurance is a human attribute at the genetic/hereditary level, manifested by strength or weakness of the nervous system. We therefore should consider the development of fatigue while a flight progresses as the result of all the above factors.

If pilots have different degrees of resistance to the development of fatigue, their different endurance implies that in flight, all other things being equal, one pilot may feel the first signs of workload-induced fatigue within the first three to four hours of the flight duty period. For a different pilot, this may occur only after seven to eight hours.

As we mentioned, knowing the characteristics of each pilot could be of great importance for crew selection and assignment. For any given crew, our usual advice is to include pilots with both high and low resistance to fatigue, and to plan their in-flight rest at different times so they can substitute for each other in flight.

All of the above factors — except personal resistance to development of fatigue — can be quantified from safety data collected on the flight duty and the rest habits of the pilots, actual conditions of their work activity during a specific flight and their amount and quality of sleep before operating the aircraft. We determined that individual resistance to fatigue can be determined by the CLE Test method, and we are not aware of any equivalent technique. The CLE Test is unlike the smartphone-based, real-time, psychomotor vigilance tests you can find in the scientific literature on pilot fatigue and applied by some airlines.

Professionally Important Characteristic

Resistance to the fatigue-inducing factors experienced by flight crews on commercial long-haul flights is what we would call a professionally important characteristic (PIC) for civilian airline pilots. However, our literature review found that in the world’s existing air transport system, there are no physical fitness tests designed to measure this PIC.

Thus, for airline psychologists who advise airlines on pilot selection at the conclusion of their mental function examinations, one aspect of the candidate pilot remains completely unknown on an important point: How well will this candidate endure fatigue and safely perform flight deck duties after eight or 10 hours of flight? (We also could argue that the mental function of any pilot after eight hours of work on board the aircraft will be lower than during a simulator session within the airline’s selection process, when candidates tend to demonstrate their abilities in an optimal state of alertness.)

Unfortunately, airlines cannot enter just any so-called “single reduction coefficient” to understand how much a given airline pilot’s mental and physical performance will decline under the usual fatigue-inducing factors. This is not possible because susceptibility to the development of fatigue must be expressed as a purely individual feature that depends on many mental and physiological factors.

The strong/weak nervous system and its degree of resistance to mental and physical exhaustion involve the pilot’s age, quality of life and occupational stress — not the more common characteristics of this profession. For this reason, an assessment of resistance to mental fatigue must be carried out only at the individual level.

Among several concepts considered, we settled on the AMOA-derived CLE Test as a relatively objective technique for studying individual resistance to the development of pronounced fatigue.

We began this work by assuming that the greatest threat to safe performance of an airline pilot’s duties arises from high degrees of fatigue. We called these subtypes of pronounced fatigue expressed fatigue and severe fatigue (exhaustion). The term expressed fatigue stands in sharp contrast to other easy-to-manage degrees of fatigue — those that result in no serious problems in performing flight duties, or uncorrected errors, omissions or delays.

Extreme degrees of fatigue are manifested in definable data and measurable phenomena, so we have the opportunity to recognize their presence as a pilot performs his or her duties. To do this, flight data monitoring can be effective. For example, in civil aviation, we can use the onboard flight data recorder, which captures flight parameters that, through routine analysis, reflect the quality of the flight crew’s piloting of the aircraft.

For example, the airline’s risk analysts can look at a specific pilot’s performance on staying within acceptable flight parameters during a manual landing in standard weather conditions while lightly fatigued (in other words, the variable of interest would be the landing phase after three to five hours of flight). They can compare this scenario with the quality of the pilot’s landing performance measured after a flight of eight to 10 hours. The differences can be judged as to the degree of fatigue and the process that ended in pronounced fatigue.

We consider this technique for studying an individual pilot’s resistance to expressed fatigue to be sufficiently valid and reliable for operational use because it incorporates objective performance indicators from real flights.

However, the CLE Test also is very labor-intensive — it requires the collection and analysis of large amounts of pilot data and aircraft data. This creates a need to create tools that allow for a rapid preflight evaluation of the pilot’s fatigue resistance, the scope of his “mental reserves,” and their depletion (that is, the rates of development of expressed fatigue and severe fatigue).

To create a suitable test method, we turned to a type of computer simulation called virtual execution of operator activity. We implemented this as the CLE Test variant of AMOA software, written under the leadership of Igor Gorodetsky using MATI’s information technology expertise and resources.

Subsequently, a multi-function, computer-based test was programmed. This software assesses the individual pilot’s resistance to fatigue for 10 minutes while the pilot performs complex tasks with fatigue-inducing proxy tasks. They approximate tasks performed on an aircraft flight deck. (Psychomotor vigilance tests also measure fatigue-related mental performance without replicating piloting tasks.)

We must emphasize that our psycho-diagnostic simulation was difficult to implement because it requires the pilot to simultaneously perform several tasks on the computer display. The pilot had to control the movement of a fast-moving object on the screen, read the changing quantitative indications on the screen to make a decision based on mental calculations, and perform hand-eye motor-response reactions (inputs to a joystick control).

In other words, the CLE Test simulates the combined operations we defined earlier. Such simulations enable measurements that are sufficiently representative of the most difficult tasks conducted by an airline pilot on any flight deck.

In summary, the CLE Test makes it possible to simulate development of pronounced fatigue and to see which pilots are most resistant to adverse effects (resilient) and which are least resistant (weak).

Validation at Volga-Dneper Airlines

To test the practical feasibility of using the CLE Test to predict individual pilots’ resistance to fatigue, we carried out an experimental study with Volga-Dneper’s pilots who fly cargo on both long-haul transcontinental flights and short-haul flights. The study was conducted under the leadership of Eleanora I. Surina from July to December 2012.

The aim of this study was to use the CLE Test to predict each pilot’s degree of resistance to the development of fatigue in flight and to verify the automated estimates by comparing them with objectively defined manifestations of pronounced fatigue during real flights. The real flights generated flight-parameter data from flight data recorders.

For the 12 participating Ruslan pilots operating An-124-100 freighters, we collected data on three short-haul flights (each lasting three to five hours) intended to induce slight fatigue. The same pilots operated three long-haul flights (each lasting eight to 10 hours) intended to induce expressed fatigue.

Flight crews of Volga-Dneper Airlines’ Ruslan charter division operate Antonov An-124-100 freighters between many global destinations.

Our analysis of data from the “expressed fatigue” long-haul scenarios showed flight crew deviations from the airlines’ stabilized-approach and landing standards occurring during manual landings. The incidence of these deviations exceeded those of the participating pilots operating the short-haul flights.

The study analyzed a total of 72 flights using flight-parameter data from the approach and landing phase of the long-haul scenarios and the short-haul scenarios. The scenarios involved manual piloting in approximately the same weather conditions, and with approximately equal approach complexity.

The following calculations of deviations were derived from approach-related parameters captured by the aircraft flight data recorders for Russian-standard instrument approach systems:

The deviation from published aircraft height above ground level while crossing the locator outer marker (a high-powered radio marker beacon);
The deviation from published aircraft height along the final approach course (from the outer nondirectional beacon [NDB] to the inner NDB); and,
The average deviation from the maximum bank angle prescribed for roll maneuvers beyond the point of aircraft alignment with the final approach course.

The calculations showed more deviations from approach-and-landing standards for pilots of the three long-haul flights, compared with the deviations for pilots of short-haul flights. We also determined the percentage differences between these scenarios, which we called the total percentage difference. We generalized the total percentage difference for these three flights, and this functioned as an objective metric of pilot fatigue during long-haul flights, enabling us to validate the predictions made by our CLE Test technique.

The CLE Test, therefore, enabled an indirect assessment of the degree of endurance of each pilot’s central nervous system. This simulation comprises a combined activity that requires the pilot to perform two actions simultaneously.

The pilot must hold the computer mouse on an object that moves quickly across the screen, while mentally performing arithmetic calculations with numbers that are changing rapidly on the screen. After interpreting the answer for each computation, the pilot must choose whether to press either the left button or the right button on the mouse.

This task load is cognitively very complex and very intense, so after three minutes of test time, some pilots show signs of fatigue — that is, they become less able to control the moving object or, worse, less able to add the rapidly changing numbers. If the pilot begins to skip any steps required by the simulation (that is, if he doesn’t have the “strength” to do them all in sequence), pilot behaviors that we call compensatory pauses begin to emerge in the response pattern. We interpret compensatory pauses as an objective manifestation of the pilot’s need for rest from the task and the beginning of the development of pronounced fatigue.

The name CLE Test reflects these periods of pilots’ performance deterioration. The software graphically displays analytical results on a computer screen. We call this computer graphic the efficiency curve or the curve of fatigue development. The curve shows the times taken for steps in the combined tasks, then an increase of quality of activities in performing the combined tasks, followed by a decline in task-performance quality, in which the software “sees” the compensatory pauses as the pilot’s way to cope with the fatigue factors.

If the slope of the efficiency curve increases from the beginning to the end of the CLE Test (10 minutes) with small deviations, we interpret this as indicating the pilot’s fairly high resistance to fatigue.

If the efficiency curve — after a small increase in slope — continues to rise, but then moves to a permanent reversal, that is evidence of the low ability of the pilot to resist fatigue.

After we ended our experimental work and CLE Test–validation work at Volga-Dneper, the airlines’ psychologists reported that their own fatigue forecasts — applying our methodology to a wider range of pilots (pilots who did not take part in the experiments and tests) — generated the strongest correlations so far with flight data analyses.

Moreover, their psychologists found that the computer-predicted curves of fatigue development were very similar to the curves for real flights. We therefore concluded that a CLE Test prediction, when compared with a sample of actual flight-hours data for a specific pilot, will reveal pronounced fatigue for that pilot. A minute of our 10-minute preflight CLE Test simulation — as indicated by the curve — was approximately comparable to an hour of what the pilot experienced during a 10-hour flight.

According to these psychologists, if pronounced fatigue (either expressed fatigue or severe fatigue) appeared in the first half of the CLE Test, it also developed earlier in the aircraft flight. Similarly, if these degrees of fatigue appeared toward the end of the test, the pilot’s fatigue on the aircraft also appeared later (typically between the seven-hour and the eight-hour elapsed-time point in the flight).

Curves of Fatigue Development

To design the CLE Test software function that generates the efficiency curve/curve of fatigue development, we used the following metrics:

The X-axis represents the execution time scale for the test.
The Y-axis represents the pilot’s “quality of work” during execution of test activities.

To determine the quality of the test activities that synthesize the two kinds of activities (the tracking of an object and the accuracy of the mentally performed arithmetic), we calculated the pilot’s performance of each activity separately. Each activity only can be considered as successfully completed (for the object-tracking) or correctly solved (in sequence).

We limited the CLE Test’s collection of tracking-task data to tasks in which the pilot successfully solved the tracking problems (in other words, the mouse remained on the object). The difficulty of the task involves the amplitude of the object’s on-screen movement (and the pilot’s mouse movement) along the X axis and Y axis, and the speed of motion of the object. We created a value (score) we call quality of work in standard units (the Y-axis range from 0.01 to 0.99, seen in Figures 1 and 2) by writing a formula.

Similarly, we used a formula that reduces the pilot’s quality of work in solving the series of arithmetic problems to a single value (score) representing the average number of correctly solved arithmetic problems per unit of time (per minute).

Finally, we wrote a formula to calculate a single value representing the results of a pilot’s entire CLE Test. We call this value the integral performance indicator. It is our metric for quality of simultaneous performance of the combined activities (that is, completing two activities at the same time). It depends on the pilot’s quality of executing each activity. The formula is simply multiplying the score for the quality of tracking times the score for the pilot’s accuracy in adding and interpreting numbers.

Reading the integral performance indicator, the higher the quality of pilot performance on combined activities, the higher will be the pilot’s fatigue-resistance (efficiency) level and the lower will be the pilot’s degree of fatigue. We graphically present this indicator in the following two figures. The efficiency curve/curve of fatigue development is the X axis.

Fatigue curves in these figures, generated to show sample results of our CLE Test, also show the surprising speed of fatigue development (the appearance of compensatory pauses while performing tasks) and further fatigue development (the pilot’s inability to maintain the initial level of performance).

Figure 1 shows the curve for the pilot who was the most resistant to fatigue. Figure 2 shows the curve of the pilot who was the least resistant to fatigue.

Figure 1 — Curve of fatigue resistance capacity (pilot’s age 39)

Source: Nadezhda V. Yakimovich, Eleanora I. Surina, Igor G. Gorodetsky and Vladimir V. Chironov

Figure 1 also shows that the most fatigue-resistant pilot’s performance climbs steadily upward with few compensatory pauses — which appeared in the middle of the test (that is, after spending five minutes on the 10-minute test) — and that the fatigue indications were not severe.

In contrast, Figure 2 shows that the least fatigue-resistant pilot’s performance has a markedly different nature. Note that the pilot’s first compensatory pause occurs three minutes into the test, the performance reversal is very deep, and then the pilot unsuccessfully tries to return to the first performance level that was achieved.

In fact, during every subsequent minute, the pilot again falls into compensatory pauses and does not rise again above the initial level of performance. We especially emphasize that the progression along the efficiency curve for this pilot does not improve after the third minute of the test. The data stabilize at a low level until the end of the test.

Figure 2 — Curve of pilot fatigue development showing the greatest instability (pilot’s age 51)

Source: Nadezhda V. Yakimovich, Eleanora I. Surina, Igor G. Gorodetsky and Vladimir V. Chironov

In these figures, we mention the age of each pilot because of the overall patterns we have seen during this research project. The pilots who showed relatively weak resistance to fatigue were older than the pilots who showed relatively strong resistance to fatigue. This trend held across all experimental samples: Younger pilots constituted the group that was most resistant to pronounced fatigue effects.

Summary of Key Findings

Our research into airline pilots’ individual resistance to significant fatigue risks led us to these key findings:

The CLE Test’s integral performance indicator shows significant correlations with flight data analysis regarding approach and landing outcomes after long-haul commercial airline flights (confirming variations among individuals).
There can be negative correlations between the CLE Test’s integral performance indicator and analyses of corresponding flight data parameters. Correlations normally suggest, however, that when there is strong, sustained pilot resistance to fatigue effects, we should expect relatively few deviations from flight operations standards to be detected during the airline’s flight data monitoring analyses. Moreover, the degree of fatigue-resistance variability among pilots becomes more readily apparent on long-haul flights than on short-haul flights.
Statistically significant correlations like these should encourage the global aviation industry to pursue new opportunities to predict — on the basis of CLE Test results — that deviations from flight operations standards and best practices are likely to occur at the end of long-haul commercial airline trips.

This experience in Russia and worldwide may suggest that the airline industry should allow pilots to make better-informed recommendations about their desired rest time on board the airplane. For example, they routinely might specify their preference for when to take an optimally scheduled (technologically validated) break from flight duty to prevent development of a dangerous degree of fatigue.

As noted, a certified CLE Test method might allow airline managers to better assign/pair their pilots during flight crew rostering. They could identify those whose rest needs indicate particularly optimal times for rest breaks to help ensure that alert crewmembers are in control at all times.

Notes

Nadezhda V. Yakimovich, Ph.D., is director of the Scientific and Production Center, Automated Information Systems, at the Federal State Unitary Enterprise Scientific Research Unit of the Moscow Aviation Technological Institute.
Eleonora I. Surina is director of the Personnel Resource Department of Volga-Dnepr Airlines and is an aviation psychologist.
Igor G. Gorodetsky, Ph.D., is head of the Department of Ergonomics and Information Measuring Systems of the Moscow Aviation Technological Institute.
Vladimir V. Chironov is chief research officer of the Scientific and Production Center, Automated Information Systems, at the Federal State Unitary Enterprise Scientific Research Unit of the Moscow Aviation Technological Institute.

Simplified Vigilance Metrics

Project Background

Preflight-Test Concept

Professionally Important Characteristic

Validation at Volga-Dneper Airlines

Curves of Fatigue Development

Figure 1 — Curve of fatigue resistance capacity (pilot’s age 39)

Figure 2 — Curve of pilot fatigue development showing the greatest instability (pilot’s age 51)

Summary of Key Findings

Notes

Related Publications

Share:

Print:

Related Content

Finding a Fatigue Benchmark

Watchlist, Revisited

80 Years of Aviation Safety