Every day, pilots, flight attendants, air traffic control officers (ATCOs), mechanics, and other frontline aviation professionals put their skills and judgment to the test to solve problems that cross their paths.
Be it a failure not covered by any checklists, noncompliant passengers, a recurrent glitch in the flight planning software, or unclear instructions in a maintenance manual, “sharp-end” workers are always searching for ways to overcome challenges and accomplish their duties.
As desirable a trait as this can be on most occasions, this problem-solving attitude can sometimes be detrimental to safety, as it may cause risks to become normalized and hazards to go unreported. When employees deal with operational issues by “fixing and forgetting” (as described by Canadian researchers Tanya Anne Hewitt and Samia Chreim)1 rather than searching for root causes and reporting difficulties, organizations are deprived of important opportunities to learn from and mitigate risks.
In this two-part article, we will explore different problem-solving behaviors and how they impact safety. In Part One, we will delve into the cultural, organizational, and personal factors that contribute to issues being resolved without yielding lessons on how to prevent similar problems in the future. In Part Two, we will build on those insights and devise strategies to help foster a healthier reporting culture and motivate people to approach operational hurdles with a more inquisitive mindset.
Unfortunately, the consequences of inadequate problem-solving are often the subject of investigations of accidents or serious incidents. The effects of fixing and forgetting may seem positive at first, but immediate, local success in dealing with an issue can generate a false sense of control and conceal an imminent failure. This was the case of a large business aviation operator in Paris and the dynamics that led to the midair encounter of one of their jets and a narrow-body airliner in January 2022.
Unresolved Issues
The Cessna Citation 525, based at Paris-Le Bourget Airport, had been scheduled to fly to Geneva with two pilots and two passengers on board. In its final report on the accident, the French Bureau d’Enquêtes et d’Analyses (BEA) said that, on their climb to Flight Level (FL) 270 (about 27,000 ft), the crew had engaged the autopilot (AP) in indicated airspeed (IAS) mode and maintained a speed of 200 kt. The climb had been uneventful until they crossed FL 185, when the pilots felt a sudden increase in the load factor. Scanning their instruments, they realized that their pitch had increased to an abnormal value and the airspeed indicators were presenting different readings: 250 kt on the captain’s side and 150 kt on the first officer’s side.2
The crew disengaged the AP, re-established a normal pitch attitude and re-engaged the autopilot, now in vertical speed mode.
A few minutes later, the first officer, acting as pilot monitoring (PM), glanced at his altimeter and noticed that they had overshot FL 270, their cleared flight level. He alerted the captain, who was pilot flying (PF), about the level bust, and the captain checked his altimeter and saw that it indicated an altitude below FL 270, causing the criticality of the situation to become even more evident. Puzzled, the pilots leveled off and queried air traffic control (ATC) about their flight level. “FL 263,” the controller replied. This was consistent with the PF’s altimeter, yet the standby altimeter displayed a considerably different value, FL 280, which closely matched the indication on the PM’s side. “We have a small problem with our altimeters,” the captain said to the ATCO. After describing the issue, the controller, with a sense of urgency, indicated that they had traffic at 12 o’clock, 1,000 ft above them. Almost immediately, the Citation crew reported that an Embraer 170 had passed below them. The minimum separation between the jets was later calculated as 1.5 nm (2.8 km) horizontally and 665 ft vertically.
The subsequent BEA analysis concluded that an incorrectly installed hose in the Citation’s pitot-static system had likely caused a failure in the air data system that resulted in the unreliable indications received by the crew. Because the business jet’s transponder was receiving erroneous data from the faulty system, the decreasing proximity between the two aircraft did not cause the Embraer’s airborne collision avoidance system (ACAS) to generate advisory messages or the ATCO to receive a short-term conflict alert (the Citation was not equipped with an ACAS).
The investigation also revealed that the problems that the pilots had experienced were not new; they had presented themselves in 2017, 2019, and earlier in 2021, only a month before the incident. The BEA found that the business jet operator’s culture of placing operational readiness over safety resulted in the fault not being detected despite several opportunities.
How could a problem in such a critical system be left unresolved for so many years?
According to the BEA report, the first time it occurred, the crew made an entry in the technical logbook and filed a safety report. The aircraft was then inspected by maintenance personnel, who detected debris in the left airspeed indicator system.
Approximately two years later, the issue recurred. This time, however, a miscommunication between the pilots resulted in no safety report being filed, and an overly concise failure description in the logbook prevented the technicians from detecting the fault. Another two years passed, and the problem presented itself once again. The pilots — one of whom had been part of the crew that experienced the fault the previous time — informed maintenance but did not submit a safety report, as a problem in the reporting software did not allow the first officer to complete the submission process.
Although the BEA investigation pointed to several reasons why the pitot-static system issue was not diagnosed and resolved sooner, the bureau noted that the operator had a deficient reporting culture that affected both the information that was transmitted to its safety team and that which was used for technical troubleshooting. Maintenance issues were often analyzed by the chief pilot himself, who decided whether to ground the airplanes and where. Sometimes, as was the case for three of the four pitot-static system–related events (including the incident flight), crews would consult the chief pilot or management for guidance and were instructed to delay reporting faults until the aircraft was back at Le Bourget. At times, pilots were even told not to report any issues in the logbook, causing some problems to be “magically” discovered during scheduled maintenance and others to be left unresolved. These, in turn, gradually became part of the DNA of many of the operator’s aircraft. Jets had their own “quirks” (in the form of inoperative equipment and malfunctioning systems), and crews learned how to deal with them to keep operations running and minimize risks as much as possible in those substandard conditions.
Adapting to, and Adopting, Lower Standards
Human beings have remarkable adaptive capacity, Ohio State University Professor David Woods explains. This allows them to be able to respond to the most dynamic situations and solve problems successfully, especially when these problems fall outside what a system is designed to handle. However, adaptive capacity may be a double-edged sword. People do not solve problems in the same way, and, as the factors that led to the 2022 midair encounter demonstrate, their actions are not always conducive to safety. While adaptability is seen as a hallmark of resilient systems, sometimes the ways in which issues are dealt with by frontline staff may be counterproductive to the management of risks within an organizational context — what Woods calls “maladaptation.”3
A 2001 Harvard University study by researchers Anita Tucker, Amy Edmondson, and Steven Spear4 showed that individuals may adopt two different types of responses when facing a problem: one that hinders organizational learning and one that is conducive to it.
The first type of response is first-order problem-solving, or fixing and forgetting — characterized by actions that resolve an issue and allow work to continue but do nothing to prevent the issue from recurring. This type of response is commonly witnessed in aviation. In aircraft maintenance, for example, the lack of the right tools to conduct a certain task may cause technicians to resort to other less adequate tools or to find alternative ways to accomplish the job. Although these “workarounds” may not necessarily be unsafe or represent a procedural violation, they become problematic when the risky situation that rendered them necessary in the first place is left unreported and, consequently, unaddressed. First-order problem-solving hampers the gathering of data that could justify the adoption of corrective actions by management. If issues are only fixed but not communicated, they cannot be remedied.
The second type of response, called second-order problem-solving, or ”fixing and reporting,” involves investigating the underlying causes of an issue with the goal of preventing its recurrence. This is the gist of safety management systems (SMS). Taking the time to carefully examine the problems that frontline staff experience — as opposed to simply fixing them on the spot and resuming work — yields an important opportunity for organizational learning, one of the pillars of a safety culture, as emphasized by James Reason.
The Harvard researchers observed 22 nurses in eight different hospitals for a total of 197 hours and concluded that first-order problem-solving prevailed over second-order problem-solving. The vast majority of responses to the issues were the ones that allowed the nurses to continue caring for the patients but ignored any possibility of investigating or changing the causes of the problems. So, what factors contributed to the predominance of first-order problem-solving?
First, they identified insufficient time to “fix and report.” Overcoming a problem and then reporting it takes longer than only fixing it and moving on. Similarly to how nurses did not have enough time to engage in any activity other than caring for patients, frontline aviation professionals often lack the time to pause their work to write a report, as simple a task as this might appear.
As a second contributing factor, the researchers found poor teamwork and a low degree of psychological safety. They explained that, when nurses would share their concerns with doctors or management, their findings would be dismissed, or they would be asked to “prove” that they had indeed encountered a problem. Physicians and managers would also refrain from transmitting to nurses important information about patients’ treatment plans and other critical processes due to the nurses’ perceived hierarchical level. “[W]e found that doctors, at times, treated nurses as low-status workers,” the research noted. In aviation, despite the widely acknowledged benefits of crew resource management, deficient teamwork and low psychological safety persist, especially between highly experienced pilots and young first officers, or between flight crew and cabin crew.
Next, the study found that nurses’ sense of fulfillment and pride stemming from challenges that they had been required to overcome in the past (and had done successfully) motivated them to prioritize adopting first-order problem-solving behaviors when faced with hurdles. For author Daniel Pink, intrinsic motivation has three components: purpose, mastery, and autonomy5 By repeatedly being exposed to situations that required them to use their skills, wits, and adaptive capacity, the nurses’ sense of mastery was bolstered, further reinforcing the “benefits” of fixing and forgetting. Unfortunately, crews can also fall victim to the same phenomenon. The portrayal of pilots who “save the day” as the epitome of airmanship may cause some crewmembers to believe that the best pilots are those who successfully circumvent adversities, rather than those who adopt careful risk management practices and diligently follow standard operating procedures.
While having nurses’ decisions questioned by medical practitioners and managers inhibited second-order problem-solving, giving them autonomy without support was equally harmful. The authors observed that “empowering” nurses through the removal of direct managerial assistance, as seen in some hospitals, caused the deterioration of effective problem-solving when such initiatives did not include measures to bridge the gap between nurses and other members of the organization with whom they were required to interact. Additionally, they noted that any potential benefits in reducing managerial oversight were lost when professionals were already overburdened by their normal duties, leaving them no time to engage in the resolution of higher-level problems.
Lastly, the study revealed that resorting to local “quick fixes,” as opposed to more thoroughly planned and longer lasting adjustments, made nurses feel temporarily satisfied, a neurobehavioral process called temporal discounting. Also commonly referred to as “discounting the future,” this phenomenon causes individuals to weigh immediate rewards more heavily than longer-term ones — a behavior that was also evident for the French business jet operator. The focus seemed to be on completing flights no matter what technical issues may appear. Dealing with failures on the spot was more “rewarding” than adopting a more strategic approach and aiming for long-term fleet availability — which would have been the most financially sound decision.
The Consequences
When first-order problem-solving defines how an organization and its members deal with issues, increasingly lower operating standards become accepted — a practice widely known in safety science as normalization of deviance. Tucker and her team concluded in their study that the prevalence of fixing and forgetting resulted in organizational systems becoming worse or staying the same over time. Nurses would feel proud of themselves for overcoming adversities and resolving issues, but without reporting and deeper analyses, they would also find themselves having to fix the same problems repeatedly. A similar sentiment likely also existed among the crews of the business jet operator.
By working as what the late Massachusetts Institute of Technology Professor Donella Meadows would call a “destructive reinforcing feedback loop,”6 first-order problem-solving and the “quick fixes” that it brings about cause second-order problem-solving to become an even more distant reality among work groups and allow local successes at overcoming issues to turn into widespread organizational vulnerabilities.
Image: Yakobchuk Viacheslav / Shutterstock.com
Lucca Carrasco Filippo is a flight safety specialist and a former commercial helicopter pilot. He holds a degree in aeronautical sciences and multiple qualifications in safety management systems, human factors, and incident investigation.
Notes
- Hewitt, T A.; Chreim, S. (2015). “Fix and Forget or Fix and Report: A Qualitative Study of Tensions at the Front Line of Incident Reporting.” BMJ Quality & Safety Volume 24 (Issue 5): 303–310.
- Bureau d’Enquêtes et d’Analyses (BEA). (2023). “Serious Incident Between the Cessna Citation 525 CJ Registered F-HGPG Operated by Valljet and the Embraer 170 Registered F-HBXG Operated by HOP! on 12 January 2022 RE Route south of Auxerre (Yonne).”
- Woods, D. “Essential Characteristics of Resilience.” In Resilience Engineering (1st ed., p. 21–33): Aldershot, Hampshire, U.K.: Ashgate, 2006.
- Tucker, A.L.; Edmondson, A.C.; Spear, S. (2002a). “When Problem Solving Prevents Organizational Learning.” Journal of Organizational Change Management Volume 15 (Issue 2): 122–137.
- Pink, D.H. “Drive: The Surprising Truth About What Motivates Us” (1st ed). Riverhead Books, 2011.
- Meadows, D. “Thinking in Systems.” White River Junction, Vermont, U.S.: Chelsea Green Publishing, 2008.