By Jeff Dagle, Pacific Northwest National Laboratory
Power grid dispatchers can now train like airline pilots, using simulators that provide faulty readings designed to throw them off, thus teaching them how to compensate for instrument failure and continue to safely and reliably operate the grid.
Researchers at the Department of Energy’s Pacific Northwest National Laboratory in Richland, Wash., have developed a hands-on training curriculum that uses a dispatcher-training simulator to evoke loss of situational awareness-originally an aviation term that human factors expert Mica Endsley described as “the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future.”
The objective was to learn if increased awareness led to better recognition and troubleshooting of off-normal conditions. With the help of Areva T&D, a worldwide provider of transmission and distribution products including grid simulator software, PNNL created specific scenarios to simulate misleading, false data as part of the training curriculum.
Researchers now can study the response of dispatchers dealing with conflicting or bad data both before and after receiving the training. The ultimate goal is to avoid situations such as those that occurred on Aug. 14, 2003, and resulted in the largest blackout in the history of the North American electrical power grid. On that day, an alarm processor at a control center failed at the same time the center was experiencing other problems. These simultaneous occurrences led to a sequence of events that ultimately impacted 20 million people in Ontario, Canada, 40 million people in eight states, and cost the United States between $4 billion and $10 billion.
The alarm message log lists issues related to alarming that have occurred during operation.
The blackout investigation revealed deficiencies in dispatcher training, particularly in recognizing deteriorating conditions and taking subsequent effective actions. The final report cited loss of situational awareness by the power system dispatchers as one of the root causes of the blackout. The investigation team recommended better training to identify and resolve bad data.
This event provided powerful motivation for the PNNL team to begin looking at ways to train grid dispatchers to recognize bad information caused by instrument failure or malicious hacking. The researchers were surprised to learn from vendors that while hands-on training simulators were currently used by the electrical transmission industry, the industry standard did not include or incorporate situational awareness training.
Teaching Dispatchers to Hit ‘Curve Balls’
Most vendors of large-scale energy management systems offer hands-on training simulators for electric power grid dispatchers, and many large control centers throughout the country use training simulators to drill power system dispatchers on emergency restoration procedures. These simulators have proved successful in increasing the proficiency of power grid dispatchers by providing a realistic training environment and enhancing the sophistication of the training curriculum. However, the scenarios are run with all instrumentation and software in full working order.
Also, training scenarios usually consist of executing drills to raise awareness of the issues associated with system operations and grid nuances, such as “black start” restoration, that focus on power grid operation. Little to no simulator training exists to reinforce communications, coordination with other centers, workload management issues or situational awareness.
As the August 2003 blackout and the Three-Mile Island nuclear power plant accident (see sidebar) clearly showed, dispatchers need to learn how to respond to situations where curve balls come at them from instrument failure or from hackers. Training is essential to cultivate and reinforce operators’ ability to recognize and act on bad information to prevent or minimize the impact of problems on the system.
Taking Cues from Airline Pilot Training
The PNNL developers looked at airline pilot training methods because there are many parallels in simulator training between the power and airline industries. In both cases, the simulator’s primary focus is to drill emergency procedures. But there is one important difference: Pilots are trained to recognize failures in the instruments through “partial panel” flying. That is, pilots are trained to cross-check their instruments by comparing the pertinent information from each, and comparing each to their mental model of the big picture. By doing so, the pilot retains situational awareness at all times and can recognize instrument failures and respond accordingly.
The alarm summary shows current and unacknowledged alarms on the system, such as breaker trips, voltage limit violations, overloaded equipment, or bad telemetry.
Airplane cockpits contain a variety of instruments that provide crucial information to the pilot. These are particularly important when external visual references are not available, such as when flying at night or in clouds or fog. Multiple sets of instruments provide redundant information on critical aircraft parameters necessary for maintaining control in these conditions. When possible, common modes of failure are anticipated and eliminated through redundancy. For example, a vacuum-driven gyroscopic instrument may be backed up with an electric-driven gyroscopic instrument that provides similar information.
However, the incipient failure of any instrument, if not recognized by the pilot, could place the aircraft in jeopardy if the pilot uses incorrect information as a reference for aircraft control. The value of partial-panel training is that it teaches pilots to not put unwavering faith in their instrument panel, but rather to stay on their toes by continually checking and cross-checking the data they’re receiving and looking for potential discrepancies.
Learning to Operate Safely Even with Bad Data
In the aftermath of the August 2003 blackout, PNNL conducted simulator training with skilled dispatchers to determine their success in recognizing conditions that may have occurred in the grid even when the alerting tools are faulty-for example, failure of the alarm processor.
The first challenge was finding a power system training simulator that could accommodate this training scenario-no commercially available simulators were configured to operate in this mode. The standard design presents trainees with a transparent picture of the scenario being simulated by the instructor or person “running” the simulator. In other words, there isn’t a way for the instructor to simulate a line going down while at the same time “secretly” deactivating the alarms that would typically notify dispatchers of this event.
The graphical user interface allows operators to zoom in on various one-line diagrams to concentrate on particular areas, or look broadly across the system.
Working with Areva T&D and using their Dispatcher Training Simulator, PNNL researchers were able to configure the training simulator to solve the network equations separately from the system that presented information to the student, and thus simulate a failure in the alarm processor or in other key subsystems, such as telemetry. Another training objective was to assess the dispatchers’ ability to recognize whether the integrity of their system had been breached in the midst of their usual daily activities.
To demonstrate the new curriculum, PNNL asked six seasoned dispatchers from the power industry to participate in a pilot training class. The class consisted of three 2-hour sessions each designed to represent a typical shift-with periods of inactivity, interruptions and other activities-augmented by a familiarization session and classroom modules.
The participants used a standard dispatcher-training simulator modified to decouple the power flow solution from the graphical user interface. The purpose of the first session was to familiarize the dispatchers with the system, running through a scenario with all equipment and instrumentation working well. In the second session, the instructor introduced an electrical scenario without notifying the dispatcher through the alarm processor.
Then dispatchers received training on cyber security and awareness of the threats hackers can pose to the grid. Following this training, the dispatchers sat down for another “shift” in the simulator that again included malfunctioning instruments or fake signals sent by hackers. The goal was to see how the dispatchers did in troubleshooting and whether they could put the clues together. Afterward, human factors experts conducted a detailed debriefing.
The researchers found that once dispatchers became aware of the very real potential for bad information, they were better able to come up with new courses of action and troubleshoot faster when confronted with a situation that could lead to widespread power outages. However, they also found that the responses of dispatchers varied significantly, indicating that more rigorous training in this area would provide a standardized approach for emergencies involving compromised data.
This kind of training for power grid dispatchers is becoming more urgent from a regulatory aspect as well. The U.S. Congress passed the Energy Policy Act of 2005, which has new mandatory reliability standards for utility companies established and enforced by the Electric Reliability Organization (ERO). The North American Electric Reliability Council has been approved by the Federal Energy Regulatory Commission as the ERO.
Center Available for Situational Awareness Training
The initial simulator training took place at the nearby National Utility Training and Education Center adjacent to DOE’s Volpentest HAMMER Training and Education Center. In June 2006, PNNL opened the Electricity Infrastructure Operations Center (EIOC), a first-of-its-kind user facility and research platform for utilities and other industry partners. It provides a real-world setting for developing and testing new technologies, conducting valuable analysis and training.
The process manager allows the instructor to start and stop applications needed for running the training session.
PNNL’s EIOC replicates an electrical power grid control room environment with functional energy management system software donated by Areva T&D, live grid data from both western and eastern interconnections, high-performance computing and networks and wide area measurement system tools to facilitate dynamic grid analysis. Researchers also are integrating advanced visualization technologies that leverage PNNL’s work in national and homeland security with electric power grid operational systems to explore increased effectiveness of grid and market operations.
The EIOC’s multiple capabilities, including the simulator-training curriculum, are being developed to help utilities and other stakeholders in the electric power industry enhance the effectiveness of control center operations for increased grid reliability. It is part of a laboratory initiative that focuses not only on reliability, but also on the need to maximize the grid’s operational efficiency-both from a cost and environmental perspective-while serving increased demand for affordable electric power from a growing economy and population.
Jeff Dagle manages Pacific Northwest National Laboratory’s support to DOE’s Office of Electricity Delivery and Energy Reliability. He led the data requests and management task for the Electricity Working Group team investigating the August 2003 blackout, and he serves on the North American Electric Reliability Council’s Critical Infrastructure Protection Committee and associated working groups. Dagle is a Senior Member of the IEEE and a licensed Professional Engineer in the state of Washington. He can be reached at [email protected].
TMI: Another Case Study for Situational Awareness
The Three Mile Island nuclear power plant accident that occurred March 28, 1979, in Middletown, Pa., was caused by a combination of personnel error, design deficiencies, and component failures. This led to a partial meltdown of the reactor core, a partial evacuation of residents near the plant, and a nearly fatal blow to the U.S. nuclear power industry.
At several points in the events leading to the accident, the control room operators did not get information that components and systems were malfunctioning because of instrument signal failure. The operators were trained to look at symptoms, and were unprepared to look at the root cause of the problems, hence they were delayed in taking action to prevent and mitigate the event.
Source: Nuclear Regulatory Commission