Protective Relaying Reliability in the New Century: Condition-based Maintenance Management

By Eric A. Udren, KEMA Inc.

Protective relaying systems stand guard over the power grid, watching for faults and disturbances, and making selective decisions in milliseconds to operate circuit breakers to isolate the problem. Relays have been described as silent sentinels. They do not demonstrate their ability to perform until a fault or other power system problem requires that they operate. Lacking faults, these protection systems may not operate for extended periods. NERC requires utilities to have testing or maintenance programs to assure the performance and availability of important protection systems.

Click here to enlarge image

Protection systems are central to power system reliability–a realization that keeps relay engineers, maintenance personnel, and managers up at night. They constantly worry about the functional health of the relay fleet. A failure to trip can cause a loss of stability leading to a major blackout, destruction of equipment, or hazard to utility personnel or the public. A false trip during critical loading can also lead to a blackout or customer outages. The same utility personnel also are concerned with staying on top of regulatory rules and audits, service metrics and performance improvement, working within budget limits, and making the most effective use of knowledgeable human resources.

Less Worry, Better Sleep

The good news is that the latest generations of protective relaying systems have built-in tools to minimize worries. Focusing on the relay maintenance program opportunities these newer relays offer, we can plan ways for relay engineers and managers to worry a bit less and sleep a bit better. The key is creating a comprehensive technical approach that ensures an effective maintenance program, makes the job easier and less expensive, and benefits from advances in protection technology.

A new relay maintenance strategy–condition-based maintenance (CBM)–seeks to eliminate periodic testing and calibration by gathering and monitoring the information available from modern microprocessor-based relays and other intelligent electronic devices (IEDs) that monitor protection system elements. These relays and IEDs generate monitoring information during normal operation, and the information can be assessed at a convenient location remote from the substation.

Time-based Maintenance

The familiar process of verifying protection systems according to a time schedule is called “time-based maintenance” (TBM). TBM requires relay technicians to periodically travel to the physical site of the relay system installation and perform a functional test on protection system components. The pros in the field use computer-based and programmed test sets to check relay operation, calibration and settings. The protection system is removed from service, so redundant or remote backup relaying protects the power system as it continues in operation. System operators discourage testing during heavy loading times, and outages for testing require advance scheduling.

Some protection system components are tough to test, for example, a lockout switch connected to a bus relay trips all the circuit breakers on the bus at once. It requires test isolating switches and a sequence of test steps to check every portion of the system without actually de-energizing the bus. Finally, there is always a risk that maintenance work can leave relays in a disabled state. Maintenance crews must be careful to properly restore complex systems to service. Protection systems left with trip test switches open can’t protect the apparatus, which may result in disruptive remote backup tripping that fragments the grid.

Clearly, utilities need to strike the right balance: test often enough to find likely problems, but not much more often than necessary. The Regional Reliability Organizations (RROs), of which utilities are members, have published optional guidelines for testing actions and time intervals. Individual utilities may adjust these recommendations according to their own experience with various categories of protective relays in their fleets. At this time, NERC and FERC are considering the need for maximum allowable intervals for time-based tests on important relaying systems.

Modern Relays and Maintenance Strategy

The likelihood of failure and the ease of verifying the operational state both depend on the technological generation of the relays as well as on how long they have been in service. The standard protection systems of previous decades were built from electromechanical relays, many of which remain in service today. These relays offer robust construction and long service life, but sometimes have calibration adjustments that can drift. More significantly, they can fail with no evidence of trouble until the relay misoperates for a fault. There is also a significant population of protection systems built from analog solid state (non-microprocessor) electronics with similar maintenance implications.

Protection and control systems have seen dramatic technological advances during the past 20 years, primarily due to the introduction of microprocessor and data communications technology. Modern microprocessor based relays have five important traits that impact a maintenance strategy:

  1. Self-monitoring capability–microprocessor-based relays can check their own operation, as well as substation circuits connected to the relay–for example, continuity of a circuit breaker trip coil and its switchyard wiring. Most relay users are aware of self monitoring, but are not focusing on exactly what parts are actually monitored. Every element critical to the protection system must be monitored or tested periodically.
  2. Ability to capture fault records showing how the protection system responded to a fault in its protection zone, or to a nearby fault for which this relay is required not to operate.
  3. Ability to meter currents and voltages, as well as circuit breaker status, continuously during normal operation. The relays also compute line megawatt and megavar flows that can be read on a front panel display, and the computations from the most recent relays are suitable for utility SCADA operations.
  4. Ability to communicate data, providing remote access to all results of protection system monitoring, recording, and measurement.
  5. Ability to trip or close circuit breakers and switches through the protection system outputs, via remote data communications or from relay panel buttons.

From TBM to CBM

Condition-based maintenance automatically monitors the performance of the protection system. Most utility engineers are already familiar with some forms of CBM. They have used monitoring systems for critical failure-prone elements, such as automatic checkback test systems for on-off power-line carrier sets in pilot transmission line relaying systems. It has recently become common practice to connect microprocessor relay failure alarms for remote annunciation.

If the protection system design includes more comprehensive condition monitoring, TBM requirements can be drastically reduced. The key is utilizing information that newer microprocessor relays can communicate. There are three major categories of information:

  1. Results from background self-monitoring, programmed in the microprocessors by the manufacturer, or by user relay logic settings. The results are presented by alarm contacts and by data communications messages.
  2. Metered values and input contact status, displayed on the front panel or via data communications.
  3. Event logs and oscillographic records captured during faults and disturbances. Large files of fault information can only be retrieved via data communications.

Results (2) and (3) must be analyzed by the user for evidence of the health of the protection system. Using these three categories of information, protective relaying engineers and maintenance managers can conduct an effective maintenance program largely from a central location remote from the substation.

Gather and Use Relay System Data

Using operational integration of relay communications brings major benefits, including condition-based monitoring for the maintenance program. This means that relay data communications ports are connected through a substation data concentrator or an RTU to the SCADA control center and to maintenance centers. There are several options for communications integration within the substation, for example, an RS-485 serial network carrying DNP3 protocol exchanges, or an Ethernet network with DNP3 packets, IEC 61850 services, or combinations of protocol packets. Measurements of voltages, currents, load flows, and status of equipment are passed through the data concentrator/RTU to SCADA every second or so. SCADA controls breakers by sending operating commands through the relays. Aside from the overall reduction in substation equipment, this operational integration allows the utility to be constantly aware of the operating state and measurement accuracy of the microprocessor relays, including their ability to trip breakers. Measurements from different relays are compared or checked against state estimator values to flag problems with the relays themselves, panel wiring, or current and voltage transformers. Combining these checks with monitoring alarms, most protection system performance problems are revealed without a substation visit.

Even without such operational integration, it is still possible to determine the state of the relays by communicating via serial ports–perhaps using modems and data switches–on specific occasions of checking.

Other messaging protocols may be needed for transferring the fault event logs and oscillographic records in the background to other individuals or locations within the utility that have the facilities and expertise to scan these records for evidence of maintenance issues needing attention.

Digital fault recorders (DFRs) have always captured important information on the operational state of protective relays. DFRs provide different coverage from relay data. DFR data analysis can demonstrate that certain parts of the protective system are functioning correctly, and can reset the TBM time clock for those specific parts only. In general, DFR data alone does not eliminate the need for manual testing at some point. DFR records are rarely able to demonstrate calibration, and only verify exactly what was proved to operate. However, DFR data can be helpful and even enables limited CBM for electromechanical or analog solid state relays.

Closing the Gaps

To reach the ideal of relay maintenance via substation visits only for repairs and updates, a relay engineer needs to map out the protection system and become aware of how every single critical component and connection is being monitored or verified. The relays monitor most of their internal parts that are important for protection, and many of the external circuits. Operational integration, or communications-based checking, covers much of the rest. Pay special attention to relaying communications systems that are important to protection, such as pilot relaying channels for transmission line protection, or Ethernet networks carrying IEC 61850 high-speed peer-to-peer protection messages. These channels can self-monitor and raise alarms, or may be monitored by the relays.

There are specific points in the system that are not monitored, and must be tested. For example, the protection microprocessor in a relay energizes an electronic circuit that picks up a small “ice cube” relay and closes the actual breaker trip contacts. This trip output action can be verified only if the relay in question has been observed to have actually tripped its circuit breaker for a recent fault. Alternatively, SCADA can send a breaker trip command by communications that operates the same output circuit. Either of these negates a trip to the substation for testing.

The Industry Adapts

Relay users may not know if there are any unmonitored critical components within the relay. To support CBM, relay manufacturers need to publish precise maps of self-monitoring coverage. Using such maps, relay engineers can document a maintenance plan showing exactly how critical elements are known to be working via a combination of monitoring, remote verification and fault report analysis.

Inevitably, there will be TBM components in CBM, such as remote breaker trip verification. Because of these gaps, the user needs to set up a database that tracks each protection system element that is not reporting failures through self monitoring. Software vendors are developing and offering tools for tracking performance of protective relaying fleets. Other vendors are working on tools for convenient analysis of relay and DFR data to glean performance data as easily as possible from these records.

CBM is much more than an easy alternative to TBM that reduces field visits. It offers the critical advantage of virtually continuous monitoring. CBM will report many hardware failure problems for repair within seconds or minutes of when they happen. This vastly reduces the percentage of problems that are discovered through incorrect relaying performance. By contrast, a hardware failure discovered by TBM may have been there for much of the time between tests, and there is a good chance that some relays will show health problems by incorrect relaying before being caught in the next test round. The frequent or continuous nature of CBM makes the effective verification interval far shorter than any TBM interval.

Evolution of a Robust CBM Strategy

CBM is an industry trend that suppliers and users will absorb and adapt over time. TBM or CBM programs are both acceptable if technically complete. Practical programs employ a combination.

As we move ahead with relay and communications technologies, a technically robust CBM strategy will ensure power system reliability. The new relay maintenance strategy should include the following:

  1. Map out the completeness of monitoring and other verification using relay vendor data.
  2. Use metered values in SCADA/EMS or gather values to cross-check with other sources.
  3. Use relays for SCADA control of breakers and switches.
  4. Analyze relay and DFR data for responses or trips for faults, with a precise eye on what is clearly demonstrated in the record.
  5. Gather alarms for self-monitoring of relays and communications channels. Ensure that the alarm gathering scheme is itself monitored for failures.
  6. Keep records and databases that show how these actions verify the operational state and calibration of protection systems, for regulatory audits of the maintenance program.
  7. Perform TBM for unmonitored parts until product and system features adapt or older devices are replaced.

With this program, failures are reported at once for quick repair, while relay technicians focus on solving problems and upgrading systems. As the supply of relay experts shrinks, there is huge value in reducing their periodic testing workload.

Eric A. Udren is a senior principal consultant with KEMA Inc. He has developed the technical strategy for some of the most progressive utility LAN-based substation protection and control development programs using IEC 61850 and other data communications. He serves on the NERC System Protection and Control Task Force (SPCTF), now investigating new protection system standards including maintenance standards. Eric can be reached at

DistribuTECH Sneak Peek

The author, Eric Udren, will present a full-day course on “Substation Protection, Control & Communications in the New Century” during Utility University at DistribuTECH 2008 in Tampa, Fla. The DistribuTECH Conference and Exhibition takes place Jan. 22-24 and pre-conference Utility University courses, including Udren’s, take place on Jan. 21. For more information, visit and click on “Utility University Course Details” in the left-hand navigation bar.

Previous articlePOWERGRID_INTERNATIONAL Volume 12 Issue 10
Next articleUtilities must avoid lost marketing opportunities that create ‘doubtcomes’ with customers

No posts to display