The Importance of Sophisticated Data Analytics in Reliability Improvement

By Scott Sidney, PA Consulting

Regulators and utility customers continue to demand better reliability, and the demand is not unique to any specific customer group. Technical businesses, such as banks, or even home businesses that rely heavily on electronics to be successful, rely on dependable service. Even momentary interruptions can reset electronic equipment, the most annoying being desktop computers and CATV boxes.

Historically, reliability has been measured in terms of;

“- SAIFI (System Average Interruption Frequency Index), the average number of outages each utility customer experiences in a year,

“- SAIDI (System Average Interruption Duration Index), the average number of hours of interruption each utility customer experiences in a year,

“- CAIDI (Customer Average Interruption Duration Index), the average length of an interruption for those customers who had one,

“- MAIFI (Momentary Average Interruption Duration Index), the number of temporary outages on the electric system, generally any outage less than or equal to five minutes.

These metrics, defined by Institute for Electrical and Electronic Engineers (IEEE) have been the cornerstone for reliability measurement for years. While they are adequate for expressing overall general reliability performance they fall short when building reliability improvement programs.

The challenge is deciding which metrics make the most sense in designing a reliability improvement plan. These metrics range from focused data analytics to more sophisticated modeling techniques. The key is to be able to take data and convert it into action (Figure 1).



The first step can be as simple as understanding outage profiles. That means carving up and segregating outage management system (OMS) data into discrete components (Figure 2). Analyzing outages by cause, hour, day, month, location, duration, weather, crew shift, sustained/momentary, major event, etc., can point to specific issues where focused attention and corrective action can be applied. A multidimensional analysis looking at multiple data sources such as OMS, asset management and work management, is needed to fully characterize outages to understand what happened and why-weather, failure, error, load, etc. Then, with a richer picture of what’s happening on the distribution system, a set of actionable programs can be developed. This type of analysis requires the use of big data techniques.


The value of big data analysis is the ability to focus on outage prevention opportunities. It’s no surprise that car-pole hits close to bars after 2:00 a.m. on Sunday mornings cause outages. The challenge is understanding the full picture of what’s causing the outages, deriving meaningful action from these insights and deciding how to prevent them in the future (Figure 3).



Most utilities have a diagnostic process in place that focuses on outages by cause as a starting point. Attacking outage causes without a multi-focused approach, however, can result in ineffective remediation programs. Combining outage cause with worst performing feeder (WPF) analysis is one way to focus reliability improvement programs. Focusing just on WPF, however, might lead to ineffective remediation programs unless unpreventable outages are removed from the analysis. In addition, based on whether the goal is to reduce frequency (SAIFI) or duration (SAIDI) the analysis should be to focus on the appropriate balance between customer interruptions (CI) and customer minutes of interruption (CMI) instead of SAIFI and SAIDI. When combined with preventable outages, this can provide more insight into which programs can have the maximum benefit. Preventable outages, for example, generally include vegetation, equipment failure, lightning, underground cable and connection failure, overhead conductor and connection failure, and animal contacts. Car-pole hits, third party contacts and cable dig-ins are generally not preventable by conventional maintenance or asset replacement programs.

All the data to create effective outage mitigation programs comes from OMS. The main objective is to filter out preventable outages from others and focus on the improvement goal (frequency or duration reduction). One method of WPF analysis is to weigh CI and CMI based on the desired outcome and create a feeder score which can be used to rank feeder performance.

((feeder CI/system CI) * X) + ((feeder CMI/system CMI) * Y) = feeder score

This above calculation can be used to rank each feeder in terms of its contribution to system CI and CMI (where X and Y are the respective percentage weights for CI and CMI). Individual feeders can then be ranked in order of priority for reliability improvement programs depending on the overall reliability improvement goal.


The impact of crew response is another metric that can be used to understand outage response and its impact on duration. Once again, segmenting outage data from OMS combined with staffing shift profiles can help utilities understand how staffing levels can be adjusted to more closely match daily outage profiles. Figure 4 shows the average number of outages per hour by day and month (green line) compared to the number of trouble respond personnel required to adequately respond to those outages (blue) with the current level of staffing (red). What can be easily seen is that there is more than adequate coverage during the day shift, but inadequate coverage for the evening shift, and no coverage for the night shift. This type of analysis can help realign crew availability to match outage response requirements.



The previously-mentioned metrics are primarily focused on analytics, but process also plays an important role in reliability improvement. Understanding the dynamics of outage restoration can help reduce CI and CMI. Using an outage profile, based solely on OMS data extracts, can show the correlation between CI and the duration CMI.

Figure 5 shows the outage profile sequence in terms of restoration events that impact reliability metrics. This outage signature shows where additional customers were dropped three times for repair work; 46 customers were dropped at 402 minutes into the outage for 50 minutes, another 46 customers were dropped 661 minutes into the outage for 571 minutes, and another 75 customers were dropped 1,214 minutes into the outage for nine minutes. This accounted for almost one third of the total outage CMI. Dissecting individual outages in this manner can point to potential improvements in the decisions that surround outage restoration, including a better focus on circuit reconfiguration options.



One frequently overlook metric is TMED, the IEEE exclusion for daily outages. Most utilities tend to treat TMED as an outcome instead of a manageable variable in outage statistics calculations. Since TMED is a function of outage duration and the number of customers interrupted per outage, minimizing either can reduce both CI and CMI. One easy way to reduce CMI is to reduce the number of customers impacted per outage. This can be done, for example, through the addition of main line (or large tap) sectionalizing devices such as reclosers, smart switches or more complex fault location, isolation and service restoration (FLISR) schemes. Figure 6 illustrates an example of a five-year outage profile where the maximum number of customers interrupted has been limited to between 200 and 4,000. The graph shows that the maximum number of customers impacted per outage device should be around 500. It is important to note that at some point, adding more sectionalizing devices reduces TMED, which increases SAIDI and makes the rate of change in SAIDI twice that of TMED.



Armed with the right data analytics tools, utilities can use information to create optimal reliability performance improvement roadmaps that focus on specific activities that will reduce outage frequency and duration. The key is to dissect and manage outage data at the micro level to understand the difference between preventable and nonpreventable outages and then reassemble the data to provide actionable knowledge instead of just information.

Scott Sidney is a managing consultant with PA Consulting Group specializing in asset management risk and reliability assessment. He is a member of DistribuTECH’s conference advisory committee.

Previous articleDon’t Miss Your Chance to be Part of DistribuTECH: The Industry’s Must Attend Event
Next articleProactive Outage Response, Without Waiting for OMS

No posts to display