CL&P discovered the critical elements of effective outage response.
by Keith Michaelson, Rod Kalbfleisch and Bill Burley
Connecticut Light and Power (CL&P) was able to achieve significant improvement in outage response, as measured by CAIDI (Customer Average Interruption Duration Index), by shifting from solving technical problems to coordinating responses to outage events more effectively. The insight that improvement had to be “event-driven” rather than “cause-driven” set the utility on a path to discover the critical elements of effective outage response.
The CAIDI improvement challenge
SAIDI, which measures the yearly number of minutes the average customer is out of power, is used by electric utilities to measure system reliability. Included in the formula for SAIDI are a measure of outage frequency (SAIFI) and a measure of outage duration (CAIDI). Overall, SAIDI is affected by equipment, environmental conditions and outage recovery performance.
Like other utilities, CL&P has improved reliability with sectionalized circuits and automatic switching. Instead of a feeder breaker outage affecting 3,000 to 4,000 people when the breaker goes out, automatic back-feeding immediately takes place and a much smaller number of customers are left without power. While this improved SAIDI, an unexpected outcome was that CAIDI actually increased, because the CAIDI measure improves when many customers go off line for a short period of time. The bigger events had provided a damper on the effects of smaller, longer outages. The reduction in the impact of the mass outage—a very good thing—meant that CAIDI performance became more vulnerable to how well the company responded to the remaining outages, highlighting an important customer service issue.
Finding a focus for improvement
In June 2007, a cross-functional team consisting of engineers, field supervisors and outage coordinators took a new look at the CAIDI challenge. To add a sense of urgency, the team was asked to deliver a measurable improvement within 100 days. The team make-up was ideal because each person brought a different perspective on the outage experience and a different view of how to make progress. The one thing they had in common, however, was the assumption that there were specific, identifiable causes for the high CAIDI numbers and each person passionately believed that solutions to the causes would bring the number down.
The team began by listing possible causes and discussing which ones to work on first. The list was long and comprehensive: switching isn’t being done quickly enough; specific equipment failures are causing longer restoration times; tree crews aren’t used effectively; cable and other critical material isn’t available when needed; there aren’t enough supervisors to work with the crews; crews aren’t dispatched efficiently enough; better coordination is needed with the phone company around repair to utility poles; incorrect choices are made about whether to repair equipment first or restore service and then repair. The team couldn’t tackle all of these issues at once and consensus couldn’t be reached on which were most important. Each member of the team had to move beyond gut feelings to see if some data analysis could help identify the greatest contributors to CAIDI.
Fortunately, good data was available, and one by one the team isolated and removed from the calculation all outages related to a specific cause. In each case, total CAIDI changed only slightly. Then the team checked to see if CAIDI performance was regionally based, looking at each of the four operational divisions in isolation. Once again, the overall CAIDI number was essentially the same. What was going on? What did it mean that the data wasn’t providing any guidance?
Then came the brainstorm. Rather than look at presumed causes, the team took days as the unit of analysis. Review of the contribution to overall CAIDI from each day over the previous three years led to an extraordinary discovery: 345 days every year, the daily CAIDI number was excellent. But each year there were about 20 days that, taken together, drove CAIDI up by 20 percent. This was the clue the team was looking for.
Setting the goal
Once the team understood that its systems and processes worked 95 percent of the time, the challenge changed from how to fix specific problems to “how to change our systems and processes so that we can always be in control.” Looking more closely at the CAIDI Top 20, the team found that on each of these days, the state experienced a mid-sized storm, with outages at 60 to 75 locations. From their own experiences, team members knew what it was like at the Systems Operation Center (SOC) during those days: the radios get noisy, alarms are going off, decision-making becomes more difficult and dispatchers may be sending crews to the nearest outage location, not necessarily the one most important for CAIDI performance. Meanwhile, in the midst of the action, some crews are “standing-by” waiting to be dispatched.
Based on this analysis, the team decided to minimize the impact of any one day on the overall CAIDI number. Because CAIDI is a cumulative measure, early in the year it can go through large fluctuations, but by mid-year any single day has minimal impact on the total. In order to evaluate each day’s performance on an equal basis, the team developed a baseline database of CAIDI performance and set a goal: During the month of September 2007, no single day will contribute more than one additional minute to the total CAIDI number.
Managing the whole picture
With the goal set, the team was able to rethink where improvement was needed and get to work on the key aspects of outage coordination.
Early recognition of the escalating events. The team developed a web page, fed by data already available in the outage management system, to provide a global view and real-time tracking of the current day’s outage events. Initially used to track progress against the team’s goal, the web page soon became a powerful tool for tracking outage response. Updated every 15 minutes, the page displays details of each event—how many customers are affected, event duration, whether a crew has been dispatched, and projected restoration time. At the top of the page, the day’s total contribution to CAIDI is continuously updated. With this snapshot available, the SOC coordinators had a clear view of the emerging picture of outage activity as a basis for deciding how to respond.
Effective communications. The outage management process already included protocols for communicating up the chain of command as conditions escalated and resource decisions needed to be made, but often the calls were not made. “We’ve got it,” supervisors would explain, driven by a sense of pride and personal responsibility, as the customer minutes mounted up and irretrievable damage was done to the CAIDI number. The team decided that it would get consistent communications only by building the call for help into the process. The team instituted a set of automated alerts through the paging system that triggered conference calls with the appropriate level of management as the size of outages increased and events became more widespread.
Better up-front decision-making. Early decision-making about whether to mobilize resources is critical to quick resolution of outages. With the data in front of them and the communications process in place, outage coordinators were in a better position to make the tough decisions: Should we bring in additional crews and supervisors? Should we have tree crews on stand-by? Should we decentralize dispatching to the local area work centers? These are all tough decisions with resource and financial implications. Once a crew works an extended workday, it isn’t available for work the following day and schedules are disrupted. When making decisions in isolation, supervisors tended to hesitate, not wanting to do the wrong thing and hoping the situation would stay under control. With the right people on the early conference call, better decisions could be made.
Better coordination of field work. Outage minutes grow when a crew is dispatched to a site but, when it arrives, finds that it can’t do the work. To improve reconnaissance and problem diagnosis ahead of the arrival of a crew, field supervisors were given wireless cards for their computers. This allows them to see what the dispatcher is seeing and get ahead of events. With work underway at one site, the supervisor can move to the next site and do an early assessment. Is there a minor issue that can be quickly eliminated? Do we need a pole, a transformer and a three-man crew? Do we need to send a tree crew first? In a mid-sized storm, it’s essential to work events in parallel and get the right materials and people on site so the crews can be fully productive.
Using CAIDI as a leading indicator.
The CAIDI measure had been a lagging indicator, used to discuss monthly performance, disconnected from immediate outage experience. The work of the CAIDI team has turned the measure into a leading indicator that can be reviewed during a daily conference call. This has accelerated learning about how outages are being managed. The result? The cumulative 2007 CAIDI number was driven down in the last quarter of the year, an achievement that previously had seemed impossible.
Keith Michaelson is a partner with the consulting firm Robert H. Schaffer & Associates in Stamford, Conn. He can be reached at firstname.lastname@example.org. Rod Kalbfleisch is the director of operation support at Connecticut Light & Power in Berlin, Conn. He can be reached at email@example.com. Bill Burley is a principal engineer with Northeast Utilities Service Company. Contact him at firstname.lastname@example.org.