Optimizing Reliability in Utility Networks

By Don Mak and Mark Welch, IBM Business Consulting Services

While reliability is perhaps one of the most watched performance indicators in the utilities industry, improvement efforts are often based on best practices or trial and error. As a result, mountains of data are available to monitor utility network reliability. Analytical tools are needed to refine all this data into information that can be used to optimize restoration efforts and simulate asset management strategies designed to improve service reliability. These new tools will be used to analyze and assess trade-offs between cost and reliability, providing utility network operators or owners with the means to simulate restoration and asset management strategies and continuously optimize tactics.

Over time, “best practices” in the industry have evolved, taking advantage of new technologies and knowledge to deliver targeted service reliability. This evolution will continue, and progressive utilities will lead the industry in pioneering new tools and techniques to continually enhance reliability.

The Past

In the past, service reliability was supported by silo applications, such as work management and asset management systems. These systems optimized tasks/activities based on the data within their systems with manual interfaces to coordinate activities outside of the specific area. For example, work management systems would produce work schedules for each field service crew with some automated interface from the customer information system. However, these schedules would be manually faxed or otherwise distributed to the crews. The results of the work would be recorded on multipart work orders by the crews and manually entered back into the work management system. This disjointed, manual process prevented the ability to “see across” all activities to understand how work could be optimized on a given day across the entire field service organization.


Recently, tools to support network reliability focused on the mobile workforce. Tools were designed to improve the effectiveness of field work by eliminating certain manual activities. Risks were assessed primarily based on known asset attributes (e.g., age and cost). Sophistication grew in regard to maintenance planning, establishing reliability and condition-based maintenance and replacement. However, the basis for decision making-related to sustaining or improving reliability-was still intuitive and based on limited scenarios. The relation between action and result was determined through subjective processes.


The industry is now leveraging more knowledge and sophisticated technologies to bring about more seamless-and integrated-decisions and actions to support reliability. These integrated systems can improve work delivery, but, interestingly, some of these systems are disabled during unplanned outages in lieu of “storm teams.” The reason: The systems are not responsive enough, so utilities resort to manual decision making to speed service restoration.

In addition to the integrated work management processes, asset management processes have also evolved to optimize return on assets. These asset management processes and tools are driven by increasingly sophisticated risk- and condition-based scenario analysis. A key tenet of these analyses is that optimized return on assets will also yield acceptable overall network reliability performance. These asset management processes can be conducted more frequently than in the past due to improved technology processing speeds, which produces large volumes of new data.

The Future

As the mountains of data grow, so do the possibilities. Complex analytical capabilities are needed to refine the available data into information that can be used to optimize restoration efforts-in real time and continuously. This same data can be used to simulate reliability-based asset management strategies-on a continual basis-that balance reliability service levels with return on assets, taking more of the guesswork out of decision making.

With the “brain” in place to conduct complex analyses continuously using more current data, additional technologies can be leveraged to automate network operations to an even greater degree. The possibilities include:

  • Remote asset monitoring and control, sensor technology;
  • Large volumes of asset operational information;
  • Consistent use of asset information for risk management;
  • Pre-emptive action in advance of faults; and
  • Dynamic asset reconfiguration.

There are two major aspects to consider in optimizing reliability: restoration and asset management. Both are key and often are considered independently.

Restoration Management

Outage management begins well before the outage (see Figure 1, previous page). Outage planning and staging are needed to help ensure rapid deployment. Once service is lost, early, accurate and continuous assessment is key. Many utilities have assigned a statistically representative portion of assets within each region to allow for a quick assessment and interpolation of damage. This initiates predefined plans for resource and equipment deployment. Mobile dispatch systems notify crews and truck rolls. Advanced systems do not stop there; as new information emerges and restoration tasks are completed, crews are re-dispatched and resource deployment is optimized. Consideration of number of customers vs. duration of outage is traded off. Advanced utilities are considering tools to support continuous assessment during an outage to balance costs with restoration speed.

Click here to enlarge image

As work is completed, the work orders are closed. Once restoration is complete, the process starts again. Through post-mortem analysis, outage performance is critiqued and outage plans are refined as needed. Forecast and assessment tools are also revisited and adjusted.

Asset Management

There are two primary strategies for managing the avoidance of interruptions. The first is a design strategy. By sectionalizing the network (e.g., using more fuses, reclosers and sectionalizers), the number of customers affected by a given asset can be contained. This often entails expensive modifications that must be spread over an extensive period (e.g., three to five years).

The second strategy is maintenance-related. Sophisticated operators employ risk assessment mechanisms to identify and target maintenance programs on high-risk assets. This strategy does not require large investments but, instead, necessitates the collection and analysis of significant amounts of data related to asset condition and reliability. Again, a long lead time is required.

These strategies are often applied in a suboptimal manner; that is, the operator does not have the necessary information to prioritize maintenance and design strategies. Often priorities are based on industry “best practices.” Although a valid source of ideas, specifics associated with individual network configuration, condition and demands prevent equal transfer of concepts.

Optimizing Reliability and Investments

Analytical models can be developed to simulate and test strategies (see Figure 2, page 43). By establishing the mathematical relationship between individual reliability and restoration strategies and their potential impact on specific networks, operators can better understand and prioritize investments. To take it a step further, these algorithms can be related to assess trade-offs between restoration and reliability investments.

Click here to enlarge image


The Approach to Optimization

System analysis and development of tools to optimize reliability is a significant undertaking. First of all, there are many aspects of each network that make it unique and limit the transferability of analytical models. Network configuration, asset condition, organizational structure and data availability all factor into program development. A logical set of comprehensive steps is required.

The starting place is where you are. What reliability improvement initiatives are currently under way? How can they be enhanced? The objective of this activity is to understand the current improvement initiatives, planned and under way, and to identify possible enhancements by leveraging supporting tools (e.g., weather forecasting, work management, dispatch and communications technologies), analytical models, industry best practices, business case analyses, and expertise internal and external to the organization.

A root cause analysis of recent interruptions will provide insights regarding cause and effect. Best-practice studies will provide ideas as to how others have addressed these causes. Contrasting this insight with improvement initiatives under way will help identify potential enhancements to the initiative. The goal is to improve the impact of ongoing initiatives and develop more granular knowledge regarding the relationship between causes and improvement initiatives.

Developing the Reliability Analysis System

There are five elements in the development of analytical tools to optimize asset management and restoration management strategies:

  • Develop and optimize requirements: Begin by interviewing key operations staff and other experts to gather information relevant to the specific areas of optimized scheduling and dispatch and asset management. The tasks will be: (1) gather business knowledge, (2) determine data availability and data format, (3) establish an appropriate linear objective function to be used in the optimization, and (4) define hard and soft constraints that need to be enforced on each aspect. There may be other issues that need to be discussed as a result of interviews and meetings with the experts. These issues will be addressed in the design detailing the high-level requirements for the overall route optimization solution.
  • Analyze data related to improved reliability: Review current and historical data that could be used to build analytical models. Utilize expertise in: data mining and modeling; an understanding of the dynamics of risk, reliability and restoration; and the availability of data to develop some potential applications of analytics to improve the utility’s ability to predict reliability.
  • Develop a restoration model: Define optimized scheduling and dispatch problem in mathematical terms. It is assumed that the objective function and the constraints are linear. Model the problem as a mixed-integer program or as a linear program. The decision will be made after conducting preliminary analysis on available data. Conduct a similar analysis for probability of failure and impact of failure. Develop draft mathematical model formulation for optimized scheduling and dispatch. This mathematical model will have the capability to serve as a basis for a prototype.
  • Develop an asset management model: Apply asset management concepts (e.g., reliability-centered maintenance and commercial-based maintenance, risk/reward capital allocation, etc.) in developing optimized investment and maintenance in mathematical terms. The approach is similar to the restoration model described above. The new dimension will be identifying interrelationships between the asset management model and the restoration model.
  • Analyze the potential for integration of other tools into the Reliability Analysis System: The objective of this activity is to analyze the environment to determine the best way to customize the products and usage; to evaluate dependency and integration issues for mathematical optimization component(s) for the decision support system; and to fine-tune modeling.

Identify and Prioritize Innovation Solutions

With the insights gained from the optimization tool, the next step is to identify and prioritize potential solutions to reduce outages and outage durations (See Figure 3).

  • Apply insights from reliability analysis system development to enhanced initiatives;
  • Rank targeted solution areas to address reliability performance according to expected “reliability return on investment”;
  • Determine which combination of initiatives will potentially achieve reliability goals;
  • Conduct risk-assessment working session to determine the financial, operational and technical risks of each key solution area;
  • Prioritize and outline solution areas and options; and
  • Develop an initial draft road map.
Click here to enlarge image



Real-time analysis capabilities will provide a new paradigm for network operations. New tools based on mathematical models will focus decisions on restoration and reliability instead of on organizational silos and budgets. Coupled with new business processes, these new tools will allow utilities to simulate investments to optimize the balance between reliability and return on investment. Operators will be able to “fly by wire” in emergency and restoration activities. And finally, the mountains of data will begin to move.

Don Mak is a partner in the energy and utilities industry practice of IBM Business Consulting Services and is the service-oriented architecture (SOA) lead for the utility industry. He has worked with clients to support their transformation efforts using an SOA approach. He currently leads client SOA engagements to support enterprise resource planning, advanced metering, generation plant work management and distribution operations solutions.

Mark Welch is a senior consultant in the Energy and Utilities Strategy and Change practice of IBM Business Consulting Services. He has more than 25 years’ experience in the utilities and energy industry and has contributed to the success of major corporations in the areas of line management, corporate strategy and organization re-creation.

Editor’s note: This article was adapted from one originally published by Montgomery Research Inc. annual utilities and energy project sponsored by IBM.

Previous articlePOWERGRID_INTERNATIONAL Volume 11 Issue 10
Next articleCan billing comparisons boost conservation?

No posts to display