Database Integration: Criteria and Techniques
By Tom Lonski and Tom Helmer, Convergent Group
As the previous article on page 16 suggests, to gain a competitive edge, utilities must fully realize the benefits of their various information systems by making critical information available throughout the company. In practical terms, “sharing information” between systems means integrating databases. Some computer systems at a utility that may be integrated with other systems include GIS, customer information system, materials management system, planned maintenance system, computer-aided dispatch, one call, human resources information system, outage management system, work management system, property accounting, SCADA, engineering analysis/system planning, accounts payable, non-utility billing, Internet or Intranet web server, and land management systems.
A first step to database integration problem resolution is to survey the available integration products. Once all of the possible solutions have been identified, they can then be compared and the best solution chosen. However, this bottom-up approach does not usually work well. Typically numerous products are available to integrate databases and this approach can turn into a prolonged research. Another drawback to this approach is that integration products frequently use different database standards and different network protocols. Most computer science advocates prefer a top-down approach.
In the course of wrestling with integration problems on a number of projects, Convergent Group has defined 10 general integration criteria to help focus on the functionality required in the database integration and six general integration techniques that can be used to classify the possible integration solutions. Using these criteria and techniques provides a top-down approach to the integration problems. By focusing on the integration functionality, planning and designing a solution is simplified.
The integration criteria that the Convergent Group has identified are data currency, performance, cost, encapsulation, data synthesis, data access, administration, side effects, data integrity and scalability.
The data currency criterion focuses on how “fresh” the data from a source database is in a target database. Depending on the integration technique, there can be time delays in making the data in one database (the source) available to users and applications of another database (the target).
The performance criterion focuses on the integration technique`s execution speed.
The cost criterion includes not only the cost of the integration products but also the cost of implementation.
The encapsulation criterion considers how well the integration solution hides the physical location of the data from the users and the applications.
The data synthesis criterion examines how users work with the data from the multiple databases.
The data access criterion focuses on what kind of access the integration solution implements for the users and applications. Data access includes create, read, update and delete access, as well as whether the access is bi-directional.
The administration criterion includes the effect the integration solution has on database administration.
The “side-effects” criterion is used to classify the integration techniques` impact on other applications and other parts of the utility`s computer environment.
The data integrity criterion examines how well the integration technique manages transactions that apply to multiple databases.
The scalability criterion evaluates how well an integration technique performs as the number of databases being integrated increases.
Most of Convergent Group`s integration experience has been with integrating various systems into utilities` GIS projects. To provide a more detailed explanation of the six integration techniques identified earlier, the GIS will be the primary system in the explanations, with other systems being integrated into it.
Integration by Batch Copy
The first integration technique is shown in Figure 1. With this approach, data is extracted from the external system and loaded into the GIS. This approach can vary in the sophistication of its implementation. In less sophisticated implementations, the data is manually extracted, transferred by tape or other media and manually loaded. In more sophisticated implementations, this is the basis for data warehousing and the data is automatically extracted, transferred and loaded on a regular and frequent basis.
This technique has several advantages. The first is cost. This technique tends to be the least expensive form of integration that can be implemented. Often special integration products are not required, and the database management system (DBMS) tools for exporting and importing data can be used. In addition, it requires only minimal database changes.
Another advantage is that, since the data is copied to the relational database management system (RDBMS) component of the GIS system, many of the criteria do not apply or are satisfied by this technique. The criteria that are satisfied by this technique are performance, encapsulation, data synthesis, administration and side effects.
However, there are several disadvantages. Typically, utilities use the simpler forms of this technique. As a result, one of the disadvantages is that data currency becomes a key issue. Even when the batch copy is fully automated, it is usually done on an overnight or longer basis. The more that critical or expensive business decisions are based on stale data, the more critical this issue becomes.
The other disadvantages have to do with data access, data integrity and scalability. Only the GIS system has access to the shared data. Other systems do not have access to the GIS or each other`s data. Since the GIS works with a snapshot copy, data access for users and applications must be read-only to guarantee data integrity.
Application Level Integration
The second technique, shown in Figure 2, performs the integration at the application level. When the applications have the capability to directly communicate with multiple databases, they can be modified to retrieve information directly from the other systems.
This technique rates well against the data currency, administration and side-effects criteria. Since the applications access the data directly in their “native” databases, the data is always current. The integrated applications should behave like any other applications so, typically, there are no changes in administration. Since this technique requires little or no changes to the database, the side effects are minimal.
The disadvantages to this technique include the encapsulation, data synthesis, data integrity and scalability criteria. This technique has very poor encapsulation, and every application will have embedded dependencies on the location of data. Since the integration is at the application level, any data synthesis has to be implemented within the application. Similarly, if the integrated application is updating multiple databases, the application has the added complexity of managing transactions that apply across databases. Finally, this approach does not scale well. It works well for a few databases but quickly becomes cumbersome and expensive as the number of applications and databases increases.
This technique`s costs are backloaded unlike the other techniques where the costs are frontloaded. The initial implementation costs are low because many GIS products and database client application tools such as PowerBuilder, Delphi and SQL Windows can simultaneously communicate with multiple databases. While the implementation costs are low, the costs for this technique increase over time because the problems in encapsulation, data synthesis and data integrity all add complexity to the applications, significantly increasing their development and maintenance costs. Data access and performance is rated as average for this technique.
Application Level Integration with Middleware
The third technique also integrates the databases at the application level. It differs from the previous approach by adding the use of middleware. Depending on the DBMS used by the external systems, some applications may not be able to communicate directly with an external system. Middleware provides a mechanism for applications to communicate with a database that is not inherently supported by the application.
This technique behaves very similarly to the previous one since it only differs in the use of middleware. It is listed separately in order to focus on the effect middleware can have on performance, cost, encapsulation and data synthesis. Since applications are now one more layer removed from directly working with the data, there is some performance penalty. With some middleware products, this performance penalty can be quite significant.
Obviously there will be an increased cost with the purchase of middleware products. The increased implementation costs may be offset by reduced long-term costs since, depending on the middleware product, there can be an effect on encapsulation and data synthesis as well. Some of the middleware products on the market are quite sophisticated and can actually provide a single interface to multiple databases (Figure 3). Further, some middleware products can perform some data synthesis. While middleware will not eliminate the encapsulation and data synthesis problems inherent in integration at the application level, by shifting some of the encapsulation and data synthesis problems from the applications to the middleware, it can reduce some of the software development and maintenance costs.
Database Level Integration
The next integration technique (Figure 4) differs significantly from the previous two by implementing the interface at the database level instead of the application level. With this approach, the GIS system presents the external data to the GIS applications as if the data were stored locally in the GIS. Depending on the databases in the different systems, they may be able to communicate directly, or middleware may be required for the databases to communicate.
This technique offers many advantages in data currency, encapsulation, side effects, data synthesis, data integrity and scalability. Since the data is accessed directly from the external systems, it is never stale. Encapsulation is good since data location is transparent to the applications and the applications access the external data as if it were stored in the GIS database.
Typically there are few side effects on existing applications. Implementing this interface requires database changes, but these usually do not affect the presentation of data to the legacy applications. This technique rates well on the data synthesis and data integrity criteria because, by using the database to perform the integration, this technique leverages the DBMS`s built-in data synthesis and transaction management capabilities. Scalability for this technique is much better than that of the previous techniques since the number of applications does not affect the number of interfaces.
When evaluated by the performance and data access criteria, this technique also usually rates well. However, these criteria are highly dependent on the database and middleware products involved. Data access is the capability that varies most by vendor. Depending on the vendor, there can be limitations in create, read, update and delete access, and many times access is not bi-directional.
The two disadvantages to this technique are cost and administration. Middleware products, often required to implement integration at the database level, are much more expensive than the middleware products used with applications. Once the middleware products have been purchased, they typically require a significant amount of database administration effort to implement.
Middleware Infrastructure Integration
The fifth integration technique is shown in Figure 5. With this approach, a middleware toolkit is purchased from a vendor and used to construct a single interface, allowing all applications to access all of the databases. This approach can vary tremendously in its complexity. In the simpler implementations, the middleware infrastructure provides only a single interface. In the more complex implementations, the middleware infrastructure implements a logical database where the middleware infrastructure has intelligence and data processing capabilities similar to those of the databases.
By creating a single interface for all of the applications and by being built on top of multiple databases, this technique provides a good solution to the data currency, data access, encapsulation, data synthesis, data integrity and scalability criteria. This technique`s evaluation on the performance criteria is not quite as good. The middleware infrastructure will cause some performance delays when compared to direct access to the databases.
The drawbacks to this technique are its side effects, administration and cost. This approach has very significant side effects since all existing applications will need to be changed to use the middleware infrastructure. Implementing the middleware infrastructure typically requires significant changes to the databases and the middleware infrastructure, requiring significant administration work. Both of these drawbacks add to the cost of this technique. In addition, the middleware toolkit products used for creating a middleware infrastructure are significantly more expensive than the middleware used by applications. Further, the implementation effort required to build the middleware infrastructure adds to the cost. As a result, this technique is one of the most expensive ones.
To provide a complete set of integration techniques, the sixth technique eliminates the integration problem by migrating the various computer systems that need to be integrated to a common database. This can be a massive undertaking depending on the number of databases and applications involved. As a result, the technique is only considered when there are other reasons to use migration. Some utilities may commit to a large-scale information technology (IT) modernization program, or the utility may have already decided to standardize on a single database as part of that utility`s IT strategy.
These criteria and techniques provide food for thought when trying to solve database integration problems. Unfortunately, no one technique or product can be used for every integration problem. Even if an integration product works at one site, it may not work at another. Even on a single project, a mix of techniques is often needed to integrate databases. While these criteria and techniques can provide the tools to plan and design a solution, it`s only the tip of the iceberg.
Thomas Lonski is a senior technical consultant with Convergent Group. He is responsible for a variety of technical AM/FM/GIS consulting assignments with emphasis on computer systems, database design and applications development. His specific expertise includes computer/network hardware, system administration, several CASE tools, data modeling, artificial intelligence and expert systems and human factors engineering.
Thomas Helmer is a senior technical consultant with Convergent Group and is responsible for a variety of systems engineering and technical AM/FM/GIS consulting assignments with an emphasis on systems architecture and integration, database design and applications development. He has 15 years experience developing large relational databases and table-driven software applications centered on RDBMS