The Active Real-time Database: A New Approach to the Design of Complex Distributed Control Systems
By Dave Stoneham and Dr. Nigel Day
Any “real-world” system must take input data of some kind, process it and respond to it. At some point in the application this data needs to be stored. The major difference in real-time systems is that the data storage lifetime can be very short. Often it is felt to be easier to just hold it in local or global variables. This approach was acceptable until the applications became too complex.
Designs based on distributed client-server configurations are being used more often when dealing with complex real-time embedded applications development. Typical examples might be an electric utility`s distribution grid monitoring and control system or an automatic meter reading system.
In many cases, servers require information or services from other servers either at a peer or lower level, leading to multiple-tier client-server architectures. Unless databases of some form are adopted as critical system components, data within a system can become so scattered that several major problems surface:
It is difficult to divide system development across a team of engineers if those engineers need to share the stored data.
It is more difficult to build systems with predictable performance and behavior, as individual developers may often handle the data in ways that may be incompatible with another developer`s use of the same data. This is particularly problematic when several items of data must be updated to represent one action especially if the action fails when only partially complete.
It becomes very difficult to maintain the system, as engineers must determine where any data is stored. If the data is structured, then it is difficult to modify with the assurance that other areas of the system are not being affected.
However, one of the major problems hindering the acceptability of such database techniques in real-time applications is the perception–often correct–that commercial databases are developed for a host environment with large amounts of memory, never-ending disk space and infinite processing resource. The complete opposite to the world real-time system designers and developers generally live in.
Indeed, most commercial systems were originally batch oriented with no requirement to give instant response to any input. They collected all input data, processed it and output it in some other form usable by either humans or the next batch run. While such systems are now moving more towards providing instant and interactive response to users, they still show clear signs of their batch processing systems` origin.
The main problem with finding a suitable database is that those most commonly available are built for the commercial, high volume data processing applications and are not suitable for real-time work. Requirements of a real-time embeddable database that distinguish it from those commonly available for commercial systems must be specified (Table 1).
Some of the requirements that must be considered for use in real-time applications are:
1. Data Model. With all the various databases in existence, the relational database is still by far the most popular. Object oriented databases have not been readily adopted, as predicted a few years ago, because they are generally more difficult to access and pure object oriented development strategies have failed to deliver the promised savings. It is clear that some mix of the two worlds is the most advantageous approach. There is, however, no clear definition of such a data model. The term object-relational has often been used, but does not define the exact capabilities.
2. Speed. It is often a requirement for embedded systems` operations to be performed in a few milliseconds, even microseconds. Speed is everything. Any database designed for real-time systems must be able to achieve hundreds or even thousands of simple updates per second to be useful. To achieve the desired performance, it is almost certain that the whole database must reside in memory. It is very important for the database to be designed with this in mind. A commercial database running from cache memory is not the same, as it will still be optimized for clustering data on disk to reduce access time. If the database is designed for memory residency, many optimizations can be performed when all the data and index information is available in a consistent manner. Unlike disk-based databases, there are no penalties for positioning. In addition, references (foreign keys, etc.) can be implemented as direct memory pointers and bypass indexes. Any database running in memory must also have the ability to store enough information on disk, or some other permanent storage, so that it can recover data that must persist over a system shutdown or failure. This should be some form of snapshot and journal process.
3. Active Queries. Most commercial databases are still passive by nature, in that the only way to note if the data changes from a client point of view is to query (poll) it. This one feature alone makes most databases unsuited for use in real-time systems. The problem is simple: The faster the response required, the more frequently the data must be polled. If no data is changing, then the system is busy processing without achieving anything. What is really needed is a system where the database itself notifies client processes immediately of any changes. Even better is to give the client the data that has changed and thus avoid a subsequent query from it to establish what has changed. This would require an active, or persistent, SQL query that once instantiated, supplies the client with the initial result set and further delta information as, and when, the data changes. These deltas need only include sufficient information to maintain the dataset for the client and would comprise record/object attributes and notification of new or deleted records/objects. Lastly, any internal operations that the database can perform such as incremental housekeeping or application integrity tasks should also be triggered instantaneously by data change rather than relying on periodic data polling.
4. Accessibility. Any database has to make the data available in an easy to use manner. It is best to adopt standards where possible, thus reducing learning curves. The most widely used query language today is SQL and is likely to remain so for many years to come. To allow general access to data, the database should provide various online access methods, such as ODBC, over well supported protocols like UDP/IP, TCP/IP and HTTP. Application program interfaces must be provided to allow application programmers to connect to and interact with the database, using general purpose querying and updating techniques with SQL. There should also be some form of optimized access strategy for connecting to known data objects in the database.
5. Transaction Processing. Most transactions in a real-time system are extremely short in duration. Where extended transactions are required, the application can easily ensure there will be no contentions, so the role of the database is more to detect conflict for defensive reasons than to assist in serializing user transactions.
Given that a database conforming to the requirements above exists, it is now possible to build conventional database applications on embedded systems with client processes deployed around a central database server. However, having a single central database can be quite a constraint on system design. Here are some useful additional features to consider (Figure 1):
Putting the Database in Control. In a typical active system, changes to data inside the client database trigger a routine for processing that data in the server, which in turn yields new updates back to the client database. Imagine how much time could be saved–and the consequent boost to real-time response–if those operations could be performed directly inside the client database, eliminating the need for a constant exchange of low-level data updates. Modern relational databases have migrated towards this with “triggers” that cause SQL to be executed on other data in the database when various update operations occur. This is fine as far as it goes, but a full object-oriented programming language to specify object (record) methods (behavior) would better allow complete flexibility.
Distributed Applications. It is often useful to have sub-sets of data located at different places within a distributed multiple-tier embedded application. This would normally require client processes to monitor one database and update others. It is better for one database to be able to communicate data changes to another.
While redundant servers can provide full replication, it is often necessary to replicate sub-sets of the database. Given that programming language and active queries now exist, many things are possible. Code in a database can manage active client queries to another database server, monitoring either basic data or deriving new data from the basic data.
Class libraries, allowing access to messaging facilities, such as TCP/IP or UDP/IP sockets, can be used. They allow code in one database to communicate with code in another database by passing messages containing actions and data.
Interfacing Directly to External Devices
Interfacing the database directly with external devices can often drastically reduce system response time. It is necessary to run the routines required to perform any filtering or data manipulation within the database itself, using a suitable programming language. In particular, tables in the database can be linked by active drivers to communicate changes to the external device and also receive information from the device allowing the tables to be updated directly. Such a programming language could be used to react to data changes from the device by generating an output value back to the device, thus ensuring any required system actions are performed in real-time by eliminating data communication delays within the system.
Historic Data. Many real-time embedded applications can produce vast amounts of information that trace, or log, system activity such as device status, meter logs, temperature trends, etc. It is desirable for the real-time database to have a facility for logging this time-series data to permanent storage without holding it all in precious system memory. Facilities must be able to log high-speed raw data and also compressed data containing time period averages, maximums, minimums, etc. Any data so logged must be retrievable via normal database access using SQL and thus appear as if it were stored in a conventional disk-based relational database.
An Alternative Approach to System Design
The approach outlined above has a number of very distinct advantages in embedded systems` design.
Using this new approach, embedded systems can be implemented directly by an active real-time database with the internal programming language performing all alarm monitoring, device management and system control actions. Any user displays and control stations can be implemented as active query clients, remaining totally event driven–a fundamental requirement for SCADA systems.
This database technology can also be implemented directly inside a programmable controller or telemetry unit to interface to the digital and analogue I/O and perform high-speed control actions. This has the advantage of allowing open access via SQL, or another open standard, to the controller rather than via proprietary closed networks, thus supporting simple and transparent access to higher-level systems.
There is now a way to approach embedded design which inherently increases system performance and maintainability, while greatly reducing system development time and complexity. Rather than applications surrounding a passive central database, it is now possible to design a real-time embedded system as a group of servers implemented as a set of active, programmable databases. n
Dave Stoneham is Polyhedra Plc.`s managing director and Polyhedra Inc.`s chairman of the board. He has a strong background in real-time computing with over 20 years industry experience. Stoneham has been in the forefront of database technology both in large companies working in senior positions and also in his own companies where he developed a number of leading edge software products for the process control industry.
Dr. Nigel Day is currently Polyhedra Plc.`s technical director and Polyhedra Inc.`s engineering vice-president. Since gaining his doctorate`s degree from Cambridge University, Day has been involved in a wide range of leading edge software design and consulting projects. Experience includes compiler, operating system and database design and implementation, and computer security research.