By Chris Sincerbox, Contributing Writer
Just a few years ago, a law requiring publicly held companies to annually certify information security practices would have been unheard of. Today, that scenario is a reality. NERC standards 1200 and 1300, ISO 17799, SAS 70 and the Sarbanes-Oxley Act all deal in some way with information security. This compliance trend impacts the security requirements of electric utility software applications such as energy management systems (EMS), distribution management systems (DMS) and geographic information systems (GIS). Data integrity is a given for these applications, but protecting intangible assets, critical data and hard assets of the company are not always the most overriding concern when designing, developing and maintaining these systems.
A sound security infrastructure mitigates risk in all software applications when denial of service attacks, intrusion attacks and information theft occur. Physical security is not enough. Critical software applications that are responsible for the reliability of bulk electric systems or municipalities can incorporate a variety of security solutions that address the three basic categories of security breaches and still meet the most demanding performance and availability criteria. These security solutions can be incorporated into existing applications using open-architecture software techniques that require no major hardware upgrades or customized software drivers. Addressing the three categories of security breaches goes a long way toward becoming security-compliant.
Most existing security plans cover external type of exploits. For instance, implementations of a firewall, security patch management, secure dial-up modem connections, anti-virus software and intrusion detection systems are concerned with external vulnerabilities. However, a good security plan must mitigate internal vulnerabilities as well.
Before discussing methods to improve security for EMS, DMS and GIS systems, two considerations should be noted. The first is the concept of a return on security investment. How does one go about justifying the cost for change? The second consideration is following a structured repeatable iterative process when implementing security changes.
To assess whether current software applications are secure enough, one could attempt to provide a return on security investment (ROSI) analysis or a thorough cost-benefit analysis, but this can be difficult with software security. Since the intent is to eliminate software security breaches, how does one quantify the benefit? If the security solution is implemented properly, no security breach would ever be realized. Normally, return on investments can be calculated by how much money can be made if the solution is implemented. This cannot be done as easily with software security solutions. These solutions should be thought of like insurance. If no security breaches are attempted, then the insurance is not needed. If security breaches are attempted, then the insurance is needed.
A plan for implementing security solutions should follow a series of repeatable steps. Following these steps defines a framework for the improvement process. This iterative process performs an overall feasibility study with reassessments being done at each step. This, in turn leads, to a decision point on whether the proposed security solutions are to be implemented and in what fashion. The model described below can be used as a good starting point.
Denial of Service
Whether the software architecture for the critical application is distributed or centralized in nature, several security vulnerabilities must be addressed. The first security vulnerability area is denial of service. Denial of service is an attempt to deprive an organization of a resource that it expects to be available. For instance, critical electrical software systems service many different functions on a 365x24x7 basis. Functions such as reliability, load balancing, interchange scheduling, transmission service and operation, generation, and load-serving have many different types of users, which may be as diverse as dispatchers/operators located locally, support personnel located remotely, other systems (e.g. ICCP interfaces), other processors on the same LAN or on a WAN, custom-designed programs that run locally, or independently designed programs that are remotely located. Each user-defined entry point that allows access to the system resources-whether via a formalized login/authentication process, a remote procedure call or a programmable network interface-is a potential denial of service attack point.
A denial of service attack can be as simple as using standard supplied operating system utilities such as telnet to connect to a network/service port. For example, this network/service port could be used to service EMS database requests. The telnet operation would connect to the network service/port and not send any data or send an unusually large amount of data. The EMS database server program would accept the request and wait for data and/or read the assumed normal request. If the EMS database server is single-threaded, this could deny any other user from accessing the system. If the EMS database server did not do proper buffer limit checking, it could terminate due to memory access violations.
Another denial of service example would be running programs locally that take up unusual amounts of memory or CPU time. Most often, denial of service attacks are caused by packet flooding over a network. This entails sending large amounts of connect requests and/or data packets over a network. Another example of a denial service attack is causing user accounts to be locked out after a maximum number of login attempts has failed.
The second security vulnerability area is information theft, which involves stealing or viewing data from a target. This information is then used to either disrupt the running system or used as an advantage for other purposes. Any data that is transported is subject to theft. Whether the medium is a fiber-optic network, a radio wave or a microwave, or dedicated modem/phone lines, data leaving or coming into the system is vulnerable. For critical electrical applications, this data could be remote terminal unit control/status/analog/accumulator data, any GUI data, interchange scheduling information, generation/transmission data, network state estimation information, alarm information or inter-control center communications. This type of data cannot be allowed to get into the wrong hands at any time. Communication protocols designed for EMS/DMS systems emphasize efficiency and performance, but this emphasis can leave data unprotected from theft attempts
Examples of information theft could be as simple as using standard supplied operating system utilities such as tcpdump to monitor any outbound/inbound traffic over a network. Capturing this data can reveal detailed protocol sequence and data information. This information could be used to disrupt or cause unexpected results by replaying back the sequence of data with embedded changes as needed. Another type of information theft would be to simply gather all ASCII-encoded data transported via network or inter-process communication channels (i.e. memory, pipes, sockets, queues). Applications can send user and password information in clear text (e.g. Unix “r” commands). This type of theft would be classified as passive. There are no actions done by the intruder. An active attack, on the other hand, would attempt to generate a reaction to an embedded event and then monitor the results looking for patterns. For instance, a intruder could call an organization and inquire about a bill/invoice, reactions from the organization would then be monitored.
The third type of security vulnerability is intrusion attack. Whereas denial of service takes away resources and information theft steals data, intrusion attacks allow unauthorized users to use the system and its resources. Any system that utilizes attended login processes is vulnerable to misuse. For instance, dispatchers and/or support personnel may open multiple login sessions to the system and keep them open for extended periods of time. Some may log in to the system and receive certain roles or privileges when authorized. Each login session has potential for an intrusion attack. Systems that allow unattended logins are extremely vulnerable. Unattended logins imply passwords are kept in a database other then the operating system. This also implies that this password must be converted to clear text to allow authentication. This is never failsafe as passwords are meant to be encrypted in a one-way fashion and never decrypted.
An example of an intrusion attack could be as simple as an unauthorized user sitting down at a terminal recently vacated by an authorized user who did not terminate his or her login session. The unauthorized user then sends commands to the system. Other types of intrusion attacks include trying to guess the password for certain users. For example, if the encrypted password value is known and the algorithm that created the password is known, then repeated attempts to guess the password can be made. Another example of intrusion detection is creating/replacing executables that normally execute at higher privileges. For instance, a security policy could be put into place that allows an executable named “a” to run at an elevated privilege. An intrusion attacker could insert into the executable path a different executable in place of “a” and execute this copy of “a” instead.
Mitigating Denial of Service Attacks
Several basic practices can be used to mitigate denial of service attacks. First, the practice of limiting network port usage should be enforced. The more programs listening for communication requests, the more prone the system is to denial of service attacks. If possible, programs requiring unique ports for communication should be eliminated. Multiple port usage within the same system should be consolidated. All network listeners should be changed to detect denial of service attacks.
The netstat command, which operates on most operating system platforms, can show the extent of network port usage. Noting the service that uses the network ports and determining the usefulness it provides is a good starting point for limiting port usage. If programs/services cannot be eliminated, port usage can be consolidated by introducing the concept of a single listener. With minimal code changes to the network service programs, this single listener can consume one network port that is recognized enterprise-wide. Based on the connection request received, the listener would start the target service and pass the network communication information to the target program. The target program would then connect back to the original destination machine using the information passed to it. This connect-back request would also be serviced by the listener on the destination machine. This is different from other super-server-type designs in that only one port is used on all machines.
If elimination or consolidation is not feasible, a program that listens for network requests can be improved to detect denial of service attacks. For instance, adding timeouts on received data, checking special protocol handshake details for consistency, checking input buffer limits, and limiting request storms can make a listening program more robust.
To further eliminate denial of service attacks, public key and certificate exchanges can be implemented. To support X509 certificates and public key exchanges a public key infrastructure (PKI) can be created. Setting up a PKI is relatively straight-forward and can be done with public domain software such as OpenSSL. A PKI allows an organization to do the following: create/maintain customized certificates, generate unique public keys and operate a digital signing service. Certificates can be issued to users, programs and hosts in the enterprise. Think of certificates as a certified identification mechanism, like a drivers license. When programs are invoked or communication is started, certificates are exchanged between client and server. If the certificates do not pass validation checks, the intended operation is aborted. X509 certificates can be extended to contain information that is related specifically to a machine, such as a serial number or unique machine identification. Certificates can also be used to limit use to certain users or limit use to certain times.
Public key exchange or asymmetric encryption is useful when there is a need for generating a single key value that two sides will agree upon. All encryption/decryption algorithms need a key as input. This key value must be agreed upon for two sides to encrypt and decrypt the exchanged data. Public key exchange works as follows: Two programs using the same designated input parameters generate a public/private key pair. These designated input parameters can be generated by the PKI. The two programs then exchange their respective public keys. After the exchange, the two sides use the newly obtained public key and their own private key to generate a secret code. Both sides come up with the same secret code in this manner. This code can then be used as a key to encrypt/decrypt data. When programs are invoked and communication is started, this key agreement process is attempted. If key agreement fails (i.e. encryption/decryption does not work), the intended operation is aborted.
To detect and eliminate excessive CPU usage, memory usage and/or disk usage, a watchdog program can be implemented. If an incident is detected, the watchdog process can abort the offending process and generate an alarm. In real-time systems this is especially critical, not only from a security viewpoint but also to detect anomalies in the system behavior.
Mitigating Information Theft
Mitigating information theft revolves around the principle of transformation. Whether data is stored in files, in a relational database or shipped over a network, it is exposed if kept in its original form. Data that is exposed can be stolen. The simplest way to protect data is to transform it from its original form into an unreadable form through the use of encryption/decryption algorithms. This transformation operation is typically performed when storing or transferring data.
Utility applications must meet critical performance benchmarks, but encryption/decryption can slow performance. However, depending on the performance requirements, encryption/decryption can be implemented successfully. Some basics of cryptography can help make the decision easier. A number of symmetric key or single key encryption algorithms are available for use, but several issues must be considered. The first issue is key size. Bigger key size means greater security but lower performance. The second issue is the key regeneration frequency to implement. The more often the key is changed, the more secure and the lower performance. The third issue is the type of algorithm to use: block ciphers or stream ciphers. Block ciphers break up data into blocks and encrypt each block separately. Stream ciphers generate a random set of bits and XOR this to the input data. (Note: XOR stands for “eXclusive OR”, this means for example, that if two input operand bits are different the result is 1; otherwise the result is 0.) Block ciphers are more conservative and slower than stream ciphers. The last issue is the mode of the cipher. The modes determine how efficiently the block/stream data is processed. Electronic code book (ECB) is the most basic mode and the least secure as it can encrypt blocks of data in parallel. Cipher block chaining (CBC) improves on ECB and is more secure. It does not encrypt in parallel and uses the previous block results to help encrypt the next data block. Cipher feedback (CFB) and output feedback (OFB) convert a block cipher into a stream cipher. CFB is more secure than traditional stream ciphers. OFB can perform stream cipher processing in parallel so it is fast but not as secure. Listed on the previous page are types, key sizes, and modes available for accepted industry-standard ciphers. The choices available allow for a mix and a match of performance and security.
Time synchronization over a network, remote distribution of files, remote command executions, scanned data, alarm notifications, and control/checkback information are just a small subset of information that needs to be protected.
Minimizing Intrusion Attacks
There are several ways to minimize intrusion attacks. One is to implement a standard command-level or application-level access/usage control mechanism. This requires passwords to be entered and authenticated before programs can be activated. Program activation can be associated with certificates as mentioned earlier. These certificates could be extended to contain user access and control information. For instance, running program “a” may require user “y” to be logged in and authenticated at times “x thru z” before execution is authorized.
Another way to prevent unauthorized use of resources is by digitally signing all executables and/or scripts. This implies the use of a process manager or process controller. Digital signing of binaries can be implemented as part of the PKI. The digital signing process works as follows: When the binary or script is created during the build process, a signing program reads all of the bytes in the file and creates a unique hash string (i.e. MD5). This hash string is then inserted into a certificate. On execution, the process controller would create the hash string and compare it to the associated sign file contents. For instance, controller “a” would load process “b” only if the sign file “c” contents match the generated hash string for process “b.”
Several other practices can help prevent intrusion attacks. Limiting privileged command usage can help prevent intrusion attacks. Programs in general should not execute as the system user or root user. Passwords should be required for all privilege commands. All passwords should be encrypted and stored using a one-way encryption algorithm. This allows the password to never be decrypted. A password lifespan and password rules (i.e. at least eight characters long with at least one capital letter and one number) should be enforced. Privileged command executions should be logged and centrally audited. All login sessions should have inactivity time out values and/or number of command execution limits. Applications should always execute under a user context that does not allow login capabilities. This prevents accidental interruption of the application. Whenever possible, do not allow storage of any password or username combinations in applications. Always disable unauthorized, invalid, expired accounts as soon as possible. Periodic reviews of rights assigned to user accounts in use should be implemented.
Implementing security practices into complicated distributed EMS/DMS systems involves three areas: authorization, authentication and administration. Authorization is usually provided as console modes or areas of responsibilities in the context of the system. These types of privilege assignments are controlled by the application. However, the data that is stored, transferred and executed upon outside the application must also be authorized.
Authentication allows a user to log in to the application. EMS/DMS requires a strong password storage and retrieval mechanism because of the many users that are supported. Single sign-on applications can be used to centrally configure and maintain identify. Applications should not store passwords.
However, the most important area of security is the administration aspect. In large organizations, administration techniques can vary from one group of systems to the next. When normal practices are implemented, audits, databases, applications, events, incidents, infrastructure maintenance and management, remote access, servers and users must be administered. Adding security policies affects each one of these areas and places more of a burden on the organization. For example, when a security incident is detected, a process may be put into place to notify the National Infrastructure Protection Center’s (NIPC) Analysis and Warning (IAW) program for electric power. Frustrating users with complicated password rules or adding encryption/decryption methods that make it impossible to debug software problems do not improve efficiency. The overall benefits gained from good security practices must outweigh the burdens to become effective and efficient. A balance between sound practices and operating efficiency must be obtained. Today, the benefits of good security practices can far outweigh the burdens.à¯£à¯£
Sincerbox holds a bachelor’s degree from New Mexico State and a master’s degree in Software Engineering from the University of Houston at Clear Lake. He has worked in all software aspects of energy management systems for the last 19 years. This experience includes designing and implementing software security enhancements for existing production EMS systems.