|
Paper 981107 TTP -- A New Approach to Solving the Interoperability
H.Kopetz Technical Unversity of Vienna, Austria
T. Thurner DaimlerChryster Research, Stuttgart, Germany
Copyright © 1998 Society of Automotive Engineers
ABSTRACT This paper investigates the problem of interoperability of ECUs in a distributed control system consisting of a set of ECUs connected by a serial communication channel. If the application tasks executing within an ECU depend on the temporal properties of the data delivered by the communicatioin channel, then a precise specification of the temporal interface properties is a prerequisite for interoperability. Such a precise interface specification can be developed if the communication system is time-triggered and the points in time, when the information is updated at the receivers, is known a priori. In the second part of the paper the concept of a "temporal firewall" is introduced to precisely specify the input and output interfaces of a node in the value domain and the time domain. Examples for temporal firewalls in a distributed vehicle control system are given. 1. Introduction The steadily improving price/performance ratio, reliability, and functionality of microelectronic devices are leading to the deployment of more and more computer-based electronic control units (ECUs) in vehicles. At present, more than 50 computer-based ECUs can be found in top-of-the-line automobiles. In the first applications, the microcontrollers were hidden within the ECUs, leaving the external conventional interfaces unchanged. It soon became apparent that new functionality can be realized and additional costs can be saved if the ECUs are connected by serial digital communication links. Because of the unique automotive communication requirements concerning speed, production cost, functionality, and dependability, special communication protocols for the automotive market have been developed. The best known examples of these protocols are CAN (Control Area Network) and J 1850 (SAE 1995). These protocols provide a standardized protocol architecture and more or less compatible physical line interfaces such that ECUs designed by different manufacturers can exchange data via the serial communication link. However, the ability to talk to each other is not sufficient for the interoperability of real-time ECUs (SAE 1995a), since the precise point in time when a message is delivered is not contained in the interface specification of event-triggered systems. It is the objective of this paper to investigate the interoperability problem in integrated automotive control systems and to demonstrate how a time-triggered protocol helps in achieving this interoperability. The rest of the paper is organized as follows: Section 2 analyzes the interoperability problem in the data domain and time domain and demonstrates the fundamental limits of event-triggered protocols in achieving interoperability in hard real-time systems. Section 3 explains the basic concepts of time-triggered protocols, outlines the structure of a time-triggered system, and discusses the communication network interface (CNI) as a stable well-specified interface between the communication system and the host computer within an ECU. Section 4 introduces the concept of temporal firewalls between independently developed ECUs and shows how the CNI of a time-triggered system can be designed to realize such a temporal firewall to support the independent development and error containment in a large distributed automotive system. 2. The Interoperability Problem Problem Definition A future onboard distributed electronic system may consist of a set of electronic control units (ECUs) that are interconnected by a serial bus (see Figure 1).
Figure 1: Example of distributed system consisting of 7 ECUs.
In order to accomplish system functions that cannot be realized on a single ECU, e.g., the tight coordination of the engine, the steering, and the brakes in the four wheels, the ECUs exchange messages via the communication system. Figure 1 depicts an example of a distributed vehicle control system consisting of 7 ECUs. In some architectures, the ECU that implements a particular subsystem, e.g., the braking system in Figure 1, can be designed as a gateway ECU that interfaces to a dedicated brake network as shown in Figure 2 (Hedenetz and Belschner 1998). An ECU contains a microcomputer with its local memory, a communication controller (CC) and a process interface (I/O interface) to connect to sensors and transducers in the periphery. The interface between the microcomputer and the communication controller is called the communication network interface (CNI). The time interval between the offering of a message at the sender's CNI and the delivery of the message at the receiver's CNI is called the transmission time of a message. The variability of this delay, i.e., the interval d max-dmin, where dmax is the maximum transmission time and dmin is the minimum transmission is called the jitter of the communication system. The jitter of the communication system is an important parameter for determining the interoperability of ECUs that exchange time-critical messages.
Figure 2: Example of a Brake by Wire System
A message carries a statement about attributes of significant state variables (e.g., speed, torque) at a particular point in real-time. A significant state variable is called an RT entity. A message is an atomic unit consisting of three parts: (i) the name of the RT entity, (ii) the observed value of the RT entity v(t), and (iii) the time t of observation of the RT entity Only the value of the RT entity must be explicitly carried in the message. If the message does not contain the observation time, it is often assumed that the time of arrival of the message at the receiver is in a fixed relation to the observation time. This is only true, if the jitter of the communication service is sufficiently small. If there is no global notion of time available in a distributed control system, a jitter Dt of the transmission time introduces an additional measurement error of the size
where v(t) denotes the time variability of the measured value v. It depends on the characteristics of the given application whether this additional measurement error, introduced by the jitter of the communication system, is of concern. A communication service is predictable and timely if the relevant service parameters, such as worst-case transmission time and jitter, are known a priori and are in agreement with the application requirements. It is the application that determines what amount of delay and jitter can be tolerated. Control applications are sensitive to a large delay (increase of the dead time) and even more sensitive to jitter. Interoperability of ECUs depends on the realization of a predictable and timely communication service for the exchange of messages between the ECUs, such that the intended system function can be realized under all operational conditions.
DATA DOMAIN - Communicating ECUs require a priori agreement on the syntax and semantics of the messages that are exchanged. The syntax specifies the structure of the messages and the size of the message fields. The semantics determine how the contents of the fields are interpreted by the sender and receiver. For example, if a message contains a name fields that is supposed to denote the name of the RT entity the value of which is contained in the message, then sender and receiver must agree on the precise meaning of the name. The same is true for the representation of the value or the time.
Time Domain - If the communication jitter is not restricted by the application, i.e., if it does not matter to the application whether a messages arrives within a very short time interval or takes a long time until it is delivered, then the agreement on the syntax and semantics of the messages that are exchanged is sufficient to establish interoperability. However, in distributed control applications, communicating ECUs require a guarantee on the maximum jitter of the communication system. There is a fundamental timing problem when a mutual exclusive communication channel is shared among a set of ECUs. At any particular point in time, only one ECU can send a message, while all others have to wait. If there is an access conflict, the media access logic of the communication protocol determines who is allowed to send at a particular point in time and who must wait.
Limits of Event-Triggered Protocols - Different protocols use differing techniques to solve the mentioned access conflicts to the shared communication channel. In the CAN (CAN 1990) protocol the message priority, derived from the message name, determines which message will be sent first. In the LON (LON 1990) protocol, an access conflict is resolved by probabilistic techniques, whereas in a time-triggered protocol, such as TTP (Kopetz and Gruensteidl 1993), the current point in time determines which ECU is allowed to send at a particular point in time. Whereas in the event-triggered protocols CAN and LON the occurrences of events in the host computer (the execution of the "send message command" in the host software) determine when the communication system should try to send a message, the progression of time determines when a message will be sent in the time-triggered protocol TTP. In an event-triggered system, where the host computers operate asynchronously at their own pace, it is not possible to control a priori the phase relationships between the execution of the "send message commands" in the different ECUs of Figure 1. Assume that every ECU of Figure 1 sends a sporadic message with a minimum interarrival time of 1 msec. Assume further that all messages are of the same size, and it takes 100 msec to transmit a message. We define a critical instant (Liu and Layland 1973) as a point in time when all ECUs connected to the channel intend to send a message simultaneously. Since the occurrence of a critical instant, where all 7 ECUs execute the "send message command" at the same time, cannot be ruled out in an event-triggered system, the worst case jitter, in the above example, is 600 msec. The actual transmission time in the above example will vary between the minimum of 100 msec (channel idle) and the maximum of 700 msec (critical instant) depending dynamically on the load generated by the hosts.In a CAN system, because of the priority based non-preemptive access control, the jitter of the highest priority message is bound by the longest message transmission interval. If an ECU sends continuously highest priority messages (e.g., because of a babbling idiot error in the application software of a host computer), then all other ECUs will not be able to access the channel at all. Thus a number of assumptions about ECU behavior must be made to determine the worst-case jitter of non-highest priority CAN messages. For a complete analysis of the worst case jitter in a CAN based system the reader is referred to (Tindell 1995). We cannot imagine an event-triggered protocol that performs better than CAN, because the conflict resolution in CAN takes no time at all. Other event-triggered protocols, like the before mentioned LON, take some extra time for the conflict resolution and thus produce a jitter that is worse than the jitter of the CAN protocol. If no global notion of time is available, the receiver does not even know how long a message has been waiting before it had been transmitted. If such a global notion of time is available in an event-triggered system, then the message can contain the point of time of observation in the data field and the receiver knows how "old" the data is when it is received. In this case the receiver can perform a state estimation to reduce the measurement error caused by the jitter. To summarize, in an event-triggered system without a global notion of time, the actual transmission delay will vary between the minimum protocol execution time and some large maximum (occurring very infrequently) determined by the ECU behaviors at the critical instant. This jitter, which can be a multiple of the actual message transmission time, cannot be predicted and is not even known to the receivers. Applications that are sensitive to the jitter will fail occasionally whenever the jitter is above the application specific threshold. Since the jitter depends not only on the behavior of the sender ECU and the receiver ECU, but on the behavior of all other ECUs in the ensemble, temporal interoperability is not achievable in pure event-triggered systems. 3. The Time-Triggered solution Principles of Operation - In a time-triggered system the sending of messages is controlled by the scheduling table of the communication controller (in TTP this scheduling table is called the message descriptor list MEDL) that contains the information as to which ECU is allowed to access the channel at what point in time. This information, generated a priori (i.e., before run time), is static common knowledge to all ECUs in the system. Using this common knowledge, a host can synchronize the reading of an analog value from the controlled object with this a priori known send time. Since the transmission schedule is free of conflicts (by design), the transmission time to the receiver is constant and known. The uncontrollable jitter in a time-triggered system is reduced to the precision of the global time. In TTP, this precision is in the order of microseconds. In a time-triggered system there is no "send message command" to execute in the host computer. The host computer stores the data it intends to transmit into the provided shared memory of the CNI, knowing a priori at what future point in time the message will be transmitted. The communication system operates autonomously and deterministically without any control command from the host computers. The temporal properties of the CNI, i.e., the points in time when a particular message will be sent and when a particular message will arrive, are precisely specified and will not change dynamically depending on the system load. This precise specification of the temporal properties of the CNI is the basis for the interoperability of time-triggered system. If the application software of a host is validated with respect to its local CNI then it will also operate predictably in a distributed system, since the system integration does not have an effect on the temporal properties of the local CNIs.
Structure of an ECU - Figure 3 depicts the structure of an ECU in a time-triggered system. Such an ECU consists of three major subsystems, the host computer, the time-triggered communication controller, and the process I/O subsystem to interface with the signals of the sensors and actuators. These three subsystems are connected by two interfaces: the communication network interface (CNI) between the host computer and the communication controller, and the controlled object interface (COI) between the host computer and the process I/O subsystem. The time-triggered communication controller contains a scheduling table in its local memory that determines at what points in time a particular message is sent or is expected to arrive. In principle, the communication system will operate autonomously and deterministically even if the host computer is not present.
Services of the Time-Triggered Protocol: The Time-Triggered Protocol (TTP) is an example of a communication protocol for a time-triggered system. This protocol has been developed at the Technical University of Vienna with support from Daimler Benz Research in Stuttgart. A consortium of major European automotive companies have proposed to base the emerging safety critical class C automotive systems on TTP. TTP is a TDMA (time-division multiple access) based protocol organized into a set of rounds. Each ECU is allowed to send a message in each round. TTP provides the following services across the CNI: (i) (ii) Distributed fault-tolerant clock synchronization(iii) Consistent membership service informing each ECU about which ECU has been operational in the last TDMA round.(iv) Consistent and immediate mode change service. (v) Support of the reconfiguration of ECUs in a fault-tolerant system. (vi) Support of multiplexed ECUs to increase the bandwidth utilization. At present a TTP VLSI controller is under development. This controller will support a line speed of 512 kbits/sec, 1 Mb/sec, and 2 Mbit/sec. With the fastest speed, a typical TDMA round is expected to have a duration that is significantly shorter than 1 msec.
The Communication Network Interface (CNI): The most important interface in a time-triggered architecture is the communication network interface (CNI) between the communication controller and the host processor. ECUs communicate by the exchange of state messages across the CNI. A state message can be viewed as a distributed state variable that always contains the most recent version of the real-time data. A new version of a state message overwrites the previous version, there is no queuing of messages in the CNI. The points in time when an incoming state messages is updated or when an outgoing state message will be transmitted, as well as the duration of the transmission, is known a priori to all communicating partners. The CNI is implemented in a dualported RAM (DPRAM) where the TTP controller writes/reads the data from one side and the host CPU form the other side. In addition to the state data, the CNI contains a set of status and control words to inform the ECU about the operation of the protocol. There is only one interrupt line from the TTP controller to the host processor to be able to signal to the host processor the ticks of the global time and to inform the processor about the occurrence of significant events, such as a mode change or a transmission error. The interrupt system is under the control of the host, i.e., each interrupt can be individually enabled or disabled under host program control. It is not possible for the host to interfere with the operation of the controller because there is no physical control line from the host to the controller. 4. Temporal Firewalls On the conceptual level, the CNI between the host computer and the communication network (or the controlled object) can be seen as erecting two unidirectional temporal firewalls (Kopetz and Nosssal 1997) that connect the ECU to its environment. A temporal firewall is a unidirectional data-sharing interface with state-data semantics where at least one of the interfacing subsystems accesses the temporal firewall according to an a priori known schedule and where at all points in time the information contained in the temporal firewall is temporally accurate for at least d acc time units into the future.The subsystem that accesses the temporal firewall according to the a priori known schedule is called the time-triggered (TT) subsystem. No control signal is crossing the temporal firewall. The information provider has to update the RT image in the temporal firewall according to the dynamics of the corresponding RT entity. If the information-providing subsystem ceases to operate, the information in the temporal firewall will be invalidated by the passage of time.
Stable Properties of Temporal Firewalls The following stable properties characterize a temporal firewall. Knowledge about these properties is available a priori to all interfacing subsystems: (i) The addresses (names) and the syntactic structure of the data items in the temporal firewall. The meaning of the data items is associated with these names. (ii) The points on the global time base when the data items in the temporal firewall are accessed by the TT subsystem. This information enables the avoidance of race conditions between the producer and the consumer. A race condition could lead to a loss of replica determinism in replicated temporal firewalls. (iii) The temporal accuracy dacc of the data items in the temporal firewall (Kopetz 1997). This knowledge is important to guide the information consumer about the minimum rate of sampling the temporal firewall. The absolute timepoints when the TT subsystem accesses the temporal firewall are reference points for the temporal accuracy of the information in the temporal firewall. Some of this knowledge is stored in the personalized Message Descriptor List (MEDL) of each communication controller.
Obligations of the Subsystems To ensure the proper information flow across a temporal firewall, the producer and the consumer subsystem must comply with the followings obligations. Producer: The producer of the RT-images stored in the temporal firewall is responsible that the a priori guaranteed temporal accuracy of the RT-images is always maintained. It must update the state information with such a frequency that the guaranteed temporal accuracy is sustained even immediately before the point of update. In case the producer of the information is the TT subsystem, the producer is allowed to access the temporal firewall only at the a priori established time points t to avoid race conditions for access to the temporal firewall. In case the producer of the information is not the TT subsystem, the producer is allowed to access the temporal firewall at any point in time outside a critical interval around t. The duration of this critical interval is [t-2g, t+2g], where t is the access time of the TT subsystem and g is the granularity of the global time (Kopetz 1997a). Consumer: Based on the a priori knowledge about the temporal accuracy of the RT images in the temporal firewall, the consumer must sample the information in the temporal firewall with a sampling rate that ensures that the accessed information is temporally accurate at its time of use of this information. The consumer is only allowed to access the information in the temporal firewall when it knows (based on the a priori knowledge) that the producer is not accessing it (see above). If the consumer violates these access constraints, replica determinism may be lost, or, in the worst case, the consumed information may be corrupted. (The implementation of protected shared objects can avoid information corruption, but cannot guarantee replica determinism).
Temporal Firewalls in the Validation Process: A temporal firewall is a small and stable interface that provides understandable abstractions of the relevant properties of the interfacing subsystems. Conceptually, the RT images in the temporal firewall are closely related to the image presented by a sensor of an analog RT entity in the environment. Temporal firewalls are thus based on an accustomed view of the world. A temporal firewall is a rigid interface with a priori known stable attributes. This rigidity is the strength and the weakness of the temporal firewall concept at the same time. It is a strength, because a precisely defined stable interface induces structure into an architecture. Since this structure is time-invariant it can be relied on when building the system and when reasoning about the properties of the system. System validation and error confinement are facilitated if subsystems are encapsulated within a rigid structure. On the other hand, rigid internal interfaces limit the flexibility and the adaptation of the architecture to a changing request pattern from the environment. In some cases, these stable internal interfaces can be the cause for a waste of system resources.
Table 1: Data elements in the input firewall and output firewall of the system depicted in Figure 1.
Preconditions and Postconditions. Assume a ECU that is encapsulated between two temporal firewalls. These two firewalls form the only interfaces of this ECU to its environment. The first firewall, the input firewall, delivers RT images from the environment of this ECU into the ECU under consideration, and the second one, the output firewall, delivers the results from this ECU to the rest of the system. The stable properties of the input firewall form important preconditions for the validation of the ECU under consideration. Many assumptions about the environment are contained in the specification of this input firewall. Since the information flow across a temporal firewall is unidirectional, there is no dependence of the producer subsystem at the producing side of the input firewall on the proper operation of the ECU under consideration. The stable properties of the output firewall form important postconditions of the validation. In the validation process it must be demonstrated that the postconditions, given in output firewall specification, are always TRUE, provided the preconditions associated with the input firewall hold. Temporal firewalls partition a large distributed real-time system into a set of nearly autonomous subsystems with fully specified interfaces in the temporal domain and in the value domain. Each one of these subsystems can be developed and tested independently from the other subsystems. This systematic decomposition during the design phase and the ensuing constructive composition during the validation and integration phase facilitates the ECU-based development of large distributed real-time systems. The implementation of a synchronous time-triggered communication system, a means to implement the temporal firewalls, facilitates the formal reasoning about the relevant properties of distributed real-time architectures (Rushby 1997). Error Containment Interface. A temporal firewall is free of control signals. Therefore there is no possibility of a control-error propagation across a temporal firewall. Since the information flow across a temporal firewall is unidirectional, a data error can only propagate from the producer to the consumer. There cannot be any error propagation from the consumer back to the producer. Thus a temporal firewall acts as an effective error-containment interface. It encapsulates a ECU and restricts the visibility of its internal mechanisms. The static temporal properties of the temporal firewall ensure that the temporal obligations of the partners of a client-server interaction are enforced (Kopetz 1996). Changes made inside a ECU do not effect the static properties of the temporal firewall. These changes are encapsulated within the ECU and cannot ripple through the total architecture. 5. Temporal Firewall EXAMPLE In Table 1 we have identified some of the key data elements that are contained in the input firewalls and output firewalls of the distributed vehicle control system depicted in Figure 1. These data elements must be periodically updated by the information provider. The update periods are determined by the application specific temporal accuracy requirements of these real-time data elements. Note that a number of the data elements, e.g., the status information about the vehicle, is used by a number of different ECUs. To realize a consistent system behavior, all ECUs must operate on the same version of these data elements. The a priori knowledge as to when a new version overwrites a previous version (this is part of the definition of the temporal firewall) and the awareness of each ECU about the current global time are sufficient to avoid race conditions and to guarantee a consistent system behavior. 6. Conclusion The interoperability of electronic control units in a distributed real-time system requires an a priori known precise specification of the ECU interfaces, both in the value domain and in the temporal domain. If this temporal behavior of the interfaces is not fully specified and is subject to change during system integration, then the behavior of tasks within an ECU, that depend on the temporal properties of the data delivered across the interface, can deviate from the expected behavior and can cause transient system failures that are difficult to reproduce. In this paper we have introduced the concept of a temporal firewall as a fully specified interface of an ECU in a distributed real-time application. Since the temporal properties of the temporal firewalls are not affected by the system integration, the interoperability (or composability as it is called in other communities) of the independently developed ECUs can be guaranteed by design. ACKNOWLEDGMENTS This work has been supported in part by the Brite Euram Project X-by-Wire, and by the ESPRIT LTR project DEVA. REFERENCES CAN (1990). Controller Area Network CAN, an In-Vehicle Serial Communication Protocol. SAE Handbook 1992 SAE Press. pp. 20.341-20.355. Hedenetz, B. and Belschner, R. (1998). "Brake by Wire" without Mechanical Backup by Using a TTP Communication Network. SAE World Congress, Detroit Michigan. SAE Press, Warrendale, PA, USA. Kopetz, H. (1996). A Node as a Real-Time Object. Proc. of the IEEE Workshop on Object Oriented Real-Time Systems, Laguna Beach, Cal. IEEE Press. pp. 1-8. Kopetz, H. (1997). Component Based Design of Large Distributed Real-Time Systems. Proceedings of the Workshop on Distributed Computer Control Systems (DCCS), Seoul, Korea. IFAC. pp. 171-177. Kopetz, H. (1997a). Real-Time Systems, Design Principles for Distributed Embedded Applications; ISBN: 0-7923-9894-7. Boston. Kluwer Academic Publishers. Kopetz, H. and G. Gruensteidl (1993). TTP - A Time-Triggered Protocol for Fault-Tolerant Real-Time Systems. Proc. 23rd IEEE International Symposium on Fault-Tolerant Computing (FTCS-23), Toulouse, France. IEEE Press. pp. 524-532. Kopetz, H. and R. Nosssal (1997). Temporal Firewalls in Large Distributed Real-Time Systems. Proceedings of IEEE Workshop on Future Trends in Distributed Computing, Tunis, Tunesia. IEEE Press. pp. Liu, C. L. and J. W. Layland (1973). Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment. J. of the ACM. Vol. 20. pp. 46-61. LON (1990). LON Protocol Overview. Echelon Systems Corporation, 727 University Avenue, Los Gatos, California. Rushby, J. (1997). Systematic Formal Verification for Fault-Tolerant Time-Triggered Architectures. Proc. DCCA 6, Garmisch, Germany. IEEE Press. (Preprints) 191-210. SAE (1995). Class C Application Requirements, Survey of Known Protocols J20056. SAE Handbook SAE Press, Warrendale, PA. pp. 23.437-23.461 SAE (1995a). Multiplexing Meeting. Minutes of the SAE Multiplexing Committee, March 1995. Tindell, K. (1995). Analysis of Hard Real-Time Communications. Real-Time Systems. Vol. 9. pp. 147-171.
|