[ main page ]

Paper Server : Real-Time Systems Group



Search documents

Search string

Search in
Title Authors Abstract Keywords

Show
Abstracts Bibtex Bibfile  Published documents only

78 documents found

35/2012 : Worst-case execution time analysis-driven object cache design
Benedikt Huber, Wolfgang Puffitsch, Martin Schoeberl
Concurrency and Computation: Practice and Experience

Abstract: Hard real-time systems need a time-predictable computing platform to enable static worst-case execution time (WCET) analysis. All performance-enhancing features need to be WCET analyzable. However, standard data caches containing heap-allocated data are very hard to analyze statically. In this paper we explore a new object cache design, which is driven by the capabilities of static WCET analysis. Simulations of standard benchmarks estimating the expected average case performance usually drive computer architecture design. The design decisions derived from this methodology do not necessarily result in a WCET analysis-friendly design. Aiming for a time-predictable design, we therefore propose to employ WCET analysis techniques for the design space exploration of processor architectures. We evaluated different object cache configurations using static analysis techniques. The number of field accesses that can be statically classified as hits is considerable. The analyzed number of cache miss cycles is 3–46% of the access cycles needed without a cache, which agrees with trends obtained using simulations. Standard data caches perform comparably well in the average case, but accesses to heap data result in overly pessimistic WCET estimations. We therefore believe that an early architecture exploration by means of static timing analysis techniques helps to identify configurations suitable for hard real-time systems.


34/2012 : Data Cache Organization for Accurate Timing Analysis
Martin Schoeberl, Benedikt Huber, Wolfgang Puffitsch
Real-Time Systems

Abstract: Caches are essential to bridge the gap between the high latency main memory and the fast processor pipeline. Standard processor architectures implement two first-level caches to avoid a structural hazard in the pipeline: an instruction cache and a data cache. For tight worst-case execution times it is important to classify memory accesses as either cache hit or cache miss. The addresses of instruction fetches are known statically and static cache hit/miss classification is possible for the instruction cache. The access to data that is cached in the data cache is harder to predict statically. Several different data areas, such as stack, global data, and heap allocated data, share the same cache. Some addresses are known statically, other addresses are only known at runtime. With a standard cache organization all those different data areas must be considered by worst-case execution time analysis. In this paper we propose to split the data cache for the different data areas. Data cache analysis can be performed individually for the different areas. Access to an unknown address in the heap does not destroy the abstract cache state for other data areas. Furthermore, we propose to use a small, highly associative cache for the heap area. We designed and implemented a static analysis for this cache, and integrated it into a worst-case execution time analysis tool.


8/2011 : Worst-case execution time analysis driven object cache design
Benedikt Huber, Wolfgang Puffitsch, Martin Schoeberl
Concurrency and Computation: Practice and Experience

Abstract: Hard real-time systems need a time-predictable computing platform to enable static worst-case execution time (WCET) analysis. All performance-enhancing features need to be WCET analyzable. However, standard data caches containing heap allocated data are very hard to analyze statically. In this paper we explore a new object cache design, which is driven by the capabilities of static WCET analysis. Simulations of standard benchmarks estimating the expected average case performance usually drive computer architecture design. The design decisions derived from this methodology do not necessarily result in a WCET analysis-friendly design. Aiming for a time-predictable design, we therefore propose to employ WCET analysis techniques for the design space exploration of processor architectures. We evaluated different object cache configurations using static analysis techniques. The number of field accesses that can be statically classified as hits is considerable. The analyzed number of cache miss cycles is 3%–46% of the access cycles needed without a cache, which agrees with trends obtained using simulations. Standard data caches perform comparably well in the average case, but accesses to heap data result in overly pessimistic WCET estimations.We therefore believe that an early architecture exploration by means of static timing analysis techniques helps to identify configurations suitable for hard real-time systems.

Get ocwcet_jnl.pdf (304.9492KB)


38/2010 : Towards a Time-predictable Dual-Issue Microprocessor: The Patmos Approach
Martin Schoeberl, Pascal Schleuniger, Wolfgang Puffitsch, Florian Brandner, Christian W. Probst, Sven Karlsson, Tommy Thorn
First Workshop on Bringing Theory to Practice: Predictability and Performance in Embedded Systems (PPES 2011)

Abstract: Current processors are optimized for average case performance, often leading to a high worst-case execution time (WCET). Many architectural features that increase the average case performance are hard to be modeled for the WCET analysis. In this paper we present Patmos, a processor optimized for low WCET bounds rather than high average case performance. Patmos is a dual-issue, statically scheduled RISC processor. The instruction cache is organized as a method cache and the data cache is organized as a split cache in order to simplify the cache WCET analysis. To fill the dual-issue pipeline with enough useful instructions, Patmos relies on a customized compiler. The compiler also plays a central role in optimizing the application for the WCET instead of average case performance.


29/2010 : Worst-Case Analysis of Heap Allocations
Wolfgang Puffitsch, Benedikt Huber, Martin Schoeberl
Lecture Notes in Computer Science

Abstract: In object oriented languages, dynamic memory allocation is a fundamental concept. When using such a language in hard real-time systems, it becomes important to bound both the worst-case execution time and the worst-case memory consumption. In this paper, we present an analysis to determine the worst-case heap allocations of tasks. The analysis builds upon techniques that are well established for worst-case execution time analysis. The difference is that the cost function is not the execution time of instructions in clock cycles, but the allocation in bytes. In contrast to worst-case execution time analysis, worst-case heap allocation analysis is not processor dependent. However, the cost function depends on the object layout of the runtime system. The analysis is evaluated with several real-time benchmarks to establish the usefulness of the analysis, and to compare the memory consumption of different object layouts.

Get wcmem.pdf (322.1768KB)


20/2010 : A real-time Java chip-multiprocessor
Christof Pitter, Martin Schoeberl
ACM Transactions on Embedded Computing Systems (TECS)

Abstract: Chip-multiprocessors are an emerging trend for embedded systems. In this article, we introduce a real-time Java multiprocessor called JopCMP. It is a symmetric shared-memory multiprocessor, and consists of up to eight Java Optimized Processor (JOP) cores, an arbitration control device, and a shared memory. All components are interconnected via a system on chip bus. The arbiter synchronizes the access of multiple CPUs to the shared main memory. In this article, three different arbitration policies are presented, evaluated, and compared with respect to their real-time and average-case performance: a fixed priority, a fair-based, and a time-sliced arbiter. Tasks running on different CPUs of a chip-multiprocessor (CMP) influence each others' execution times when accessing a shared memory. Therefore, the system needs an arbiter that is able to limit the worst-case execution time of a task running on a CPU, even though tasks executing simultaneously on other CPUs access the main memory. Our research shows that timing analysis is in fact possible for homogeneous multiprocessor systems with a shared memory. The timing analysis of tasks, executing on the CMP using time-sliced memory arbitration, leads to viable worst-case execution time bounds. The time-sliced arbiter divides the memory access time into equal time slots, one time slot for each CPU. This memory arbitration scheme allows for a calculation of upper bounds of Java application worst-case execution times, depending on the number of CPUs, the time slot size, and the memory access time. Examples of worst-case execution time calculation are presented, and the analyzed results of a real-world application task are compared to measured execution time results. Finally, we evaluate the tradeoffs when using a time-predictable solution compared to using average-case optimized chip-multiprocessors, applying three different benchmarks. These experiments are carried out by executing the programs on the CMP prototype.

Get jopcmp_tecs.pdf (281.8252KB; Preprint)


15/2010 : WCET Driven Design Space Exploration of an Object Cache
Benedikt Huber, Wolfgang Puffitsch, Martin Schoeberl
JTRES’10 August 19–21, 2010 Prague, Czech Republic

Abstract: In order to guarantee that real-time systems meet their timing specification, static execution time bounds need to be calculated. Not considering execution time predictability led to architectures which perform well in the average case, but require very pessimistic assumptions when bounding the worst-case execution time (WCET). Computer architecture design is driven by simulations of standard benchmarks estimating the expected average case performance. The design decisions derived from this design methodology do not necessarily result in a WCET analysis-friendly design. Aiming for a time-predictable computer architecture, we propose to employ WCET analysis techniques for the design space exploration of processor architectures. We exemplify this approach by a WCET driven design of a cache for heap allocated objects. Depending on the main memory properties (latency and bandwidth), different cache organizations result in the lowest WCET. The evaluation reveals that for certain cache configurations, the analyzed hit rate is comparable to the average case hit rate obtained by measurements. We believe that an early architecture exploration by means of static timing analysis techniques helps to identify configurations suitable for hard real-time systems.

Get jtres2010.pdf (694.8691KB)


10/2010 : Scheduling of Hard Real-Time Garbage Collection
Martin Schoeberl
Real-Time Systems

Abstract: Automatic memory management or garbage collection greatly simplifies development of large systems. However, garbage collection is usually not used in real-time systems due to the unpredictable temporal behavior of current implementations of a garbage collector. In this paper we propose a real-time garbage collector that can be scheduled like a normal real-time thread with a deadline monotonic assigned priority. We provide an upper bound for the collector period so that the application threads will never run out of memory. Furthermore, we show that the restricted execution model of the Safety Critical Java standard simplifies root scanning and reduces copying of static data. Our proposal has been implemented and evaluated in the context of the Java processor JOP.

Get hrtsgc.pdf (414.5332KB)


2/2010 : Worst-case execution time analysis for a Java processor
Martin Schoeberl, Wolfgang Puffitsch, Rasmus Pedersen, Benedikt Huber
Software: Practice and Experience

Abstract: In this paper, we propose a solution for a worst-case execution time (WCET) analyzable Java system: a combination of a time predictable Java processor and a tool that performs WCET analysis at Java bytecode level. We present a Java processor, called JOP, designed for time-predictable execution of real-time tasks. The execution time of bytecodes, the instructions of the Java virtual machine, is known cycle accurately for JOP. Therefore, JOP simplifies the low-level WCET analysis. A method cache, which fills whole Java methods into the cache, simplifies cache analysis. The WCET analysis tool is based on integer linear programming. The tool performs the low-level analysis at the bytecode level and integrates the method cache analysis. An integrated data-flow analysis performs receiver type analysis for dynamic method dispatches and loop bound analysis. Furthermore, a model checking approach to WCET analysis is presented where the method cache can be exactly simulated. The combination of the time-predictable Java processor and the WCET analysis tool is evaluated with standard WCET benchmarks and three real-time applications. The WCET friendly architecture of JOP and the integrated method cache analysis yield tight WCET bounds. Comparing the exact, but expensive, model checking based analysis of the method cache with the static approach demonstrates that the static approximation of the method cache is sufficiently tight for practical purposes.

Get wcetana.pdf (601.0771KB)


1/2010 : RTTM: Real-time transactional memory
Florian Brandner, Jan Vitek, Martin Schoeberl
Proceedings of the 25th ACM Symposium on Applied Computing

Abstract: Hardware transactional memory is a promising synchronization technology for chip-multiprocessors. It simplifies programming of concurrent applications and allows for higher concurrency than lock based synchronization. Standard transactional memory is optimized for average case throughput, but for real-time systems we are interested in worst-case execution times. We propose real-time transactional memory (RTTM) as a time-predictable synchronization solution for chip-multiprocessors in real-time systems. We define the hardware for time-predictable transactions and provide a bound for the maximum transaction retries. The proposed RTTM is evaluated with a simulation of a Java chip-multiprocessor.

Get rttm_final.pdf (212.9453KB)


80/2009 : Single-Path Programming on a Chip-Multiprocessor System
Martin Schoeberl, Peter Puschner
Workshop on Reconciling Performance with Predictability (RePP 2009)

Abstract: In this paper we explore a time-predictable chip-multiprocessor (CMP) system based on single-path programming. To keep the timing constant, even in the case of shared memory access for the CMP cores, the tasks on the cores are synchronized with the time-sliced memory arbitration unit.

Get spcmp_repp_final.pdf (109.7861KB)


76/2009 : Fun with a Deadline Instruction
Martin Schoeberl, Hiren D. Patel, Edward A. Lee

Abstract: In this paper we present example applications using a deadline instruction. The deadline instruction brings cycle accurate timing information into the application code. We have implemented the mechanism in a time-predictable Java chip-multiprocessor. As a proof of the accuracy that can be gained, a digital to analog conversion of audio signals is implemented completely in software. Furthermore, we show how the deadline instruction can be used to verify bytecode execution times on chip-multiprocessors and how to synchronize tasks to a time-division based memory arbiter.

Get deadline.pdf (203.3379KB)


75/2009 : Multiprocessor JOP Documentation
Martin Schoeberl, Wolfgang Puffitsch
JEOPARD Deliverable D 2.4

Abstract: This documents contains the deliverable D2.4 Multiprocessor JOP Documentation of work-package 2 of the JEOPARD project due 20 months after project start as stated in the Description ofWork. This document presents the user documentation of the JEOPARD JOP multi-core platform as described in deliverable D2.1.

Get d2.4.pdf (513.8096KB)


65/2009 : A Hardware Abstraction Layer in Java
Martin Schoeberl, Stephan Korsholm, Tomas Kalibera, Anders P. Ravn
Trans. on Embedded Computing Sys.

Abstract: Embedded systems use specialized hardware devices to interact with their environment, and since they have to be dependable, it is attractive to use a modern, type-safe programming language like Java to develop programs for them. Standard Java, as a platform independent language, delegates access to devices, direct memory access, and interrupt handling to some underlying operating system or kernel, but in the embedded systems domain resources are scarce and a Java virtual machine (JVM) without an underlying middleware is an attractive architecture. The contribution of this paper is a proposal for Java packages with hardware objects and interrupt handlers that interface to such a JVM. We provide implementations of the proposal directly in hardware, as extensions of standard interpreters, and finally with an operating system middleware. The latter solution is mainly seen as a migration path allowing Java programs to coexist with legacy system components. An important aspect of the proposal is that it is compatible with the Real-Time Specification for Java (RTSJ).

Get jhal.pdf (453.7256KB)


63/2009 : Educational Case Studies with an Open Source Embedded Real-Time Java Processor
Rasmus Pedersen, Martin Schoeberl
Proceedings of the Workshop on Embedded Systems Education (WESE 2009)

Abstract: In this paper we show a platform which allows for education and training of a number of essential embedded skills. The Java optimized processor (JOP) is open source and has been used in several educational and training sessions and we cover how each setting has trained a special skill set. The experience covers basics from undergraduate education to Ph.D. level education. At each level different properties of the system are emphasized. Our emphasis on the interdisciplinary of embedded systems education is based on referenced research findings. This way we provide empirical findings and couple it with academic frameworks.

Get jopedu.pdf (384.3477KB)


62/2009 : A Disruptive Computer Design Idea: Architectures with Repeatable Timing
Stephen A. Edwards, Sungjun Kim, Edward A. Lee, Isaac Liu, Hiren D. Patel, Martin Schoeberl
Proceedings of IEEE International Conference on Computer Design (ICCD 2009)

Abstract: This paper argues that repeatable timing is more important and more achievable than predictable timing. It describes microarchitecture approaches to pipelining and memory hierarchy that deliver repeatable timing and promise comparable or better performance compared to established techniques. Specifically, threads are interleaved in a pipeline to eliminate pipeline hazards, and a hierarchical memory architecture is outlined that hides memory latencies.

Get pret_iccd.pdf (154.6924KB)


61/2009 : Towards Time-predictable Data Caches for Chip-Multiprocessors
Martin Schoeberl, Wolfgang Puffitsch, Benedikt Huber
Proceedings of the Seventh IFIP Workshop on Software Technologies for Future Embedded and Ubiquitous Systems (SEUS 2009)

Abstract: Future embedded systems are expected to use chip-multiprocessors to provide the execution power for increasingly demanding applications. Multiprocessors increase the pressure on the memory bandwidth and processor local caching is mandatory. However, data caches are known to be very hard to integrate into the worst-case execution time (WCET) analysis. We tackle this issue from the computer architecture side: provide a data cache organization that enables tight WCET analysis. Similar to the cache splitting between instruction and data, we argue to split the data cache for different data areas. In this paper we show cache simulation results for the split-cache organization, propose the modularization of the data cache analysis for the different data areas, and evaluate the implementation costs in a prototype chip-multiprocessor system.

Get dcache_seus.pdf (149.6953KB)


60/2009 : A Single-Path Chip-Multiprocessor System
Martin Schoeberl, Peter Puschner
Proceedings of the Seventh IFIP Workshop on Software Technologies for Future Embedded and Ubiquitous Systems (SEUS 2009)

Abstract: In this paper we explore the combination of a time-predictable chip-multiprocessor system with the single-path programming paradigm. Time-sliced arbitration of the main memory access provides time-predictable memory load and store instructions. Single-path programming avoids control flow dependent timing variations. To keep the execution time of tasks constant, even in the case of shared memory access of several processor cores, the tasks on the cores are synchronized with the time-sliced memory arbitration unit.

Get spcmp_seus.pdf (127.5439KB)


59/2009 : JOP Reference Handbook
Martin Schoeberl
CreateSpace

Abstract: This book is about JOP, the Java Optimized Processor. JOP is an implementation of the Java virtual machine (JVM) in hardware. The main implementation platform is a field programmable gate array (FPGA). JOP began as a research project for a PhD thesis. In the mean time, JOP has been used in several industrial applications and as a research platform. JOP is a time-predictable processor for hard real-time systems implemented in Java. JOP is open-source under the GNU GPL and has a growing user base. This book is written for all of you who build this lively community. For a long time the PhD thesis, some research papers, and the web site have been the main documentation for JOP. A PhD thesis focus is on research results and implementation details are usually omitted. This book complements the thesis and provides insight into the implementation of JOP and the accompanying JVM. Furthermore, it gives you an idea how to build an embedded real-time system based on JOP.


57/2009 : Using Hardware Methods to Improve Time-predictable Performance in Real-time Java Systems
Jack Whitham, Neil Audsley, Martin Schoeberl
Proceedings of the 7th International Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2009)

Abstract: This paper describes hardware methods, a lightweight and platform-independent scheme for linking real-time Java code to co-processors implemented using a hardware description language (HDL). Intended for use in embedded systems, hardware methods have similar semantics to the native methods used to interface Java code to legacy C/C++ software, but are also time-predictable, facilitating accurate worst-case execution time (WCET) analysis. By reference to several examples, the paper demonstrates the applicability of hardware methods and shows that they can (1) reduce the WCET of embedded real-time Java, and (2) improve the quality of WCET estimates in the presence of infeasible paths.

Get hwmethods.pdf (287.8213KB)


56/2009 : Cross-profiling for Java processors
Walter Binder, Martin Schoeberl, Philippe Moret, Alex Villazon
Software: Practice and Experience

Abstract: Performance evaluation of embedded software is essential in an early development phase so as to ensure that the software will run on the embedded device's limited computing resources. Prevailing approaches either require the deployment of the software on the embedded target, which can be tedious and may be impossible in an early development phase, or rely on simulation, which can be very slow. In this article, we introduce a customizable cross-profiling framework for embedded Java processors, including processors featuring a method cache. The developer profiles the embedded software in the host environment, completely decoupled from the target system, on any standard Java virtual machine, but the generated profiles represent the execution time metric of the target system. Our cross-profiling framework is based on bytecode instrumentation. We identify several pointcuts in the execution of bytecode that need to be instrumented in order to estimate the CPU cycle consumption on the target system. An evaluation using the JOP embedded Java processor as target confirms that our approach reconciles high profile accuracy with moderate overhead. Our cross-profiling framework also enables the performance evaluation of new processor architectures before they are implemented. As a case study, we explore the performance impact of various processor design choices and optimizations, such as different cache sizes or pipeline organizations, and come up with an improved processor design that yields speedups of up to 40% on standard Java benchmarks.

Get cprof_spe.pdf (293.0625KB)


54/2009 : A Real-Time Java Chip-Multiprocessor
Christof Pitter, Martin Schoeberl
ACM Transactions on Embedded Computing Systems (TECS)

Abstract: Chip-multiprocessors are an emerging trend for embedded systems. In this article, we introduce a real-time Java multiprocessor called JopCMP. It is a symmetric shared-memory multiprocessor, and consists of up to eight Java Optimized Processor (JOP) cores, an arbitration control device, and a shared memory. All components are interconnected via a system on chip bus. The arbiter synchronizes the access of multiple CPUs to the shared main memory. In this article, three different arbitration policies are presented, evaluated, and compared with respect to their real-time and average-case performance: a fixed priority, a fair-based, and a time-sliced arbiter. Tasks running on different CPUs of a chip-multiprocessor (CMP) influence each others' execution times when accessing a shared memory. Therefore, the system needs an arbiter that is able to limit the worst-case execution time of a task running on a CPU, even though tasks executing simultaneously on other CPUs access the main memory. Our research shows that timing analysis is in fact possible for homogeneous multiprocessor systems with a shared memory. The timing analysis of tasks, executing on the CMP using time-sliced memory arbitration, leads to viable worst-case execution time bounds. The time-sliced arbiter divides the memory access time into equal time slots, one time slot for each CPU. This memory arbitration scheme allows for a calculation of upper bounds of Java application worst-case execution times, depending on the number of CPUs, the time slot size, and the memory access time. Examples of worst-case execution time calculation are presented, and the analyzed results of a real-world application task are compared to measured execution time results. Finally, we evaluate the tradeoffs when using a time-predictable solution compared to using average-case optimized chip-multiprocessors, applying three different benchmarks. These experiments are carried out by executing the programs on the CMP prototype.

Get jopcmp_tecs.pdf (281.8252KB; Preprint)


52/2009 : Analyzing Performance and Dynamic Behavior of Embedded Java Software with Calling-Context Cross-Profiling
Philippe Moret, Walter Binder, Martin Schoeberl, Alex Villazon, Danilo Ansaloni
Proceedings of the 7th International Conference on the Principles and Practice of Programming in Java (PPPJ 2009)

Abstract: Prevailing approaches to analyze embedded software performance either require the deployment of the software on the embedded target, which can be tedious and may be impossible in an early development phase, or rely on simulation, which can be extremely slow. We promote cross-profiling as an alternative approach, which is particularly well suited for embedded Java processors. The embedded software is profiled in any standard Java Virtual Machine in a host environment, but the generated cross-profile estimates the execution time on the target. We implemented our approach in the customizable cross-profiler CProf, which generates calling-context cross-profiles. Each calling-context stores dynamic metrics, such as the estimated CPU cycle consumption on the target. We visualize the generated calling-context cross-profiles as ring charts, where callee methods are represented in segments surrounding the caller’s segment. As the size of each segment corresponds to the relative CPU consumption of the corresponding calling-context, the visualization eases the location of performance bottlenecks in embedded Java software, revealing hot methods, as well as their callers and callees, at one glance.

Get pppj09-cprof-final.pdf (98.0479KB)


50/2009 : Design Space Exploration for Java Processors with Cross-Profiling
Martin Schoeberl, Walter Binder, Philippe Moret, Alex Villazon
Proceedings of the 6th International Conference on the Quantitative Evaluation of SysTems (QEST 2009)

Abstract: Most processors are used in embedded systems, where the processor architectures are diverse due to optimizations for different application domains. The main challenge for embedded system processors is the right balance between performance and chip size, which directly relates to cost. An early estimation of the performance for a new design is of paramount importance. In this paper we propose cross-profiling for that performance estimation, which can be accomplished very early in the design phase. We evaluate our approach in the context of a Java processor for embedded systems using cross-profiling on a standard desktop Java virtual machine. We explore the performance impact of various processor design choices and optimizations, such as different caches strategies or pipeline organizations, and come up with an improved processor design that yields speedups of up to 40\% on standard Java benchmarks. Comparing the generated cross-profiles with the execution of benchmarks in real hardware confirms that our approach is sound.

Get profarch_qest2009.pdf (142.0234KB)


48/2009 : Comparison of Implicit Path Enumeration and Model Checking based WCET Analysis
Benedikt Huber, Martin Schoeberl
Proceedings of the 9th International Workshop on Worst-Case Execution Time (WCET) Analysis

Abstract: In this paper, we present our new worst-case execution time (WCET) analysis tool for Java processors, supporting both implicit path enumeration (IPET) and model checking based execution time estimation. Even though model checking is significantly more expensive than IPET, it simplifies accurate modeling of pipelines and caches. Experimental results using the UPPAAL model checker indicate that model checking is fast enough for typical tasks in embedded applications, though large loop bounds may lead to long analysis times. To obtain a tool which is able to cope with larger applications, we recommend to use model checking for more important code fragments, and combine it with the IPET approach.

Get wcetmc_wcet2009.pdf (279.6602KB)


46/2009 : Is Chip-Multiprocessing the End of Real-Time Scheduling?
Martin Schoeberl, Peter Puschner
Proceedings of the 9th International Workshop on Worst-Case Execution Time (WCET) Analysis

Abstract: Chip-multiprocessing is considered the future path for performance enhancements in computer architecture. Eight processor cores on a single chip are state-of-the art and several hundreds of cores on a single die are expected in the near future. General purpose computing is facing the challenge how to use the many cores. However, in embedded real-time systems thread-level parallelism is naturally used. In this paper we assume a system where we can dedicate a single core for each thread. In that case classic real-time scheduling disappears. However, the threads, running on their dedicated core, still compete for a shared resource, the main memory. A time-sliced memory arbiter is used to avoid timing influences between threads. The schedule of the arbiter is integrated into the worst-case execution time (WCET) analysis. The WCET results are used as a feedback to regenerate the arbiter schedule. Therefore, we schedule memory access instead of CPU time.

Get cmpwcet.pdf (157.1904KB)


39/2009 : Nonblocking Real-Time Garbage Collection
Martin Schoeberl, Wolfgang Puffitsch
ACM Transactions on Embedded Computing Systems

Abstract: A real-time garbage collector has to fulfill two basic properties: ensure that programs with bounded allocation rates do not run out of memory and provide short blocking times. Even for incremental garbage collectors, two major sources of blocking exist, namely root scanning and heap compaction. Finding root nodes of an object graph is an integral part of tracing garbage collectors and cannot be circumvented. Heap compaction is necessary to avoid probably unbounded heap fragmentation, which in turn would lead to unacceptably high memory consumption. In this paper, we propose solutions to both issues. Thread stacks are local to a thread, and root scanning therefore only needs to be atomic with respect to the thread whose stack is scanned. This fact can be utilized by either blocking only the thread whose stack is scanned, or by delegating the responsibility for root scanning to the application threads. The latter solution eliminates blocking due to root scanning completely. The impact of this solution on the execution time of a garbage collector is shown for two different variants of such a root scanning algorithm. During heap compaction, objects are copied. Copying is usually performed atomically to avoid interference with application threads, which could render the state of an object inconsistent. Copying of large objects and especially large arrays introduces long blocking times that are unacceptable for real-time systems. In this paper an interruptible copy unit is presented that implements non-blocking object copy. The unit can be interrupted after a single word move. We evaluate a real-time garbage collector that uses the proposed techniques on a Java processor. With this garbage collector, it is possible to run high priority hard real-time tasks at 10~kHz parallel to the garbage collection task on a 100~MHz system.

Get nbgc.pdf (292.7764KB)


38/2009 : Multiprocessor JOP Architecture Design
Martin Schoeberl, Wolfgang Puffitsch, Christof Pitter, Andy Wellings
JEOPARD Deliverable D 2.1

Abstract: Work-package 2 of the JEOPARD project is devoted to the design and evaluation of hardware support for Java based CMP systems. In this document the architecture design of the JOP CMP system is described. Best to our knowledge, the presented system is the first time-predictable CMP system that is analyzable and supported by a worst-case execution time (WCET) analysis tool.

Get d2_1_final.pdf (1459.9512KB)


29/2009 : Time-predictable Computer Architecture
Martin Schoeberl
EURASIP Journal on Embedded Systems

Abstract: Today's general-purpose processors are optimized for maximum throughput. Real-time systems need a processor with both a reasonable and a known worst-case execution time (WCET). Features such as pipelines with instruction dependencies, caches, branch prediction, and out-of-order execution complicate WCET analysis and lead to very conservative estimates. In this paper, we evaluate the issues of current architectures with respect to WCET analysis. Then, we propose solutions for a time-predictable computer architecture. The proposed architecture is evaluated with implementation of some features in a Java processor. The resulting processor is a good target for WCET analysis and still performs well in the average case.

Get ca4rts.pdf (353.7920KB)


20/2009 : Java for Safety-Critical Applications
Thomas Henties, James J. Hunt, Doug Locke, Kelvin Nilsen, Martin Schoeberl, Jan Vitek
2nd International Workshop on the Certification of Safety-Critical Software Controlled Systems (SafeCert 2009)

Abstract: In recent years, various approaches to real-time execution of Java have proven their worth in numerous commercial and defense applications. The Real-time Specification for Java has extended the Java platform with a range of features needed for real-time computing. As the use of real-time Java has become more widespread, the demand for Java in real-time applications with safety requirements has led to an effort to define a new standard---JSR-302 Safety-Critical Java (SCJ). The goal of this standard is to facilitate the creation of safety-critical Java applications capable of certification under standards such as DO 178B level A or IEC61508 for SIL 4. JSR-302 is nearing completion and will soon be released for public review. This paper introduces some of the primary goals, challenges, and proposed solutions for safety-critical Java and its relationship with the Real-time Specification for Java.

Get safecert2009_final.pdf (675.8955KB)


19/2009 : Towards Transactional Memory for Real-Time Systems
Martin Schoeberl, Bent Thomsen, Lone Leth Tomsen

Abstract: In this paper, we explore a new synchronization paradigm for real-time systems: transactional memory. Transactional memory is considered as a solution for parallel programs on a shared memory chip multiprocessors. It simplifies the programming model and increases the average case throughput. However, in real-time systems we are interested in the worst-case execution time. In this paper we propose formulaes to bound the maximum number of transaction retries. Furthermore, we propose a possible hardware implementation in the context of a Java processor and show first results in a multiprocessor simulation.

Get rr-2009-019.pdf (155.3223KB)


7/2009 : Embedded JIT Compilation with CACAO on YARI
Florian Brandner, Tommy Thorn, Martin Schoeberl
Proceedings of the 12th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC 2009)

Abstract: Java is one of the most popular programming languages for the development of portable workstation and server applications available today. Because of its clean design and typesafety, it is also becoming attractive in the domain of embedded systems. Unfortunately, the dynamic features of the language and its rich class library cause considerable overhead in terms of runtime and memory consumption. Efficient techniques to implement Java Virtual Machines (JVM), that are suitable for use in resource constrained environments are thus needed. In this work we present a solution for very restricted environments based on CACAO. CACAO is a just-in-time (JIT) compiling JVM implementation, combining high speed and small size. We have modified the original JVM to run without an underlaying operating system within only 1 MB of memory. In addition we present a new technique to selectively precompile methods during the initialization phase of real-time Java applications to prevent unwanted interaction between the JIT compilation and critical tasks. Furthermore we present the YARI soft-core as the execution platform of CACAO within an FPGA. We compare our implementation with two well known Java processors, JOP and Sun's picoJava-II, on the same FPGA technology. Although JOP achieves a higher clock frequency and picoJava-II occupies nearly 4 times the resource of YARI, our solution is capable to outperform both of them by a factor of up to 2.8 and 2.2 respectively.

Get isorc-09-final.pdf (180.0537KB)


6/2009 : Thread-local Scope Caching for Real-time Java
Andy Wellings, Martin Schoeberl
Proceedings of the 12th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC 2009)

Abstract: There is increasing convergence between the fields of parallel and embedded computing. The demand for more functionality in embedded devices means that complex multicore architectures will be used. In order to promote scalability and obtain predictability, on-chip processor-local private memory subsystems will be used. Whilst at the hardware level this is technical feasible, the more pressing problem is how such memory is presented to the programmer and how its local access is policed. In this paper we illustrate how Java augmented by the Real-time Specification for Java can be used to present the abstraction of a thread-local scoped memory area. We show how to enforce access to the memory area to a single real-time thread. We implement the model on the JOP multiprocessor system and report on our experiences.

Get local_scopes_final.pdf (315.8223KB)


5/2009 : Time-predictable Cache Organization
Martin Schoeberl
Proceedings of the First International Workshop on Software Technologies for Future Dependable Distributed Systems (STFSSD 2009)

Abstract: Caches are a mandatory feature of current processors to deliver instructions and data to a fast processor pipeline. However, standard cache organizations are designed to increase the average case performance. They are hard to model for worst-case execution time (WCET) analysis. Unknown abstract cache states during the analysis result in conservative WCET bounds. Therefore, we propose to adapt the cache organization to simplify the analysis. The data cache is split into several independent caches for the stack, static data, constants, and heap allocated data.

Get tpcache_final.pdf (88.1201KB)


72/2008 : Comparison of ILP and Model Checking based WCET Analysis
Benedikt Huber, Martin Schoeberl

Get wcetmctr.pdf (206.2617KB)


45/2008 : JOP: A Java Optimized Processor for Embedded Real-Time Systems
Martin Schoeberl
VDM Verlag


39/2008 : Non-blocking Root Scanning for Real-Time Garbage Collection
Wolfgang Puffitsch, Martin Schoeberl
Proceedings of the 6th International Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2008)

Abstract: Root scanning is a well known source of blocking times due to garbage collection. In this paper, we show that root scanning only needs to be atomic with respect to the thread whose stack is scanned. We propose two solutions to utilize this fact: (a) block only the thread whose stack is scanned, or (b) shift the responsibility for root scanning from the garbage collector to the application threads. The latter solution eliminates blocking due to root scanning completely. Furthermore, we show that a snapshot-at-beginning write barrier is sufficient to ensure the consistency of the root set even if local root sets are scanned independently of each other. The impact of solution (b) on the execution time of a garbage collector is shown for two different variants of the root scanning algorithm. Finally, we evaluate the resulting real-time garbage collector in a real system to confirm our theoretical findings.

Get nbrs.pdf (156.9834KB)


38/2008 : Non-blocking Object Copy for Real-Time Garbage Collection
Martin Schoeberl, Wolfgang Puffitsch
Proceedings of the 6th International Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2008)

Abstract: A real-time garbage collector has to fulfill two conflicting properties: avoid heap fragmentation and provide short blocking time. The heap needs to be compacted to avoid probably unbounded fragmentation. During compaction all objects are copied; copying is usually performed atomically to avoid interference with mutator threads. Copying of large objects and especially large arrays introduces long blocking times that are unacceptable for real-time systems. In this paper an interruptible copy unit is presented that implements non-blocking object copy. The unit intercepts object and array field access and redirects the access either to the source or destination part of the moving object. The unit can be interrupted after a single word move. The resulting maximum blocking time is the time for a memory word read and write. We have implemented the proposed non-blocking copy unit in the Java processor JOP and are able to run high priority real-time tasks at 10 kHz parallel to the garbage collection task on a 100 MHz system.

Get gchwcp.pdf (230.2871KB)


37/2008 : On Composable System Timing, Task Timing, and WCET Analysis
Peter Puschner, Martin Schoeberl
8th International Workshop on Worst-Case Execution Time (WCET) Analysis

Abstract: The complexity of hardware and software architectures used in today's embedded systems make a hierarchical, composable timing analysis impossible. This paper describes the source of this complexity in terms of mechanisms and side effects that determine variations in the timing of single tasks and entire applications. Based on these observations, the paper proposes strategies to reduce the complexity. It shows the positive effects of these strategies on the timing of tasks and on WCET analysis.

Get wcet2008.pdf (102.4111KB)


35/2008 : Embedded JIT Compilation with CACAO on YARI
Florian Brandner, Tommy Thorn, Martin Schoeberl

Abstract: Java is one of the most popular programming languages for the development of portable workstation and server applications available today. Because of its clean design and typesafety, it is also becoming attractive in the domain of embedded systems. Unfortunately, the dynamic features of the language and its rich class library cause considerable overhead in terms of runtime and memory consumption. Efficient techniques to implement Java Virtual Machines (JVM), that are suitable for use in resource constrained environments are thus needed. In this work we present a solution for very restricted environments based on CACAO. CACAO is a just-in-time (JIT) compiling JVM implementation, combining high speed and small size. We have modified the original JVM to run without an underlaying operating system within only 1~MB of memory. In addition we present a new technique to selectively precompile methods during the initialization phase of real-time Java applications to prevent unwanted interaction between the JIT compilation and critical tasks. Furthermore we present the YARI soft-core as the execution platform of CACAO within an FPGA. We compare our implementation with two well known Java processors, JOP and Sun's picoJava-II, on the same FPGA technology. Although JOP achieves a higher clock frequency and picoJava-II occupies nearly 4 times the resource of YARI, our solution is capable to outperform both of them by a factor of up to 2.2 and 1.7 respectively.

Get embcacao_techrep.pdf (228.0684KB)


34/2008 : A Java Processor Architecture for Embedded Real-Time Systems
Martin Schoeberl
Journal of Systems Architecture

Abstract: Architectural advancements in modern processor designs increase average performance with features such as pipelines, caches, branch prediction, and out-of-order execution. However, these features complicate worst-case execution time analysis and lead to very conservative estimates. JOP (Java Optimized Processor) tackles this problem from the architectural perspective -- by introducing a processor architecture in which simpler and more accurate WCET analysis is more important than average case performance. This paper presents a Java processor designed for time-predictable execution of real-time tasks. JOP is the implementation of the Java virtual machine in hardware. JOP is intended for applications in embedded real-time systems and the primary implementation technology is in a field programmable gate array. This paper demonstrates that a hardware implementation of the Java virtual machine results in a small design for resource-constrained devices.

Get rtarch.pdf (288.8213KB)


33/2008 : Cache-aware Cross-profiling for Java Processors
Walter Binder, Alex Villazon, Martin Schoeberl, Philippe Moret
Proceedings of the 2008 international conference on Compilers, architecture, and synthesis for embedded systems (CASES 2008)

Abstract: Performance evaluation of embedded software is essential in an early development phase so as to ensure that the software will run on the embedded device's limited computing resources. Prevailing approaches either require the deployment of the software on the embedded target, which can be tedious and may be impossible in an early development phase, or rely on simulation, which can be very slow. In this paper, we introduce a customizable cross-profiling framework for embedded Java processors, including processors featuring a method cache. The developer profiles the embedded software in the host environment, completely decoupled from the target system, on any standard Java Virtual Machine, but the generated profiles represent the execution time metric of the target system. Our cross-profiling framework is based on bytecode instrumentation. We identify several pointcuts in the execution of bytecode that need to be instrumented in order to estimate the CPU cycle consumption on the target system. An evaluation using the JOP embedded Java processor as target confirms that our approach reconciles high profile accuracy with moderate overhead. Our cross-profiling framework also enables the rapid evaluation of the performance impact of possible optimizations, such as different caching strategies.

Get crossprofiling_cases2008.pdf (363.7227KB)


27/2008 : Cross-Profiling for Embedded Java Processors
Walter Binder, Martin Schoeberl, Philippe Moret, Alex Villazon
Proceedings of the 5th International Conference on the Quantitative Evaluation of SysTems (QEST 2008)

Abstract: Profiling is essential for finding execution time hot spots in applications. However, in embedded systems resources are usually scarce and profiling is not an option, although the detection and optimization of hot spots is particularly important in such resource-constrained systems. In this paper we propose cross-profiling for embedded systems equipped with a Java processor; the cross-profiles are collected in any standard Java environment, but represent the execution time metrics of the embedded target platform. We present a novel cross-profiler that relies on Java bytecode instrumentation and generates calling-contextsensitive cross-profiles with CPU cycle estimations for each calling context. Our cross-profiler reconciles platformindependence, portability, compatibility with standard Java runtime systems, complete bytecode coverage, moderate profiling overhead, and high accuracy of the generated cross-profiles.

Get crossprofiling_qest2008.pdf (241.0508KB)


25/2008 : Performance Evaluation of a Java Chip-Multiprocessor
Christof Pitter, Martin Schoeberl
IEEE Third Symposium on Industrial Embedded Systems (SIES’2008)

Abstract: Chip multiprocessing design is an emerging trend for embedded systems. In this paper, we introduce a Java multiprocessor system-on-chip called JopCMP. It is a symmetric shared-memory multiprocessor and consists of up to 8 Java Optimized Processor (JOP) cores, an arbitration control device, and a global shared memory. All components are interconnected with a system-on-chip bus. This paper focuses on the performance evaluation of different hardware configurations of the multicore system. Therefore, we vary the instruction cache sizes, the number of processors and the memory bandwidth. Within our experiments, we measure the performance by running three benchmarks on real hardware: an embedded application from industry, a computationally intensive matrix multiplication and a synthetic benchmark that continuously accesses a shared data structure. Two different field-programmable gate arrays are used for the presented experiments. Our results illustrate the promises and limits of the proposed multiprocessor architecture concerning synchronization, memory bandwidth and caching. Furthermore, we compare the performance and size of JopCMP with a complex Java processor.

Get sies_paper_final.pdf (193.4033KB)


9/2008 : Application Experiences with a Real-Time Java Processor
Martin Schoeberl
Proceedings of the 17th IFAC World Congress

Abstract: In this paper we present three different industrial real-time applications that are based on an embedded Java processor. Although from different application domains all three projects have one topic in common: communication. Today's embedded systems are networked systems. Either a proprietary protocol is used due to legacy applications or for real-time aspects or standard Internet protocols are used. We present the challenges and solutions for this variety of protocols in small, memory constraint embedded devices.

Get jop_app.pdf (303.4043KB)


5/2008 : Java Interrupt Handling
Stephan Korsholm, Martin Schoeberl, Anders P. Ravn
11th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC 2008)

Abstract: An important part of implementing device drivers is to control the interrupt facilities of the hardware platform and to program interrupt handlers. Current methods for handling interrupts in Java use a server thread waiting for the VM to signal an interrupt occurrence. It means that the interrupt is handled at a later time, which has some disadvantages. We present constructs that allow interrupts to be handled directly and not at a later point decided by a scheduler. A desirable feature of our approach is that we do not require a native middelware layer but can handle interrupts entirely with Java code. We have implemented our approach using an interpreter and a Java processor, and give an example demonstrating its use.

Get ihjava_isorc2008.pdf (111.4346KB)


4/2008 : Toward Libraries for Real-time Java
Trevor Harmon, Martin Schoeberl, Raymond Klefstad
11th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC 2008)

Abstract: Reusable libraries are problematic for real-time software in Java. Using Java's standard class library, for example, demands meticulous coding and testing to avoid response time spikes and garbage collection. We propose two design requirements for reusable libraries in real-time systems: worst-case execution time (WCET) bounds and worst-case memory consumption bounds. Furthermore, WCET cannot be known if blocking method calls are used. We have applied these requirements to the design of three Java-based prototypes: a set of collection classes, a networking stack, and trigonometric functions. Our prototypes show that reusable libraries can meet these requirements and thus be viable for real-time systems.

Get rtlib_isorc2008.pdf (671.9307KB)


3/2008 : Hardware Objects for Java
Martin Schoeberl, Christian Thalinger, Stephan Korsholm, Anders P. Ravn
11th IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC 2008)

Abstract: Java, as a safe and platform independent language, avoids access to low-level I/O devices or direct memory access. In standard Java, low-level I/O it not a concern; it is handled by the operating system. However, in the embedded domain resources are scarce and a Java virtual machine (JVM) without an underlying middleware is an attractive architecture. When running the JVM on emph{bare metal}, we need access to I/O devices from Java; therefore we investigate a safe and efficient mechanism to represent I/O devices as first class Java objects, where device registers are represented by object fields. Access to those registers is safe as Java's type system regulates the access. The access is also fast as it is directly performed by the bytecodes getfield and putfield. Hardware objects thus provide an object-oriented abstraction of low-level hardware devices. As a proof of concept, we have implemented hardware objects in three quite different JVMs: in the Java processor JOP, the JIT compiler CACAO, and in the interpreting embedded JVM SimpleRTJ.

Get hwobj_isorc2008_final.pdf (209.9131KB)


2/2008 : A Modular Worst-case Execution Time Analysis Tool for Java Processors
Trevor Harmon, Martin Schoeberl, Raymond Klefstad
Proceedings of the 14th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2008)

Abstract: Recent technologies such as the Real-Time Specification for Java promise to bring Java's advantages to real-time systems. While these technologies have made Java more predictable, they lack a crucial element: support for determining the worst-case execution time (WCET). Without knowledge of WCET, the correct temporal behavior of a Java program cannot be guaranteed. Although considerable research has been applied to the theory of WCET analysis, implementations are much less common, particularly for Java. Recognizing this deficiency, we have created an open-source, extensible tool that supports WCET analysis of Java programs. Designed for flexibility, it is built around a plug-in model that allows features to be incorporated as needed. Users can plug in various processor models, loop bound detectors, and WCET analysis algorithms without having to understand or alter the tool's internals. % By default, the tool provides plug-ins for an annotation-based loop bound detector, a timing model for the Java Optimized Processor (JOP), and both tree- and graph-based analysis algorithms.

Get paper_subm_RTAS2008_final.pdf (336.7510KB)


58/2007 : SimpCon - a Simple and Efficient SoC Interconnect
Martin Schoeberl
Proceedings of the 15th Austrian Workhop on Microelectronics, Austrochip 2007

Abstract: To build a system-on-chip (SoC) a common interface standard is necessary to connect ready-to-use components (IPs) from different vendors. Today several SoC interconnect standards, such as AMBA, Wishbone, OPB, and Avalon, are in use. We show in this paper that those standards have a common drawback for on-chip interconnections: They are built on the model of a common back-plane bus that does not fit very well for on-chip interconnections. We provide a new, simple on-chip interconnect specification for the well accepted master/slave model. It is intended to provide pipelined access to devices such as on-chip peripherals and on-chip memory controller with minimum hardware resources.

Get simpcon_austrochip2007.pdf (147.9219KB)


53/2007 : Garbage Collection for Safety Critical Java
Martin Schoeberl, Jan Vitek
Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems (JTRES 2007)

Abstract: The Real-time Specification for Java and the upcoming, and more restricted, Safety Critical Java standard have been designed to allow programmers to avoid pauses caused by automatic memory management algorithms. Dynamic memory is user-managed using a region-based allocation scheme known as scoped memory areas. However, usage of those scoped memories is cumbersome and often leads to runtime errors. In this paper we focus on the safety critical subset of the Real-time Specification for Java and propose a real-time garbage collector that can be scheduled like a normal real-time thread with a deadline monotonic assigned priority. The restricted programming model offered by Safety Critical Java allows us to substantially simplify the collector. Our proposal has been implemented and evaluated in the context of the JOP project. JOP is a Java processor especially designed for embedded real-time systems. The architecture is optimized for worst-case execution time (WCET) instead of the usual optimization for average case execution time. Execution time of bytecodes is known cycle accurate.

Get scjgc_final.pdf (167.1768KB)


52/2007 : Towards a Java Multiprocessor
Christof Pitter, Martin Schoeberl
Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems (JTRES 2007)

Abstract: This paper describes the first steps towards a Java multiprocessor system on a single chip for embedded systems. The chip multiprocessing (CMP) system consists of a homogeneous set of processing elements and a shared memory. Each processor core is based on the Java Optimized Processor (JOP). A major challenge in CMP is the shared memory access of multiple CPUs. The proposed memory arbiter resolves possible emerging conflicts of parallel accesses to the shared memory using a fixed priority scheme. Furthermore, the paper describes the boot-up of the CMP. We verify the proposed CMP architecture by the implementation of the prototype called JopCMP. JopCMP consists of multiple JOPs and a shared memory. Finally yet importantly, the first implementation of the CMP composed of two/three JOPs in an FPGA enables us to present a comparison of the performance between a single-chip JOP and the CMP version by running real applications.

Get jopcmp_jtres07.pdf (252.4600KB)


51/2007 : Architecture for Object Oriented Programming Languages
Martin Schoeberl
Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems (JTRES 2007)

Abstract: In this paper we investigate the overheads of object-oriented operations, such as virtual method dispatch and field access, in the context of an embedded processor for real-time systems. As an example we use a Java processor that implements those operations in microcode similar to the way those operations are compiled to a RISC processor. As this processor is a soft-core, implemented in an FPGA, an optimization of those operations is a valuable option. Significant application speedup is possible by providing an architecture for object-oriented programming languages. We also evaluate the hardware cost of this optimization with respect to the application speedup.

Get oohw_final.pdf (134.7061KB)


50/2007 : picoJava-II in an FPGA
Wolfgang Puffitsch, Martin Schoeberl
Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems (JTRES 2007)

Abstract: picoJava is a Java microprocessor developed by Sun to speedup execution of Java in embedded systems and an often-cited reference design for other Java processors. Information about implementations of picoJava is rare however. In contrast to a number of new Java processors which are targeted at FPGAs, picoJava was designed for ASICs, and no implementation in an FPGA is known up to date. In this paper we show the implementation and evaluation of Sun's picoJava-II microprocessor in an FPGA.

Get pjfpga_final.pdf (305.7549KB)


38/2007 : Time Predictable CPU and DMA Shared Memory Access
Christof Pitter, Martin Schoeberl
International Conference on Field-Programmable Logic and its Applications (FPL 2007)

Abstract: In this paper we propose a first step towards a time predictable computer architecture for single-chip multiprocessing (CMP). CMP is the actual trend in server and desktop systems. CMP is even considered for embedded real-time systems, where worst-case execution time (WCET) estimates are of primary importance. We attack the problem of WCET analysis for several processing units accessing a shared resource (the main memory) by support from the hardware. In this paper we combine a time predictable Java processor and a direct memory access (DMA) unit with a regular access pattern (VGA controller). We analyze and evaluate different arbitration schemes with respect to schedulability analysis and WCET analysis. We also implement the various combinations in an FPGA. An FPGA is the ideal platform to verify the different concepts and evaluate the results by running applications with industrial background in real hardware.


37/2007 : A Time-Triggered Network-on-Chip
Martin Schoeberl
International Conference on Field-Programmable Logic and its Applications (FPL 2007)

Abstract: In this paper we propose a time-triggered network-on-chip (NoC) for on-chip real-time systems. The NoC provides time predictable on- and off-chip communication, a mandatory feature for dependable real-time systems. A regular structured NoC with a pseudo-static communication schedule allows for a high bandwidth. In this paper we argue for a simple, time-triggered NoC structure to achieve maximum bandwidth. We have implemented the proposed TT-NoC in a low-cost FPGA. The base bandwidth is 29 Gbit/s and the peak bandwidth 230 Gbit/s for eight nodes. The idea is in line with current on-chip multiprocessor designs, such as the Cell processor. The simple design of the network and the network interface easies certification of the proposed NoC for safety critical applications.

Get ttnoc_fpl2007.pdf (174.5605KB)


29/2007 : Modeling the Function Cache for Worst-Case Execution Time Analysis
Martin Schoeberl
44th ACM Design Automation Conference (DAC'07)

Abstract: Static worst-case execution time (WCET) analysis is done by modeling the hardware behavior. In this paper we describe a WCET analysis technique to analyze systems with "function caches", a special kind of instruction cache that caches whole functions only. This cache was designed with the aim to be more predictable for the worst-case than existing instruction caches. Within this paper we developed a cache analysis technique for the function cache. One of the new concepts of this analysis technique is the "local persistence" analysis, which allows to precisely model the function cache.

Get rr-2007-29_dac07.pdf (145.2773KB)


9/2007 : A Profile for Safety Critical Java
Martin Schoeberl, Hans Sondergaard, Bent Thomsen, Anders P. Ravn
10th IEEE International Symposium on Object and component-oriented Real-time distributed Computing (ISORC2007)

Abstract: In this paper we propose a new, minimal specification for real-time Java for safety critical applications. The intention is to provide a profile that supports programming of applications that can be validated against safety critical standards such as DO-178B. The proposed profile is in line with the Java specification request JSR-302: Safety Critical Java Technology, which is still under discussion. In contrast to the current direction of the expert group for the JSR-302 we do not subset the rather complex Real-Time Specification for Java (RTSJ). Nevertheless, our profile can be implemented on top of an RTSJ compliant JVM.

Get scjava_isorc2007.pdf (128.5771KB)


4/2007 : Mission Modes for Safety Critical Java
Martin Schoeberl
5th IFIP Workshop on Software Technologies for Future Embedded & Ubiquitous Systems

Abstract: Java is now considered as a language for the domain of safety critical applications. A restricted version of the Real-Time Specification for Java (RTSJ) is currently under development within the Java Specification Request (JSR) 302. The application model follows the Ravenscar Ada approach with a fixed number of threads during the mission phase. This static approach simplifies certification against safety critical standards such as DO-178B. In this paper we extend this restrictive model by mission modes. Mission modes are intended to cover different modes of a real-time application during runtime without a complete restart. Mission modes are still simpler to analyze with respect to WCET and schedulability than the full dynamic RTSJ model. Furthermore our approach to thread stopping during a mode change provides a clean coordination between the runtime system and the application threads.

Get scjava_modes.pdf (88.7012KB)


114/2006 : A Time-Triggered Network-on-Chip
Martin Schoeberl

Abstract: In this paper we propose a time-triggered network-on-chip (NoC) for on-chip real-time systems. The NoC provides time predictable on- and off-chip communication, a mandatory feature for dependable real-time systems. A regular structured NoC with a pseudo-static communication schedule allows for a high bandwidth. It is even possible to implement an on-chip bus with a broadcast bandwidth of 29 Gbit/s (and a peak bandwidth of 230 Gbit/s for eight nodes) inside a low-cost FPGA. In this paper we argue for a simple, time-triggered NoC structure to achieve maximum bandwidth. This is in line with current on-chip multiprocessor designs, such as the Cell processor. The simple design of the network and the network interface allows certification of the proposed NoC for safety critical applications.

Get ttnoc_tr.pdf (139.3574KB)


66/2006 : Exact Roots for a Real-Time Garbage Collector
Rasmus Pedersen, Martin Schoeberl
The 4th Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2006)

Abstract: Garbage collection is traditionally not used in real-time systems due to the unpredictable temporal behavior of current implementations of a garbage collector. However, without garbage collection the programming model is very different from standard Java. It is the opinion of the authors that garbage collection algorithms can be adapted to meet even the requirements for hard real-time systems. One important property of a real-time garbage collector is to identify only the real roots on the root scan. Misinterpreting primitive values as false root pointers can result in an unpredictable worst case memory consumption. In this paper we propose a method to add information on the stack layout to the runtime data structure in order to find the roots exactly. Furthermore, interpreting this information during the collection process is implemented to be worst-case execution time analyzable.

Get gcroots_jtres2006.pdf (178.1475KB)


65/2006 : WCET Analysis for a Java Processor
Rasmus Pedersen, Martin Schoeberl
The 4th Workshop on Java Technologies for Real-time and Embedded Systems (JTRES 2006)

Abstract: In this paper we propose a solution for a worst-case execution time (WCET) analyzable Java system: a combination of a time predictable Java processor and a tool that performs WCET analysis of Java bytecode. We present a Java processor, called JOP, designed for time-predictable execution of real-time tasks. JOP is an implementation of the Java virtual machine (JVM) in hardware. The execution time of bytecodes, the instructions of the JVM, is known cycle accurate for JOP. Therefore, JOP simplifies the low-level WCET analysis. A method cache, that fills whole Java methods into the cache, is analyzable with respect to the WCET. The WCET analysis tool is based on integer linear programming. The tool performs the low-level analysis at the bytecode level and integrates the method cache analysis for a two block cache.

Get wcet_jtres2006.pdf (413.8965KB)


64/2006 : An Embedded Support Vector Machine
Rasmus Pedersen, Martin Schoeberl
Fourth International Workshop on Intelligent Solutions in Embedded Systems

Abstract: In this paper we work on the balance between hardware and software implementation of a machine learning algorithm, which belongs to the area of statistical learning theory. We use system-on-chip technology to demonstrate the potential usefulness of moving the critical sections of an algorithm into HW: the so-called hardware/software balance. Our experiments show that the approach can achieve speedups using a complex machine learning algorithm called a support vector machine. The experiments are conducted on a real-time Java Virtual Machine named Java Optimized Processor.

Get rtsvm_final.pdf (99.0537KB)


46/2006 : Instruction Cache für Echtzeitsysteme
Martin Schoeberl
Patentschrift Nr. 500858

Get patent_pct_korr_20051104.pdf (27.1807KB)


44/2006 : Real-Time Garbage Collection for Java
Martin Schoeberl
ISORC 2006

Abstract: Automatic memory management or garbage collection greatly simplifies the development of large systems. However, garbage collection is usually not used in real-time systems due to the unpredictable temporal behavior of current implementations of a garbage collector. In this paper we propose a concurrent collector that is scheduled periodically in the same way as ordinary application threads. We provide an upper bound for the collector period so that the application threads never run out of memory.

Get rtgc_sched.pdf (278.0029KB)


22/2006 : A Time Predictable Java Processor
Martin Schoeberl
DATE 2006

Abstract: This paper presents a Java processor, called JOP, designed for time-predictable execution of real-time tasks. JOP is the implementation of the Java virtual machine in hardware. We propose a processor architecture that favors low worst-case execution time (WCET) over average case performance. The resulting processor is an easy target for the low-level WCET analysis.

Get jop_wcet.pdf (86.4365KB)


52/2005 : Evaluation of a Java Processor
Martin Schoeberl
Austrochip 2005

Abstract: In this paper, we will present the evaluation results for a Java processor, with respect to size and performance. The Java Optimized Processor (JOP) is an implementation of the Java virtual machine (JVM) in a low-cost FPGA. JOP is the smallest hardware realization of the JVM available to date. Due to the efficient implementation of the stack architecture, JOP is also smaller than a comparable RISC processor in an FPGA. Although JOP is intended as a processor for embedded realtime systems, whereas accurate worst case execution time analysis is more important than average case performance, its general performance is still important. We will see that a real-time processor architecture does not need to be slow.


44/2005 : Automatic Generation of Application-Specific Systems Based on a Micro-programmed Java Core
Martin Schoeberl, Flavius Gruian, Per Andersson, Krzysztof Kuchcinski
Proceedings of the 2005 ACM symposium on Applied computing

Abstract: This paper describes a co-design based approach for automatic generation of application specific systems, suitable for FPGA-centric embedded applications. The approach augments a processor core with hardware accelerators extracted automatically from a high-level specification (Java) of the application, to obtain a custom system, optimised for the target application. We advocate herein the use of a microprogrammed core as the basis for system generation in order to hide the hardware access operations in the micro-code, while conserving the core data-path (and clock frequency). To prove the feasibility of our approach, we also present an implementation based on a modified version of the Java Optimized Processor soft core on a Xilinx Virtex-II FPGA.


43/2005 : Design and Implementation of an Efficient Stack Machine
Martin Schoeberl
Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International (IPDPS)

Abstract: Although virtually every processor today uses a loadstore register architecture, stack architectures attract attention again due to the success of Java. The intermediate language of Java, the Java bytecodes, is stack based and therefore a hardware realization of the Java Virtual Machine (JVM), a Java processor, is also stack based. In this paper two different architectures, found in Java processors, are presented. Detailed analysis of the JVM access patterns to the stack prove that a simpler and faster solution is possible. The proposed solution is a stack with two levels of on-chip cache.


/2005 : JOP: A Java Optimized Processor for Embedded Real-Time Systems
Martin Schoeberl

Abstract: Compared to software development for desktop systems, current software design practice for embedded systems is still archaic. C/C++ and even assembler are used on top of a small real-time operating system. Many of the benefits of Java, such as safe object references, the notion of concurrency as a first-class language construct, and its portability, have the potential to make embedded systems much safer and simpler to program. However, Java technology is seldom used in embedded systems, due to the lack of acceptable real-time performance. This thesis presents a Java processor designed for time-predictable execution of real-time tasks. JOP (Java Optimized Processor) is the implementation of the Java virtual machine in hardware. JOP is intended for applications in embedded real-time systems and the primary implementation technology is in a field programmable gate array. This research demonstrates that a hardware implementation of the Java virtual machine results in a small design for resource-constrained devices. Architectural advancements in modern processor designs increase average performance with features such as pipelines, caches and branch prediction. However, these features complicate worst-case execution time (WCET) analysis and lead to very conservative WCET estimates. This thesis tackles this problem from the architectural perspective – by introducing a processor architecture in which simpler and more accurate WCET analysis is more important than average case performance. This thesis evaluates the issues surrounding the use of standard Java for real-time applications. In order to overcome some of the issues with standard Java, a profile for real-time Java is defined. Tight integration of the real-time scheduler with the supporting processor result in an efficient platform for Java in embedded real-time systems. The proposed processor and the Java real-time profile have been used with success to implement several commercial real-time applications.

Get schoeberl_thesis.pdf (1737.1377KB)


113/2004 : A Time Predictable Instruction Cache for a Java Processor
Martin Schoeberl
On the Move to Meaningful Internet Systems 2004: Workshop on Java Technologies for Real-Time and Embedded Systems (JTRES 2004)

Abstract: Cache memories are mandatory to bridge the growing gap between CPU speed and main memory access time. Standard cache organizations improve the average execution time but are difficult to predict for worst case execution time (WCET) analysis. This paper proposes a different cache architecture, intended to ease WCET analysis. The cache stores complete methods and cache misses occur only on method invocation and return. Cache block replacement depends on the call tree, instead of instruction addresses.

Get jtres_cache.pdf (67.0010KB)


112/2004 : Java Technology in an FPGA
Martin Schoeberl
International Conference on Field-Programmable Logic and its Applications (FPL 2004)

Abstract: The application of Field Programmable Gate Arrays (FPGA) has moved from simple glue logic to complete systems. The potential for FPGA use in embedded systems is steadily increasing continuously opening up new appli-cation areas. Low cost FPGA devices are available in logic densities where the CPU with necessary peripheral device can be integrated in a single device. Java, with its pragmatic approach to object orientation and enhancements over C, got very popular for desktop and server application development. Some features of Java, such as thread support in the language, could greatly simplify develop-ment of embedded systems. However, due to resource constraints in embedded systems, the common implementations of the Java Virtual Machine (JVM), as interpreter or just-in-time compiler, are not practical. This paper describes an alternative approach: JOP (a Java Optimized Processor) is a hardware imple-mentation of the JVM with short and predictable execution time of most byte-codes. JOP is implemented as a configurable soft core in an FPGA. With JOP it is possible to develop applications in pure Java on resource constraint devices.

Get fpl2004.pdf (128.7607KB)


111/2004 : Design Rationale of a Processor Architecture for Predictable Real-Time Execution of Java Programs
Martin Schoeberl
10th International Conference on Real-Time and Embedded Computing Systems and Applications (RTCSA 2004)

Abstract: Many of the benefits of Java, such as safe object references, notion of concurrency as a first-class language construct and its portability have the po-tential to make embedded systems much safer and simpler to program. How-ever, Java technology is seldom used in embedded systems due to the lack of acceptable real-time performance. This paper provides a short overview of the issues with Java in real-time systems and the Real-Time Specification of Java (RTSJ) that addresses most of these problems. A simple real-time profile is pre-sented and the implementation of this profile on top of a Java processor, de-signed for real-time systems, is described in detail. Performance comparison be-tween this solution and the reference implementation of RTSJ on top of Linux show that a dedicated Java processor, without an underlying operating system, is more time predictable than an adoption of a general purpose OS for real-time systems.

Get design.pdf (201.7305KB)


110/2004 : Real-Time Scheduling on a Java Processor
Martin Schoeberl
10th International Conference on Real-Time and Embedded Computing Systems and Applications (RTCSA 2004)

Abstract: This paper presents the lessons learned by implementing a real-time scheduler for Java on a Java processor. A pure Java system, without an underlying RTOS, is an unusual system with some interesting new properties. Java is a safer execution environment than C (e.g. no pointers) and the boundary between kernel and user space can become quite loose. Scheduling, usually part of the operating system or the Java Virtual Machine, is implemented in Java and executed in the same context as the application. This property provides an easy path to a framework for user-defined scheduling.

Get javasched.pdf (167.1406KB)


109/2004 : Restrictions of Java for Embedded Real-Time Systems
Martin Schoeberl
7th IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC 2004)

Abstract: Java, with its pragmatic approach to object orientation and enhancements over C, got very popular for desktop and server application development. The productivity in-crement of up to 40% compared with C++ [1] attracts also embedded systems programmers. However, standard Java is not practical on these usually small devices. This paper presents the status of restricted Java environments for embedded and real-time systems. For missing defini-tions, additional profiles are proposed. Results of the im-plementation on a Java processor show that it is possible to develop applications in pure Java on resource con-straint devices.

Get rtjava.pdf (193.9238KB)


68/2003 : Using a Java Optimized Processor in a Real World Application
Martin Schoeberl
Proceedings of the First Workshop on Intelligent Solutions in Embedded Systems (WISES 2003)

Abstract: Java, a popular programming language on desktop systems, is rarely used in embedded systems. Some features of Java, like thread support in the language, could greatly simplify development of embedded systems, but the common implementations of the JVM (Java Virtual Machine), as interpreter or just-in-time compiler, are not practical. This paper describes an alternative approach: JOP (a Java Optimized Processor) is a hardware implementation of the JVM with short and predictable execution time of most bytecodes. JOP is implemented as a configurable soft core in an FPGA. The experiences of the first application of JOP and the benefits from using an FPGA in an embedded distributed control system are described in the second part of this paper.

Get wises03.pdf (209.8125KB)


67/2003 : JOP: A Java Optimized Processor
Martin Schoeberl
On the Move to Meaningful Internet Systems 2003: Workshop on Java Technologies for Real-Time and Embedded Systems (JTRES 2003)

Abstract: Java is still not a common language for embedded systems. It posses language features, like thread support, that can improve embedded system development, but common implementations as interpreter or just-in-time compiler are not practical. JOP is a hardware implementation of the Java Virtual Machine with focus on real-time applications. This paper describes the architecture of JOP and proposes a simple real-time extension of Java for JOP. First application in an industrial system showed that JOP is one way to use Java in the embedded world.

Get jtres03.pdf (219.1455KB)


66/2003 : Design Decisions for a Java Processor
Martin Schoeberl
Tagungsband Austrochip 2003

Abstract: This paper describes design decisions for JOP, a Java Optimized Processor, implemented in an FPGA. FPGA density-price relationship makes it now possible to consider them not only for prototyping of processor designs but also as final implementation technology. However, using an FPGA as target platform for a processor different constraints influence the CPU architecture. Digital building blocks that map well in an ASIC can result in poor resource usage in an FPGA. Considering these constraints in the architecture can result in a tiny soft-core processor.

Get austrochip03.pdf (219.4297KB)



[ main page ]