Browsing by Author "Nelis, Vincent"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
- An extensible framework for multicore response time analysisPublication . David, Robert I.; Altmeyer, Sebastian; Indrusiak, Leandro S.; Maiza, Claire; Nelis, Vincent; Reineke, JanIn this paper, we introduce a multicore response time analysis (MRTA) framework, which decouples response time analysis from a reliance on context-independent WCET values. Instead, the analysis formulates response times directly from the demands placed on different hardware resources. The MRTA framework is extensible to different multicore architectures, with a variety of arbitration policies for the common interconnects, and different types and arrangements of local memory. We instantiate the framework for single level local data and instruction memories (cache or scratchpads), for a variety of memory bus arbitration policies, including: Round-Robin, FIFO, Fixed-Priority, Processor-Priority, and TDMA, and account for DRAM refreshes. The MRTA framework provides a general approach to timing verification for multicore systems that is parametric in the hardware configuration and so can be used at the architectural design stage to compare the guaranteed levels of real-time performance that can be obtained with different hardware configurations. We use the framework in this way to evaluate the performance of multicore systems with a variety of different architectural components and policies. These results are then used to compose a predictable architecture, which is compared against a reference architecture designed for good average-case behaviour. This comparison shows that the predictable architecture has substantially better guaranteed real-time performance, with the precision of the analysis verified using cycle-accurate simulation.
- Contention-Free Execution of Automotive Applications on a Clustered Many-Core PlatformPublication . Becker, Matthias; Dasari, Dakshina; Nikolic, Borislav; Åkesson, Benny; Nelis, Vincent; Nolte, ThomasNext generations of compute-intensive real-time applications in automotive systems will require more powerful computing platforms. One promising power-efficient solution for such applications is to use clustered many-core architectures. However, ensuring that real-time requirements are satisfied in the presence of contention in shared resources, such as memories, remains an open issue. This work presents a novel contention-free execution framework to execute automotive applications on such platforms. Privatization of memory banks together with defined access phases to shared memory resources is the backbone of the framework. An Integer Linear Programming (ILP) formulation is presented to find the optimal time-triggered schedule for the on-core execution as well as for the access to shared memory. Additionally a heuristic solution is presented that generates the schedule in a fraction of the time required by the ILP. Extensive evaluations show that the proposed heuristic performs only 0.5% away from the optimal solution while it outperforms a baseline heuristic by 67%. The applicability of the approach to industrially sized problems is demonstrated in a case study of a software for Engine Management Systems.
- A framework for memory contention analysis in multi-core platformsPublication . Dasari, Dakshina; Nelis, Vincent; Akesson, BennyThe last decade has witnessed a major shift towards the deployment of embedded applications on multi-core platforms. However, real-time applications have not been able to fully benefit from this transition, as the computational gains offered by multi-cores are often offset by performance degradation due to shared resources, such as main memory. To efficiently use multi-core platforms for real-time systems, it is hence essential to tightly bound the interference when accessing shared resources. Although there has been much recent work in this area, a remaining key problem is to address the diversity of memory arbiters in the analysis to make it applicable to a wide range of systems. This work handles diverse arbiters by proposing a general framework to compute the maximum interference caused by the shared memory bus and its impact on the execution time of the tasks running on the cores, considering different bus arbiters. Our novel approach clearly demarcates the arbiter-dependent and independent stages in the analysis of these upper bounds. The arbiter-dependent phase takes the arbiter and the task memory-traffic pattern as inputs and produces a model of the availability of the bus to a given task. Then, based on the availability of the bus, the arbiter-independent phase determines the worst-case request-release scenario that maximizes the interference experienced by the tasks due to the contention for the bus. We show that the framework addresses the diversity problem by applying it to a memory bus shared by a fixed-priority arbiter, a time-division multiplexing (TDM) arbiter, and an unspecified work-conserving arbiter using applications from the MediaBench test suite. We also experimentally evaluate the quality of the analysis by comparison with a state-of-the-art TDM analysis approach and consistently showing a considerable reduction in maximum interference.
- Global and Partitioned Multiprocessor Fixed Priority Scheduling with Deferred Pre-emptionPublication . Davis, Robert I.; Burns, Alan; Marinho, José; Nelis, Vincent; Petters, Stefan M.; Bertogna, MarkoThis article introduces schedulability analysis for global fixed priority scheduling with deferred preemption (gFPDS) for homogeneous multiprocessor systems. gFPDS is a superset of global fixed priority pre-emptive scheduling (gFPPS) and global fixed priority non-pre-emptive scheduling (gFPNS). We show how schedulability can be improved using gFPDS via appropriate choice of priority assignment and final non-pre-emptive region lengths, and provide algorithms which optimize schedulability in this way. Via an experimental evaluation we compare the performance of multiprocessor scheduling using global approaches: gFPDS, gFPPS, and gFPNS, and also partitioned approaches employing FPDS, FPPS, and FPNS on each processor.
- High-performance parallelisation of real-time applicationsPublication . Pinho, Luís Miguel; Nelis, Vincent; Quinoñes, Eduardo; Burgio, Paolo; Marongiu, Andrea; Gai, Paolo; Sancho, JuanThis paper presents an overview of the P-SOCRATES methodology and tools, instantiated in the UpScale SDK (Software Development Kit) for the development of time-predictable high-performance applications. The proposed methodology was designed to provide an integrated SDK to fully exploit the huge performance opportunities brought by the most advanced many-core processors, whilst ensuring a predictable performance and maintaining (or even reducing) development costs of applications. The paper also provides the performance results of the application of the SDK in relevant embedded usecases.
- Investigation on AUTOSAR-Compliant Solutions for Many-Core ArchitecturesPublication . Becker, Matthias; Dasari, Dakshina; Nelis, Vincent; Behnam, Moris; Pinho, Luís Miguel; Nolte, ThomasAs of today, AUTOSAR is the de facto standard in the automotive industry, providing a common software architec- ture and development process for automotive applications. While this standard is originally written for singlecore operated Elec- tronic Control Units (ECU), new guidelines and recommendations have been added recently to provide support for multicore archi- tectures. This update came as a response to the steady increase of the number and complexity of the software functions embedded in modern vehicles, which call for the computing power of multicore execution environments. In this paper, we enumerate and analyze the design options and the challenges of porting AUTOSAR-based automotive applications onto multicore platforms. In particular, we investigate those options when considering the emerging many- core architectures that provide a more scalable environment than the traditional multicore systems. Such platforms are suitable to enable massive parallel execution, and their design is more suitable for partitioning and isolating the software components.
- NoC contention analysis using a branch-and-prune algorithmPublication . Dasari, Dakshina; Nikolic, Borislav; Nelis, Vincent; Petters, Stefan M.“Many-core” systems based on a Network-on-Chip (NoC) architecture offer various opportunities in terms of performance and computing capabilities, but at the same time they pose many challenges for the deployment of real-time systems, which must fulfill specific timing requirements at runtime. It is therefore essential to identify, at design time, the parameters that have an impact on the execution time of the tasks deployed on these systems and the upper bounds on the other key parameters. The focus of this work is to determine an upper bound on the traversal time of a packet when it is transmitted over the NoC infrastructure. Towards this aim, we first identify and explore some limitations in the existing recursive-calculus-based approaches to compute the Worst-Case Traversal Time (WCTT) of a packet. Then, we extend the existing model by integrating the characteristics of the tasks that generate the packets. For this extended model, we propose an algorithm called “Branch and Prune” (BP). Our proposed method provides tighter and safe estimates than the existing recursive-calculus-based approaches. Finally, we introduce a more general approach, namely “Branch, Prune and Collapse” (BPC) which offers a configurable parameter that provides a flexible trade-off between the computational complexity and the tightness of the computed estimate. The recursive-calculus methods and BP present two special cases of BPC when a trade-off parameter is 1 or ∞, respectively. Through simulations, we analyze this trade-off, reason about the implications of certain choices, and also provide some case studies to observe the impact of task parameters on the WCTT estimates.
- Response Time Analysis of Sporadic DAG Tasks under Partitioned SchedulingPublication . Fonseca, José; Nelissen, Geoffrey; Nelis, Vincent; Pinho, Luís MiguelSeveral schedulability analyses have been proposed for a variety of parallel task systems with real-time constraints. However, these analyses are mostly restricted to global scheduling policies. The problem with global scheduling is that it adds uncertainty to the lower-level timing analysis which on multicore systems are heavily context-dependent. As parallel tasks typically exhibit intense communication and concurrency among their sequential computational units, this problem is further exacerbated. This paper considers instead the schedulability of partitioned parallel tasks. More precisely, we present a response time analysis for sporadic DAG tasks atop multiprocessors under partitioned fixed-priority scheduling. We assume the partitioning to be given. We show that a partitioned DAG task can be modeled as a set of self-suspending tasks. We then propose an algorithm to traverse a DAG and characterize such worst-case scheduling scenario. With minor modifications, any state-of-the-art technique for sporadic self-suspending tasks can thus be used to derived the worstcase response time of a partitioned DAG task. Experiments show that the proposed approach significantly tightens the worst-case response time of partitioned parallel tasks comparatively to the state-of-the-art when the most accurate technique is chosen.
- The variability of application execution times on a multi-core platformPublication . Nelis, Vincent; Yomsi, Patrick; Pinho, Luís MiguelIt is a known fact that processes running concurrently on different cores in a multicore environment interfere with each other on the processor shared resources. The contention on these shared resources considerably slows down the execution on every core since sometimes the cores must stall while their requests to access the resources are being served. But by how much the execution may b e s lowed down due to this interference? In this pap er we answer this question with numbers coming from experimentation. That is, we quantify the magnitude of the impact of the interference on the execution time by running programs taken from the TACLeBench benchmark suite, a popular benchmark suite in the real-time research community, on the first generation of Kalray manycore processor family, the MPPA-256 (the development board) that goes by the code name “Andey”.
- Using Quicktrace to collect runtime execution traces easily and automaticallyPublication . Nelis, Vincent; Pinho, Luís MiguelIn many application domains, it is an elementary step in the design of a software, typically before its deployment, to exercise parts of its functionality by running some of its code on the target platform and collect informations about its runtime behaviour. Those informations may be used for debugging purpose or to assess the responsiveness of the application for example.
