Browsing by Author "Dasari, Dakshina"
Now showing 1 - 10 of 11
Results Per Page
Sort Options
- An analysis of the impact of bus contention on the WCET in multicoresPublication . Dasari, Dakshina; Nélis, VincentThe use of multicores is becoming widespread inthe field of embedded systems, many of which have real-time requirements. Hence, ensuring that real-time applications meet their timing constraints is a pre-requisite before deploying them on these systems. This necessitates the consideration of the impact of the contention due to shared lowlevel hardware resources like the front-side bus (FSB) on the Worst-CaseExecution Time (WCET) of the tasks. Towards this aim, this paper proposes a method to determine an upper bound on the number of bus requests that tasks executing on a core can generate in a given time interval. We show that our method yields tighter upper bounds in comparison with the state of-the-art. We then apply our method to compute the extra contention delay incurred by tasks, when they are co-scheduled on different cores and access the shared main memory, using a shared bus, access to which is granted using a round-robin arbitration (RR) protocol.
- Contention-Free Execution of Automotive Applications on a Clustered Many-Core PlatformPublication . Becker, Matthias; Dasari, Dakshina; Nikolic, Borislav; Åkesson, Benny; Nelis, Vincent; Nolte, ThomasNext generations of compute-intensive real-time applications in automotive systems will require more powerful computing platforms. One promising power-efficient solution for such applications is to use clustered many-core architectures. However, ensuring that real-time requirements are satisfied in the presence of contention in shared resources, such as memories, remains an open issue. This work presents a novel contention-free execution framework to execute automotive applications on such platforms. Privatization of memory banks together with defined access phases to shared memory resources is the backbone of the framework. An Integer Linear Programming (ILP) formulation is presented to find the optimal time-triggered schedule for the on-core execution as well as for the access to shared memory. Additionally a heuristic solution is presented that generates the schedule in a fraction of the time required by the ILP. Extensive evaluations show that the proposed heuristic performs only 0.5% away from the optimal solution while it outperforms a baseline heuristic by 67%. The applicability of the approach to industrially sized problems is demonstrated in a case study of a software for Engine Management Systems.
- A framework for memory contention analysis in multi-core platformsPublication . Dasari, Dakshina; Nelis, Vincent; Akesson, BennyThe last decade has witnessed a major shift towards the deployment of embedded applications on multi-core platforms. However, real-time applications have not been able to fully benefit from this transition, as the computational gains offered by multi-cores are often offset by performance degradation due to shared resources, such as main memory. To efficiently use multi-core platforms for real-time systems, it is hence essential to tightly bound the interference when accessing shared resources. Although there has been much recent work in this area, a remaining key problem is to address the diversity of memory arbiters in the analysis to make it applicable to a wide range of systems. This work handles diverse arbiters by proposing a general framework to compute the maximum interference caused by the shared memory bus and its impact on the execution time of the tasks running on the cores, considering different bus arbiters. Our novel approach clearly demarcates the arbiter-dependent and independent stages in the analysis of these upper bounds. The arbiter-dependent phase takes the arbiter and the task memory-traffic pattern as inputs and produces a model of the availability of the bus to a given task. Then, based on the availability of the bus, the arbiter-independent phase determines the worst-case request-release scenario that maximizes the interference experienced by the tasks due to the contention for the bus. We show that the framework addresses the diversity problem by applying it to a memory bus shared by a fixed-priority arbiter, a time-division multiplexing (TDM) arbiter, and an unspecified work-conserving arbiter using applications from the MediaBench test suite. We also experimentally evaluate the quality of the analysis by comparison with a state-of-the-art TDM analysis approach and consistently showing a considerable reduction in maximum interference.
- Investigation on AUTOSAR-Compliant Solutions for Many-Core ArchitecturesPublication . Becker, Matthias; Dasari, Dakshina; Nelis, Vincent; Behnam, Moris; Pinho, Luís Miguel; Nolte, ThomasAs of today, AUTOSAR is the de facto standard in the automotive industry, providing a common software architec- ture and development process for automotive applications. While this standard is originally written for singlecore operated Elec- tronic Control Units (ECU), new guidelines and recommendations have been added recently to provide support for multicore archi- tectures. This update came as a response to the steady increase of the number and complexity of the software functions embedded in modern vehicles, which call for the computing power of multicore execution environments. In this paper, we enumerate and analyze the design options and the challenges of porting AUTOSAR-based automotive applications onto multicore platforms. In particular, we investigate those options when considering the emerging many- core architectures that provide a more scalable environment than the traditional multicore systems. Such platforms are suitable to enable massive parallel execution, and their design is more suitable for partitioning and isolating the software components.
- NoC contention analysis using a branch-and-prune algorithmPublication . Dasari, Dakshina; Nikolic, Borislav; Nelis, Vincent; Petters, Stefan M.“Many-core” systems based on a Network-on-Chip (NoC) architecture offer various opportunities in terms of performance and computing capabilities, but at the same time they pose many challenges for the deployment of real-time systems, which must fulfill specific timing requirements at runtime. It is therefore essential to identify, at design time, the parameters that have an impact on the execution time of the tasks deployed on these systems and the upper bounds on the other key parameters. The focus of this work is to determine an upper bound on the traversal time of a packet when it is transmitted over the NoC infrastructure. Towards this aim, we first identify and explore some limitations in the existing recursive-calculus-based approaches to compute the Worst-Case Traversal Time (WCTT) of a packet. Then, we extend the existing model by integrating the characteristics of the tasks that generate the packets. For this extended model, we propose an algorithm called “Branch and Prune” (BP). Our proposed method provides tighter and safe estimates than the existing recursive-calculus-based approaches. Finally, we introduce a more general approach, namely “Branch, Prune and Collapse” (BPC) which offers a configurable parameter that provides a flexible trade-off between the computational complexity and the tightness of the computed estimate. The recursive-calculus methods and BP present two special cases of BPC when a trade-off parameter is 1 or ∞, respectively. Through simulations, we analyze this trade-off, reason about the implications of certain choices, and also provide some case studies to observe the impact of task parameters on the WCTT estimates.
- Partitioning and Analysis of the Network-on-Chip on a COTS Many-Core PlatformPublication . Becker, Matthias; Nikolic, Borislav; Dasari, Dakshina; Åkesson, Benny; Nélis, Vincent; Behnam, Moris; Nolte, ThomasMany-core processors can provide the computational power required by future complex embedded systems. However, their adoption is not trivial, since several sources of interference on COTS many-core platforms have adverse effects on the resulting performance. One main source of performance degradation is the contention on the Network-on-Chip, which is used for communication among the compute cores via the offchip memory. Available analysis techniques for the traversal time of messages on the NoC do not consider many of the architectural features found on COTS platforms. In this work, we target a state-of-the-art many-core processor, the Kalray MPPA. A novel partitioning strategy for reducing the contention on the NoC is proposed. Further, we present an analysis technique dedicated to the proposed partitioning strategy, which considers all architectural features of the COTS NoC. Additionally, it is shown how to configure the parameters for flow-regulation on the NoC, such that the Worst-Case Traversal Time (WCTT) is minimal and buffers never overflow. The benefits of our approach are evaluated based on extensive experiments that show that contention is significantly reduced compared to the unconstrained case, while the proposed analysis outperforms a state-of-the-art analysis for the same platform. An industrial case study shows the tightness of the proposed analysis.
- Partitioning the Network-on-Chip to Enable Virtualization on Many-Core ProcessorsPublication . Becker, Matthias; Dasari, Dakshina; Nélis, Vincent; Behnam, Moris; Nolte, ThomasIn this paper, we highlight some key problems in NoC based architectures that must be addressed before the deployment of real-time applications onto these platforms becomes possible. A paradigm shift from function centric to data and communication centric approaches is required. Combining hardware and software based flow-regulation seems to be the only way to ensure that NoCs go beyond the best-effort service and address the requirements of diverse applications.
- Response time analysis of COTS-Based multicores considering the contention on the shared memory busPublication . Dasari, Dakshina; Andersson, Björn; Nélis, Vincent; Petters, Stefan M.; Easwaran, Arvind; Lee, JinkyuThe current industry trend is towards using Commercially available Off-The-Shelf (COTS) based multicores for developing real time embedded systems, as opposed to the usage of custom-made hardware. In typical implementation of such COTS-based multicores, multiple cores access the main memory via a shared bus. This often leads to contention on this shared channel, which results in an increase of the response time of the tasks. Analyzing this increased response time, considering the contention on the shared bus, is challenging on COTS-based systems mainly because bus arbitration protocols are often undocumented and the exact instants at which the shared bus is accessed by tasks are not explicitly controlled by the operating system scheduler; they are instead a result of cache misses. This paper makes three contributions towards analyzing tasks scheduled on COTS-based multicores. Firstly, we describe a method to model the memory access patterns of a task. Secondly, we apply this model to analyze the worst case response time for a set of tasks. Although the required parameters to obtain the request profile can be obtained by static analysis, we provide an alternative method to experimentally obtain them by using performance monitoring counters (PMCs). We also compare our work against an existing approach and show that our approach outperforms it by providing tighter upper-bound on the number of bus requests generated by a task.
- A tighter analysis of the worst-case endto- end communication delay in massive multicoresPublication . Nélis, Vincent; Dasari, Dakshina; Nikolic, Borislav; Petters, Stefan M."Many-core” systems based on the Network-on- Chip (NoC) architecture have brought into the fore-front various opportunities and challenges for the deployment of real-time systems. Such real-time systems need timing guarantees to be fulfilled. Therefore, calculating upper-bounds on the end-to-end communication delay between system components is of primary interest. In this work, we identify the limitations of an existing approach proposed by [1] and propose different techniques to overcome these limitations.
- Time-Triggered Co-Scheduling of Computation and Communication with Jitter RequirementsPublication . Minaeva, Anna; Åkesson, Benny; Hanzálek, Zdeněk; Dasari, DakshinaThe complexity of embedded application design is increasing with growing user demands. In particular, automotive embedded systems are highly complex in nature, and their functionality is realized by a set of periodic tasks. These tasks may have hard real-time requirements and communicate over an interconnect. The problem is to efficiently co-schedule task execution on cores and message transmission on the interconnect so that timing constraints are satisfied. Contemporary works typically deal with zero-jitter scheduling, which results in lower resource utilization, but has lower memory requirements. This article focuses on jitter-constrained scheduling that puts constraints on the tasks jitter, increasing schedulability over zero-jitter scheduling. The contributions of this article are: 1) Integer Linear Programming and Satisfiability Modulo Theory model exploiting problem-specific information to reduce the formulations complexity to schedule small applications. 2) A heuristic approach, employing three levels of scheduling scaling to real-world use-cases with 10,000 tasks and messages. 3) An experimental evaluation of the proposed approaches on a case-study and on synthetic data sets showing the efficiency of both zero-jitter and jitter-constrained scheduling. It shows that up to 28 percent higher resource utilization can be achieved by having up to 10 times longer computation time with relaxed jitter requirements.