ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation

Fault-injection (FI) campaigns provide an in-depth resilience analysis of safety-critical systems in the presence of transient hardware faults. However, FI campaigns require many independent injection experiments and, combined, long run times, especially if we aim for a high coverage of the fault space. Besides reducing the number of pilot injections (e.g., with def-use pruning) in the first place, we can also speed up the overall campaign by speeding up individual experiments. From our experiments, we see that the timeout failure class is especially important here: Although timeouts account only for 8% (QSort) of the injections, they require 32% of the campaign run time.

In this paper, we analyze and discuss the nature of timeouts as a failure class, and reason about the general design of dynamic timeout detectors. Based on those insights, we propose ACTOR, a method to identify and abort stuck experiments early by performing autocorrelation on the branch-target history. Applied to seven MiBench benchmarks, we can reduce the number of executed post-injection instructions by up to 30%, which translates into an end-to-end saving of 27%. Thereby, the absolute classification error of experiments as timeouts was always less than 0.5%.

DDC Class

000: Allgemeines, Wissenschaft

Options

ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation