The neuro vector engine: Flexibility to improve convolutional net efficiency for wearable vision

Deep Convolutional Networks (ConvNets) are currently superior in benchmark performance, but the associated demands on computation and data transfer prohibit straightforward mapping on energy constrained wearable platforms. The computational burden can be overcome by dedicated hardware accelerators, but it is the sheer amount of data transfer, and level of utilization that determines the energy-efficiency of these implementations. This paper presents the Neuro Vector Engine (NVE) a SIMD accelerator for ConvNets for visual object classification, targeting portable and wearable devices. Our accelerator is very flexible due to the usage of VLIW ISA, at the cost of instruction fetch overhead. We show that this overhead is insignificant when the extra flexibility enables advanced data locality optimizations, and improves HW utilization over ConvNet vision applications. By co-optimizing accelerator architecture and algorithm loop structure, 30 Gops is achieved with a power envelope of 54mW and only 0.26mm2 silicon footprint at TSMC 40nm technology, enabling high-end visual object recognition by portable and even wearable devices.

Options

The neuro vector engine: Flexibility to improve convolutional net efficiency for wearable vision