
- Read more about PositNN: Training Deep Neural Networks with Mixed Low-Precision Posit
- Log in to post comments
Low-precision formats have proven to be an efficient way to reduce not only the memory footprint but also the hardware resources and power consumption of deep learning computations. Under this premise, the posit numerical format appears to be a highly viable substitute for the IEEE floating-point, but its application to neural networks training still requires further research. Some preliminary results have shown that 8-bit (and even smaller) posits may be used for inference and 16-bit for training, while maintaining the model accuracy.
poster.pdf

- Categories:

- Read more about dMazeRunner: Optimizing Convolutions on Dataflow Accelerators
- Log in to post comments
- Categories:

- Read more about dMazeRunner: Optimizing Convolutions on Dataflow Accelerators
- Log in to post comments
- Categories:

- Read more about dMazeRunner: Optimizing Convolutions on Dataflow Accelerators
- Log in to post comments
- Categories:

- Read more about PROCESSING CONVOLUTIONAL NEURAL NETWORKS ON CACHE
- Log in to post comments
With the advent of Big Data application domains, several Machine Learning (ML) signal-processing algorithms such as Convolutional Neural Networks (CNNs) are required to process progressively larger datasets at a great cost in terms of both compute power and memory bandwidth. Although dedicated accelerators have been developed targeting this issue, they usually require moving massive amounts of data across the memory hierarchy to the processing cores and low-level knowledge of how data is stored in the memory devices to enable in-/near-memory processing solutions.
- Categories:

- Read more about SIMPLIFIED DYNAMIC SC-FLIP POLAR DECODING
- Log in to post comments
SC-Flip (SCF) decoding is a low-complexity polar code decoding algorithm alternative to SC-List (SCL) algorithm with small list sizes. To achieve the performance of the SCL algorithm with large list sizes, the Dynamic SC-Flip (DSCF) algorithm was proposed. However, DSCF involves logarithmic and exponential computations that are not suitable for practical hardware implementations. In this work, we propose a simple approximation that replaces the transcendental computations of DSCF decoding. Moreover, we show how to incorporate fast decoding techniques with the DSCF algorithm.
- Categories:

- Read more about Lowering Dynamic Power of a Stream-based CNN Hardware Accelerator
- Log in to post comments
Custom hardware accelerators of Convolutional Neural Networks (CNN) provide a promising solution to meet real-time constraints for a wide range of applications on low-cost embedded devices. In this work, we aim to lower the dynamic power of a stream-based CNN hardware accelerator by reducing the computational redundancies in the CNN layers. In particular, we investigate the redundancies due to the downsampling effect of max pooling layers which are prevalent in state-of-the-art CNNs, and propose an approximation method to reduce the overall computations.
- Categories:

- Read more about SOLVING MEMORY ACCESS CONFLICTS IN LTE-4G STANDARD
- Log in to post comments
- Categories:

- Read more about FFTTA presentation
- Log in to post comments
This paper describes a low-power processor tailored for fast Fourier transform computations where transport triggering template is exploited. The processor is software-programmable while retaining an energy-efficiency comparable to existing fixed-function implementations. The power savings are achieved by compressing the computation kernel into one instruction word. The word is stored in an instruction loop buffer, which is more power-efficient than regular instruction memory storage.
10-05-19.pdf

14-05-19.pdf

- Categories: