In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning.

However, good performance comes at a significant computational cost. This makes scaling training expensive, but an even more pertinent issue is inference, in particular for real-time applications (where runtime latency is critical) and edge devices (where computational and storage resources may be limited).

This talk will explore common techniques and emerging advances for dealing with these challenges, including best practices for batching; quantization and other methods for trading off computational cost at training vs inference performance; architecture optimization and graph manipulation approaches.



vbuzz 1
Berlin Buzzwords
08.06.2020 18:40 – 19:20