Review of methods for deep learning inference speed-up on CPU
It is well known that the explosive growth of computation power provided by the evolution of GPUs led to a new era of AI rise. And even when GPUs became an essential part of the training, applying them for inference is still could be not reasonable due to economic or technical restrictions. So what alternative do we have?
In this talk, I want to show that with some effort it’s feasible to achieve very close factor latency/cost by using CPU for inference. We’ll review in details various optimization approaches and see how they work under the hood. Talk will also include many hints on how to prepare your DL model to production and get maximum value from your hardware.