A nice high-level overview of trends in DL hardware. My top takeaways are that mobile hardware for inference is increasingly ubiquitous, and that the near-future of meta-learning will require hardware that probably unites training and inference. Most current hardware focuses on one mode or the other, if my understanding is correct - though that is less true of RL.

Hardware for Deep Learning. Part 1: Introduction

Almost two years ago I started to include a Hardware section into my Deep Learning presentations. It was dedicated to a review of the current state and a set of trends for the nearest 1–5+ years.

Here is a version from April 2016, and here is an update from October 2017. Last year we saw a lot of interesting announces, I gave some talks with updated slides, and now I am updating it for February/March 2018. I will publish it soon as a separate presentation, and this text(s) will be a companion post(s) to the slides with the goal to make it more readable and useful as a reference.

I started to write it as a single post, but it soon became too large, so I decided to split it into a series of bite-sized posts.

I will constantly update the texts to fix errors and include recent news and announces. See the release notes at the bottom of the current post. Feel free to comment the posts and/or drop me an email to [email protected].


TL;DR / Executive Summary

Here is a short summary of what the series will be about.

  • There are two distinct working modes for the currently used Neural Networks (NN, aka Deep Learning, DL): Training (learning a set of weights for a NN designed to solve a specific task) and Inference (using a trained NN). Training is much more compute intensive process than inference. Many applications separate these two modes, but some tasks (like Deep Reinforcement Learning) may require tight integration of both. There is, in principle, another mode of Meta-Learning (finding the right architecture, parameters and so on), but let’s set it aside for now.
  • Deep Learning goes the same way as Bitcoin has passed. ̶M̶i̶n̶i̶n̶g̶ Training started on CPUs (Cental Processing Units, ordinary processors from Intel/AMD like Core i7, Ryzen and so on), then switched to GPUs (Graphic Processing Units from NVIDIA/AMD like NVIDIA GTX 1080 Ti), then switching to FPGAs (Field-Programmable Gate Arrays, an integrated circuit designed to be programmed by a customer) and ASICs (Application-specific Integrated Circuits, produced to be customized for special calculations rather than general-purpose computing).
  • Right now most of the training goes on GPUs, and it’s NVIDIA GPUs. AMD almost lost the battle, because their GPUs have very poor support in DL frameworks (while having very good performance). FPGAs and ASICs are on the rise (among the recent examples is the Google’s TPU, Tensor Processing Unit).
  • One of the next big thing in DL Hardware is mobile processors suited for DL, and almost every company is adding some ML/DL capabilities in the form of special instructions, optimized DSPs, and dedicated NPUs (Neural Processing Units). It’s mostly about inference, not training. Fast (and energy efficient) mobile processors will allow using [already trained] models to instantaneously processing data without the need to send it to the cloud, so reducing latency and increasing privacy/security, and this will probably lead to another cambrian explosion of AI applications.
  • Having increased computing power at the edge (mobile, wearables, home devices, IoT, etc) could advance distributed training modes. This topic requires more research and experiments.
  • There is an interesting field called neuromorphic computing, and some interesting things are happening, i.e. IBM already has its TrueNorth chip, Intel announces its Loihi chip. Two main advantages of neuromorphic architectures are that 1) they are more suitable for the brain-like computations; 2) they are potentially much more energy efficient (i.e. TrueNorth consumes only 70mW of energy, compare this to 250W for the top NVIDIA GPUs).
  • Memristors are being researched, and they could advance neuromorphic computing even further. They provide a completely different circuitry comparing to previously mentioned neuromorphic processors still based on transistors. Right now it’s hard to expect something useful in production.
  • Quantum computers (QC) are on the rise, the last year we saw a great deal of achievements from all the major companies including Google, IBM, Microsoft, Intel and so on. Quantum computing can advance ML field in different ways, starting from increasing speed of many classical algorithms to completely new ones and development of the field of Quantum Machine Learning. But there are still many obstacles to overcome, and there is a long way before we can use QCs for real-life large datasets.



Part 1: Introduction and Executive summary (this post)

Part 2: CPU

Part 3: GPU

Part 4: FPGA

Part 5: ASIC

Part 6: Mobile AI

Part 7: Neuromorphic computing

Part 8: Quantum computing

Release Notes

2018/02/26: “Part 1: Introduction" published.

2018/02/26: “Part 2: CPU" published.

2018/03/14: “Part 3: GPU" published