Tor Aamodt

Advances in hardware performance have yielded improvements in machine learning by enabling deep neural networks trained on large datasets in reasonable time. However, scaling of existing hardware technologies is slowing and may (eventually) reach limits. This talk describes our recent work exploring approaches to accelerate the training of deep neural networks further in spite of this. Encouraging sparsity during training can reduce computation requirements. Two techniques for accomplishing this will be described: SWAT encourages sparsity in weights and activations; ReSprop avoids computations by selectively reusing gradients between successive training iterations. Recent parallel hardware provides limited support for exploiting the sparsity generated by techniques like SWAT and ReSprop. Thus, we propose and evaluate a hardware mechanism, ANT, for anticipating and eliminating redundant non-zero multiplications during training that result from mapping convolutions onto outer-product based systolic architectures. The talk will also describe our recent work on hardware and software techniques for employing lossy compression to enable larger model sizes.

Biography:

Tor Aamodt is a Professor in the Department of Electrical and Computer Engineering at the University of British Columbia where he has been a faculty member since 2006. His current research focuses on the architecture of general purpose GPUs and energy efficient computing. Three of his papers related to the architecture of general purpose GPUs have been selected as “Top Picks” by IEEE Micro Magazine and one as a “Research Highlight” by Communications of the ACM magazine. He is in the MICRO Conference Hall of Fame. He was an Associate Editor for IEEE Computer Architecture Letters from 2012 to 2015 and the International Journal of High Performance Computing Applications, was Program Chair for ISPASS 2013, General Chair for ISPASS 2014, and has served on numerous conference technical program committees. He was a Visiting Associate Professor in the Computer Science Department at Stanford University during his 2012-2013 sabbatical. He was awarded an NVIDIA Academic Partnership Award in 2010, an NSERC Discovery Accelerator for 2016-2019 and a 2016 Google Faculty Research Award (the first at UBC since 2012). From late 2004 to early 2006 he worked at NVIDIA on the memory system architecture (“framebuffer”) of the GeForce 8 Series GPU — the first NVIDIA GPU to support CUDA. He received his BASc (in Engineering Science), MASc and PhD at the University of Toronto.

He is a Professional Engineer in the province of British Columbia (registered with APEGBC).

FastPath 2022 Program

Tor Aamodt

FastPath 2022 - A Micro Workshop

Tor Aamodt

Professor – Department of Electrical and Computer Engineering – University of British Columbia

Invited Talk: Faster Learning on Slow Hardware

Slides

Video