Tuesday, April 29, 2014
Over the course of past four decades, consistent and exponential improvement in transistor scaling coupled with continuous advances in general-purpose processor design has exponentially reduced the cost of raw material for computing, i.e., performance; making it a pervasive commodity. While in 1971, at the dawn of microprocessors, 1 MIPS (Million Instruction Per Second) cost $5000, today the same level of performance costs less than 5¢. This exponential reduction in the cost has made the computing industry, the industry of new capabilities. We are not an industry of replacement whose economic cycle relies on consumers replacing inventory when they run out of products. It is even hard to perceive running out of a software app such as Microsoft office. Instead, the computing industry’s economic ecosystem relies on continuously providing new capabilities both at the device and at the service level.
Before the effective end of Dennard scaling, we consistently improved performance and efficiency while maintaining generality in general-purpose computing. As the benefits from scaling diminish and the current paradigm ofthe microprocessor design, multicore processors, significantly falls short ofthe traditional cadence of performance, we are facing an “iron triangle”; we can only choose any two of performance, efficiency, and generality at the expense of the third. Energy efficiency now fundamentally limits microprocessor performance gains. These shortcomings may drastically curtail computing industry from continuously delivering new capabilities, the backbone of its economic ecosystem. Hence, developing solutions that improve performance and efficiency, while retaining as much generality as possible, are of outmost importance.
Radical departures from conventional approaches are necessary to provide energy-efficacy and large performance gains for a wide range of applications and domains. One such departure is general-purpose approximate computing, where error in computation is acceptable and the traditional robust digital abstraction of near-perfect accuracy is relaxed. Conventional techniques in energy-efficient computing navigate a design space defined by the two dimensions of performance and energy, and traditionally trade one for the other. General-purpose approximate computing explores a third dimension, error, and trades the accuracy of computation for gains in both energy and performance.
This radical approach in general-purpose computing will only be beneficial if a large body of applications can tolerate error during execution. Fortunately, as the landscape of computing is changing toward providing more personalize and more targeted experience for the users, a vast number of emerging applications are inherently approximate and error-resilient. These applications can be categorized to four classes:
(1) Applications with analog inputs (wearable electronics, voice recognition, scene reconstructions).
(2) Applications with analog output (multimedia).
(3) Applications with multiple possible answers (machine learning, web search, heuristics).
(4) Convergent applications (big data analytics, optimizations).
More importantly, in this realm of computing, the rate of data generation and collection is growing overwhelmingly beyond what conventional computing platforms can process. By trading off computation accuracy for gains in performance and efficiency, general-purpose approximate computing aims to exploit this emerging opportunity in the application level to tackle the aforementioned fundamental challenges in the transistor and architecture level. One may visualize these trade-offs as finding the Pareto-optimal points in the processor design space, as shown below. Traditionally, for any set of workloads, the set of possible processor implementations may be plotted, with energy efficiency on one axis and performance on the other, and the best implementations residing on the two-dimensional frontier. When approximation is supported, the degree of permissible error represents a third axis. The Pareto surface in this three-dimensional space represents the best points of performance, efficiency, and error. However, this surface is not yet well understood. Navigating this three dimensional space provides many opportunities for innovation across the entire system stack.
As an instance, analog circuits inherently trade accuracy for significant gains in energy-efficiency. However, it is challenging to utilize them in a way that is both programmable and generally useful. In our most recent work that will be presented in International Symposium on Computer Architecture (ISCA) on June 2014, we propose a solution—from circuit to compiler—that enables general-purpose use of limited-precision, analog hardware to accelerate “approximable” code—code that can tolerate imprecise execution. We utilize an algorithmic transformation that automatically converts approximable regions of code from a von Neumann model to an “analog” neural model. The core idea is to learn how aregion of approximable code behaves and automatically replace the original code with an efficient computation of the learned model. The neural transformation of general-purpose approximable code provides an avenue for realizing the benefits of analog computation while targeting code written in conventional languages. The insights from this work show that it is crucial to expose analog circuit characteristics to the compilation and neural network training phases. At run time, while the processor executes the program, it invokes a reconfigurable accelerator, which we named Neural Processing Unit (NPU), instead of running the original region of code. Our most recent work reports on the design and integration of a mixed-signal NPU for general-purpose code execution. The NPU model offers a way to exploit analog efficiencies, despite their challenges, for a wider range of applications than is typically possible. Further, mixed-signal execution delivers much larger savings for NPUs than digital. Analog neural acceleration provides whole application speedup of 3.7× and energy savings of 6.3× with quality loss less around 10%. Even though the results are very encouraging, there are still several challenges that need to be overcome. The full range of applications that can exploit mixed-signal NPUs is still unknown, as is whether it will be sufficiently large to drive adoption in high-volume microprocessors. It is still an open question how developers might reason about the acceptable level of error when an application undergoes an approximate execution including analog acceleration. Finally, in a noisy, high-performance microprocessor environment, it is unclear that an analog NPU would not be adversely affected. However, the significant gains from A-NPU acceleration and the diversity of the studied applications suggest a potentially promising path forward. This work also shows how relaxing the abstraction of near-perfect accuracy can provides a bridge between two disjoint models of computing, neuromorphic and von Neumann.
Despite its great potential, practical and prevalent use of general-purpose approximate computing requires techniques that seamlessly integrate with the current well-established practices of programming and system design and provide a smooth and evolutionary adaptation path for this revolutionary paradigm in computing.
In general, when conventional approaches run out of steam, it is time for extreme creativity. In fact, we may be living the most exciting era of computing!
Hadi Esmaeilzadeh is the Catherine M. and James E. Allchin Early Career Professor of Computer Science at Georgia Institute of Technology. His dissertation received the 2013 William Chan Memorial Dissertation Award from University of Washington. He founded the Alternative Computing Technologies (ACT) Lab, where he works with his students on developing new technologies and cross-stack solutions to develop the next generation computing systems for emerging applications. Hadi received his Ph.D. in Computer Science and Engineering from University of Washington in 2013. He has a Master’s degree in Computer Science from The University of Texas at Austin (2010), and a Master’s degree in Electrical and Computer Engineering from University of Tehran (2005). Hadi received the Google Research Faculty Award in 2013.
Hadi’s research is recognized by three Communications of the ACM Research Highlights and three IEEE Micro Top Picks. His work on dark silicon has been profiled in New York Times.