LTH-image

Dynamic Power Management in Data Centers: Theory & Practice

Mor Harchol-Balter, Computer Science Department, Carnegie Mellon University

Abstract:

Presentation slides

Energy costs for data centers continue to rise, already exceeding ten billion dollars yearly.  Sadly much of this power is wasted.  Servers are only busy 10-30% of the time, but they are often left on, while idle, utilizing 60% of more of peak power while in the idle state. The obvious solution is dynamic power management: turning servers off, or re-purposing them, when idle. The drawback is a prohibitive "setup cost" to get servers back on. The purpose of this talk is to understand the effect of the "setup cost" and whether dynamic power management makes sense.

We first turn to theory and study the effect of setup cost in an M/M/k queue.  We present the first analysis of the M/M/k/setup queueing system.  We do this by introducing a new technique for analyzing infinite, repeating, Markov chains, which we call Recursive Renewal Reward (RRR).

We then turn to implementation, where we implement and evaluate dynamic power management in a multi-tier data center with key-value store workload, reminiscent of Facebook or Amazon.  We propose a new dynamic algorithm, AutoScale, which is ideally suited to the case of unpredictable, time-varying load, and we show that AutoScale dramatically reduces power in data centers.

Joint work with: Anshul Gandhi, Alan Scheller-Wolf, and Mike Kozuch.

Biography:Mor Harchol-Balter is a Professor of Computer Science at Carnegie Mellon University.  From 2008-2011, she served as the Associate Department Head for Computer Science. Mor received her doctorate from the Computer Science department at U.C. Berkeley under the direction of Manuel Blum in 1996 and then spent three years at MIT under the NSF Postdoctoral Fellowship in the Mathematical Sciences, before joining CMU.  She is a recipient of the McCandless Chair, the NSF CAREER award, multiple best paper awards, and several teaching awards, including the Herbert A. Simon Award for Teaching Excellence.

Mor is heavily involved in the ACM SIGMETRICS research community, where she served as Technical Program Chair for Sigmetrics 2007 and as General Chair for Sigmetrics 2013.  Mor's work focuses on designing new resource allocation policies (load balancing policies, power management policies, and scheduling policies) for server farms.  In 2013, Mor authored her first textbook, "Performance Analysis and Design of Computer Systems," published by Cambridge University Press, 2013.  Mor is known for both her work in queueing analysis and in systems implementation, but she is perhaps best known for her many successful PhD students, the majority of whom are professors at top universities.