LTH-image

Cloud Control Systems - Real Time Analytics

Simon Tuffs, consultant

Abstract:

Presentation slides

Event Driven, Service-Oriented Cloud Systems can be viewed as systems of multi-variable time-series data.  This makes them amenable to analysis using discrete-time signal-processing and feedback control systems analytic techniques.  Some illustrative examples from a real-world Cloud System are presented: Anomaly detection and classification using statistical tests, correlation, and causal inference based on service dependencies; Qualification of new code deployments into production, automated failure detection and recovery; and Auto-scaling challenges and remedies, with use of feed-forward predictors to survive outages,  A general architecture for Cloud Systems Analytics is presented, together with a view of the interesting challenges ahead.

Biography:

Dr. P. Simon Tuffs obtained his D.Phil degree in Self Tuning Control Systems at the University of Oxford, and has worked in a wide range of positions spanning Control and Signal Processing Systems, and Software Engineering for the past 25 years. He is currently engaged at Netflix performing Real-Time analytics: automated decision making based on real-time data to diagnose, adapt, and repair the large-scale Netflix Cloud system.