Last week, while working on customer engagement, I learned a new method of quantifying behavior of time-series data. The method is called “Control Chart” and credit to Josh Wills, our director of data science, for pointing it out. I thought I’ll share it with my readers as its easy to understand, easy to implement, flexible and very useful in many situations.
The problem is ages old – you collect measurements over time and want to know when your measurements indicate abnormal behavior. “Abnormal” is not well defined, and thats on purpose – we want our method to be flexible enough to match what you define as an issue.
For example, lets say Facebook are interested in tracking usage trend for each user, catching those with decreasing use
There are few steps to the Control Chart method:
Those zones can be used to define rules for normal and abnormal behaviors of the system. These rules are what makes the system valuable.
Examples of rules that define abnormal behavior can be:
Note how flexible the method is – you can use any combination of rules that will highlight abnormalities you are interested in highlighting.
Also note that while the traditional use indeed involves charts, the values and rules are very easy to calculate programmatically and visualization can be useful but not mandatory.
If you measure CPU utilization on few servers, visualizing the chart and actually seeing the server behavior can be useful. If you are Facebook and monitor user behavior, visualizing a time series of every one of their millions of users is hopeless. Calculating baselines, standard deviations and rules for each use is trivial.
Also note how this problem is “embarrassingly parallel“. To calculate behavior for each user, you only need to look at data for that particular user. Parallel, share-nothing platform like Hadoop can be used to scale the calculation indefinitely simply by throwing increasing number of servers on the problem. The only limit is the time it takes to calculate the rules for a single user.
Naturally, I didn’t dive into some of the complexities in using Control Charts. Such as how to select a good baseline, how to calculate standard deviation (or whether to use another statistic to define zones) and how many measurements should be examined before a behavior signals a trend. If you think this tool is useful for you, I encourage you to investigate more deeply.
Chen (Gwen) Shapira