![]() ![]() This technique iterates over a set of points and marks as clusters points that are in regions with many nearby neighbors, while marking those in lower density regions as outliers. How DBSCAN WorksĭBSCAN is a clustering algorithm originally proposed in 1996 by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu. While there are many different clustering algorithms, each with their own tradeoffs, we use Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to determine which servers are not performing like the others. The advantage of using an unsupervised technique is that we do not need to have labeled data, i.e., we do not need to create a training dataset that contains examples of outliers. The goal of cluster analysis is to group objects in such a way that objects in the same cluster are more similar to each other than those in other clusters. To solve this problem we use cluster analysis, which is an unsupervised machine learning technique. Somewhere out there a few unhealthy servers lurk among thousands of healthy ones. The unhealthy server will respond to health checks and show normal system-level metrics but still be operating in a suboptimal state.Ī slow or unhealthy server is worse than a down server because its effects can be small enough to stay within the tolerances of our monitoring system and be overlooked by an on-call engineer scanning through graphs, but still have a customer impact and drive calls to customer service. For example, a server’s network performance might degrade and cause elevated request processing latency. The Netflix service currently runs on tens of thousands of servers typically less than one percent of those become unhealthy. In this post we’ll describe our automated outlier detection and remediation for unhealthy servers that has saved us from countless hours of late-night heroics. Similar to this, we set out to build a system that could look beyond the obvious and find the subtle differences in servers that could be causing production problems. This allows him to go beyond what others see to determine the non-obvious, like when someone is lying. In Netflix’s Marvel’s Daredevil, Matt Murdock uses his heightened senses to detect when a person’s actions are abnormal. We missed it amongst the thousands of other servers because we were looking for a clearly visible problem, not an insidious deviant. After an hour of searching we realize there is one rogue server in our farm causing the problem. None of our systems are obviously broken, but something is amiss and we’re not seeing it. and half of our reliability team is online searching for the root cause of why Netflix streaming isn’t working. Tracking down the Villains: Outlier Detection at Netflix
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |