As networks grow larger and more complex, network professionals are faced with an ever-increasing number of network incidents. Identifying potential issues is essential to maintaining uptime and performance, but with thousands of monitors, creating thresholds for each one isn't practical. That's where our dynamic threshold analysis comes in.
Network Node Manager i (NNMi) uses dynamic and static threshold violation algorithms to detect anomaly in performance data collected across discovered network infrastructure devices. While static thresholds are useful, dealing with them in certain scenarios can be a nightmare, even for those with enough knowledge of the domain and good control of monitored network infrastructure. This is where the need for dynamic thresholds arises. The success of dynamic thresholds depends on the effectiveness of the algorithm to keep false positives extremely low. This blog describes NNMi’s dynamic threshold capability and why it is significantly more powerful and accurate than more traditional statistical methods.
Dynamic Threshold Analysis
The process of detecting dynamic threshold violations involves baselining the input data. After baselining the input data, upper and lower threshold limits (also known as baseline sleeves) are established with respect to the baselined value. Subsequently, the streamed data is validated against the upper/lower limits, generating violation incidents upon limit breach. The effectiveness of the dynamic threshold analysis is dependent on the accuracy with which the algorithm identifies the seasonality and its trend and minimizes false positive threshold violations.
The Standard Deviation (SD) based threshold computation is a popular statistical method used to establish threshold upper/lower limits. However, this method has limited effectiveness as it does not consider factors such as trend and seasonal patterns in input data. On the other hand, NNMi uses an advanced smoothing method for threshold assessment. This algorithm considers seasonality, level, and trend patterns available in streamed data.
The example below demonstrates the effectiveness of NNMi’s thresholding capability in reducing false positives in threshold violations by more than 70%, through analysis of our test data. For the analysis, we considered interface utilization data from a switch. Figure 1 shows the interface utilization trace for a 24-hour period at 5-min intervals. The server runs a load operation every quarter for an hour (utilization varies randomly between 20%-40%). The server also runs a bigger load at the end of every four-hour period (utilization varies randomly between 70%-90%) generating a pattern as shown in Figure 1.
FIGURE 1
Traditional Standard Deviation based threshold analysis
The upper and lower threshold breach limits using the Standard Deviation method can be calculated as follows:
1. Calculate mean for the data in time slice
2. Calculate Standard Deviation for each sample in time slice
3. Calculate Standard Deviation Error
4. High = Mean + (Standard Deviation Error * Limit) (ex. For our analysis, limit is set at 2)
5. Low = Mean - (Standard Deviation Error * Limit) (ex. For our analysis, limit is set at 2)
The resulting Standard Deviation -based threshold analysis over a 4-hour period is shown in Figure 2. As observed in the analysis, approximately 34 upper threshold violations and even more lower threshold violations are detected. Since the pattern is repeated every 4 hours, the same number of violations end up repeatedly getting raised as the algorithm does not understand the pattern available in the data. This behaviour introduces significant noise in incidents generated, increasing operator load in having to manage a large number of incidents and weed out all the false positives.
FIGURE 2
NNMi’s Triple Exponential Smoothing based threshold analysis
Exponential smoothing is a method for forecasting univariate time series data. The algorithm forecasts a value as weighted averages of past observations, where the weights of older observations exponentially decrease. A smoothing coefficient is used to assign exponentially decreasing weights for past observations.
Triple exponential smoothing involves analysis across level, trend, and seasonality. The first order of smoothing is applied to the level, which is the weighted average of data in the analysis window. The level is smoothed using coefficient α (alpha). The second order of exponential smoothing analyzes time series data with respect to linear trend, where the coefficient β (beta) controls the decay of the influence of change in trend. The third order of exponential smoothing applies to time series data that have linear trend with seasonal patterns where coefficient γ(gamma) controls the influence of the seasonal component.
The formula for level, trend, and seasonality computation is as follows:
Level ℓx = α(yx−sx−L)+(1−α)(ℓx−1+bx−1)
Trend bx = β(ℓx−ℓx−1)+(1−β)bx−1
Seasonal sx = γ(yx−ℓx)+(1−γ)sx−L
Where yx is current value for which the baseline is being computed, and L is seasonal length
The level, trend and seasonal values are used to derive baseline and deviation values. The baseline and deviation values, in turn, derive upper and lower limits that define the normal operating range or baseline sleeve. The width of the baseline sleeve is a multiplier function of the historical deviation observed in the data. The chosen multiplier can be used to tune the selectivity of the range: a multiplier of 2 captures about 95% of sample, while a multiplier of 3 captures 99% of samples.
NNMi’s Baseline analysis for the same period with a multiplier of 2 is shown in Figure 3. The upper and lower threshold limits form the baseline sleeve.
FIGURE 3
In comparison to traditional standard deviation (SD) based threshold analysis, NNMi generated approximately 70% fewer upper threshold violations and no lower threshold violations. The 10 detected upper threshold violations were able to catch the significantly higher utilization peaks in sample data varying randomly across each interval. The algorithm hence ensured that NNMi is effectively able to differentiate between repeated high/low utilization and ad-hoc peaks/valleys in streamed data.
NNMi’s algorithm is statistically capable of providing a better threshold analysis that is dynamic in nature, learning and adjusting from the information available in incoming data. The algorithm understands the seasonal pattern in data, thereby adjusting the lower and upper threshold limits every season. NNMi considers 2016 records at a 5-minute interval (1 week data) as one season with configurable historical season count for inclusion in exponential weighted analysis.
Ultimately, NNMi’s Dynamic Threshold Analysis helps your team cut through the noise and get to the incidents that matter. With increasing attention on the network and growing demand on network teams, NNMi allows you to keep pace with business needs and do the critical work necessary to ensure network availability and performance.
written by Venkatesh Ramteke
Head over to our Practitioner Portal to learn more about:
Configuring Baseline Settings for Interfaces
Configuring Baseline Settings for Nodes
Network Node Manager i (NNMi)
Network Automation (NA)
Network Operations Management (NOM)