Thresholds

Threshold basics

NannyML performance metrics and drift methods have thresholds associated to them in order to generate alerts when necessary. The Threshold class is responsible for calculating those thresholds. Its thresholds() method returns two values: a lower and an upper threshold value. It takes a numpy.ndarray of values as an input. These are typically the metric or method values calculated on reference data.

The process of calculating the threshold values is as follows. The calculator or estimator runs and uses the reference data to compute the values for the related method or metric for each chunk. Those values are used by the thresholds() method to calculate the associated lower and upper threshold values.

When the calculator or estimator runs on an analysis chunk the lower and upper threshold values will be compared with the method or metric values for each chunk to see if they are breaching either the lower or upper threshold values. If so, the alert flag will be set to True for that chunk.

All NannyML calculators and estimators have a threshold property that allows you to set a custom threshold for their metrics or inspect them.

Some metrics have mathematical boundaries. The F1 score for example, is limited to \([0, 1]\). To enforce these boundaries some metrics and drift methods within NannyML have lower and upper limits. When calculating the threshold values during fitting, NannyML will check if the calculated threshold values fall within these limits. If they don’t, the breaching threshold value(s) will be overridden by the theoretical limit.

NannyML also supports disabling the lower, upper or both thresholds. We’ll illustrate this in the following examples.

Constant thresholds

The ConstantThreshold class is a very basic threshold. It is given a lower and upper value when initialized and these will be returned as the lower and upper threshold values, independent of what reference data is passed to it.

The ConstantThreshold can be configured using the following parameters:

lower: an optional float that sets the constant lower value. Defaults to None.
Setting this to None disables the lower threshold.
upper: an optional float that sets the constant upper threshold value. Defaults to None.
Setting this to None disables the upper threshold.

>>> ct = nml.thresholds.ConstantThreshold(lower=0.5, upper=0.9)
>>> ct.thresholds(np.asarray(range(3)))
(0.5, 0.9)

The lower and upper parameters have a default value of None. For example NannyML interprets providing no lower threshold value as no lower threshold should be applied.

>>> js = nml.thresholds.ConstantThreshold(upper=0.1)
>>> js.thresholds(np.asarray(range(3)))
(None, 0.1)

Standard deviation thresholds

The StandardDeviationThreshold class will use the mean of the data it is given as a baseline. It will then add the standard deviation of the given data, scaled by a multiplier, to that baseline to calculate the upper threshold value. By subtracting the standard deviation, scaled by a multiplier, from the baseline it calculates the lower threshold value.

This is easier to illustrate in code:

data = np.asarray(range(10))
baseline = np.mean(data)
offset = np.std(data)
upper_offset = offset * 3
lower_offset = offset * 3
lower_threshold, upper_threshold = baseline - lower_offset, baseline + upper_offset

The StandardDeviationThreshold can be configured using the following parameters:

std_lower_multiplier: an optional float that scales the offset for the upper threshold value. Defaults to 3.
std_upper_multiplier: an optional float that scales the offset for the lower threshold value. Defaults to 3.
offset_from: a function used to aggregate the given data.

These examples show how to create a StandardDeviationThreshold. This first example demonstrates the default usage.

>>> stdt = nml.thresholds.StandardDeviationThreshold()
>>> stdt.thresholds(np.asarray(range(3)))
(-1.4494897427831779, 3.449489742783178)

This next example shows how to configure the StandardDeviationThreshold. Multipliers can make the offset smaller or larger, alternatives to the mean may be provided as well.

>>> stdt = nml.thresholds.StandardDeviationThreshold(std_lower_multiplier=0.1, std_upper_multiplier=5, offset_from=np.max)
>>> stdt.thresholds(np.asarray(range(3)))
(1.9183503419072274, 6.08248290463863)

By providing a None value you can disable one or more thresholds. The following example shows how to disable the lower threshold by setting the appropriate multiplier to None.

>>> stdt = nml.thresholds.StandardDeviationThreshold(std_lower_multiplier=None)
>>> stdt.thresholds(np.asarray(range(3)))
(None, 3.449489742783178)

Warning

The Chi-squared, \(\chi^2\), drift detection method for categorical data does not support custom thresholds yet. It is currently using p-values for thresholding and replacing them by or incorporating them in the custom thresholding system requires further research.

For now it will continue to function as it did before.

When specifying a custom threshold for Chi-squared in the UnivariateDriftCalculator, NannyML will log a warning message to clarify the custom threshold will be ignored.