NannyML performance metrics and drift methods have thresholds associated to them in order to generate
alerts when necessary. The
Threshold class is responsible for calculating
thresholds() method returns two values: a lower and an upper threshold value.
It takes a
numpy.ndarray of values as an input. These are typically the metric or method values
calculated on reference data.
The process of calculating the threshold values is as follows.
The calculator or estimator runs and uses the reference data to compute the values
for the related method or metric for each chunk. Those values are used by the
thresholds() method to calculate the associated lower and upper
When the calculator or estimator runs on an analysis chunk
the lower and upper threshold values will be compared with the method or metric values for each
chunk to see if they are breaching either the lower or upper threshold values.
If so, the alert flag will be set to
True for that chunk.
All NannyML calculators and estimators have a
threshold property that allows you to set a custom threshold for
their metrics or inspect them.
Some metrics have mathematical boundaries. The
F1 score for example, is limited to \([0, 1]\).
To enforce these boundaries some metrics and drift methods within NannyML have lower and upper limits.
When calculating the threshold values during fitting, NannyML will check if the calculated threshold values fall within
these limits. If they don’t, the breaching threshold value(s) will be overridden by the theoretical limit.
NannyML also supports disabling the lower, upper or both thresholds. We’ll illustrate this in the following examples.
ConstantThreshold class is a very basic threshold. It is given a lower and upper value
when initialized and these will be returned as the lower and upper threshold values, independent of what reference data
is passed to it.
ConstantThreshold can be configured using the following parameters:
lower: an optional float that sets the constant lower value. Defaults to
Setting this to
Nonedisables the lower threshold.
upper: an optional float that sets the constant upper threshold value. Defaults to
Setting this to
Nonedisables the upper threshold.
>>> ct = nml.thresholds.ConstantThreshold(lower=0.5, upper=0.9) >>> ct.thresholds(np.asarray(range(3))) (0.5, 0.9)
upper parameters have a default value of
None. For example
NannyML interprets providing no
lower threshold value as no lower threshold should be applied.
>>> js = nml.thresholds.ConstantThreshold(upper=0.1) >>> js.thresholds(np.asarray(range(3))) (None, 0.1)
Standard deviation thresholds
StandardDeviationThreshold class will use the mean of the data it is given as
a baseline. It will then add the standard deviation of the given data, scaled by a multiplier, to that baseline to
calculate the upper threshold value. By subtracting the standard deviation, scaled by a multiplier, from the baseline
it calculates the lower threshold value.
This is easier to illustrate in code:
data = np.asarray(range(10)) baseline = np.mean(data) offset = np.std(data) upper_offset = offset * 3 lower_offset = offset * 3 lower_threshold, upper_threshold = baseline - lower_offset, baseline + upper_offset
StandardDeviationThreshold can be configured using the following parameters:
std_lower_multiplier: an optional float that scales the offset for the upper threshold value. Defaults to
std_upper_multiplier: an optional float that scales the offset for the lower threshold value. Defaults to
offset_from: a function used to aggregate the given data.
These examples show how to create a
This first example demonstrates the default usage.
>>> stdt = nml.thresholds.StandardDeviationThreshold() >>> stdt.thresholds(np.asarray(range(3))) (-1.4494897427831779, 3.449489742783178)
This next example shows how to configure the
Multipliers can make the offset smaller or larger, alternatives to the mean may be provided as well.
>>> stdt = nml.thresholds.StandardDeviationThreshold(std_lower_multiplier=0.1, std_upper_multiplier=5, offset_from=np.max) >>> stdt.thresholds(np.asarray(range(3))) (1.9183503419072274, 6.08248290463863)
By providing a None value you can disable one or more thresholds. The following example shows how to disable the lower threshold by setting the appropriate multiplier to None.
>>> stdt = nml.thresholds.StandardDeviationThreshold(std_lower_multiplier=None) >>> stdt.thresholds(np.asarray(range(3))) (None, 3.449489742783178)
The Chi-squared, \(\chi^2\), drift detection method for categorical data does not support custom thresholds yet. It is currently using p-values for thresholding and replacing them by or incorporating them in the custom thresholding system requires further research.
For now it will continue to function as it did before.
When specifying a custom threshold for Chi-squared in the
NannyML will log a warning message to clarify the custom threshold will be ignored.