Thresholds

NannyML calculators and estimators allow a user to configure alerting thresholds for more fine-grained control over the alerts generated by NannyML.

This tutorial will walk you through threshold basics and how to use them to customize the behavior of NannyML.

Just the code

>>> import numpy as np

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()
>>> display(reference_df.head())

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['f1'],
...     chunk_size=5000,
...     problem_type='classification_binary',
>>> )
>>> estimator.thresholds['f1']

>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> columns = [('chunk', 'key'), ('chunk', 'period'), ('f1', 'value'), ('f1', 'upper_threshold'), ('f1', 'lower_threshold'), ('f1', 'alert')]
>>> display(results.to_df()[columns])

>>> metric_fig = results.plot()
>>> metric_fig.show()

>>> constant_threshold = nml.thresholds.ConstantThreshold(lower=None, upper=0.93)
>>> constant_threshold.thresholds(results.filter(period='reference').to_df()[('f1', 'value')])

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['f1'],
...     chunk_size=5000,
...     problem_type='classification_binary',
...     thresholds={
...         'f1': constant_threshold
...     }
>>> )
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> display(results.to_df()[columns])

>>> metric_fig = results.plot()
>>> metric_fig.show()

Walkthrough

We’ll use an F1-score estimation as an example use case. But first, let’s dive into some of the basics.

NannyML compares the metric values it calculates to lower and upper threshold values. If the metric values fall outside of the range determined by these, NannyML will flag these values as alerts.

To determine the lower and upper threshold values for a certain metric, NannyML will take the reference data, split it into chunks and calculate the metric value for each of those chunks. NannyML then applies a calculation that transforms this array of chunked reference metric values into a single lower threshold value and a single upper threshold value.

NannyML provides simple classes to customize this calculation.

Constant thresholds

The ConstantThreshold class is a very basic threshold. It is given a lower and upper value when initialized and these will be returned as the lower and upper threshold values, independent of what reference data is passed to it.

The ConstantThreshold can be configured using the parameters lower and upper. They represent the constant lower and upper values used as thresholds when evaluating alerts. One or both parameters can be set to None, disabling the upper or lower threshold.

This snippet shows how to create an instance of the ConstantThreshold:

>>> ct = nml.thresholds.ConstantThreshold(lower=0.5, upper=0.9)

Standard deviation thresholds

The StandardDeviationThreshold class will use the mean of the data it is given as a baseline. It will then add the standard deviation of the given data, scaled by a multiplier, to that baseline to calculate the upper threshold value. By subtracting the standard deviation, scaled by a multiplier, from the baseline it calculates the lower threshold value.

The StandardDeviationThreshold can be configured using the following parameters. The std_lower_multiplier and std_upper_multiplier parameters allow you to set a custom value for the multiplier applied to the standard deviation of the given data, determining respectively the lower threshold value and the upper threshold value. Both can be set to None, which disables the respective threshold.

The offset_from parameter takes any function that aggregates an array of numbers into a single number. This function will be applied to the given data and the resulting value serves as a baseline to add or subtract the calculated offset.

This snippet shows how to create an instance of the StandardDeviationThreshold:

>>> stdt = nml.thresholds.StandardDeviationThreshold(std_lower_multiplier=3, std_upper_multiplier=3, offset_from=np.mean)

Setting custom thresholds for calculators and estimators

All calculators and estimators in NannyML support custom thresholds. You can specify a custom threshold for each drift detection method and performance metric.

Warning

The Chi-squared, \(\chi^2\), drift detection method for categorical data does not support custom thresholds yet. It is currently using p-values for thresholding and replacing them by or incorporating them in the custom thresholding system requires further research.

For now it will continue to function as it did before.

When specifying a custom threshold for Chi-squared in the UnivariateDriftCalculator, NannyML will log a warning message to clarify the custom threshold will be ignored.

We’ll illustrate this by means of performance estimation using CBPE. First we load our datasets.

>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()
>>> display(reference_df.head())

car_value

salary_range

debt_to_income_ratio

loan_length

repaid_loan_on_prev_car

size_of_downpayment

driver_tenure

repaid

timestamp

y_pred_proba

y_pred

0

39811

40K - 60K €

0.63295

19

False

40%

0.212653

1

2018-01-01 00:00:00.000

0.99

1

1

12679

40K - 60K €

0.718627

7

True

10%

4.92755

0

2018-01-01 00:08:43.152

0.07

0

2

19847

40K - 60K €

0.721724

17

False

0%

0.520817

1

2018-01-01 00:17:26.304

1

1

3

22652

20K - 20K €

0.705992

16

False

10%

0.453649

1

2018-01-01 00:26:09.456

0.98

1

4

21268

60K+ €

0.671888

21

True

30%

5.69526

1

2018-01-01 00:34:52.608

0.99

1

Next we’ll set up the CBPE estimator. Note that we’re not providing any threshold specifications for now. Let’s check out the default value for the f1 metric:

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['f1'],
...     chunk_size=5000,
...     problem_type='classification_binary',
>>> )
>>> estimator.thresholds['f1']
StandardDeviationThreshold{'std_lower_multiplier': 3, 'std_upper_multiplier': 3, 'offset_from': <function mean at 0x7fda4d54fdc0>}

After running the estimation we can see some alerts popping up. This means a couple of threshold values have been breached.

>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> columns = [('chunk', 'key'), ('chunk', 'period'), ('f1', 'value'), ('f1', 'upper_threshold'), ('f1', 'lower_threshold'), ('f1', 'alert')]
>>> display(results.to_df()[columns])

(‘chunk’, ‘key’)

(‘chunk’, ‘period’)

(‘f1’, ‘value’)

(‘f1’, ‘upper_threshold’)

(‘f1’, ‘lower_threshold’)

(‘f1’, ‘alert’)

0

[0:4999]

reference

0.94296

0.95085

0.93466

False

1

[5000:9999]

reference

0.940827

0.95085

0.93466

False

2

[10000:14999]

reference

0.943211

0.95085

0.93466

False

3

[15000:19999]

reference

0.942901

0.95085

0.93466

False

4

[20000:24999]

reference

0.943178

0.95085

0.93466

False

5

[25000:29999]

reference

0.942702

0.95085

0.93466

False

6

[30000:34999]

reference

0.940858

0.95085

0.93466

False

7

[35000:39999]

reference

0.944588

0.95085

0.93466

False

8

[40000:44999]

reference

0.944518

0.95085

0.93466

False

9

[45000:49999]

reference

0.94443

0.95085

0.93466

False

10

[0:4999]

analysis

0.94303

0.95085

0.93466

False

11

[5000:9999]

analysis

0.941324

0.95085

0.93466

False

12

[10000:14999]

analysis

0.943574

0.95085

0.93466

False

13

[15000:19999]

analysis

0.943159

0.95085

0.93466

False

14

[20000:24999]

analysis

0.944204

0.95085

0.93466

False

15

[25000:29999]

analysis

0.911753

0.95085

0.93466

True

16

[30000:34999]

analysis

0.911766

0.95085

0.93466

True

17

[35000:39999]

analysis

0.911661

0.95085

0.93466

True

18

[40000:44999]

analysis

0.913763

0.95085

0.93466

True

19

[45000:49999]

analysis

0.914751

0.95085

0.93466

True

The plots clearly illustrate this:

>>> metric_fig = results.plot()
>>> metric_fig.show()
../_images/est_f1_default_thresholds.svg

Now let’s set a threshold that inverses this result by fixing the upper threshold and dropping the lower.

>>> constant_threshold = nml.thresholds.ConstantThreshold(lower=None, upper=0.93)
>>> constant_threshold.thresholds(results.filter(period='reference').to_df()[('f1', 'value')])
(None, 0.93)

Let’s use this new custom threshold for our performance estimation now. Note that we’re passing our custom thresholds as a dictionary, mapping the metric name to a Threshold instance. We only have to provide our single override value, the other metrics will use the default values.

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['f1'],
...     chunk_size=5000,
...     problem_type='classification_binary',
...     thresholds={
...         'f1': constant_threshold
...     }
>>> )
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> display(results.to_df()[columns])

(‘chunk’, ‘key’)

(‘chunk’, ‘period’)

(‘f1’, ‘value’)

(‘f1’, ‘upper_threshold’)

(‘f1’, ‘lower_threshold’)

(‘f1’, ‘alert’)

0

[0:4999]

reference

0.94296

0.93

True

1

[5000:9999]

reference

0.940827

0.93

True

2

[10000:14999]

reference

0.943211

0.93

True

3

[15000:19999]

reference

0.942901

0.93

True

4

[20000:24999]

reference

0.943178

0.93

True

5

[25000:29999]

reference

0.942702

0.93

True

6

[30000:34999]

reference

0.940858

0.93

True

7

[35000:39999]

reference

0.944588

0.93

True

8

[40000:44999]

reference

0.944518

0.93

True

9

[45000:49999]

reference

0.94443

0.93

True

10

[0:4999]

analysis

0.94303

0.93

True

11

[5000:9999]

analysis

0.941324

0.93

True

12

[10000:14999]

analysis

0.943574

0.93

True

13

[15000:19999]

analysis

0.943159

0.93

True

14

[20000:24999]

analysis

0.944204

0.93

True

15

[25000:29999]

analysis

0.911753

0.93

False

16

[30000:34999]

analysis

0.911766

0.93

False

17

[35000:39999]

analysis

0.911661

0.93

False

18

[40000:44999]

analysis

0.913763

0.93

False

19

[45000:49999]

analysis

0.914751

0.93

False

If we check the plots, we can see that the alerts have now inverted.

>>> metric_fig = results.plot()
>>> metric_fig.show()
../_images/est_f1_inverted_thresholds.svg

Default thresholds

Performance metrics and drift detection methods have the following default threshold:

StandardDeviationThreshold(std_lower_multiplier=3, std_upper_multiplier=3, offset_from=np.mean)

Some drift detection methods are exceptions to this rule. They have default thresholds more attuned to distances:

Calculator

Drift method

Default threshold

Univariate drift calculator

jensen_shannon

ConstantThreshold(upper=0.1)

Univariate drift calculator

hellinger

ConstantThreshold(upper=0.1)

Univariate drift calculator

l_infinity

ConstantThreshold(upper=0.1)

What’s next?

You can read more about the threshold inner workings in the how it works article, or review the API reference documentation.