Thresholds
NannyML calculators and estimators allow a user to configure alerting thresholds for more fine-grained control over the alerts generated by NannyML.
This tutorial will walk you through threshold basics and how to use them to customize the behavior of NannyML.
Just the code
>>> import numpy as np
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()
>>> display(reference_df.head())
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['f1'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> )
>>> estimator.thresholds['f1']
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> columns = [('chunk', 'key'), ('chunk', 'period'), ('f1', 'value'), ('f1', 'upper_threshold'), ('f1', 'lower_threshold'), ('f1', 'alert')]
>>> display(results.to_df()[columns])
>>> metric_fig = results.plot()
>>> metric_fig.show()
>>> constant_threshold = nml.thresholds.ConstantThreshold(lower=None, upper=0.93)
>>> constant_threshold.thresholds(results.filter(period='reference').to_df()[('f1', 'value')])
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['f1'],
... chunk_size=5000,
... problem_type='classification_binary',
... thresholds={
... 'f1': constant_threshold
... }
>>> )
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> display(results.to_df()[columns])
>>> metric_fig = results.plot()
>>> metric_fig.show()
Walkthrough
We will use an F1-score estimation as an example use case. But first, let’s dive into some of the basics.
NannyML compares the metric values it calculates to lower and upper threshold values. If the metric values fall outside the range determined by these, NannyML will flag these values as alerts.
To determine the lower and upper threshold values for a certain metric, NannyML will take the reference data, split it into chunks and calculate the metric value for each of those chunks. NannyML then applies a calculation that transforms this array of chunked reference metric values into single lower and upper threshold values.
NannyML provides simple classes to customize this calculation.
Constant thresholds
The ConstantThreshold
class is a very basic threshold. It is given a lower and upper value
when initialized, which will be returned as the lower and upper threshold values, independent of what reference data
is passed to it.
The ConstantThreshold
can be configured using the parameters lower
and upper
.
They represent the constant lower and upper values used as thresholds when evaluating alerts.
One or both parameters can be set to None
, disabling the upper or lower threshold.
This snippet shows how to create an instance of the ConstantThreshold
:
>>> ct = nml.thresholds.ConstantThreshold(lower=0.5, upper=0.9)
Standard deviation thresholds
The StandardDeviationThreshold
class will use the mean of the data given as
a baseline. It will then add the standard deviation of the given data, scaled by a multiplier, to that baseline to
calculate the upper threshold value. Subtracting the standard deviation, scaled by a multiplier, from the baseline
calculates the lower threshold value.
The StandardDeviationThreshold
can be configured using the following parameters.
The std_lower_multiplier
and std_upper_multiplier
parameters allow you to set a custom value for the multiplier
applied to the standard deviation of the given data, respectively determining the lower threshold value and the
upper threshold value. Both can be set to None
, which disables the respective threshold.
The offset_from
parameter takes any function aggregating an array of numbers into a single number. This function
will be applied to the given data, and the resulting value serves as a baseline to add or subtract the calculated offset.
This snippet shows how to create an instance of the StandardDeviationThreshold
:
>>> stdt = nml.thresholds.StandardDeviationThreshold(std_lower_multiplier=3, std_upper_multiplier=3, offset_from=np.mean)
Setting custom thresholds for calculators and estimators
All calculators and estimators in NannyML support custom thresholds. You can specify a custom threshold for each drift detection method and performance metric.
Warning
The Chi-squared, \(\chi^2\), drift detection method for categorical data does not support custom thresholds yet. It is currently using p-values for thresholding and replacing them by or incorporating them in the custom thresholding system requires further research.
For now, it will continue to function as it did before.
When specifying a custom threshold for Chi-squared in the
UnivariateDriftCalculator
,
NannyML will log a warning message to clarify that the custom threshold will be ignored.
We will illustrate this through performance estimation using CBPE. But, first we load our datasets.
>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()
>>> display(reference_df.head())
id |
car_value |
salary_range |
debt_to_income_ratio |
loan_length |
repaid_loan_on_prev_car |
size_of_downpayment |
driver_tenure |
repaid |
timestamp |
y_pred_proba |
y_pred |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
0 |
39811 |
40K - 60K € |
0.63295 |
19 |
False |
40% |
0.212653 |
1 |
2018-01-01 00:00:00.000 |
0.99 |
1 |
1 |
1 |
12679 |
40K - 60K € |
0.718627 |
7 |
True |
10% |
4.92755 |
0 |
2018-01-01 00:08:43.152 |
0.07 |
0 |
2 |
2 |
19847 |
40K - 60K € |
0.721724 |
17 |
False |
0% |
0.520817 |
1 |
2018-01-01 00:17:26.304 |
1 |
1 |
3 |
3 |
22652 |
20K - 40K € |
0.705992 |
16 |
False |
10% |
0.453649 |
1 |
2018-01-01 00:26:09.456 |
0.98 |
1 |
4 |
4 |
21268 |
60K+ € |
0.671888 |
21 |
True |
30% |
5.69526 |
1 |
2018-01-01 00:34:52.608 |
0.99 |
1 |
Next, we will set up the CBPE estimator
. Note that we are not providing any threshold specifications for now.
Let’s check out the default value for the f1
metric:
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['f1'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> )
>>> estimator.thresholds['f1']
StandardDeviationThreshold{'std_lower_multiplier': 3, 'std_upper_multiplier': 3, 'offset_from': <function nanmean at 0x7f97d01c6160>}
After running the estimation, we can see some alerts popping up. This means a couple of threshold values have been breached.
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> columns = [('chunk', 'key'), ('chunk', 'period'), ('f1', 'value'), ('f1', 'upper_threshold'), ('f1', 'lower_threshold'), ('f1', 'alert')]
>>> display(results.to_df()[columns])
(‘chunk’, ‘key’) |
(‘chunk’, ‘period’) |
(‘f1’, ‘value’) |
(‘f1’, ‘upper_threshold’) |
(‘f1’, ‘lower_threshold’) |
(‘f1’, ‘alert’) |
|
---|---|---|---|---|---|---|
0 |
[0:4999] |
reference |
0.94296 |
0.95085 |
0.93466 |
False |
1 |
[5000:9999] |
reference |
0.940827 |
0.95085 |
0.93466 |
False |
2 |
[10000:14999] |
reference |
0.943211 |
0.95085 |
0.93466 |
False |
3 |
[15000:19999] |
reference |
0.942901 |
0.95085 |
0.93466 |
False |
4 |
[20000:24999] |
reference |
0.943178 |
0.95085 |
0.93466 |
False |
5 |
[25000:29999] |
reference |
0.942702 |
0.95085 |
0.93466 |
False |
6 |
[30000:34999] |
reference |
0.940858 |
0.95085 |
0.93466 |
False |
7 |
[35000:39999] |
reference |
0.944588 |
0.95085 |
0.93466 |
False |
8 |
[40000:44999] |
reference |
0.944518 |
0.95085 |
0.93466 |
False |
9 |
[45000:49999] |
reference |
0.94443 |
0.95085 |
0.93466 |
False |
10 |
[0:4999] |
analysis |
0.94303 |
0.95085 |
0.93466 |
False |
11 |
[5000:9999] |
analysis |
0.941324 |
0.95085 |
0.93466 |
False |
12 |
[10000:14999] |
analysis |
0.943574 |
0.95085 |
0.93466 |
False |
13 |
[15000:19999] |
analysis |
0.943159 |
0.95085 |
0.93466 |
False |
14 |
[20000:24999] |
analysis |
0.944204 |
0.95085 |
0.93466 |
False |
15 |
[25000:29999] |
analysis |
0.911753 |
0.95085 |
0.93466 |
True |
16 |
[30000:34999] |
analysis |
0.911766 |
0.95085 |
0.93466 |
True |
17 |
[35000:39999] |
analysis |
0.911661 |
0.95085 |
0.93466 |
True |
18 |
[40000:44999] |
analysis |
0.913763 |
0.95085 |
0.93466 |
True |
19 |
[45000:49999] |
analysis |
0.914751 |
0.95085 |
0.93466 |
True |
The plots clearly illustrate this:
>>> metric_fig = results.plot()
>>> metric_fig.show()
Now let’s set a threshold that inverses this result by fixing the upper threshold and dropping the lower.
>>> constant_threshold = nml.thresholds.ConstantThreshold(lower=None, upper=0.93)
>>> constant_threshold.thresholds(results.filter(period='reference').to_df()[('f1', 'value')])
(None, 0.93)
Let’s use this new custom threshold for our performance estimation now.
Note that we are passing our custom thresholds as a dictionary,
mapping the metric name to a Threshold
instance.
We only have to provide our single override value; the other metrics will use the default values.
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['f1'],
... chunk_size=5000,
... problem_type='classification_binary',
... thresholds={
... 'f1': constant_threshold
... }
>>> )
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> display(results.to_df()[columns])
(‘chunk’, ‘key’) |
(‘chunk’, ‘period’) |
(‘f1’, ‘value’) |
(‘f1’, ‘upper_threshold’) |
(‘f1’, ‘lower_threshold’) |
(‘f1’, ‘alert’) |
|
---|---|---|---|---|---|---|
0 |
[0:4999] |
reference |
0.94296 |
0.93 |
True |
|
1 |
[5000:9999] |
reference |
0.940827 |
0.93 |
True |
|
2 |
[10000:14999] |
reference |
0.943211 |
0.93 |
True |
|
3 |
[15000:19999] |
reference |
0.942901 |
0.93 |
True |
|
4 |
[20000:24999] |
reference |
0.943178 |
0.93 |
True |
|
5 |
[25000:29999] |
reference |
0.942702 |
0.93 |
True |
|
6 |
[30000:34999] |
reference |
0.940858 |
0.93 |
True |
|
7 |
[35000:39999] |
reference |
0.944588 |
0.93 |
True |
|
8 |
[40000:44999] |
reference |
0.944518 |
0.93 |
True |
|
9 |
[45000:49999] |
reference |
0.94443 |
0.93 |
True |
|
10 |
[0:4999] |
analysis |
0.94303 |
0.93 |
True |
|
11 |
[5000:9999] |
analysis |
0.941324 |
0.93 |
True |
|
12 |
[10000:14999] |
analysis |
0.943574 |
0.93 |
True |
|
13 |
[15000:19999] |
analysis |
0.943159 |
0.93 |
True |
|
14 |
[20000:24999] |
analysis |
0.944204 |
0.93 |
True |
|
15 |
[25000:29999] |
analysis |
0.911753 |
0.93 |
False |
|
16 |
[30000:34999] |
analysis |
0.911766 |
0.93 |
False |
|
17 |
[35000:39999] |
analysis |
0.911661 |
0.93 |
False |
|
18 |
[40000:44999] |
analysis |
0.913763 |
0.93 |
False |
|
19 |
[45000:49999] |
analysis |
0.914751 |
0.93 |
False |
If we check the plots, we can see that the alerts have now inverted.
>>> metric_fig = results.plot()
>>> metric_fig.show()
Default thresholds
Performance metrics and drift detection methods, and the missing values data quality metric have the following default threshold:
nml.thresholds.StandardDeviationThreshold(std_lower_multiplier=3, std_upper_multiplier=3, offset_from=np.mean)
The unseen values data quality metric is an exception to this rule. It ahs default thresholds more attuned to its specific role and properties:
Module |
Functionality |
Default threshold |
---|---|---|
Data Quality |
Unseen Values Calculator |
|
What’s next?
You can read more about the threshold’s inner workings in the how it works article or review the API reference documentation.