Creating and Estimating a Custom Binary Classification Metric
This tutorial explains how to use NannyML to estimate a custom metric based on confusion matrix for binary classification models in the absence of target data. In particular, we will be creating a balanced accuracy metric. To find out how CBPE estimates the confusion matrix components, read the explanation of Confidence-based Performance Estimation.
Just the Code
>>> import nannyml as nml
>>> from IPython.display import display
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> reference_df = nml.load_synthetic_car_loan_dataset()[0]
>>> analysis_df = nml.load_synthetic_car_loan_dataset()[1]
>>> display(reference_df.head(3))
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... metrics=['confusion_matrix'],
... problem_type='classification_binary',
... normalize_confusion_matrix="all",
>>> )
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> results_data = results.to_df()
>>> display(results_data)
>>> true_pos_rate = results_data['true_positive']['value'].values
>>> false_pos_rate = results_data['false_positive']['value'].values
>>> true_neg_rate = results_data['true_negative']['value'].values
>>> false_neg_rate = results_data['false_negative']['value'].values
>>> sensitivity = true_pos_rate / (true_pos_rate + false_neg_rate)
>>> specificity = true_neg_rate / (true_neg_rate + false_pos_rate)
>>> balanced_accuracy = (sensitivity + specificity) / 2
>>> num_ref_chunks = len(results.filter(period = 'reference').to_df())
>>> reference_index = np.arange(num_ref_chunks)
>>> analysis_index = np.arange(num_ref_chunks, len(results_data))
>>> plt.plot(reference_index, balanced_accuracy[:num_ref_chunks], label='Reference', marker='o')
>>> plt.plot(analysis_index, balanced_accuracy[num_ref_chunks:], label='Analysis', marker='o')
>>> plt.axvline(x=num_ref_chunks-0.5, color='gray')
>>> plt.xlabel('Chunk Number')
>>> plt.ylabel('Estimated Balanced Accuracy')
>>> plt.title('Estimated Balanced Accuracy')
>>> plt.legend()
>>> plt.show()
Walkthrough
While NannyML offers out-of-the-box support for the estimation of a number of metrics (see which in our Estimating Performance for Binary Classification page), it is also possible to create custom metrics. In this tutorial we will be creating a balanced accuracy metric, using the confusion matrix as a building block.
For simplicity this guide is based on a synthetic dataset included in the library, where the monitored model predicts whether a customer will repay a loan to buy a car. You can read more about this synthetic dataset here.
In order to monitor a model, NannyML needs to learn about it from a reference dataset. Then it can monitor the data that is subject to actual analysis, provided as the analysis dataset. You can read more about this in our section on data periods.
We start by importing the libraries we’ll need and loading the dataset we’ll be using:
>>> import nannyml as nml
>>> from IPython.display import display
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> reference_df = nml.load_synthetic_car_loan_dataset()[0]
>>> analysis_df = nml.load_synthetic_car_loan_dataset()[1]
>>> display(reference_df.head(3))
id |
car_value |
salary_range |
debt_to_income_ratio |
loan_length |
repaid_loan_on_prev_car |
size_of_downpayment |
driver_tenure |
repaid |
timestamp |
y_pred_proba |
y_pred |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
0 |
39811 |
40K - 60K € |
0.63295 |
19 |
False |
40% |
0.212653 |
1 |
2018-01-01 00:00:00.000 |
0.99 |
1 |
1 |
1 |
12679 |
40K - 60K € |
0.718627 |
7 |
True |
10% |
4.92755 |
0 |
2018-01-01 00:08:43.152 |
0.07 |
0 |
2 |
2 |
19847 |
40K - 60K € |
0.721724 |
17 |
False |
0% |
0.520817 |
1 |
2018-01-01 00:17:26.304 |
1 |
1 |
Next we create the Confidence-based Performance Estimation
(CBPE
)
estimator to estimate the confusion matrix elements that we will
need for our custom metric. In order to estimate the confusion_matrix
elements we will specify the metrics parameter as [‘confusion_matrix’].
We will also specify the normalize_confusion_matrix parameter as “all”
to get the rate instead of the count for each cell.
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... metrics=['confusion_matrix'],
... problem_type='classification_binary',
... normalize_confusion_matrix="all",
>>> )
The CBPE
estimator is then fitted using the
fit()
method on the reference
data.
>>> estimator.fit(reference_df)
The fitted estimator
can be used to estimate performance on other data, for which performance cannot be calculated.
Typically, this would be used on the latest production data where target is missing. In our example this is
the analysis_df
data.
NannyML can then output a dataframe that contains all the results.
>>> results = estimator.estimate(analysis_df)
>>> results_data = results.to_df()
>>> display(results_data)
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
true_positive
value
|
sampling_error
|
realized
|
upper_confidence_boundary
|
lower_confidence_boundary
|
upper_threshold
|
lower_threshold
|
alert
|
true_negative
value
|
sampling_error
|
realized
|
upper_confidence_boundary
|
lower_confidence_boundary
|
upper_threshold
|
lower_threshold
|
alert
|
false_positive
value
|
sampling_error
|
realized
|
upper_confidence_boundary
|
lower_confidence_boundary
|
upper_threshold
|
lower_threshold
|
alert
|
false_negative
value
|
sampling_error
|
realized
|
upper_confidence_boundary
|
lower_confidence_boundary
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
reference |
0.458185 |
0.00705286 |
0.4596 |
0.479343 |
0.437026 |
0.478879 |
0.449401 |
False |
0.486383 |
0.00706512 |
0.4866 |
0.507579 |
0.465188 |
0.494119 |
0.464881 |
False |
0.0204154 |
0.00202397 |
0.019 |
0.0264873 |
0.0143435 |
0.025818 |
0.016022 |
False |
0.0350166 |
0.00261473 |
0.0348 |
0.0428607 |
0.0271724 |
0.0416915 |
0.0291885 |
False |
||
1 |
[5000:9999] |
1 |
5000 |
9999 |
reference |
0.456855 |
0.00705286 |
0.455 |
0.478013 |
0.435696 |
0.478879 |
0.449401 |
False |
0.485678 |
0.00706512 |
0.4844 |
0.506873 |
0.464482 |
0.494119 |
0.464881 |
False |
0.0207453 |
0.00202397 |
0.0226 |
0.0268172 |
0.0146733 |
0.025818 |
0.016022 |
False |
0.0367222 |
0.00261473 |
0.038 |
0.0445664 |
0.028878 |
0.0416915 |
0.0291885 |
False |
||
2 |
[10000:14999] |
2 |
10000 |
14999 |
reference |
0.469963 |
0.00705286 |
0.471 |
0.491121 |
0.448804 |
0.478879 |
0.449401 |
False |
0.473446 |
0.00706512 |
0.4752 |
0.494641 |
0.452251 |
0.494119 |
0.464881 |
False |
0.0208371 |
0.00202397 |
0.0198 |
0.0269091 |
0.0147652 |
0.025818 |
0.016022 |
False |
0.035754 |
0.00261473 |
0.034 |
0.0435982 |
0.0279098 |
0.0416915 |
0.0291885 |
False |
||
3 |
[15000:19999] |
3 |
15000 |
19999 |
reference |
0.46226 |
0.00705286 |
0.4634 |
0.483419 |
0.441102 |
0.478879 |
0.449401 |
False |
0.481754 |
0.00706512 |
0.4808 |
0.502949 |
0.460559 |
0.494119 |
0.464881 |
False |
0.0207396 |
0.00202397 |
0.0196 |
0.0268115 |
0.0146677 |
0.025818 |
0.016022 |
False |
0.035246 |
0.00261473 |
0.0362 |
0.0430901 |
0.0274018 |
0.0416915 |
0.0291885 |
False |
||
4 |
[20000:24999] |
4 |
20000 |
24999 |
reference |
0.468431 |
0.00705286 |
0.4674 |
0.489589 |
0.447272 |
0.478879 |
0.449401 |
False |
0.475128 |
0.00706512 |
0.4708 |
0.496324 |
0.453933 |
0.494119 |
0.464881 |
False |
0.0209695 |
0.00202397 |
0.022 |
0.0270414 |
0.0148976 |
0.025818 |
0.016022 |
False |
0.0354715 |
0.00261473 |
0.0398 |
0.0433157 |
0.0276274 |
0.0416915 |
0.0291885 |
False |
||
5 |
[25000:29999] |
5 |
25000 |
29999 |
reference |
0.459727 |
0.00705286 |
0.458 |
0.480885 |
0.438568 |
0.478879 |
0.449401 |
False |
0.484389 |
0.00706512 |
0.4862 |
0.505584 |
0.463193 |
0.494119 |
0.464881 |
False |
0.0208731 |
0.00202397 |
0.0226 |
0.026945 |
0.0148012 |
0.025818 |
0.016022 |
False |
0.0350115 |
0.00261473 |
0.0332 |
0.0428557 |
0.0271673 |
0.0416915 |
0.0291885 |
False |
||
6 |
[30000:34999] |
6 |
30000 |
34999 |
reference |
0.465254 |
0.00705286 |
0.4648 |
0.486413 |
0.444096 |
0.478879 |
0.449401 |
False |
0.476255 |
0.00706512 |
0.4802 |
0.49745 |
0.45506 |
0.494119 |
0.464881 |
False |
0.0201459 |
0.00202397 |
0.0206 |
0.0262178 |
0.014074 |
0.025818 |
0.016022 |
False |
0.0383451 |
0.00261473 |
0.0344 |
0.0461892 |
0.0305009 |
0.0416915 |
0.0291885 |
False |
||
7 |
[35000:39999] |
7 |
35000 |
39999 |
reference |
0.469571 |
0.00705286 |
0.469 |
0.49073 |
0.448412 |
0.478879 |
0.449401 |
False |
0.475337 |
0.00706512 |
0.476 |
0.496532 |
0.454141 |
0.494119 |
0.464881 |
False |
0.0210291 |
0.00202397 |
0.0216 |
0.027101 |
0.0149572 |
0.025818 |
0.016022 |
False |
0.0340635 |
0.00261473 |
0.0334 |
0.0419076 |
0.0262193 |
0.0416915 |
0.0291885 |
False |
||
8 |
[40000:44999] |
8 |
40000 |
44999 |
reference |
0.465682 |
0.00705286 |
0.4682 |
0.48684 |
0.444523 |
0.478879 |
0.449401 |
False |
0.479609 |
0.00706512 |
0.4768 |
0.500804 |
0.458414 |
0.494119 |
0.464881 |
False |
0.0207181 |
0.00202397 |
0.0182 |
0.02679 |
0.0146462 |
0.025818 |
0.016022 |
False |
0.033991 |
0.00261473 |
0.0368 |
0.0418352 |
0.0261468 |
0.0416915 |
0.0291885 |
False |
||
9 |
[45000:49999] |
9 |
45000 |
49999 |
reference |
0.466762 |
0.00705286 |
0.465 |
0.48792 |
0.445603 |
0.478879 |
0.449401 |
False |
0.47831 |
0.00706512 |
0.478 |
0.499505 |
0.457115 |
0.494119 |
0.464881 |
False |
0.0214382 |
0.00202397 |
0.0232 |
0.0275101 |
0.0153662 |
0.025818 |
0.016022 |
False |
0.03349 |
0.00261473 |
0.0338 |
0.0413341 |
0.0256458 |
0.0416915 |
0.0291885 |
False |
||
10 |
[0:4999] |
0 |
0 |
4999 |
analysis |
0.481766 |
0.00705286 |
nan |
0.502925 |
0.460608 |
0.478879 |
0.449401 |
True |
0.460026 |
0.00706512 |
nan |
0.481221 |
0.43883 |
0.494119 |
0.464881 |
True |
0.0212337 |
0.00202397 |
nan |
0.0273056 |
0.0151617 |
0.025818 |
0.016022 |
False |
0.0369745 |
0.00261473 |
nan |
0.0448186 |
0.0291303 |
0.0416915 |
0.0291885 |
False |
||
11 |
[5000:9999] |
1 |
5000 |
9999 |
analysis |
0.454646 |
0.00705286 |
nan |
0.475804 |
0.433487 |
0.478879 |
0.449401 |
False |
0.488676 |
0.00706512 |
nan |
0.509871 |
0.46748 |
0.494119 |
0.464881 |
False |
0.0199543 |
0.00202397 |
nan |
0.0260262 |
0.0138824 |
0.025818 |
0.016022 |
False |
0.0367245 |
0.00261473 |
nan |
0.0445687 |
0.0288803 |
0.0416915 |
0.0291885 |
False |
||
12 |
[10000:14999] |
2 |
10000 |
14999 |
analysis |
0.455756 |
0.00705286 |
nan |
0.476914 |
0.434597 |
0.478879 |
0.449401 |
False |
0.489736 |
0.00706512 |
nan |
0.510931 |
0.46854 |
0.494119 |
0.464881 |
False |
0.0198442 |
0.00202397 |
nan |
0.0259161 |
0.0137723 |
0.025818 |
0.016022 |
False |
0.0346643 |
0.00261473 |
nan |
0.0425084 |
0.0268201 |
0.0416915 |
0.0291885 |
False |
||
13 |
[15000:19999] |
3 |
15000 |
19999 |
analysis |
0.457828 |
0.00705286 |
nan |
0.478987 |
0.43667 |
0.478879 |
0.449401 |
False |
0.486988 |
0.00706512 |
nan |
0.508183 |
0.465793 |
0.494119 |
0.464881 |
False |
0.0205719 |
0.00202397 |
nan |
0.0266438 |
0.0145 |
0.025818 |
0.016022 |
False |
0.0346121 |
0.00261473 |
nan |
0.0424563 |
0.0267679 |
0.0416915 |
0.0291885 |
False |
||
14 |
[20000:24999] |
4 |
20000 |
24999 |
analysis |
0.468372 |
0.00705286 |
nan |
0.489531 |
0.447213 |
0.478879 |
0.449401 |
False |
0.476273 |
0.00706512 |
nan |
0.497468 |
0.455078 |
0.494119 |
0.464881 |
False |
0.020428 |
0.00202397 |
nan |
0.0264999 |
0.014356 |
0.025818 |
0.016022 |
False |
0.034927 |
0.00261473 |
nan |
0.0427712 |
0.0270829 |
0.0416915 |
0.0291885 |
False |
||
15 |
[25000:29999] |
5 |
25000 |
29999 |
analysis |
0.461246 |
0.00705286 |
nan |
0.482404 |
0.440087 |
0.478879 |
0.449401 |
False |
0.449469 |
0.00706512 |
nan |
0.470664 |
0.428273 |
0.494119 |
0.464881 |
True |
0.0287544 |
0.00202397 |
nan |
0.0348263 |
0.0226825 |
0.025818 |
0.016022 |
True |
0.0605314 |
0.00261473 |
nan |
0.0683756 |
0.0526873 |
0.0416915 |
0.0291885 |
True |
||
16 |
[30000:34999] |
6 |
30000 |
34999 |
analysis |
0.459067 |
0.00705286 |
nan |
0.480225 |
0.437908 |
0.478879 |
0.449401 |
False |
0.452083 |
0.00706512 |
nan |
0.473278 |
0.430888 |
0.494119 |
0.464881 |
True |
0.0283335 |
0.00202397 |
nan |
0.0344054 |
0.0222616 |
0.025818 |
0.016022 |
True |
0.060517 |
0.00261473 |
nan |
0.0683612 |
0.0526729 |
0.0416915 |
0.0291885 |
True |
||
17 |
[35000:39999] |
7 |
35000 |
39999 |
analysis |
0.458246 |
0.00705286 |
nan |
0.479404 |
0.437087 |
0.478879 |
0.449401 |
False |
0.452947 |
0.00706512 |
nan |
0.474142 |
0.431752 |
0.494119 |
0.464881 |
True |
0.0295542 |
0.00202397 |
nan |
0.0356261 |
0.0234823 |
0.025818 |
0.016022 |
True |
0.0592531 |
0.00261473 |
nan |
0.0670972 |
0.0514089 |
0.0416915 |
0.0291885 |
True |
||
18 |
[40000:44999] |
8 |
40000 |
44999 |
analysis |
0.453561 |
0.00705286 |
nan |
0.47472 |
0.432403 |
0.478879 |
0.449401 |
False |
0.460828 |
0.00706512 |
nan |
0.482024 |
0.439633 |
0.494119 |
0.464881 |
True |
0.0272388 |
0.00202397 |
nan |
0.0333107 |
0.0211669 |
0.025818 |
0.016022 |
True |
0.0583718 |
0.00261473 |
nan |
0.066216 |
0.0505277 |
0.0416915 |
0.0291885 |
True |
||
19 |
[45000:49999] |
9 |
45000 |
49999 |
analysis |
0.473578 |
0.00705286 |
nan |
0.494737 |
0.45242 |
0.478879 |
0.449401 |
False |
0.438153 |
0.00706512 |
nan |
0.459349 |
0.416958 |
0.494119 |
0.464881 |
True |
0.0296219 |
0.00202397 |
nan |
0.0356938 |
0.02355 |
0.025818 |
0.016022 |
True |
0.0586468 |
0.00261473 |
nan |
0.066491 |
0.0508026 |
0.0416915 |
0.0291885 |
True |
From these results we will want the value for each component of the confusion matrix for each chunk of data. To do so, we simply index into the results dataframe as is done below:
>>> true_pos_rate = results_data['true_positive']['value'].values
>>> false_pos_rate = results_data['false_positive']['value'].values
>>> true_neg_rate = results_data['true_negative']['value'].values
>>> false_neg_rate = results_data['false_negative']['value'].values
Now that we have these values, we can use them to calculate the sensitivity and specificity for each chunk of data. We can then use these values to calculate the balanced accuracy for each chunk of data.
As a reminder, the balanced accuracy is defined as:
and the sensitivity and specificity are defined as:
where \(TP\) is the number of true positives (or true positive rate), \(TN\) is the number of true negatives (or true negative rate), \(FP\) is the number of false positives (or false positive rate), and \(FN\) is the number of false negatives (or false negative rate).
>>> sensitivity = true_pos_rate / (true_pos_rate + false_neg_rate)
>>> specificity = true_neg_rate / (true_neg_rate + false_pos_rate)
>>> balanced_accuracy = (sensitivity + specificity) / 2
To distinguish between the balanced accuracy for the reference data and the analysis data,
we can get the number of chunks in the reference data and analysis data and then use this to
index the balanced_accuracy
array.
>>> num_ref_chunks = len(results.filter(period = 'reference').to_df())
>>> reference_index = np.arange(num_ref_chunks)
>>> analysis_index = np.arange(num_ref_chunks, len(results_data))
Since balanced accuracy is not supported out of the box with NannyML, we will create a custom plot to visualize the performance estimation results.
>>> plt.plot(reference_index, balanced_accuracy[:num_ref_chunks], label='Reference', marker='o')
>>> plt.plot(analysis_index, balanced_accuracy[num_ref_chunks:], label='Analysis', marker='o')
>>> plt.axvline(x=num_ref_chunks-0.5, color='gray')
>>> plt.xlabel('Chunk Number')
>>> plt.ylabel('Estimated Balanced Accuracy')
>>> plt.title('Estimated Balanced Accuracy')
>>> plt.legend()
>>> plt.show()
Insights
After reviewing the performance estimation results, we should be able to see any indications of performance change that NannyML has detected based upon the model’s inputs and outputs alone.
What’s next
The Data Drift functionality can help us to understand whether data drift is causing the performance problem. When the target values become available we can compared realized and estimated custom performance metric results.