Creating and Estimating a Custom Binary Classification Metric

This tutorial explains how to use NannyML to estimate a custom metric based on confusion matrix for binary classification models in the absence of target data. In particular, we will be creating a balanced accuracy metric. To find out how CBPE estimates the confusion matrix components, read the explanation of Confidence-based Performance Estimation.

Just the Code

>>> import nannyml as nml
>>> from IPython.display import display
>>> import numpy as np
>>> import matplotlib.pyplot as plt

>>> reference_df = nml.load_synthetic_car_loan_dataset()[0]
>>> analysis_df = nml.load_synthetic_car_loan_dataset()[1]

>>> display(reference_df.head(3))

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     metrics=['confusion_matrix'],
...     problem_type='classification_binary',
...     normalize_confusion_matrix="all",
>>> )

>>> estimator.fit(reference_df)

>>> results = estimator.estimate(analysis_df)

>>> results_data = results.to_df()
>>> display(results_data)

>>> true_pos_rate = results_data['true_positive']['value'].values
>>> false_pos_rate = results_data['false_positive']['value'].values
>>> true_neg_rate = results_data['true_negative']['value'].values
>>> false_neg_rate = results_data['false_negative']['value'].values

>>> sensitivity = true_pos_rate / (true_pos_rate + false_neg_rate)
>>> specificity = true_neg_rate / (true_neg_rate + false_pos_rate)

>>> balanced_accuracy = (sensitivity + specificity) / 2

>>> num_ref_chunks = len(results.filter(period = 'reference').to_df())

>>> reference_index = np.arange(num_ref_chunks)
>>> analysis_index = np.arange(num_ref_chunks, len(results_data))

>>> plt.plot(reference_index, balanced_accuracy[:num_ref_chunks], label='Reference', marker='o')
>>> plt.plot(analysis_index, balanced_accuracy[num_ref_chunks:], label='Analysis', marker='o')

>>> plt.axvline(x=num_ref_chunks-0.5, color='gray')

>>> plt.xlabel('Chunk Number')
>>> plt.ylabel('Estimated Balanced Accuracy')
>>> plt.title('Estimated Balanced Accuracy')

>>> plt.legend()

>>> plt.show()

Walkthrough

While NannyML offers out-of-the-box support for the estimation of a number of metrics (see which in our Estimating Performance for Binary Classification page), it is also possible to create custom metrics. In this tutorial we will be creating a balanced accuracy metric, using the confusion matrix as a building block.

For simplicity this guide is based on a synthetic dataset included in the library, where the monitored model predicts whether a customer will repay a loan to buy a car. You can read more about this synthetic dataset here.

In order to monitor a model, NannyML needs to learn about it from a reference dataset. Then it can monitor the data that is subject to actual analysis, provided as the analysis dataset. You can read more about this in our section on data periods.

We start by importing the libraries we’ll need and loading the dataset we’ll be using:

>>> import nannyml as nml
>>> from IPython.display import display
>>> import numpy as np
>>> import matplotlib.pyplot as plt

>>> reference_df = nml.load_synthetic_car_loan_dataset()[0]
>>> analysis_df = nml.load_synthetic_car_loan_dataset()[1]

>>> display(reference_df.head(3))

id

car_value

salary_range

debt_to_income_ratio

loan_length

repaid_loan_on_prev_car

size_of_downpayment

driver_tenure

repaid

timestamp

y_pred_proba

y_pred

0

0

39811

40K - 60K €

0.63295

19

False

40%

0.212653

1

2018-01-01 00:00:00.000

0.99

1

1

1

12679

40K - 60K €

0.718627

7

True

10%

4.92755

0

2018-01-01 00:08:43.152

0.07

0

2

2

19847

40K - 60K €

0.721724

17

False

0%

0.520817

1

2018-01-01 00:17:26.304

1

1

Next we create the Confidence-based Performance Estimation (CBPE) estimator to estimate the confusion matrix elements that we will need for our custom metric. In order to estimate the confusion_matrix elements we will specify the metrics parameter as [‘confusion_matrix’]. We will also specify the normalize_confusion_matrix parameter as “all” to get the rate instead of the count for each cell.

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     metrics=['confusion_matrix'],
...     problem_type='classification_binary',
...     normalize_confusion_matrix="all",
>>> )

The CBPE estimator is then fitted using the fit() method on the reference data.

>>> estimator.fit(reference_df)

The fitted estimator can be used to estimate performance on other data, for which performance cannot be calculated. Typically, this would be used on the latest production data where target is missing. In our example this is the analysis_df data.

NannyML can then output a dataframe that contains all the results.

>>> results = estimator.estimate(analysis_df)

>>> results_data = results.to_df()
>>> display(results_data)

chunk
key
chunk_index
start_index
end_index
start_date
end_date
period
true_positive
value
sampling_error
realized
upper_confidence_boundary
lower_confidence_boundary
upper_threshold
lower_threshold
alert
true_negative
value
sampling_error
realized
upper_confidence_boundary
lower_confidence_boundary
upper_threshold
lower_threshold
alert
false_positive
value
sampling_error
realized
upper_confidence_boundary
lower_confidence_boundary
upper_threshold
lower_threshold
alert
false_negative
value
sampling_error
realized
upper_confidence_boundary
lower_confidence_boundary
upper_threshold
lower_threshold
alert

0

[0:4999]

0

0

4999

reference

0.458185

0.00705286

0.4596

0.479343

0.437026

0.478879

0.449401

False

0.486383

0.00706512

0.4866

0.507579

0.465188

0.494119

0.464881

False

0.0204154

0.00202397

0.019

0.0264873

0.0143435

0.025818

0.016022

False

0.0350166

0.00261473

0.0348

0.0428607

0.0271724

0.0416915

0.0291885

False

1

[5000:9999]

1

5000

9999

reference

0.456855

0.00705286

0.455

0.478013

0.435696

0.478879

0.449401

False

0.485678

0.00706512

0.4844

0.506873

0.464482

0.494119

0.464881

False

0.0207453

0.00202397

0.0226

0.0268172

0.0146733

0.025818

0.016022

False

0.0367222

0.00261473

0.038

0.0445664

0.028878

0.0416915

0.0291885

False

2

[10000:14999]

2

10000

14999

reference

0.469963

0.00705286

0.471

0.491121

0.448804

0.478879

0.449401

False

0.473446

0.00706512

0.4752

0.494641

0.452251

0.494119

0.464881

False

0.0208371

0.00202397

0.0198

0.0269091

0.0147652

0.025818

0.016022

False

0.035754

0.00261473

0.034

0.0435982

0.0279098

0.0416915

0.0291885

False

3

[15000:19999]

3

15000

19999

reference

0.46226

0.00705286

0.4634

0.483419

0.441102

0.478879

0.449401

False

0.481754

0.00706512

0.4808

0.502949

0.460559

0.494119

0.464881

False

0.0207396

0.00202397

0.0196

0.0268115

0.0146677

0.025818

0.016022

False

0.035246

0.00261473

0.0362

0.0430901

0.0274018

0.0416915

0.0291885

False

4

[20000:24999]

4

20000

24999

reference

0.468431

0.00705286

0.4674

0.489589

0.447272

0.478879

0.449401

False

0.475128

0.00706512

0.4708

0.496324

0.453933

0.494119

0.464881

False

0.0209695

0.00202397

0.022

0.0270414

0.0148976

0.025818

0.016022

False

0.0354715

0.00261473

0.0398

0.0433157

0.0276274

0.0416915

0.0291885

False

5

[25000:29999]

5

25000

29999

reference

0.459727

0.00705286

0.458

0.480885

0.438568

0.478879

0.449401

False

0.484389

0.00706512

0.4862

0.505584

0.463193

0.494119

0.464881

False

0.0208731

0.00202397

0.0226

0.026945

0.0148012

0.025818

0.016022

False

0.0350115

0.00261473

0.0332

0.0428557

0.0271673

0.0416915

0.0291885

False

6

[30000:34999]

6

30000

34999

reference

0.465254

0.00705286

0.4648

0.486413

0.444096

0.478879

0.449401

False

0.476255

0.00706512

0.4802

0.49745

0.45506

0.494119

0.464881

False

0.0201459

0.00202397

0.0206

0.0262178

0.014074

0.025818

0.016022

False

0.0383451

0.00261473

0.0344

0.0461892

0.0305009

0.0416915

0.0291885

False

7

[35000:39999]

7

35000

39999

reference

0.469571

0.00705286

0.469

0.49073

0.448412

0.478879

0.449401

False

0.475337

0.00706512

0.476

0.496532

0.454141

0.494119

0.464881

False

0.0210291

0.00202397

0.0216

0.027101

0.0149572

0.025818

0.016022

False

0.0340635

0.00261473

0.0334

0.0419076

0.0262193

0.0416915

0.0291885

False

8

[40000:44999]

8

40000

44999

reference

0.465682

0.00705286

0.4682

0.48684

0.444523

0.478879

0.449401

False

0.479609

0.00706512

0.4768

0.500804

0.458414

0.494119

0.464881

False

0.0207181

0.00202397

0.0182

0.02679

0.0146462

0.025818

0.016022

False

0.033991

0.00261473

0.0368

0.0418352

0.0261468

0.0416915

0.0291885

False

9

[45000:49999]

9

45000

49999

reference

0.466762

0.00705286

0.465

0.48792

0.445603

0.478879

0.449401

False

0.47831

0.00706512

0.478

0.499505

0.457115

0.494119

0.464881

False

0.0214382

0.00202397

0.0232

0.0275101

0.0153662

0.025818

0.016022

False

0.03349

0.00261473

0.0338

0.0413341

0.0256458

0.0416915

0.0291885

False

10

[0:4999]

0

0

4999

analysis

0.481766

0.00705286

nan

0.502925

0.460608

0.478879

0.449401

True

0.460026

0.00706512

nan

0.481221

0.43883

0.494119

0.464881

True

0.0212337

0.00202397

nan

0.0273056

0.0151617

0.025818

0.016022

False

0.0369745

0.00261473

nan

0.0448186

0.0291303

0.0416915

0.0291885

False

11

[5000:9999]

1

5000

9999

analysis

0.454646

0.00705286

nan

0.475804

0.433487

0.478879

0.449401

False

0.488676

0.00706512

nan

0.509871

0.46748

0.494119

0.464881

False

0.0199543

0.00202397

nan

0.0260262

0.0138824

0.025818

0.016022

False

0.0367245

0.00261473

nan

0.0445687

0.0288803

0.0416915

0.0291885

False

12

[10000:14999]

2

10000

14999

analysis

0.455756

0.00705286

nan

0.476914

0.434597

0.478879

0.449401

False

0.489736

0.00706512

nan

0.510931

0.46854

0.494119

0.464881

False

0.0198442

0.00202397

nan

0.0259161

0.0137723

0.025818

0.016022

False

0.0346643

0.00261473

nan

0.0425084

0.0268201

0.0416915

0.0291885

False

13

[15000:19999]

3

15000

19999

analysis

0.457828

0.00705286

nan

0.478987

0.43667

0.478879

0.449401

False

0.486988

0.00706512

nan

0.508183

0.465793

0.494119

0.464881

False

0.0205719

0.00202397

nan

0.0266438

0.0145

0.025818

0.016022

False

0.0346121

0.00261473

nan

0.0424563

0.0267679

0.0416915

0.0291885

False

14

[20000:24999]

4

20000

24999

analysis

0.468372

0.00705286

nan

0.489531

0.447213

0.478879

0.449401

False

0.476273

0.00706512

nan

0.497468

0.455078

0.494119

0.464881

False

0.020428

0.00202397

nan

0.0264999

0.014356

0.025818

0.016022

False

0.034927

0.00261473

nan

0.0427712

0.0270829

0.0416915

0.0291885

False

15

[25000:29999]

5

25000

29999

analysis

0.461246

0.00705286

nan

0.482404

0.440087

0.478879

0.449401

False

0.449469

0.00706512

nan

0.470664

0.428273

0.494119

0.464881

True

0.0287544

0.00202397

nan

0.0348263

0.0226825

0.025818

0.016022

True

0.0605314

0.00261473

nan

0.0683756

0.0526873

0.0416915

0.0291885

True

16

[30000:34999]

6

30000

34999

analysis

0.459067

0.00705286

nan

0.480225

0.437908

0.478879

0.449401

False

0.452083

0.00706512

nan

0.473278

0.430888

0.494119

0.464881

True

0.0283335

0.00202397

nan

0.0344054

0.0222616

0.025818

0.016022

True

0.060517

0.00261473

nan

0.0683612

0.0526729

0.0416915

0.0291885

True

17

[35000:39999]

7

35000

39999

analysis

0.458246

0.00705286

nan

0.479404

0.437087

0.478879

0.449401

False

0.452947

0.00706512

nan

0.474142

0.431752

0.494119

0.464881

True

0.0295542

0.00202397

nan

0.0356261

0.0234823

0.025818

0.016022

True

0.0592531

0.00261473

nan

0.0670972

0.0514089

0.0416915

0.0291885

True

18

[40000:44999]

8

40000

44999

analysis

0.453561

0.00705286

nan

0.47472

0.432403

0.478879

0.449401

False

0.460828

0.00706512

nan

0.482024

0.439633

0.494119

0.464881

True

0.0272388

0.00202397

nan

0.0333107

0.0211669

0.025818

0.016022

True

0.0583718

0.00261473

nan

0.066216

0.0505277

0.0416915

0.0291885

True

19

[45000:49999]

9

45000

49999

analysis

0.473578

0.00705286

nan

0.494737

0.45242

0.478879

0.449401

False

0.438153

0.00706512

nan

0.459349

0.416958

0.494119

0.464881

True

0.0296219

0.00202397

nan

0.0356938

0.02355

0.025818

0.016022

True

0.0586468

0.00261473

nan

0.066491

0.0508026

0.0416915

0.0291885

True

From these results we will want the value for each component of the confusion matrix for each chunk of data. To do so, we simply index into the results dataframe as is done below:

>>> true_pos_rate = results_data['true_positive']['value'].values
>>> false_pos_rate = results_data['false_positive']['value'].values
>>> true_neg_rate = results_data['true_negative']['value'].values
>>> false_neg_rate = results_data['false_negative']['value'].values

Now that we have these values, we can use them to calculate the sensitivity and specificity for each chunk of data. We can then use these values to calculate the balanced accuracy for each chunk of data.

As a reminder, the balanced accuracy is defined as:

\[\text{balanced accuracy} = \frac{1}{2} \left( \text{sensitivity} + \text{specificity} \right)\]

and the sensitivity and specificity are defined as:

\[\text{sensitivity} = \frac{TP}{TP + FN}\]
\[\text{specificity} = \frac{TN}{TN + FP}\]

where \(TP\) is the number of true positives (or true positive rate), \(TN\) is the number of true negatives (or true negative rate), \(FP\) is the number of false positives (or false positive rate), and \(FN\) is the number of false negatives (or false negative rate).

>>> sensitivity = true_pos_rate / (true_pos_rate + false_neg_rate)
>>> specificity = true_neg_rate / (true_neg_rate + false_pos_rate)

>>> balanced_accuracy = (sensitivity + specificity) / 2

To distinguish between the balanced accuracy for the reference data and the analysis data, we can get the number of chunks in the reference data and analysis data and then use this to index the balanced_accuracy array.

>>> num_ref_chunks = len(results.filter(period = 'reference').to_df())

>>> reference_index = np.arange(num_ref_chunks)
>>> analysis_index = np.arange(num_ref_chunks, len(results_data))

Since balanced accuracy is not supported out of the box with NannyML, we will create a custom plot to visualize the performance estimation results.

>>> plt.plot(reference_index, balanced_accuracy[:num_ref_chunks], label='Reference', marker='o')
>>> plt.plot(analysis_index, balanced_accuracy[num_ref_chunks:], label='Analysis', marker='o')

>>> plt.axvline(x=num_ref_chunks-0.5, color='gray')

>>> plt.xlabel('Chunk Number')
>>> plt.ylabel('Estimated Balanced Accuracy')
>>> plt.title('Estimated Balanced Accuracy')

>>> plt.legend()

>>> plt.show()
../../../_images/tutorial-custom-metric-estimation-binary-car-loan-analysis-with-ref.svg

Insights

After reviewing the performance estimation results, we should be able to see any indications of performance change that NannyML has detected based upon the model’s inputs and outputs alone.

What’s next

The Data Drift functionality can help us to understand whether data drift is causing the performance problem. When the target values become available we can compared realized and estimated custom performance metric results.