Comparing Estimated and Realized Performance

Just the code

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()

>>> analysis_target_df.head(3)

>>> analysis_with_targets = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)

>>> display(analysis_with_targets.head(3))

>>> # Estimate performance without targets
>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc'],
...     chunk_size=5000,
...     problem_type='classification_binary',
>>> )

>>> estimator.fit(reference_df)

>>> results = estimator.estimate(analysis_df)

>>> display(results.filter(period='analysis').to_df())

>>> # Calculate realized performance using targets
>>> calculator = nml.PerformanceCalculator(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc'],
...     chunk_size=5000,
...     problem_type='classification_binary',
>>> ).fit(reference_df)
>>> realized_results = calculator.calculate(analysis_with_targets)
>>> display(realized_results.filter(period='analysis').to_df())

>>> # Show comparison plot
>>> results.filter(period='analysis').compare(realized_results).plot().show()

Walkthrough

When the targets become available, the quality of estimations provided by NannyML can be evaluated.

The beginning of the code below is similar to the one in tutorial on performance calculation with binary classification data.

For simplicity this guide is based on a synthetic dataset included in the library, where the monitored model predicts whether a customer will repay a loan to buy a car. Check out Car Loan Dataset to learn more about this dataset.

This datasets provided contains targets for analysis period. It has the target values for the monitored model in the repaid column.

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()

>>> analysis_target_df.head(3)

repaid

0

1

1

1

2

1

For this example, the analysis targets and the analysis frame are joined by their index.

>>> analysis_with_targets = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)

>>> display(analysis_with_targets.head(3))

car_value

salary_range

debt_to_income_ratio

loan_length

repaid_loan_on_prev_car

size_of_downpayment

driver_tenure

timestamp

y_pred_proba

y_pred

repaid

0

12638

0 - 20K €

0.487926

21

False

10%

4.22463

2018-10-30 18:00:00.000

0.99

1

1

1

52425

20K - 20K €

0.672183

20

False

40%

4.9631

2018-10-30 18:08:43.152

0.98

1

1

2

20369

40K - 60K €

0.70309

19

True

40%

4.58895

2018-10-30 18:17:26.304

0.98

1

1

Estimating performance without targets

We create the Confidence-based Performance Estimation (CBPE) estimator with a list of metrics, and an optional chunking specification. For more information about chunking you can check the chunking tutorial.

>>> # Estimate performance without targets
>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc'],
...     chunk_size=5000,
...     problem_type='classification_binary',
>>> )

The CBPE estimator is then fitted using the fit() method on the reference data.

We estimate the performance of both the reference and analysis datasets, to compare the estimated and actual performance of the reference period.

We filter the results to only have the estimated values.

>>> estimator.fit(reference_df)

>>> results = estimator.estimate(analysis_df)

>>> display(results.filter(period='analysis').to_df())

chunk
key
chunk_index
start_index
end_index
start_date
end_date
period
roc_auc
value
sampling_error
realized
upper_confidence_boundary
lower_confidence_boundary
upper_threshold
lower_threshold
alert

0

[0:4999]

0

0

4999

2018-10-30 18:00:00

2018-11-30 00:27:16.848000

analysis

0.968631

0.00181072

nan

0.974063

0.963198

0.97866

0.963317

False

1

[5000:9999]

1

5000

9999

2018-11-30 00:36:00

2018-12-30 07:03:16.848000

analysis

0.969044

0.00181072

nan

0.974476

0.963612

0.97866

0.963317

False

2

[10000:14999]

2

10000

14999

2018-12-30 07:12:00

2019-01-29 13:39:16.848000

analysis

0.969444

0.00181072

nan

0.974876

0.964012

0.97866

0.963317

False

3

[15000:19999]

3

15000

19999

2019-01-29 13:48:00

2019-02-28 20:15:16.848000

analysis

0.969047

0.00181072

nan

0.974479

0.963615

0.97866

0.963317

False

4

[20000:24999]

4

20000

24999

2019-02-28 20:24:00

2019-03-31 02:51:16.848000

analysis

0.968873

0.00181072

nan

0.974305

0.963441

0.97866

0.963317

False

5

[25000:29999]

5

25000

29999

2019-03-31 03:00:00

2019-04-30 09:27:16.848000

analysis

0.960478

0.00181072

nan

0.96591

0.955046

0.97866

0.963317

True

6

[30000:34999]

6

30000

34999

2019-04-30 09:36:00

2019-05-30 16:03:16.848000

analysis

0.961134

0.00181072

nan

0.966566

0.955701

0.97866

0.963317

True

7

[35000:39999]

7

35000

39999

2019-05-30 16:12:00

2019-06-29 22:39:16.848000

analysis

0.960536

0.00181072

nan

0.965968

0.955104

0.97866

0.963317

True

8

[40000:44999]

8

40000

44999

2019-06-29 22:48:00

2019-07-30 05:15:16.848000

analysis

0.961869

0.00181072

nan

0.967301

0.956437

0.97866

0.963317

True

9

[45000:49999]

9

45000

49999

2019-07-30 05:24:00

2019-08-29 11:51:16.848000

analysis

0.960537

0.00181072

nan

0.965969

0.955104

0.97866

0.963317

True

Comparing to realized performance

We’ll first calculate the realized performance:

>>> # Calculate realized performance using targets
>>> calculator = nml.PerformanceCalculator(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc'],
...     chunk_size=5000,
...     problem_type='classification_binary',
>>> ).fit(reference_df)
>>> realized_results = calculator.calculate(analysis_with_targets)
>>> display(realized_results.filter(period='analysis').to_df())

chunk
key
chunk_index
start_index
end_index
start_date
end_date
period
targets_missing_rate
roc_auc
sampling_error
value
upper_threshold
lower_threshold
alert

0

[0:4999]

0

0

4999

2018-10-30 18:00:00

2018-11-30 00:27:16.848000

analysis

0

0.00181072

0.970962

0.97866

0.963317

False

1

[5000:9999]

1

5000

9999

2018-11-30 00:36:00

2018-12-30 07:03:16.848000

analysis

0

0.00181072

0.970248

0.97866

0.963317

False

2

[10000:14999]

2

10000

14999

2018-12-30 07:12:00

2019-01-29 13:39:16.848000

analysis

0

0.00181072

0.976282

0.97866

0.963317

False

3

[15000:19999]

3

15000

19999

2019-01-29 13:48:00

2019-02-28 20:15:16.848000

analysis

0

0.00181072

0.967721

0.97866

0.963317

False

4

[20000:24999]

4

20000

24999

2019-02-28 20:24:00

2019-03-31 02:51:16.848000

analysis

0

0.00181072

0.969886

0.97866

0.963317

False

5

[25000:29999]

5

25000

29999

2019-03-31 03:00:00

2019-04-30 09:27:16.848000

analysis

0

0.00181072

0.96005

0.97866

0.963317

True

6

[30000:34999]

6

30000

34999

2019-04-30 09:36:00

2019-05-30 16:03:16.848000

analysis

0

0.00181072

0.95853

0.97866

0.963317

True

7

[35000:39999]

7

35000

39999

2019-05-30 16:12:00

2019-06-29 22:39:16.848000

analysis

0

0.00181072

0.959041

0.97866

0.963317

True

8

[40000:44999]

8

40000

44999

2019-06-29 22:48:00

2019-07-30 05:15:16.848000

analysis

0

0.00181072

0.963094

0.97866

0.963317

True

9

[45000:49999]

9

45000

49999

2019-07-30 05:24:00

2019-08-29 11:51:16.848000

analysis

0

0.00181072

0.957556

0.97866

0.963317

True

We can then visualize both estimated and realized performance in a single comparison plot.

>>> # Show comparison plot
>>> results.filter(period='analysis').compare(realized_results).plot().show()
../_images/comparison_plot.svg