Comparing Estimated and Realized Performance
Just the code
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df, analysis_df, analysis_targets_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_targets_df.head(3)
>>> analysis_with_targets = analysis_df.merge(analysis_targets_df, left_index=True, right_index=True)
>>> display(analysis_with_targets.head(3))
>>> # Estimate performance without targets
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> )
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> display(results.filter(period='analysis').to_df())
>>> # Calculate realized performance using targets
>>> calculator = nml.PerformanceCalculator(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> ).fit(reference_df)
>>> realized_results = calculator.calculate(analysis_with_targets)
>>> display(realized_results.filter(period='analysis').to_df())
>>> # Show comparison plot
>>> results.filter(period='analysis').compare(realized_results).plot().show()
Walkthrough
When the targets become available, the quality of estimations provided by NannyML can be evaluated.
The beginning of the code below is similar to the one in tutorial on performance calculation with binary classification data. while this tutorial uses the roc_auc metric, any metric estimated and calculated by NannyML can be used for comparison.
For simplicity this guide is based on a synthetic dataset included in the library, where the monitored model predicts whether a customer will repay a loan to buy a car. Check out Car Loan Dataset to learn more about this dataset.
This datasets provided contains targets for analysis period. It has the target values for the monitored model in the repaid column.
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df, analysis_df, analysis_targets_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_targets_df.head(3)
id |
repaid |
|
---|---|---|
0 |
50000 |
1 |
1 |
50001 |
1 |
2 |
50002 |
1 |
For this example, the analysis targets and the analysis frame are joined by their index.
>>> analysis_with_targets = analysis_df.merge(analysis_targets_df, left_index=True, right_index=True)
>>> display(analysis_with_targets.head(3))
id_x |
car_value |
salary_range |
debt_to_income_ratio |
loan_length |
repaid_loan_on_prev_car |
size_of_downpayment |
driver_tenure |
timestamp |
y_pred_proba |
y_pred |
id_y |
repaid |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
50000 |
12638 |
0 - 20K € |
0.487926 |
21 |
False |
10% |
4.22463 |
2018-10-30 18:00:00.000 |
0.99 |
1 |
50000 |
1 |
1 |
50001 |
52425 |
20K - 40K € |
0.672183 |
20 |
False |
40% |
4.9631 |
2018-10-30 18:08:43.152 |
0.98 |
1 |
50001 |
1 |
2 |
50002 |
20369 |
40K - 60K € |
0.70309 |
19 |
True |
40% |
4.58895 |
2018-10-30 18:17:26.304 |
0.98 |
1 |
50002 |
1 |
Estimating performance without targets
We create the Confidence-based Performance Estimation (CBPE
)
estimator with a list of metrics, and an optional chunking specification.
For more information about chunking you can check the chunking tutorial.
>>> # Estimate performance without targets
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> )
The CBPE
estimator is then fitted using
the fit()
method on the reference data.
We estimate the performance of both the reference and analysis datasets, to compare the estimated and actual performance of the reference period.
We filter the results to only have the estimated values.
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> display(results.filter(period='analysis').to_df())
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
roc_auc
value
|
sampling_error
|
realized
|
upper_confidence_boundary
|
lower_confidence_boundary
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
0.970744 |
0.00181072 |
nan |
0.976176 |
0.965311 |
0.97866 |
0.963317 |
False |
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
0.971011 |
0.00181072 |
nan |
0.976443 |
0.965578 |
0.97866 |
0.963317 |
False |
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
0.971407 |
0.00181072 |
nan |
0.976839 |
0.965975 |
0.97866 |
0.963317 |
False |
3 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
0.971091 |
0.00181072 |
nan |
0.976524 |
0.965659 |
0.97866 |
0.963317 |
False |
4 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
0.971123 |
0.00181072 |
nan |
0.976555 |
0.965691 |
0.97866 |
0.963317 |
False |
5 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
0.96109 |
0.00181072 |
nan |
0.966522 |
0.955658 |
0.97866 |
0.963317 |
True |
6 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
0.961825 |
0.00181072 |
nan |
0.967257 |
0.956393 |
0.97866 |
0.963317 |
True |
7 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
0.961073 |
0.00181072 |
nan |
0.966506 |
0.955641 |
0.97866 |
0.963317 |
True |
8 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
0.962533 |
0.00181072 |
nan |
0.967966 |
0.957101 |
0.97866 |
0.963317 |
True |
9 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
0.961316 |
0.00181072 |
nan |
0.966748 |
0.955884 |
0.97866 |
0.963317 |
True |
Comparing to realized performance
We’ll first calculate the realized performance:
>>> # Calculate realized performance using targets
>>> calculator = nml.PerformanceCalculator(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> ).fit(reference_df)
>>> realized_results = calculator.calculate(analysis_with_targets)
>>> display(realized_results.filter(period='analysis').to_df())
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
targets_missing_rate
|
roc_auc
sampling_error
|
value
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
0 |
0.00181072 |
0.970962 |
0.97866 |
0.963317 |
False |
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
0 |
0.00181072 |
0.970248 |
0.97866 |
0.963317 |
False |
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
0 |
0.00181072 |
0.976282 |
0.97866 |
0.963317 |
False |
3 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
0 |
0.00181072 |
0.967721 |
0.97866 |
0.963317 |
False |
4 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
0 |
0.00181072 |
0.969886 |
0.97866 |
0.963317 |
False |
5 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
0 |
0.00181072 |
0.96005 |
0.97866 |
0.963317 |
True |
6 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
0 |
0.00181072 |
0.95853 |
0.97866 |
0.963317 |
True |
7 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
0 |
0.00181072 |
0.959041 |
0.97866 |
0.963317 |
True |
8 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
0 |
0.00181072 |
0.963094 |
0.97866 |
0.963317 |
True |
9 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
0 |
0.00181072 |
0.957556 |
0.97866 |
0.963317 |
True |
We can then visualize both estimated and realized performance in a single comparison plot.
>>> # Show comparison plot
>>> results.filter(period='analysis').compare(realized_results).plot().show()