Comparing Estimated and Realized Performance
Just the code
>>> import pandas as pd
>>> import nannyml as nml
>>> from IPython.display import display
>>> from sklearn.metrics import roc_auc_score
>>> import matplotlib.pyplot as plt
>>> reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_target_df.head(3)
>>> analysis_with_targets = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)
>>> display(analysis_with_targets.head(3))
>>> # Estimate performance without targets
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> )
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> display(results.filter(period='analysis').to_df())
>>> # Calculate realized performance using targets
>>> calculator = nml.PerformanceCalculator(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> ).fit(reference_df)
>>> realized_results = calculator.calculate(analysis_with_targets)
>>> display(realized_results.filter(period='analysis').to_df())
>>> # Show comparison plot
>>> results.filter(period='analysis').compare(realized_results).plot().show()
Walkthrough
When the targets become available, the quality of estimations provided by NannyML can be evaluated.
The beginning of the code below is similar to the one in tutorial on performance calculation with binary classification data.
The synthetic datasets provided with the library contain targets for analysis period.
It contains the target values for the monitored model in the repaid
column.
>>> import pandas as pd
>>> import nannyml as nml
>>> from IPython.display import display
>>> from sklearn.metrics import roc_auc_score
>>> import matplotlib.pyplot as plt
>>> reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_target_df.head(3)
repaid |
|
---|---|
0 |
1 |
1 |
1 |
2 |
1 |
For this example, the analysis targets and the analysis frame are joined by their index.
>>> analysis_with_targets = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)
>>> display(analysis_with_targets.head(3))
car_value |
salary_range |
debt_to_income_ratio |
loan_length |
repaid_loan_on_prev_car |
size_of_downpayment |
driver_tenure |
timestamp |
y_pred_proba |
y_pred |
repaid |
|
---|---|---|---|---|---|---|---|---|---|---|---|
0 |
12638 |
0 - 20K € |
0.487926 |
21 |
False |
10% |
4.22463 |
2018-10-30 18:00:00.000 |
0.99 |
1 |
1 |
1 |
52425 |
20K - 20K € |
0.672183 |
20 |
False |
40% |
4.9631 |
2018-10-30 18:08:43.152 |
0.98 |
1 |
1 |
2 |
20369 |
40K - 60K € |
0.70309 |
19 |
True |
40% |
4.58895 |
2018-10-30 18:17:26.304 |
0.98 |
1 |
1 |
Estimating performance without targets
We create the Confidence-based Performance Estimation (CBPE) estimator with a list of metrics, and an optional chunking specification. For more information about chunking you can check the chunking tutorial.
>>> # Estimate performance without targets
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> )
The CBPE estimator is then fitted using the fit()
method
on the reference data.
We estimate the performance of both the reference and analysis datasets, to compare the estimated and actual performance of the reference period.
We filter the results to only have the estimated values.
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> display(results.filter(period='analysis').to_df())
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
roc_auc
value
|
sampling_error
|
realized
|
upper_confidence_boundary
|
lower_confidence_boundary
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
0.968631 |
0.00181072 |
nan |
0.974063 |
0.963198 |
0.97866 |
0.963317 |
False |
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
0.969044 |
0.00181072 |
nan |
0.974476 |
0.963612 |
0.97866 |
0.963317 |
False |
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
0.969444 |
0.00181072 |
nan |
0.974876 |
0.964012 |
0.97866 |
0.963317 |
False |
3 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
0.969047 |
0.00181072 |
nan |
0.974479 |
0.963615 |
0.97866 |
0.963317 |
False |
4 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
0.968873 |
0.00181072 |
nan |
0.974305 |
0.963441 |
0.97866 |
0.963317 |
False |
5 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
0.960478 |
0.00181072 |
nan |
0.96591 |
0.955046 |
0.97866 |
0.963317 |
True |
6 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
0.961134 |
0.00181072 |
nan |
0.966566 |
0.955701 |
0.97866 |
0.963317 |
True |
7 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
0.960536 |
0.00181072 |
nan |
0.965968 |
0.955104 |
0.97866 |
0.963317 |
True |
8 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
0.961869 |
0.00181072 |
nan |
0.967301 |
0.956437 |
0.97866 |
0.963317 |
True |
9 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
0.960537 |
0.00181072 |
nan |
0.965969 |
0.955104 |
0.97866 |
0.963317 |
True |
Comparing to realized performance
We’ll first calculate the realized performance:
>>> # Calculate realized performance using targets
>>> calculator = nml.PerformanceCalculator(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> ).fit(reference_df)
>>> realized_results = calculator.calculate(analysis_with_targets)
>>> display(realized_results.filter(period='analysis').to_df())
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
targets_missing_rate
|
roc_auc
sampling_error
|
value
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
0 |
0.00181072 |
0.970962 |
0.97866 |
0.963317 |
False |
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
0 |
0.00181072 |
0.970248 |
0.97866 |
0.963317 |
False |
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
0 |
0.00181072 |
0.976282 |
0.97866 |
0.963317 |
False |
3 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
0 |
0.00181072 |
0.967721 |
0.97866 |
0.963317 |
False |
4 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
0 |
0.00181072 |
0.969886 |
0.97866 |
0.963317 |
False |
5 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
0 |
0.00181072 |
0.96005 |
0.97866 |
0.963317 |
True |
6 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
0 |
0.00181072 |
0.95853 |
0.97866 |
0.963317 |
True |
7 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
0 |
0.00181072 |
0.959041 |
0.97866 |
0.963317 |
True |
8 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
0 |
0.00181072 |
0.963094 |
0.97866 |
0.963317 |
True |
9 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
0 |
0.00181072 |
0.957556 |
0.97866 |
0.963317 |
True |
We can then visualize both estimated and realized performance in a single comparison plot.
>>> # Show comparison plot
>>> results.filter(period='analysis').compare(realized_results).plot().show()