Estimating Business Value for Binary Classification
This tutorial explains how to use NannyML to estimate business value for binary classification models in the absence of target data. To find out how CBPE estimates performance, read the explanation of Confidencebased Performance Estimation.
Note
The following example uses timestamps. These are optional but have an impact on the way data is chunked and results are plotted. You can read more about them in the data requirements.
Just The Code
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df = nml.load_synthetic_car_loan_dataset()[0]
>>> analysis_df = nml.load_synthetic_car_loan_dataset()[1]
>>> display(reference_df.head(3))
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['business_value'],
... chunk_size=5000,
... problem_type='classification_binary',
... business_value_matrix=[[5, 10], [50, 50]],
... normalize_business_value="per_prediction",
>>> )
>>> estimator.fit(reference_df)
>>> results = estimator.estimate(analysis_df)
>>> display(results.filter(period='analysis').to_df())
>>> metric_fig = results.plot()
>>> metric_fig.show()
Walkthrough
For simplicity this guide is based on a synthetic dataset included in the library, where the monitored model predicts whether a customer will repay a loan to buy a car. Check out Car Loan Dataset to learn more about this dataset.
In order to monitor a model, NannyML needs to learn about it from a reference dataset. Then it can monitor the data that is subject to actual analysis, provided as the analysis dataset. You can read more about this in our section on data periods.
We start by loading the dataset we’ll be using:
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df = nml.load_synthetic_car_loan_dataset()[0]
>>> analysis_df = nml.load_synthetic_car_loan_dataset()[1]
>>> display(reference_df.head(3))
car_value 
salary_range 
debt_to_income_ratio 
loan_length 
repaid_loan_on_prev_car 
size_of_downpayment 
driver_tenure 
repaid 
timestamp 
y_pred_proba 
y_pred 


0 
39811 
40K  60K € 
0.63295 
19 
False 
40% 
0.212653 
1 
20180101 00:00:00.000 
0.99 
1 
1 
12679 
40K  60K € 
0.718627 
7 
True 
10% 
4.92755 
0 
20180101 00:08:43.152 
0.07 
0 
2 
19847 
40K  60K € 
0.721724 
17 
False 
0% 
0.520817 
1 
20180101 00:17:26.304 
1 
1 
Next we create the Confidencebased Performance Estimation
(CBPE
)
estimator. To initialize an estimator that estimates business_value, we specify the following
parameters:
y_pred_proba: the name of the column in the reference data that contains the predicted probabilities.
y_pred: the name of the column in the reference data that contains the predicted classes.
y_true: the name of the column in the reference data that contains the true classes.
timestamp_column_name (Optional): the name of the column in the reference data that contains timestamps.
metrics: a list of metrics to estimate. In this example we will estimate the
business_value
metric.chunk_size (Optional): the number of observations in each chunk of data used to estimate performance. For more information about chunking configurations check out the chunking tutorial.
problem_type: the type of problem being monitored. In this example we will monitor a binary classification problem.
business_value_matrix: a 2x2 matrix that specifies the value of each cell in the confusion matrix where the top left cell is the value of a true negative, the top right cell is the value of a false positive, the bottom left cell is the value of a false negative, and the bottom right cell is the value of a true positive.
normalize_business_value (Optional): how to normalize the business value. The normalization options are:
None : returns the total value per chunk
“per_prediction” : returns the total value for the chunk divided by the number of observations in a given chunk.
thresholds (Optional): the thresholds used to calculate the alert flag. For more information about thresholds, check out the thresholds tutorial.
Note
When estimating business_value, the business_value_matrix
parameter is required. The format of the business value matrix
must be specified as [[value_of_TN, value_of_FP], [value_of_FN, value_of_TP]]
. For more information about
the business value matrix, check out the Business Value “How it Works” page.
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['business_value'],
... chunk_size=5000,
... problem_type='classification_binary',
... business_value_matrix=[[5, 10], [50, 50]],
... normalize_business_value="per_prediction",
>>> )
The CBPE
estimator is then fitted using the
fit()
method on the reference
data.
>>> estimator.fit(reference_df)
The fitted estimator
can be used to estimate performance on other data, for which performance cannot be calculated.
Typically, this would be used on the latest production data where target is missing. In our example this is
the analysis_df
data.
NannyML can then output a dataframe that contains all the results. Let’s have a look at the results for analysis period only.
>>> results = estimator.estimate(analysis_df)
>>> display(results.filter(period='analysis').to_df())
chunk
key

chunk_index

start_index

end_index

start_date

end_date

period

business_value
value

sampling_error

realized

upper_confidence_boundary

lower_confidence_boundary

upper_threshold

lower_threshold

alert



0 
[0:4999] 
0 
0 
4999 
20181030 18:00:00 
20181130 00:27:16.848000 
analysis 
24.3274 
0.375491 
nan 
25.4539 
23.2009 
24.4203 
22.8263 
False 
1 
[5000:9999] 
1 
5000 
9999 
20181130 00:36:00 
20181230 07:03:16.848000 
analysis 
23.1399 
0.375491 
nan 
24.2664 
22.0134 
24.4203 
22.8263 
False 
2 
[10000:14999] 
2 
10000 
14999 
20181230 07:12:00 
20190129 13:39:16.848000 
analysis 
23.3048 
0.375491 
nan 
24.4313 
22.1783 
24.4203 
22.8263 
False 
3 
[15000:19999] 
3 
15000 
19999 
20190129 13:48:00 
20190228 20:15:16.848000 
analysis 
23.39 
0.375491 
nan 
24.5165 
22.2635 
24.4203 
22.8263 
False 
4 
[20000:24999] 
4 
20000 
24999 
20190228 20:24:00 
20190331 02:51:16.848000 
analysis 
23.8493 
0.375491 
nan 
24.9758 
22.7229 
24.4203 
22.8263 
False 
5 
[25000:29999] 
5 
25000 
29999 
20190331 03:00:00 
20190430 09:27:16.848000 
analysis 
21.9955 
0.375491 
nan 
23.122 
20.869 
24.4203 
22.8263 
True 
6 
[30000:34999] 
6 
30000 
34999 
20190430 09:36:00 
20190530 16:03:16.848000 
analysis 
21.9046 
0.375491 
nan 
23.031 
20.7781 
24.4203 
22.8263 
True 
7 
[35000:39999] 
7 
35000 
39999 
20190530 16:12:00 
20190629 22:39:16.848000 
analysis 
21.9188 
0.375491 
nan 
23.0453 
20.7924 
24.4203 
22.8263 
True 
8 
[40000:44999] 
8 
40000 
44999 
20190629 22:48:00 
20190730 05:15:16.848000 
analysis 
21.7912 
0.375491 
nan 
22.9177 
20.6647 
24.4203 
22.8263 
True 
9 
[45000:49999] 
9 
45000 
49999 
20190730 05:24:00 
20190829 11:51:16.848000 
analysis 
22.6411 
0.375491 
nan 
23.7676 
21.5146 
24.4203 
22.8263 
True 
Apart from chunkrelated data, the results data have the following columns for each metric that was estimated:
value  the estimate of a metric for a specific chunk.
sampling_error  the estimate of the sampling error.
realized  when target values are available for a chunk, the realized performance metric will also be calculated and included within the results.
upper_confidence_boundary and lower_confidence_boundary  These values show the confidence band of the relevant metric and are equal to estimated value +/ 3 times the estimated sampling error.
upper_threshold and lower_threshold  crossing these thresholds will raise an alert on significant performance change. The thresholds are calculated based on the actual performance of the monitored model on chunks in the reference partition. The thresholds are 3 standard deviations away from the mean performance calculated on the reference chunks. The thresholds are calculated during fit phase.
alert  flag indicating potentially significant performance change.
True
if estimated performance crosses upper or lower threshold.
These results can be also plotted. Our plots contains several key elements.
The purple step plot shows the estimated performance in each chunk of the analysis period. Thick squared point markers indicate the middle of these chunks.
The lowsaturated purple area around the estimated performance in the analysis period corresponds to the confidence band which is calculated as the estimated performance +/ 3 times the estimated Sampling Error.
The gray vertical line splits the reference and analysis periods.
The red horizontal dashed lines show upper and lower thresholds for alerting purposes.
The red diamondshaped point markers in the middle of a chunk indicate that an alert has been raised. Alerts are caused by the estimated performance crossing the upper or lower threshold.
>>> metric_fig = results.plot()
>>> metric_fig.show()
Additional information such as the chunk index range and chunk date range (if timestamps were provided) is shown in the hover for each chunk (these are interactive plots, though only static views are included here).
Insights
After reviewing the performance estimation results, we should be able to see any indications of performance change that NannyML has detected based upon the model’s inputs and outputs alone.
What’s next
The Data Drift functionality can help us to understand whether data drift is causing the performance problem. When the target values become available they can be compared with the estimated results.
You can learn more about the Confidence Based Performance Estimation and its limitations in the How it Works page.