Ranking
NannyML uses ranking to order columns in univariate drift results. The resulting order can help prioritize what to investigate further to fully address any issues with the model being monitored.
There are currently two ranking methods in NannyML: alert count and correlation ranking.
Just The Code
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df, analysis_df, analysis_targets_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_full_df = analysis_df.merge(analysis_targets_df, left_index=True, right_index=True)
>>> column_names = [
... 'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred', 'repaid'
>>> ]
>>> univ_calc = nml.UnivariateDriftCalculator(
... column_names=column_names,
... treat_as_categorical=['y_pred', 'repaid'],
... timestamp_column_name='timestamp',
... continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
... categorical_methods=['chi2', 'jensen_shannon'],
... chunk_size=5000
>>> )
>>> univ_calc.fit(reference_df)
>>> univariate_results = univ_calc.calculate(analysis_full_df)
>>> display(univariate_results.filter(period='analysis', column_names=['debt_to_income_ratio']).to_df())
>>> alert_count_ranker = nml.AlertCountRanker()
>>> alert_count_ranked_features = alert_count_ranker.rank(
... univariate_results.filter(methods=['jensen_shannon']),
... only_drifting = False)
>>> display(alert_count_ranked_features)
>>> estimated_calc = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc', 'recall'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> )
>>> estimated_calc.fit(reference_df)
>>> estimated_perf_results = estimated_calc.estimate(analysis_full_df)
>>> display(estimated_perf_results.filter(period='analysis').to_df())
>>> realized_calc = nml.PerformanceCalculator(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... problem_type='classification_binary',
... metrics=['roc_auc', 'recall',],
... chunk_size=5000)
>>> realized_calc.fit(reference_df)
>>> realized_perf_results = realized_calc.calculate(analysis_full_df)
>>> display(realized_perf_results.filter(period='analysis').to_df())
>>> ranker1 = nml.CorrelationRanker()
>>> # ranker fits on one metric and reference period data only
>>> ranker1.fit(
... estimated_perf_results.filter(period='reference', metrics=['roc_auc']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features1 = ranker1.rank(
... univariate_results.filter(methods=['jensen_shannon']),
... estimated_perf_results.filter(metrics=['roc_auc']),
... only_drifting = False)
>>> display(correlation_ranked_features1)
>>> ranker2 = nml.CorrelationRanker()
>>> # ranker fits on one metric and reference period data only
>>> ranker2.fit(
... realized_perf_results.filter(period='reference', metrics=['recall']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features2 = ranker2.rank(
... univariate_results.filter(period='analysis', methods=['jensen_shannon']),
... realized_perf_results.filter(period='analysis', metrics=['recall']),
... only_drifting = False)
>>> display(correlation_ranked_features2)
Walkthrough
Ranking methods use univariate drift calculation results and performance estimation or realized performance results to rank features.
Note
The univariate drift calculation results need to be created or filtered so that there is only one drift method used for each feature. Similarly, the performance estimation or realized performance results need to be created or filtered so that only one performance metric is present in them.
Below we can see in more details how to use each ranking method.
Alert Count Ranking
Let’s look deeper into our first ranking method. Alert count ranking ranks features according to the number of alerts generated within the ranking period. It is based on the univariate drift results of the features or data columns considered.
The first thing we need before using the alert count ranker is to create the univariate drift results.
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df, analysis_df, analysis_targets_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_full_df = analysis_df.merge(analysis_targets_df, left_index=True, right_index=True)
>>> column_names = [
... 'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred', 'repaid'
>>> ]
>>> univ_calc = nml.UnivariateDriftCalculator(
... column_names=column_names,
... treat_as_categorical=['y_pred', 'repaid'],
... timestamp_column_name='timestamp',
... continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
... categorical_methods=['chi2', 'jensen_shannon'],
... chunk_size=5000
>>> )
>>> univ_calc.fit(reference_df)
>>> univariate_results = univ_calc.calculate(analysis_full_df)
>>> display(univariate_results.filter(period='analysis', column_names=['debt_to_income_ratio']).to_df())
chunk
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
debt_to_income_ratio
kolmogorov_smirnov
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
0.01576 |
0.0185838 |
False |
0.0316611 |
0.0393276 |
False |
||
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
0.01268 |
0.0185838 |
False |
0.0300113 |
0.0393276 |
False |
||
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
0.01734 |
0.0185838 |
False |
0.0311286 |
0.0393276 |
False |
||
3 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
0.0128 |
0.0185838 |
False |
0.0294644 |
0.0393276 |
False |
||
4 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
0.01918 |
0.0185838 |
True |
0.0308095 |
0.0393276 |
False |
||
5 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
0.00824 |
0.0185838 |
False |
0.0286811 |
0.0393276 |
False |
||
6 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
0.01058 |
0.0185838 |
False |
0.0436276 |
0.0393276 |
True |
||
7 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
0.01002 |
0.0185838 |
False |
0.0292533 |
0.0393276 |
False |
||
8 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
0.01068 |
0.0185838 |
False |
0.0306276 |
0.0393276 |
False |
||
9 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
0.0068 |
0.0185838 |
False |
0.0283303 |
0.0393276 |
False |
To illustrate the results, we filter and display the analysis period results for debt_to_income_ratio feature.
The next step is to instantiate the ranker and instruct it to rank()
the provided results. Notice that the univariate results are filtered to ensure they only have one drift method
per categorical and continuous feature as required.
>>> alert_count_ranker = nml.AlertCountRanker()
>>> alert_count_ranked_features = alert_count_ranker.rank(
... univariate_results.filter(methods=['jensen_shannon']),
... only_drifting = False)
>>> display(alert_count_ranked_features)
number_of_alerts |
column_name |
rank |
|
---|---|---|---|
0 |
6 |
car_value |
1 |
1 |
5 |
y_pred_proba |
2 |
2 |
5 |
salary_range |
3 |
3 |
5 |
repaid_loan_on_prev_car |
4 |
4 |
5 |
loan_length |
5 |
5 |
2 |
y_pred |
6 |
6 |
2 |
repaid |
7 |
7 |
1 |
debt_to_income_ratio |
8 |
8 |
0 |
size_of_downpayment |
9 |
9 |
0 |
driver_tenure |
10 |
The alert count ranker results give a simple and concise view of features that tend to break univariate drift thresholds more than others.
Correlation Ranking
Let’s continue to the second ranking method. Correlation ranking ranks features according to how much they correlate to absolute changes in the performance metric selected.
Therefore we first need to create the performance results we will use in our ranking. The estimated performance results are created below.
>>> estimated_calc = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc', 'recall'],
... chunk_size=5000,
... problem_type='classification_binary',
>>> )
>>> estimated_calc.fit(reference_df)
>>> estimated_perf_results = estimated_calc.estimate(analysis_full_df)
>>> display(estimated_perf_results.filter(period='analysis').to_df())
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
roc_auc
value
|
sampling_error
|
realized
|
upper_confidence_boundary
|
lower_confidence_boundary
|
upper_threshold
|
lower_threshold
|
alert
|
recall
value
|
sampling_error
|
realized
|
upper_confidence_boundary
|
lower_confidence_boundary
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
0.970744 |
0.00181072 |
0.970962 |
0.976176 |
0.965311 |
0.97866 |
0.963317 |
False |
0.928723 |
0.00513664 |
0.930394 |
0.944133 |
0.913313 |
0.941033 |
0.9171 |
False |
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
0.971011 |
0.00181072 |
0.970248 |
0.976443 |
0.965578 |
0.97866 |
0.963317 |
False |
0.925261 |
0.00513664 |
0.923922 |
0.940671 |
0.909851 |
0.941033 |
0.9171 |
False |
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
0.971407 |
0.00181072 |
0.976282 |
0.976839 |
0.965975 |
0.97866 |
0.963317 |
False |
0.929317 |
0.00513664 |
0.938246 |
0.944727 |
0.913907 |
0.941033 |
0.9171 |
False |
3 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
0.971091 |
0.00181072 |
0.967721 |
0.976524 |
0.965659 |
0.97866 |
0.963317 |
False |
0.929713 |
0.00513664 |
0.92506 |
0.945123 |
0.914303 |
0.941033 |
0.9171 |
False |
4 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
0.971123 |
0.00181072 |
0.969886 |
0.976555 |
0.965691 |
0.97866 |
0.963317 |
False |
0.930604 |
0.00513664 |
0.927577 |
0.946014 |
0.915194 |
0.941033 |
0.9171 |
False |
5 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
0.96109 |
0.00181072 |
0.96005 |
0.966522 |
0.955658 |
0.97866 |
0.963317 |
True |
0.88399 |
0.00513664 |
0.905086 |
0.8994 |
0.86858 |
0.941033 |
0.9171 |
True |
6 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
0.961825 |
0.00181072 |
0.95853 |
0.967257 |
0.956393 |
0.97866 |
0.963317 |
True |
0.883528 |
0.00513664 |
0.89901 |
0.898938 |
0.868118 |
0.941033 |
0.9171 |
True |
7 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
0.961073 |
0.00181072 |
0.959041 |
0.966506 |
0.955641 |
0.97866 |
0.963317 |
True |
0.885501 |
0.00513664 |
0.901718 |
0.900911 |
0.870091 |
0.941033 |
0.9171 |
True |
8 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
0.962533 |
0.00181072 |
0.963094 |
0.967966 |
0.957101 |
0.97866 |
0.963317 |
True |
0.885978 |
0.00513664 |
0.906124 |
0.901388 |
0.870568 |
0.941033 |
0.9171 |
True |
9 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
0.961316 |
0.00181072 |
0.957556 |
0.966748 |
0.955884 |
0.97866 |
0.963317 |
True |
0.889808 |
0.00513664 |
0.905823 |
0.905218 |
0.874398 |
0.941033 |
0.9171 |
True |
The analysis period estimations are shown.
The realized performance results are also created since both can be used according to the use case.
>>> realized_calc = nml.PerformanceCalculator(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... problem_type='classification_binary',
... metrics=['roc_auc', 'recall',],
... chunk_size=5000)
>>> realized_calc.fit(reference_df)
>>> realized_perf_results = realized_calc.calculate(analysis_full_df)
>>> display(realized_perf_results.filter(period='analysis').to_df())
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
targets_missing_rate
|
roc_auc
sampling_error
|
value
|
upper_threshold
|
lower_threshold
|
alert
|
recall
sampling_error
|
value
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
0 |
0.00181072 |
0.970962 |
0.97866 |
0.963317 |
False |
0.00513664 |
0.930394 |
0.941033 |
0.9171 |
False |
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
0 |
0.00181072 |
0.970248 |
0.97866 |
0.963317 |
False |
0.00513664 |
0.923922 |
0.941033 |
0.9171 |
False |
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
0 |
0.00181072 |
0.976282 |
0.97866 |
0.963317 |
False |
0.00513664 |
0.938246 |
0.941033 |
0.9171 |
False |
3 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
0 |
0.00181072 |
0.967721 |
0.97866 |
0.963317 |
False |
0.00513664 |
0.92506 |
0.941033 |
0.9171 |
False |
4 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
0 |
0.00181072 |
0.969886 |
0.97866 |
0.963317 |
False |
0.00513664 |
0.927577 |
0.941033 |
0.9171 |
False |
5 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
0 |
0.00181072 |
0.96005 |
0.97866 |
0.963317 |
True |
0.00513664 |
0.905086 |
0.941033 |
0.9171 |
True |
6 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
0 |
0.00181072 |
0.95853 |
0.97866 |
0.963317 |
True |
0.00513664 |
0.89901 |
0.941033 |
0.9171 |
True |
7 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
0 |
0.00181072 |
0.959041 |
0.97866 |
0.963317 |
True |
0.00513664 |
0.901718 |
0.941033 |
0.9171 |
True |
8 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
0 |
0.00181072 |
0.963094 |
0.97866 |
0.963317 |
True |
0.00513664 |
0.906124 |
0.941033 |
0.9171 |
True |
9 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
0 |
0.00181072 |
0.957556 |
0.97866 |
0.963317 |
True |
0.00513664 |
0.905823 |
0.941033 |
0.9171 |
True |
The analysis period results are shown.
We can now proceed to correlation ranking. First, let’s correlate drift results with the estimated roc_auc
.
A key difference here is that after instantiation, we need to fit()
the ranker with the related results from the reference period and only contain the performance metric we want
the correlation ranker to use. You can read more about why this is needed on the
Correlation Ranking, How it Works page.
Then, after fitting, we can rank()
providing appropriately
filtered univariate and performance results.
>>> ranker1 = nml.CorrelationRanker()
>>> # ranker fits on one metric and reference period data only
>>> ranker1.fit(
... estimated_perf_results.filter(period='reference', metrics=['roc_auc']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features1 = ranker1.rank(
... univariate_results.filter(methods=['jensen_shannon']),
... estimated_perf_results.filter(metrics=['roc_auc']),
... only_drifting = False)
>>> display(correlation_ranked_features1)
column_name |
pearsonr_correlation |
pearsonr_pvalue |
has_drifted |
rank |
|
---|---|---|---|---|---|
0 |
repaid_loan_on_prev_car |
0.998626 |
1.6537e-24 |
True |
1 |
1 |
y_pred_proba |
0.998586 |
2.1415e-24 |
True |
2 |
2 |
salary_range |
0.997379 |
5.48711e-22 |
True |
3 |
3 |
loan_length |
0.997314 |
6.83346e-22 |
True |
4 |
4 |
car_value |
0.997213 |
9.52945e-22 |
True |
5 |
5 |
size_of_downpayment |
0.311427 |
0.181355 |
False |
6 |
6 |
debt_to_income_ratio |
0.256911 |
0.274199 |
True |
7 |
7 |
y_pred |
0.0665713 |
0.780356 |
True |
8 |
8 |
repaid |
-0.127146 |
0.593218 |
True |
9 |
9 |
driver_tenure |
-0.141105 |
0.55292 |
False |
10 |
Depending on circumstances, it may be appropriate to consider the correlation
of drift results on just the analysis dataset or for different metrics.
Below we can see the correlation between the same drift and the recall
results.
>>> ranker2 = nml.CorrelationRanker()
>>> # ranker fits on one metric and reference period data only
>>> ranker2.fit(
... realized_perf_results.filter(period='reference', metrics=['recall']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features2 = ranker2.rank(
... univariate_results.filter(period='analysis', methods=['jensen_shannon']),
... realized_perf_results.filter(period='analysis', metrics=['recall']),
... only_drifting = False)
>>> display(correlation_ranked_features2)
column_name |
pearsonr_correlation |
pearsonr_pvalue |
has_drifted |
rank |
|
---|---|---|---|---|---|
0 |
repaid_loan_on_prev_car |
0.96897 |
3.90719e-06 |
True |
1 |
1 |
y_pred_proba |
0.966157 |
5.50918e-06 |
True |
2 |
2 |
loan_length |
0.965298 |
6.08385e-06 |
True |
3 |
3 |
car_value |
0.963623 |
7.33185e-06 |
True |
4 |
4 |
salary_range |
0.963456 |
7.46561e-06 |
True |
5 |
5 |
size_of_downpayment |
0.308948 |
0.385072 |
False |
6 |
6 |
debt_to_income_ratio |
0.307373 |
0.387627 |
True |
7 |
7 |
y_pred |
-0.357571 |
0.310383 |
True |
8 |
8 |
repaid |
-0.395842 |
0.257495 |
True |
9 |
9 |
driver_tenure |
-0.575807 |
0.0815202 |
False |
10 |
Insights
The intended use of ranking results is to suggest prioritization of further investigation of drift results.
If other information is available, such as feature importance, they can also prioritize which drifted features can be investigated.
What’s Next
More information about the specifics of how ranking works can be found on the How it Works, Ranking page.