Ranking

NannyML uses ranking to order columns in univariate drift results. The resulting order can be helpful in prioritizing what to further investigate to fully address any issues with the model being monitored.

There are currently two ranking methods in NannyML, alert count ranking and correlation ranking.

Just The Code

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_full_df = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)

>>> column_names = [
...     'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred', 'repaid'
>>> ]

>>> univ_calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     treat_as_categorical=['y_pred', 'repaid'],
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
...     chunk_size=5000
>>> )

>>> univ_calc.fit(reference_df)
>>> univariate_results = univ_calc.calculate(analysis_full_df)
>>> display(univariate_results.filter(period='analysis', column_names=['debt_to_income_ratio']).to_df())

>>> alert_count_ranker = nml.AlertCountRanker()
>>> alert_count_ranked_features = alert_count_ranker.rank(
...     univariate_results.filter(methods=['jensen_shannon']),
...     only_drifting = False)
>>> display(alert_count_ranked_features)

>>> estimated_calc = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc', 'recall'],
...     chunk_size=5000,
...     problem_type='classification_binary',
>>> )
>>> estimated_calc.fit(reference_df)
>>> estimated_perf_results = estimated_calc.estimate(analysis_full_df)
>>> display(estimated_perf_results.filter(period='analysis').to_df())

>>> realized_calc = nml.PerformanceCalculator(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     problem_type='classification_binary',
...     metrics=['roc_auc', 'recall',],
...     chunk_size=5000)
>>> realized_calc.fit(reference_df)
>>> realized_perf_results = realized_calc.calculate(analysis_full_df)
>>> display(realized_perf_results.filter(period='analysis').to_df())

>>> ranker1 = nml.CorrelationRanker()

>>> # ranker fits on one metric and reference period data only
>>> ranker1.fit(
...     estimated_perf_results.filter(period='reference', metrics=['roc_auc']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features1 = ranker1.rank(
...     univariate_results.filter(methods=['jensen_shannon']),
...     estimated_perf_results.filter(metrics=['roc_auc']),
...     only_drifting = False)

>>> display(correlation_ranked_features1)

>>> ranker2 = nml.CorrelationRanker()

>>> # ranker fits on one metric and reference period data only
>>> ranker2.fit(
...     realized_perf_results.filter(period='reference', metrics=['recall']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features2 = ranker2.rank(
...     univariate_results.filter(period='analysis', methods=['jensen_shannon']),
...     realized_perf_results.filter(period='analysis', metrics=['recall']),
...     only_drifting = False)

>>> display(correlation_ranked_features2)

Walkthrough

Ranking methods use univariate drift calculation results and performance estimation or realized performance results in order to rank features.

Note

The univariate drift calculation results need to be created or filtered in such a way so that there is only one drift method used for each feature. Similarly the performance estimation or realized performance results need to be created or filtered in such a way that only one performance metric is present in them.

Below we can see in more details how to use each ranking method.

Alert Count Ranking

Let’s look deeper in our first ranking method. Alert count ranking ranks features according to the number of alerts they generated within the ranking period. It is based on the univariate drift results of the features or data columns considered.

The first thing we need before using the alert count ranker is to create the univariate drift results.

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_full_df = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)

>>> column_names = [
...     'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred', 'repaid'
>>> ]

>>> univ_calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     treat_as_categorical=['y_pred', 'repaid'],
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
...     chunk_size=5000
>>> )

>>> univ_calc.fit(reference_df)
>>> univariate_results = univ_calc.calculate(analysis_full_df)
>>> display(univariate_results.filter(period='analysis', column_names=['debt_to_income_ratio']).to_df())

	chunk chunk chunk_index	end_date	end_index	key	period	start_date	start_index	debt_to_income_ratio kolmogorov_smirnov alert	value	jensen_shannon alert	upper_threshold	value
0	0	2018-11-30 00:27:16.848000	4999	[0:4999]	analysis	2018-10-30 18:00:00	0	False	0.01576	False	0.1	0.0316611
1	1	2018-12-30 07:03:16.848000	9999	[5000:9999]	analysis	2018-11-30 00:36:00	5000	False	0.01268	False	0.1	0.0300113
2	2	2019-01-29 13:39:16.848000	14999	[10000:14999]	analysis	2018-12-30 07:12:00	10000	False	0.01734	False	0.1	0.0311286
3	3	2019-02-28 20:15:16.848000	19999	[15000:19999]	analysis	2019-01-29 13:48:00	15000	False	0.0128	False	0.1	0.0294644
4	4	2019-03-31 02:51:16.848000	24999	[20000:24999]	analysis	2019-02-28 20:24:00	20000	False	0.01918	False	0.1	0.0308095
5	5	2019-04-30 09:27:16.848000	29999	[25000:29999]	analysis	2019-03-31 03:00:00	25000	False	0.00824	False	0.1	0.0286811
6	6	2019-05-30 16:03:16.848000	34999	[30000:34999]	analysis	2019-04-30 09:36:00	30000	False	0.01058	False	0.1	0.0436276
7	7	2019-06-29 22:39:16.848000	39999	[35000:39999]	analysis	2019-05-30 16:12:00	35000	False	0.01002	False	0.1	0.0292533
8	8	2019-07-30 05:15:16.848000	44999	[40000:44999]	analysis	2019-06-29 22:48:00	40000	False	0.01068	False	0.1	0.0306276
9	9	2019-08-29 11:51:16.848000	49999	[45000:49999]	analysis	2019-07-30 05:24:00	45000	False	0.0068	False	0.1	0.0283303

To illustrate the results we filter and display the analysis period results for debt_to_income_ratio feature. The next step is to instantiate the ranker and instruct it to rank() the provided results. Notice that the univariate results are filtered to ensure they only have one drift method per categorical and continuous feature as required.

>>> alert_count_ranker = nml.AlertCountRanker()
>>> alert_count_ranked_features = alert_count_ranker.rank(
...     univariate_results.filter(methods=['jensen_shannon']),
...     only_drifting = False)
>>> display(alert_count_ranked_features)

	number_of_alerts	column_name	rank
0	5	y_pred_proba	1
1	5	salary_range	2
2	5	repaid_loan_on_prev_car	3
3	5	loan_length	4
4	5	car_value	5
5	0	y_pred	6
6	0	size_of_downpayment	7
7	0	repaid	8
8	0	driver_tenure	9
9	0	debt_to_income_ratio	10

The alert count ranker results give a simple and concise view of features that tend to break univariate drift thresholds more than others.

Correlation Ranking

Let’s continue to the second ranking method. Correlation ranking ranks features according to how much they correlate to absolute changes in the performance metric selected.

Therefore we first need to create the performance results we will use in our ranking. The estimated performance results are created below.

>>> estimated_calc = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc', 'recall'],
...     chunk_size=5000,
...     problem_type='classification_binary',
>>> )
>>> estimated_calc.fit(reference_df)
>>> estimated_perf_results = estimated_calc.estimate(analysis_full_df)
>>> display(estimated_perf_results.filter(period='analysis').to_df())

	chunk key	chunk_index	start_index	end_index	start_date	end_date	period	roc_auc value	sampling_error	realized	upper_confidence_boundary	lower_confidence_boundary	upper_threshold	lower_threshold	alert	recall value	sampling_error	realized	upper_confidence_boundary	lower_confidence_boundary	upper_threshold	lower_threshold	alert
0	[0:4999]	0	0	4999	2018-10-30 18:00:00	2018-11-30 00:27:16.848000	analysis	0.968631	0.00181072	0.970962	0.974063	0.963198	0.97866	0.963317	False	0.928723	0.00513664	0.930394	0.944133	0.913313	0.941033	0.9171	False
1	[5000:9999]	1	5000	9999	2018-11-30 00:36:00	2018-12-30 07:03:16.848000	analysis	0.969044	0.00181072	0.970248	0.974476	0.963612	0.97866	0.963317	False	0.925261	0.00513664	0.923922	0.940671	0.909851	0.941033	0.9171	False
2	[10000:14999]	2	10000	14999	2018-12-30 07:12:00	2019-01-29 13:39:16.848000	analysis	0.969444	0.00181072	0.976282	0.974876	0.964012	0.97866	0.963317	False	0.929317	0.00513664	0.938246	0.944727	0.913907	0.941033	0.9171	False
3	[15000:19999]	3	15000	19999	2019-01-29 13:48:00	2019-02-28 20:15:16.848000	analysis	0.969047	0.00181072	0.967721	0.974479	0.963615	0.97866	0.963317	False	0.929713	0.00513664	0.92506	0.945123	0.914303	0.941033	0.9171	False
4	[20000:24999]	4	20000	24999	2019-02-28 20:24:00	2019-03-31 02:51:16.848000	analysis	0.968873	0.00181072	0.969886	0.974305	0.963441	0.97866	0.963317	False	0.930604	0.00513664	0.927577	0.946014	0.915194	0.941033	0.9171	False
5	[25000:29999]	5	25000	29999	2019-03-31 03:00:00	2019-04-30 09:27:16.848000	analysis	0.960478	0.00181072	0.96005	0.96591	0.955046	0.97866	0.963317	True	0.88399	0.00513664	0.905086	0.8994	0.86858	0.941033	0.9171	True
6	[30000:34999]	6	30000	34999	2019-04-30 09:36:00	2019-05-30 16:03:16.848000	analysis	0.961134	0.00181072	0.95853	0.966566	0.955701	0.97866	0.963317	True	0.883528	0.00513664	0.89901	0.898938	0.868118	0.941033	0.9171	True
7	[35000:39999]	7	35000	39999	2019-05-30 16:12:00	2019-06-29 22:39:16.848000	analysis	0.960536	0.00181072	0.959041	0.965968	0.955104	0.97866	0.963317	True	0.885501	0.00513664	0.901718	0.900911	0.870091	0.941033	0.9171	True
8	[40000:44999]	8	40000	44999	2019-06-29 22:48:00	2019-07-30 05:15:16.848000	analysis	0.961869	0.00181072	0.963094	0.967301	0.956437	0.97866	0.963317	True	0.885978	0.00513664	0.906124	0.901388	0.870568	0.941033	0.9171	True
9	[45000:49999]	9	45000	49999	2019-07-30 05:24:00	2019-08-29 11:51:16.848000	analysis	0.960537	0.00181072	0.957556	0.965969	0.955104	0.97866	0.963317	True	0.889808	0.00513664	0.905823	0.905218	0.874398	0.941033	0.9171	True

The analysis period estimations are shown.

The realized performance results are also created since both can be used according to the use case being addressed.

>>> realized_calc = nml.PerformanceCalculator(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     problem_type='classification_binary',
...     metrics=['roc_auc', 'recall',],
...     chunk_size=5000)
>>> realized_calc.fit(reference_df)
>>> realized_perf_results = realized_calc.calculate(analysis_full_df)
>>> display(realized_perf_results.filter(period='analysis').to_df())

	chunk key	chunk_index	start_index	end_index	start_date	end_date	period	roc_auc sampling_error	value	upper_threshold	lower_threshold	alert	recall sampling_error	value	upper_threshold	lower_threshold	alert
0	[0:4999]	0	0	4999	2018-10-30 18:00:00	2018-11-30 00:27:16.848000	analysis	0.00181072	0.970962	0.97866	0.963317	False	0.00513664	0.930394	0.941033	0.9171	False
1	[5000:9999]	1	5000	9999	2018-11-30 00:36:00	2018-12-30 07:03:16.848000	analysis	0.00181072	0.970248	0.97866	0.963317	False	0.00513664	0.923922	0.941033	0.9171	False
2	[10000:14999]	2	10000	14999	2018-12-30 07:12:00	2019-01-29 13:39:16.848000	analysis	0.00181072	0.976282	0.97866	0.963317	False	0.00513664	0.938246	0.941033	0.9171	False
3	[15000:19999]	3	15000	19999	2019-01-29 13:48:00	2019-02-28 20:15:16.848000	analysis	0.00181072	0.967721	0.97866	0.963317	False	0.00513664	0.92506	0.941033	0.9171	False
4	[20000:24999]	4	20000	24999	2019-02-28 20:24:00	2019-03-31 02:51:16.848000	analysis	0.00181072	0.969886	0.97866	0.963317	False	0.00513664	0.927577	0.941033	0.9171	False
5	[25000:29999]	5	25000	29999	2019-03-31 03:00:00	2019-04-30 09:27:16.848000	analysis	0.00181072	0.96005	0.97866	0.963317	True	0.00513664	0.905086	0.941033	0.9171	True
6	[30000:34999]	6	30000	34999	2019-04-30 09:36:00	2019-05-30 16:03:16.848000	analysis	0.00181072	0.95853	0.97866	0.963317	True	0.00513664	0.89901	0.941033	0.9171	True
7	[35000:39999]	7	35000	39999	2019-05-30 16:12:00	2019-06-29 22:39:16.848000	analysis	0.00181072	0.959041	0.97866	0.963317	True	0.00513664	0.901718	0.941033	0.9171	True
8	[40000:44999]	8	40000	44999	2019-06-29 22:48:00	2019-07-30 05:15:16.848000	analysis	0.00181072	0.963094	0.97866	0.963317	True	0.00513664	0.906124	0.941033	0.9171	True
9	[45000:49999]	9	45000	49999	2019-07-30 05:24:00	2019-08-29 11:51:16.848000	analysis	0.00181072	0.957556	0.97866	0.963317	True	0.00513664	0.905823	0.941033	0.9171	True

The analysis period results are shown.

We can now proceed to correlation ranking. Let’s correlate drift results with the estimated roc_auc. A key difference here is that after instantiation, we need to fit() the ranker with the related results from the reference period and only contain the performance metric we want the correlation ranker to use. You can read more about why this is needed on the Correlation Ranking, How it Works page. After fitting, we can rank() providing appropriately filtered univariate and performance results.

>>> ranker1 = nml.CorrelationRanker()

>>> # ranker fits on one metric and reference period data only
>>> ranker1.fit(
...     estimated_perf_results.filter(period='reference', metrics=['roc_auc']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features1 = ranker1.rank(
...     univariate_results.filter(methods=['jensen_shannon']),
...     estimated_perf_results.filter(metrics=['roc_auc']),
...     only_drifting = False)

>>> display(correlation_ranked_features1)

	column_name	pearsonr_correlation	pearsonr_pvalue	has_drifted	rank
0	repaid_loan_on_prev_car	0.99829	1.17771e-23	True	1
1	y_pred_proba	0.998072	3.47458e-23	True	2
2	loan_length	0.996876	2.66146e-21	True	3
3	salary_range	0.996512	7.16292e-21	True	4
4	car_value	0.996148	1.74676e-20	True	5
5	size_of_downpayment	0.307497	0.18722	False	6
6	debt_to_income_ratio	0.250211	0.287342	False	7
7	y_pred	0.0752823	0.752426	False	8
8	repaid	-0.117004	0.623245	False	9
9	driver_tenure	-0.134447	0.571988	False	10

Depending on circumstances it may be appropriate to consider correlation of drift results on just the analysis dataset or for different metrics. Below we can see the correlation of the same drift results with the recall results

>>> ranker2 = nml.CorrelationRanker()

>>> # ranker fits on one metric and reference period data only
>>> ranker2.fit(
...     realized_perf_results.filter(period='reference', metrics=['recall']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features2 = ranker2.rank(
...     univariate_results.filter(period='analysis', methods=['jensen_shannon']),
...     realized_perf_results.filter(period='analysis', metrics=['recall']),
...     only_drifting = False)

>>> display(correlation_ranked_features2)

	column_name	pearsonr_correlation	pearsonr_pvalue	has_drifted	rank
0	repaid_loan_on_prev_car	0.96897	3.90719e-06	True	1
1	y_pred_proba	0.966157	5.50918e-06	True	2
2	loan_length	0.965298	6.08385e-06	True	3
3	car_value	0.963623	7.33185e-06	True	4
4	salary_range	0.963456	7.46561e-06	True	5
5	size_of_downpayment	0.308948	0.385072	False	6
6	debt_to_income_ratio	0.307373	0.387627	False	7
7	y_pred	-0.357571	0.310383	False	8
8	repaid	-0.395842	0.257495	False	9
9	driver_tenure	-0.575807	0.0815202	False	10

Insights

The intended use of ranking results is to suggest prioritization of further investigation of drift results.

If other information is available, such as feature importance, they can also be used to prioritize which drifted features can be investigated.

What’s Next

More information about the specifics of how ranking works can be found on the How it Works, Ranking page.