Ranking

As mentioned in the ranking tutorial NannyML uses ranking to order columns in univariate drift results. The resulting order can be helpful in prioritizing what to further investigate to fully address any issues with the model. Let’s deep dive into how ranking options work.

Alert Count Ranking

The alert count ranker is quite simple. It uses the provided univariate drift results to count all the alerts a feature has for the period in question. Then it ranks the features starting ranking first the one that had the highest alert count with decreasing order.

Correlation Ranking

The correlation ranker is a bit more complex. Instead of just looking at the univariate drift results, it also uses performance results. Those can be either estimated performance results or realized performance results. It then looks for correlation between the univariate drift results and the absolute performance change and ranks higher the features with higher correlation.

Let’s look into detail how correlation ranking works to get a better understanding. We’ll use an example from Correlation Ranking Tutorial:

>>> import nannyml as nml
>>> import matplotlib.pyplot as plt
>>> from IPython.display import display

>>> reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()

>>> analysis_df = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)

>>> column_names = [
...     'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure',
>>> ]
>>> univ_calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     timestamp_column_name='timestamp',
...     continuous_methods=['jensen_shannon'],
...     categorical_methods=['jensen_shannon'],
...     chunk_size=5000
>>> )

>>> univ_calc.fit(reference_df)
>>> univariate_results = univ_calc.calculate(analysis_df)

>>> realized_calc = nml.PerformanceCalculator(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     problem_type='classification_binary',
...     metrics=['roc_auc'],
...     chunk_size=5000)
>>> realized_calc.fit(reference_df)
>>> realized_perf_results = realized_calc.calculate(analysis_df)

>>> ranker = nml.CorrelationRanker()
>>> # ranker fits on one metric and reference period data only
>>> ranker.fit(
...     realized_perf_results.filter(period='reference'))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features = ranker.rank(
...     univariate_results,
...     realized_perf_results,
...     only_drifting = False)
>>> display(correlation_ranked_features)

	column_name	pearsonr_correlation	pearsonr_pvalue	has_drifted	rank
0	repaid_loan_on_prev_car	0.92971	3.07647e-09	True	1
1	loan_length	0.926671	4.45233e-09	True	2
2	salary_range	0.921556	8.01487e-09	True	3
3	car_value	0.920795	8.71793e-09	True	4
4	debt_to_income_ratio	0.31739	0.172699	False	5
5	size_of_downpayment	0.154622	0.515113	False	6
6	driver_tenure	-0.177018	0.455305	False	7

We see that after initializing a correlation ranker, the next step is to fit() it by providing performance results from the reference period. From those results the ranker calculates the average performance during the reference period. This value is saved at the mean_perf_value property of the ranker. The we proceed with the rank() method where we provide the chosen univariate drift and performance results. The performance results are preprocessed in order to caclulate the absolute difference of observed performance values with the mean performance on reference. We can see how this transformation affects the performance values below:

>>> fig, ax1 = plt.subplots()
>>> ax2 = ax1.twinx()
>>> ax1.plot(realized_perf_results.filter(period='all', metrics=['roc_auc']).to_df()[('roc_auc', 'value')].to_numpy(), 'g-')
>>> ax2.plot(ranker.absolute_performance_change, 'b-')
>>> ax1.set_xlabel('Chunk')
>>> ax1.set_ylabel('realized performance', color='g')
>>> ax2.set_ylabel('absolute performance difference', color='b')
>>> plt.title("From realized performance to absolute performance change")
>>> plt.savefig("../_static/how-it-works/ranking-abs-perf.svg", bbox_inches='tight')

The next step is to calculate the pearson correlation between the drift results and the calculated absolute performance changes. In order to build an intuition about how the pearson correlation ranks features this way we can compare the drift values of two features, wfh_prev_workday and gas_price_per_litre with the absolute performance difference as shown in this plot below.

>>> fig, ax1 = plt.subplots()
>>> ax2 = ax1.twinx()
>>> ax2.plot(ranker.absolute_performance_change, 'b-')
>>> ax1.plot(
...     univariate_results.filter(
...         period='all', column_names=['repaid_loan_on_prev_car']
...     ).to_df().loc[
...         :, ('repaid_loan_on_prev_car', slice(None), 'value')
...     ].to_numpy().ravel(),
...     'g-', label='repaid_loan_on_prev_car'
>>> )
>>> ax1.plot(
...     univariate_results.filter(
...         period='all', column_names=['debt_to_income_ratio']
...     ).to_df().loc[
...         :, ('debt_to_income_ratio', slice(None), 'value')
...     ].to_numpy().ravel(),
...     'm-', label='debt to income ratio'
>>> )
>>> ax1.set_xlabel('Chunk')
>>> ax1.set_ylabel('drift results', color='g')
>>> ax2.set_ylabel('absolute performance difference', color='b')
>>> fig.legend(loc="center")
>>> plt.title("Drift Results vs Absolute Performance Change")
>>> plt.savefig("../_static/how-it-works/ranking-abs-perf-features-compare.svg", bbox_inches='tight')

In the results, the correlation ranker outputs not only the pearson correlation coefficient but also the associated p-value for testing non-correlation. This is done to help interpret the results if needed.