Ranking

As mentioned in the ranking tutorial NannyML uses ranking to sort columns in univariate drift results. The resulting order can be helpful in prioritizing what to further investigate to fully address any issues with the model. Let’s deep dive into how ranking options work.

Alert Count Ranking

The alert count ranker is quite simple. First, it uses the provided univariate drift results to count all the alerts a feature has for the period in question. And then, it ranks the features starting with the one with the highest alert count and continues in decreasing order.

Correlation Ranking

The correlation ranker is a bit more complex. Instead of just looking at the univariate drift results, it also uses performance results. Those can be either Estimated Performance results or Realized Performance results. It then looks for correlation between the univariate drift results and the absolute performance change and ranks higher the features with higher correlation.

Let’s look into detail how correlation ranking works to get a better understanding. We’ll use an example from the Correlation Ranking Tutorial:

>>> import nannyml as nml
>>> import matplotlib.pyplot as plt
>>> from IPython.display import display

>>> reference_df, analysis_df, analysis_targets_df = nml.load_synthetic_car_loan_dataset()

>>> analysis_df = analysis_df.merge(analysis_targets_df, left_index=True, right_index=True)

>>> column_names = [
...     'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure',
>>> ]
>>> univ_calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     timestamp_column_name='timestamp',
...     continuous_methods=['jensen_shannon'],
...     categorical_methods=['jensen_shannon'],
...     chunk_size=5000
>>> )

>>> univ_calc.fit(reference_df)
>>> univariate_results = univ_calc.calculate(analysis_df)

>>> realized_calc = nml.PerformanceCalculator(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     problem_type='classification_binary',
...     metrics=['roc_auc'],
...     chunk_size=5000)
>>> realized_calc.fit(reference_df)
>>> realized_perf_results = realized_calc.calculate(analysis_df)

>>> ranker = nml.CorrelationRanker()
>>> # ranker fits on one metric and reference period data only
>>> ranker.fit(
...     realized_perf_results.filter(period='reference'))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features = ranker.rank(
...     univariate_results,
...     realized_perf_results,
...     only_drifting = False)
>>> display(correlation_ranked_features)

	column_name	pearsonr_correlation	pearsonr_pvalue	has_drifted	rank
0	repaid_loan_on_prev_car	0.92971	3.07647e-09	True	1
1	loan_length	0.926671	4.45233e-09	True	2
2	salary_range	0.921556	8.01487e-09	True	3
3	car_value	0.920795	8.71793e-09	True	4
4	debt_to_income_ratio	0.31739	0.172699	True	5
5	size_of_downpayment	0.154622	0.515113	False	6
6	driver_tenure	-0.177018	0.455305	False	7

We see that after initializing the CorrelationRanker correlation ranker, the next step is to fit() it by providing performance results from the reference period. From those results, the ranker calculates the average performance during the reference period. This value is saved at the mean_reference_performance property of the ranker.

Then we proceed with the rank() method where we provide the chosen univariate drift and performance results. The performance results are preprocessed in order to calculate the absolute difference of observed performance values with the mean performance on reference. We can see how this transformation affects the performance values below:

>>> fig, ax1 = plt.subplots()
>>> ax2 = ax1.twinx()
>>> ax1.plot(realized_perf_results.filter(period='all', metrics=['roc_auc']).to_df()[('roc_auc', 'value')].to_numpy(), 'g-')
>>> ax2.plot(ranker.absolute_performance_change, 'b-')
>>> ax1.set_xlabel('Chunk')
>>> ax1.set_ylabel('realized performance', color='g')
>>> ax2.set_ylabel('absolute performance difference', color='b')
>>> plt.title("From realized performance to absolute performance change")
>>> plt.savefig("../_static/how-it-works/ranking-abs-perf.svg", bbox_inches='tight')

The next step is to calculate the pearson correlation between the drift results and the calculated absolute performance changes.

In order to build an intuition about how the pearson correlation ranks features this way we can compare the drift values of two features, wfh_prev_workday and gas_price_per_litre with the absolute performance difference as shown in this plot below.

>>> fig, ax1 = plt.subplots()
>>> ax2 = ax1.twinx()
>>> ax2.plot(ranker.absolute_performance_change, 'b-')
>>> ax1.plot(
...     univariate_results.filter(
...         period='all', column_names=['repaid_loan_on_prev_car']
...     ).to_df().loc[
...         :, ('repaid_loan_on_prev_car', slice(None), 'value')
...     ].to_numpy().ravel(),
...     'g-', label='repaid_loan_on_prev_car'
>>> )
>>> ax1.plot(
...     univariate_results.filter(
...         period='all', column_names=['debt_to_income_ratio']
...     ).to_df().loc[
...         :, ('debt_to_income_ratio', slice(None), 'value')
...     ].to_numpy().ravel(),
...     'm-', label='debt to income ratio'
>>> )
>>> ax1.set_xlabel('Chunk')
>>> ax1.set_ylabel('drift results', color='g')
>>> ax2.set_ylabel('absolute performance difference', color='b')
>>> fig.legend(loc="center")
>>> plt.title("Drift Results vs Absolute Performance Change")
>>> plt.savefig("../_static/how-it-works/ranking-abs-perf-features-compare.svg", bbox_inches='tight')

In the results, the correlation ranker outputs not only the pearson correlation coefficient but also the associated p-value for testing non-correlation. This is done to help interpret the results if needed.