Ranking

NannyML uses ranking to order columns in univariate drift results. The resulting order can be helpful in prioritizing what to further investigate to fully address any issues with the model being monitored.

There are currently two ranking methods in NannyML, alert count ranking and correlation ranking.

Just The Code

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_full_df = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)

>>> column_names = [
...     'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred', 'repaid'
>>> ]

>>> univ_calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     treat_as_categorical=['y_pred', 'repaid'],
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
...     chunk_size=5000
>>> )

>>> univ_calc.fit(reference_df)
>>> univariate_results = univ_calc.calculate(analysis_full_df)
>>> display(univariate_results.filter(period='analysis', column_names=['debt_to_income_ratio']).to_df())

>>> alert_count_ranker = nml.AlertCountRanker()
>>> alert_count_ranked_features = alert_count_ranker.rank(
...     univariate_results.filter(methods=['jensen_shannon']),
...     only_drifting = False)
>>> display(alert_count_ranked_features)

>>> estimated_calc = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc', 'recall'],
...     chunk_size=5000,
...     problem_type='classification_binary',
>>> )
>>> estimated_calc.fit(reference_df)
>>> estimated_perf_results = estimated_calc.estimate(analysis_full_df)
>>> display(estimated_perf_results.filter(period='analysis').to_df())

>>> realized_calc = nml.PerformanceCalculator(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     problem_type='classification_binary',
...     metrics=['roc_auc', 'recall',],
...     chunk_size=5000)
>>> realized_calc.fit(reference_df)
>>> realized_perf_results = realized_calc.calculate(analysis_full_df)
>>> display(realized_perf_results.filter(period='analysis').to_df())

>>> ranker1 = nml.CorrelationRanker()

>>> # ranker fits on one metric and reference period data only
>>> ranker1.fit(
...     estimated_perf_results.filter(period='reference', metrics=['roc_auc']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features1 = ranker1.rank(
...     univariate_results.filter(methods=['jensen_shannon']),
...     estimated_perf_results.filter(metrics=['roc_auc']),
...     only_drifting = False)

>>> display(correlation_ranked_features1)

>>> ranker2 = nml.CorrelationRanker()

>>> # ranker fits on one metric and reference period data only
>>> ranker2.fit(
...     realized_perf_results.filter(period='reference', metrics=['recall']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features2 = ranker2.rank(
...     univariate_results.filter(period='analysis', methods=['jensen_shannon']),
...     realized_perf_results.filter(period='analysis', metrics=['recall']),
...     only_drifting = False)

>>> display(correlation_ranked_features2)

Walkthrough

Ranking methods use univariate drift calculation results and performance estimation or realized performance results in order to rank features.

Note

The univariate drift calculation results need to be created or filtered in such a way so that there is only one drift method used for each feature. Similarly the performance estimation or realized performance results need to be created or filtered in such a way that only one performance metric is present in them.

Below we can see in more details how to use each ranking method.

Alert Count Ranking

Let’s look deeper in our first ranking method. Alert count ranking ranks features according to the number of alerts they generated within the ranking period. It is based on the univariate drift results of the features or data columns considered.

The first thing we need before using the alert count ranker is to create the univariate drift results.

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_full_df = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)

>>> column_names = [
...     'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred', 'repaid'
>>> ]

>>> univ_calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     treat_as_categorical=['y_pred', 'repaid'],
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
...     chunk_size=5000
>>> )

>>> univ_calc.fit(reference_df)
>>> univariate_results = univ_calc.calculate(analysis_full_df)
>>> display(univariate_results.filter(period='analysis', column_names=['debt_to_income_ratio']).to_df())

chunk
chunk
chunk_index
end_date
end_index
key
period
start_date
start_index
debt_to_income_ratio
kolmogorov_smirnov
alert
lower_threshold
upper_threshold
value
jensen_shannon
alert
lower_threshold
upper_threshold
value

0

0

2018-11-30 00:27:16.848000

4999

[0:4999]

analysis

2018-10-30 18:00:00

0

False

0.01576

False

0.1

0.0316611

1

1

2018-12-30 07:03:16.848000

9999

[5000:9999]

analysis

2018-11-30 00:36:00

5000

False

0.01268

False

0.1

0.0300113

2

2

2019-01-29 13:39:16.848000

14999

[10000:14999]

analysis

2018-12-30 07:12:00

10000

False

0.01734

False

0.1

0.0311286

3

3

2019-02-28 20:15:16.848000

19999

[15000:19999]

analysis

2019-01-29 13:48:00

15000

False

0.0128

False

0.1

0.0294644

4

4

2019-03-31 02:51:16.848000

24999

[20000:24999]

analysis

2019-02-28 20:24:00

20000

False

0.01918

False

0.1

0.0308095

5

5

2019-04-30 09:27:16.848000

29999

[25000:29999]

analysis

2019-03-31 03:00:00

25000

False

0.00824

False

0.1

0.0286811

6

6

2019-05-30 16:03:16.848000

34999

[30000:34999]

analysis

2019-04-30 09:36:00

30000

False

0.01058

False

0.1

0.0436276

7

7

2019-06-29 22:39:16.848000

39999

[35000:39999]

analysis

2019-05-30 16:12:00

35000

False

0.01002

False

0.1

0.0292533

8

8

2019-07-30 05:15:16.848000

44999

[40000:44999]

analysis

2019-06-29 22:48:00

40000

False

0.01068

False

0.1

0.0306276

9

9

2019-08-29 11:51:16.848000

49999

[45000:49999]

analysis

2019-07-30 05:24:00

45000

False

0.0068

False

0.1

0.0283303

To illustrate the results we filter and display the analysis period results for debt_to_income_ratio feature. The next step is to instantiate the ranker and instruct it to rank() the provided results. Notice that the univariate results are filtered to ensure they only have one drift method per categorical and continuous feature as required.

>>> alert_count_ranker = nml.AlertCountRanker()
>>> alert_count_ranked_features = alert_count_ranker.rank(
...     univariate_results.filter(methods=['jensen_shannon']),
...     only_drifting = False)
>>> display(alert_count_ranked_features)

number_of_alerts

column_name

rank

0

5

y_pred_proba

1

1

5

salary_range

2

2

5

repaid_loan_on_prev_car

3

3

5

loan_length

4

4

5

car_value

5

5

0

y_pred

6

6

0

size_of_downpayment

7

7

0

repaid

8

8

0

driver_tenure

9

9

0

debt_to_income_ratio

10

The alert count ranker results give a simple and concise view of features that tend to break univariate drift thresholds more than others.

Correlation Ranking

Let’s continue to the second ranking method. Correlation ranking ranks features according to how much they correlate to absolute changes in the performance metric selected.

Therefore we first need to create the performance results we will use in our ranking. The estimated performance results are created below.

>>> estimated_calc = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc', 'recall'],
...     chunk_size=5000,
...     problem_type='classification_binary',
>>> )
>>> estimated_calc.fit(reference_df)
>>> estimated_perf_results = estimated_calc.estimate(analysis_full_df)
>>> display(estimated_perf_results.filter(period='analysis').to_df())

chunk
key
chunk_index
start_index
end_index
start_date
end_date
period
roc_auc
value
sampling_error
realized
upper_confidence_boundary
lower_confidence_boundary
upper_threshold
lower_threshold
alert
recall
value
sampling_error
realized
upper_confidence_boundary
lower_confidence_boundary
upper_threshold
lower_threshold
alert

0

[0:4999]

0

0

4999

2018-10-30 18:00:00

2018-11-30 00:27:16.848000

analysis

0.968631

0.00181072

0.970962

0.974063

0.963198

0.97866

0.963317

False

0.928723

0.00513664

0.930394

0.944133

0.913313

0.941033

0.9171

False

1

[5000:9999]

1

5000

9999

2018-11-30 00:36:00

2018-12-30 07:03:16.848000

analysis

0.969044

0.00181072

0.970248

0.974476

0.963612

0.97866

0.963317

False

0.925261

0.00513664

0.923922

0.940671

0.909851

0.941033

0.9171

False

2

[10000:14999]

2

10000

14999

2018-12-30 07:12:00

2019-01-29 13:39:16.848000

analysis

0.969444

0.00181072

0.976282

0.974876

0.964012

0.97866

0.963317

False

0.929317

0.00513664

0.938246

0.944727

0.913907

0.941033

0.9171

False

3

[15000:19999]

3

15000

19999

2019-01-29 13:48:00

2019-02-28 20:15:16.848000

analysis

0.969047

0.00181072

0.967721

0.974479

0.963615

0.97866

0.963317

False

0.929713

0.00513664

0.92506

0.945123

0.914303

0.941033

0.9171

False

4

[20000:24999]

4

20000

24999

2019-02-28 20:24:00

2019-03-31 02:51:16.848000

analysis

0.968873

0.00181072

0.969886

0.974305

0.963441

0.97866

0.963317

False

0.930604

0.00513664

0.927577

0.946014

0.915194

0.941033

0.9171

False

5

[25000:29999]

5

25000

29999

2019-03-31 03:00:00

2019-04-30 09:27:16.848000

analysis

0.960478

0.00181072

0.96005

0.96591

0.955046

0.97866

0.963317

True

0.88399

0.00513664

0.905086

0.8994

0.86858

0.941033

0.9171

True

6

[30000:34999]

6

30000

34999

2019-04-30 09:36:00

2019-05-30 16:03:16.848000

analysis

0.961134

0.00181072

0.95853

0.966566

0.955701

0.97866

0.963317

True

0.883528

0.00513664

0.89901

0.898938

0.868118

0.941033

0.9171

True

7

[35000:39999]

7

35000

39999

2019-05-30 16:12:00

2019-06-29 22:39:16.848000

analysis

0.960536

0.00181072

0.959041

0.965968

0.955104

0.97866

0.963317

True

0.885501

0.00513664

0.901718

0.900911

0.870091

0.941033

0.9171

True

8

[40000:44999]

8

40000

44999

2019-06-29 22:48:00

2019-07-30 05:15:16.848000

analysis

0.961869

0.00181072

0.963094

0.967301

0.956437

0.97866

0.963317

True

0.885978

0.00513664

0.906124

0.901388

0.870568

0.941033

0.9171

True

9

[45000:49999]

9

45000

49999

2019-07-30 05:24:00

2019-08-29 11:51:16.848000

analysis

0.960537

0.00181072

0.957556

0.965969

0.955104

0.97866

0.963317

True

0.889808

0.00513664

0.905823

0.905218

0.874398

0.941033

0.9171

True

The analysis period estimations are shown.

The realized performance results are also created since both can be used according to the use case being addressed.

>>> realized_calc = nml.PerformanceCalculator(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     problem_type='classification_binary',
...     metrics=['roc_auc', 'recall',],
...     chunk_size=5000)
>>> realized_calc.fit(reference_df)
>>> realized_perf_results = realized_calc.calculate(analysis_full_df)
>>> display(realized_perf_results.filter(period='analysis').to_df())

chunk
key
chunk_index
start_index
end_index
start_date
end_date
period
targets_missing_rate
roc_auc
sampling_error
value
upper_threshold
lower_threshold
alert
recall
sampling_error
value
upper_threshold
lower_threshold
alert

0

[0:4999]

0

0

4999

2018-10-30 18:00:00

2018-11-30 00:27:16.848000

analysis

0

0.00181072

0.970962

0.97866

0.963317

False

0.00513664

0.930394

0.941033

0.9171

False

1

[5000:9999]

1

5000

9999

2018-11-30 00:36:00

2018-12-30 07:03:16.848000

analysis

0

0.00181072

0.970248

0.97866

0.963317

False

0.00513664

0.923922

0.941033

0.9171

False

2

[10000:14999]

2

10000

14999

2018-12-30 07:12:00

2019-01-29 13:39:16.848000

analysis

0

0.00181072

0.976282

0.97866

0.963317

False

0.00513664

0.938246

0.941033

0.9171

False

3

[15000:19999]

3

15000

19999

2019-01-29 13:48:00

2019-02-28 20:15:16.848000

analysis

0

0.00181072

0.967721

0.97866

0.963317

False

0.00513664

0.92506

0.941033

0.9171

False

4

[20000:24999]

4

20000

24999

2019-02-28 20:24:00

2019-03-31 02:51:16.848000

analysis

0

0.00181072

0.969886

0.97866

0.963317

False

0.00513664

0.927577

0.941033

0.9171

False

5

[25000:29999]

5

25000

29999

2019-03-31 03:00:00

2019-04-30 09:27:16.848000

analysis

0

0.00181072

0.96005

0.97866

0.963317

True

0.00513664

0.905086

0.941033

0.9171

True

6

[30000:34999]

6

30000

34999

2019-04-30 09:36:00

2019-05-30 16:03:16.848000

analysis

0

0.00181072

0.95853

0.97866

0.963317

True

0.00513664

0.89901

0.941033

0.9171

True

7

[35000:39999]

7

35000

39999

2019-05-30 16:12:00

2019-06-29 22:39:16.848000

analysis

0

0.00181072

0.959041

0.97866

0.963317

True

0.00513664

0.901718

0.941033

0.9171

True

8

[40000:44999]

8

40000

44999

2019-06-29 22:48:00

2019-07-30 05:15:16.848000

analysis

0

0.00181072

0.963094

0.97866

0.963317

True

0.00513664

0.906124

0.941033

0.9171

True

9

[45000:49999]

9

45000

49999

2019-07-30 05:24:00

2019-08-29 11:51:16.848000

analysis

0

0.00181072

0.957556

0.97866

0.963317

True

0.00513664

0.905823

0.941033

0.9171

True

The analysis period results are shown.

We can now proceed to correlation ranking. Let’s correlate drift results with the estimated roc_auc. A key difference here is that after instantiation, we need to fit() the ranker with the related results from the reference period and only contain the performance metric we want the correlation ranker to use. You can read more about why this is needed on the Correlation Ranking, How it Works page. After fitting, we can rank() providing appropriately filtered univariate and performance results.

>>> ranker1 = nml.CorrelationRanker()

>>> # ranker fits on one metric and reference period data only
>>> ranker1.fit(
...     estimated_perf_results.filter(period='reference', metrics=['roc_auc']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features1 = ranker1.rank(
...     univariate_results.filter(methods=['jensen_shannon']),
...     estimated_perf_results.filter(metrics=['roc_auc']),
...     only_drifting = False)

>>> display(correlation_ranked_features1)

column_name

pearsonr_correlation

pearsonr_pvalue

has_drifted

rank

0

repaid_loan_on_prev_car

0.99829

1.17771e-23

True

1

1

y_pred_proba

0.998072

3.47458e-23

True

2

2

loan_length

0.996876

2.66146e-21

True

3

3

salary_range

0.996512

7.16292e-21

True

4

4

car_value

0.996148

1.74676e-20

True

5

5

size_of_downpayment

0.307497

0.18722

False

6

6

debt_to_income_ratio

0.250211

0.287342

False

7

7

y_pred

0.0752823

0.752426

False

8

8

repaid

-0.117004

0.623245

False

9

9

driver_tenure

-0.134447

0.571988

False

10

Depending on circumstances it may be appropriate to consider correlation of drift results on just the analysis dataset or for different metrics. Below we can see the correlation of the same drift results with the recall results

>>> ranker2 = nml.CorrelationRanker()

>>> # ranker fits on one metric and reference period data only
>>> ranker2.fit(
...     realized_perf_results.filter(period='reference', metrics=['recall']))
>>> # ranker ranks on one drift method and one performance metric
>>> correlation_ranked_features2 = ranker2.rank(
...     univariate_results.filter(period='analysis', methods=['jensen_shannon']),
...     realized_perf_results.filter(period='analysis', metrics=['recall']),
...     only_drifting = False)

>>> display(correlation_ranked_features2)

column_name

pearsonr_correlation

pearsonr_pvalue

has_drifted

rank

0

repaid_loan_on_prev_car

0.96897

3.90719e-06

True

1

1

y_pred_proba

0.966157

5.50918e-06

True

2

2

loan_length

0.965298

6.08385e-06

True

3

3

car_value

0.963623

7.33185e-06

True

4

4

salary_range

0.963456

7.46561e-06

True

5

5

size_of_downpayment

0.308948

0.385072

False

6

6

debt_to_income_ratio

0.307373

0.387627

False

7

7

y_pred

-0.357571

0.310383

False

8

8

repaid

-0.395842

0.257495

False

9

9

driver_tenure

-0.575807

0.0815202

False

10

Insights

The intended use of ranking results is to suggest prioritization of further investigation of drift results.

If other information is available, such as feature importance, they can also be used to prioritize which drifted features can be investigated.

What’s Next

More information about the specifics of how ranking works can be found on the How it Works, Ranking page.