Monitoring Realized Performance for Regression

Note

The following example uses timestamps. These are optional but have an impact on the way data is chunked and results are plotted. You can read more about them in the data requirements.

Just The Code

>>> import pandas as pd
>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df = nml.load_synthetic_car_price_dataset()[0]
>>> analysis_df = nml.load_synthetic_car_price_dataset()[1]
>>> analysis_target_df = nml.load_synthetic_car_price_dataset()[2]
>>> analysis_df = analysis_df.join(analysis_target_df)

>>> display(reference_df.head(3))

>>> calc = nml.PerformanceCalculator(
...     y_pred='y_pred',
...     y_true='y_true',
...     timestamp_column_name='timestamp',
...     problem_type='regression',
...     metrics=['mae', 'mape', 'mse', 'msle', 'rmse', 'rmsle'],
...     chunk_size=6000)

>>> calc.fit(reference_df)

>>> results = calc.calculate(analysis_df)
>>> display(results.data)

>>> display(results.calculator.previous_reference_results)

>>> for metric in calc.metrics:
...     figure = results.plot(kind='performance', plot_reference=True, metric=metric)
...     figure.show()

Walkthrough

For simplicity the guide is based on a synthetic dataset where the monitored model predicts the selling price of a used car. You can learn more about this dataset.

In order to monitor a model, NannyML needs to learn about it from a reference dataset. Then it can monitor the data that is subject to actual analysis, provided as the analysis dataset. You can read more about this in our section on data periods.

The analysis_targets dataframe contains the target results of the analysis period. This is kept separate in the synthetic data because it is not used during performance estimation. But as it is required to calculate performance, the first thing to do in this case is to join the analysis target values with the analysis data.

>>> import pandas as pd
>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df = nml.load_synthetic_car_price_dataset()[0]
>>> analysis_df = nml.load_synthetic_car_price_dataset()[1]
>>> analysis_target_df = nml.load_synthetic_car_price_dataset()[2]
>>> analysis_df = analysis_df.join(analysis_target_df)

>>> display(reference_df.head(3))

car_age

km_driven

price_new

accident_count

door_count

fuel

transmission

y_true

y_pred

timestamp

0

15

144020

42810

4

3

diesel

automatic

569

1246

2017-01-24 08:00:00.000

1

12

57078

31835

3

3

electric

automatic

4277

4924

2017-01-24 08:00:33.600

2

2

76288

31851

3

5

diesel

automatic

7011

5744

2017-01-24 08:01:07.200

Next a PerformanceCalculator is created using a list of metrics to calculate (or just one metric), the data columns required for these metrics, an optional chunking specification and the type of machine learning problem being addressed.

The list of metrics specifies which performance metrics of the monitored model will be calculated. The following metrics are currently supported:

  • mae - mean absolute error

  • mape - mean absolute percentage error

  • mse - mean squared error

  • rmse - root mean squared error

  • msle - mean squared logarithmic error

  • rmsle - root mean squared logarithmic error

For more information on metrics, check the metrics module.

>>> calc = nml.PerformanceCalculator(
...     y_pred='y_pred',
...     y_true='y_true',
...     timestamp_column_name='timestamp',
...     problem_type='regression',
...     metrics=['mae', 'mape', 'mse', 'msle', 'rmse', 'rmsle'],
...     chunk_size=6000)

>>> calc.fit(reference_df)

The new PerformanceCalculator is fitted using the fit() method on the reference data.

The fitted PerformanceCalculator can then be used to calculate realized performance metrics on all data which has target values available with the calculate() method. NannyML can output a dataframe that contains all the results of the analysis data.

>>> results = calc.calculate(analysis_df)
>>> display(results.data)

key

start_index

end_index

start_date

end_date

period

targets_missing_rate

mae

mae_lower_threshold

mae_upper_threshold

mae_sampling_error

mae_alert

mape

mape_lower_threshold

mape_upper_threshold

mape_sampling_error

mape_alert

mse

mse_lower_threshold

mse_upper_threshold

mse_sampling_error

mse_alert

msle

msle_lower_threshold

msle_upper_threshold

msle_sampling_error

msle_alert

rmse

rmse_lower_threshold

rmse_upper_threshold

rmse_sampling_error

rmse_alert

rmsle

rmsle_lower_threshold

rmsle_upper_threshold

rmsle_sampling_error

rmsle_alert

0

[0:5999]

0

5999

2017-02-16 16:00:00

2017-02-18 23:59:26.400000

0

853.4

817.855

874.805

8.21576

False

0.228707

0.229456

0.237019

0.00248466

True

1.14313e+06

1.02681e+06

1.21572e+06

21915

False

0.0704883

0.0696521

0.0737091

0.0011989

False

1069.17

1014.28

1103.31

10.348

False

0.265496

0.263948

0.271511

0.002239

False

1

[6000:11999]

6000

11999

2017-02-19 00:00:00

2017-02-21 07:59:26.400000

0

853.137

817.855

874.805

8.21576

False

0.230818

0.229456

0.237019

0.00248466

False

1.13987e+06

1.02681e+06

1.21572e+06

21915

False

0.0699896

0.0696521

0.0737091

0.0011989

False

1067.65

1014.28

1103.31

10.348

False

0.264556

0.263948

0.271511

0.002239

False

2

[12000:17999]

12000

17999

2017-02-21 08:00:00

2017-02-23 15:59:26.400000

0

846.304

817.855

874.805

8.21576

False

0.229042

0.229456

0.237019

0.00248466

True

1.12872e+06

1.02681e+06

1.21572e+06

21915

False

0.0696923

0.0696521

0.0737091

0.0011989

False

1062.41

1014.28

1103.31

10.348

False

0.263993

0.263948

0.271511

0.002239

False

3

[18000:23999]

18000

23999

2017-02-23 16:00:00

2017-02-25 23:59:26.400000

0

855.495

817.855

874.805

8.21576

False

0.233624

0.229456

0.237019

0.00248466

False

1.15829e+06

1.02681e+06

1.21572e+06

21915

False

0.0719322

0.0696521

0.0737091

0.0011989

False

1076.24

1014.28

1103.31

10.348

False

0.268202

0.263948

0.271511

0.002239

False

4

[24000:29999]

24000

29999

2017-02-26 00:00:00

2017-02-28 07:59:26.400000

0

849.33

817.855

874.805

8.21576

False

0.233887

0.229456

0.237019

0.00248466

False

1.12429e+06

1.02681e+06

1.21572e+06

21915

False

0.0724877

0.0696521

0.0737091

0.0011989

False

1060.32

1014.28

1103.31

10.348

False

0.269235

0.263948

0.271511

0.002239

False

5

[30000:35999]

30000

35999

2017-02-28 08:00:00

2017-03-02 15:59:26.400000

0

702.518

817.855

874.805

8.21576

True

0.262864

0.229456

0.237019

0.00248466

True

829589

1.02681e+06

1.21572e+06

21915

True

0.104949

0.0696521

0.0737091

0.0011989

True

910.818

1014.28

1103.31

10.348

True

0.323958

0.263948

0.271511

0.002239

True

6

[36000:41999]

36000

41999

2017-03-02 16:00:00

2017-03-04 23:59:26.400000

0

700.736

817.855

874.805

8.21576

True

0.26346

0.229456

0.237019

0.00248466

True

829693

1.02681e+06

1.21572e+06

21915

True

0.104814

0.0696521

0.0737091

0.0011989

True

910.875

1014.28

1103.31

10.348

True

0.32375

0.263948

0.271511

0.002239

True

7

[42000:47999]

42000

47999

2017-03-05 00:00:00

2017-03-07 07:59:26.400000

0

684.702

817.855

874.805

8.21576

True

0.26095

0.229456

0.237019

0.00248466

True

792287

1.02681e+06

1.21572e+06

21915

True

0.104347

0.0696521

0.0737091

0.0011989

True

890.105

1014.28

1103.31

10.348

True

0.323027

0.263948

0.271511

0.002239

True

8

[48000:53999]

48000

53999

2017-03-07 08:00:00

2017-03-09 15:59:26.400000

0

705.814

817.855

874.805

8.21576

True

0.265371

0.229456

0.237019

0.00248466

True

835917

1.02681e+06

1.21572e+06

21915

True

0.104714

0.0696521

0.0737091

0.0011989

True

914.285

1014.28

1103.31

10.348

True

0.323596

0.263948

0.271511

0.002239

True

9

[54000:59999]

54000

59999

2017-03-09 16:00:00

2017-03-11 23:59:26.400000

0

698.344

817.855

874.805

8.21576

True

0.265757

0.229456

0.237019

0.00248466

True

825936

1.02681e+06

1.21572e+06

21915

True

0.105882

0.0696521

0.0737091

0.0011989

True

908.81

1014.28

1103.31

10.348

True

0.325394

0.263948

0.271511

0.002239

True

There results from the reference data are also available.

>>> display(results.calculator.previous_reference_results)

key

start_index

end_index

start_date

end_date

period

targets_missing_rate

mae

mae_lower_threshold

mae_upper_threshold

mae_sampling_error

mae_alert

mape

mape_lower_threshold

mape_upper_threshold

mape_sampling_error

mape_alert

mse

mse_lower_threshold

mse_upper_threshold

mse_sampling_error

mse_alert

msle

msle_lower_threshold

msle_upper_threshold

msle_sampling_error

msle_alert

rmse

rmse_lower_threshold

rmse_upper_threshold

rmse_sampling_error

rmse_alert

rmsle

rmsle_lower_threshold

rmsle_upper_threshold

rmsle_sampling_error

rmsle_alert

0

[0:5999]

0

5999

2017-01-24 08:00:00

2017-01-26 15:59:26.400000

0

863.932

817.855

874.805

8.21576

False

0.23274

0.229456

0.237019

0.00248466

False

1.18007e+06

1.02681e+06

1.21572e+06

21915

False

0.0715427

0.0696521

0.0737091

0.0011989

False

1086.31

1014.28

1103.31

10.348

False

0.267475

0.263948

0.271511

0.002239

False

1

[6000:11999]

6000

11999

2017-01-26 16:00:00

2017-01-28 23:59:26.400000

0

844.491

817.855

874.805

8.21576

False

0.234282

0.229456

0.237019

0.00248466

False

1.12407e+06

1.02681e+06

1.21572e+06

21915

False

0.0721316

0.0696521

0.0737091

0.0011989

False

1060.22

1014.28

1103.31

10.348

False

0.268573

0.263948

0.271511

0.002239

False

2

[12000:17999]

12000

17999

2017-01-29 00:00:00

2017-01-31 07:59:26.400000

0

830.578

817.855

874.805

8.21576

False

0.231986

0.229456

0.237019

0.00248466

False

1.07831e+06

1.02681e+06

1.21572e+06

21915

False

0.0709387

0.0696521

0.0737091

0.0011989

False

1038.42

1014.28

1103.31

10.348

False

0.266343

0.263948

0.271511

0.002239

False

3

[18000:23999]

18000

23999

2017-01-31 08:00:00

2017-02-02 15:59:26.400000

0

838.746

817.855

874.805

8.21576

False

0.231618

0.229456

0.237019

0.00248466

False

1.07827e+06

1.02681e+06

1.21572e+06

21915

False

0.0709489

0.0696521

0.0737091

0.0011989

False

1038.4

1014.28

1103.31

10.348

False

0.266362

0.263948

0.271511

0.002239

False

4

[24000:29999]

24000

29999

2017-02-02 16:00:00

2017-02-04 23:59:26.400000

0

857.765

817.855

874.805

8.21576

False

0.235091

0.229456

0.237019

0.00248466

False

1.14923e+06

1.02681e+06

1.21572e+06

21915

False

0.0727984

0.0696521

0.0737091

0.0011989

False

1072.02

1014.28

1103.31

10.348

False

0.269812

0.263948

0.271511

0.002239

False

5

[30000:35999]

30000

35999

2017-02-05 00:00:00

2017-02-07 07:59:26.400000

0

852.697

817.855

874.805

8.21576

False

0.232364

0.229456

0.237019

0.00248466

False

1.15555e+06

1.02681e+06

1.21572e+06

21915

False

0.0712554

0.0696521

0.0737091

0.0011989

False

1074.97

1014.28

1103.31

10.348

False

0.266937

0.263948

0.271511

0.002239

False

6

[36000:41999]

36000

41999

2017-02-07 08:00:00

2017-02-09 15:59:26.400000

0

842.253

817.855

874.805

8.21576

False

0.232789

0.229456

0.237019

0.00248466

False

1.12037e+06

1.02681e+06

1.21572e+06

21915

False

0.0715653

0.0696521

0.0737091

0.0011989

False

1058.48

1014.28

1103.31

10.348

False

0.267517

0.263948

0.271511

0.002239

False

7

[42000:47999]

42000

47999

2017-02-09 16:00:00

2017-02-11 23:59:26.400000

0

837.9

817.855

874.805

8.21576

False

0.235516

0.229456

0.237019

0.00248466

False

1.10396e+06

1.02681e+06

1.21572e+06

21915

False

0.0729194

0.0696521

0.0737091

0.0011989

False

1050.7

1014.28

1103.31

10.348

False

0.270036

0.263948

0.271511

0.002239

False

8

[48000:53999]

48000

53999

2017-02-12 00:00:00

2017-02-14 07:59:26.400000

0

844.266

817.855

874.805

8.21576

False

0.232423

0.229456

0.237019

0.00248466

False

1.09914e+06

1.02681e+06

1.21572e+06

21915

False

0.0711648

0.0696521

0.0737091

0.0011989

False

1048.4

1014.28

1103.31

10.348

False

0.266767

0.263948

0.271511

0.002239

False

9

[54000:59999]

54000

59999

2017-02-14 08:00:00

2017-02-16 15:59:26.400000

0

850.673

817.855

874.805

8.21576

False

0.233561

0.229456

0.237019

0.00248466

False

1.12369e+06

1.02681e+06

1.21572e+06

21915

False

0.0715405

0.0696521

0.0737091

0.0011989

False

1060.04

1014.28

1103.31

10.348

False

0.267471

0.263948

0.271511

0.002239

False

Apart from chunking and chunk and period-related columns, the results data have a set of columns for each calculated metric. When taking mae as an example:

  • targets_missing_rate - The fraction of missing target data.

  • <metric> - The value of the metric for a specific chunk.

  • <metric>_lower_threshold> and <metric>_upper_threshold> - Lower and upper thresholds for performance metric. Crossing them will raise an alert that there is a significant metric change. The thresholds are calculated based on the realized performance of chunks in the reference period. The thresholds are 3 standard deviations away from the mean performance calculated on reference chunks. They are calculated during fit phase.

  • <metric>_alert - A flag indicating potentially significant performance change. True if realized performance crosses upper or lower threshold.

  • <metric>_sampling_error - Estimated Sampling Error for the relevant metric.

The results can be plotted for visual inspection:

>>> for metric in calc.metrics:
...     figure = results.plot(kind='performance', plot_reference=True, metric=metric)
...     figure.show()
../../_images/tutorial-performance-calculation-regression-MAE.svg../../_images/tutorial-performance-calculation-regression-MAPE.svg../../_images/tutorial-performance-calculation-regression-MSE.svg../../_images/tutorial-performance-calculation-regression-MSLE.svg../../_images/tutorial-performance-calculation-regression-RMSE.svg../../_images/tutorial-performance-calculation-regression-RMSLE.svg

Insights

From looking at the RMSE and RMSLE performance results we can observe an interesting effect. We know that RMSE penalizes mispredictions symmetrically while RMSLE penalizes underprediction more than overprediction. Hence while our model has become a little bit more accurate according to RMSE, the increase in RMSLE tells us that our model is now underpredicting more than it was before!

What Next

If we decide further investigation is needed, the Data Drift functionality can help us to see what feature changes may be contributing to any performance changes.

It is also wise to check whether the model’s performance is satisfactory according to business requirements. This is an ad-hoc investigation that is not covered by NannyML.