Working with results

What are NannyML Results?

In NannyML, any calculation will return a Result object. Not returning a DataFrame directly allows NannyML to separate the concerns of storing calculation results and having users interact with them. It also means we can provide additional useful methods, such as filtering and plotting, on top of the results.

Just the code

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()

>>> column_names = [
...     col for col in reference_df.columns
...     if col not in ['timestamp', 'repaid']
>>> ]
>>> print(column_names)

>>> calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     treat_as_categorical=['y_pred'],
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)

>>> display(results.to_df())

>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))

>>> display(filtered_results.to_df())

>>> display(filtered_results.to_df(multilevel=False))

>>> results.plot().show()

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc'],
...     chunk_size=5000,
...     problem_type='classification_binary'
>>> ).fit(reference_df)
>>> est_perf_results = estimator.estimate(analysis_df)

>>> est_perf_results.compare(results.filter(methods=['chi2'], column_names=['salary_range'])).plot().show()

Walkthrough

The data structure

In order to obtain results, we first have to perform some calculation. We will start by loading the reference and analysis sample data for binary classification. Then, we will perform univariate drift detection on a number of columns whose names are printed below. Knowing the column names will help you understand this walkthrough better.

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()

>>> column_names = [
...     col for col in reference_df.columns
...     if col not in ['timestamp', 'repaid']
>>> ]
>>> print(column_names)
['car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred']

We then set up the UnivariateDriftCalculator by specifying the names of the columns to evaluate and the continuous and categorical methods we would like to use.

We then fit the calculator on our reference data. The fitted calculator is then used to evaluate drift for the analysis data, stored here as the variable results.

>>> calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     treat_as_categorical=['y_pred'],
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)

This variable is an instance of the Result class. To turn this object into a DataFrame you can use the to_df() method. Let’s see what this DataFrame looks like.

>>> display(results.to_df())

We can immediately see that a MultiLevel index is being used to store the data. There is a part containing chunk information, followed by the numerical results of the drift calculations.

In the case of the UnivariateDriftCalculator, there are two degrees of freedom. You can specify columns to include in the calculation, and each column might be evaluated by different methods.

This structure is visible in the column index. The top level represents the column names. The middle level represents the specific methods used to evaluate a column. Finally, the bottom level contains the information relevant to each method: a value, upper and lower thresholds for alerts, and whether the evaluated method crossed the thresholds for that chunk, leading to an alert.

car_value
jensen_shannon
alert
lower_threshold
upper_threshold
value
kolmogorov_smirnov
alert
lower_threshold
upper_threshold
value
chunk
chunk
chunk_index
end_date
end_index
key
period
start_date
start_index
debt_to_income_ratio
jensen_shannon
alert
lower_threshold
upper_threshold
value
kolmogorov_smirnov
alert
lower_threshold
upper_threshold
value
driver_tenure
jensen_shannon
alert
lower_threshold
upper_threshold
value
kolmogorov_smirnov
alert
lower_threshold
upper_threshold
value
loan_length
jensen_shannon
alert
lower_threshold
upper_threshold
value
kolmogorov_smirnov
alert
lower_threshold
upper_threshold
value
repaid_loan_on_prev_car
chi2
alert
lower_threshold
upper_threshold
value
jensen_shannon
alert
lower_threshold
upper_threshold
value
salary_range
chi2
alert
lower_threshold
upper_threshold
value
jensen_shannon
alert
lower_threshold
upper_threshold
value
size_of_downpayment
chi2
alert
lower_threshold
upper_threshold
value
jensen_shannon
alert
lower_threshold
upper_threshold
value
y_pred
chi2
alert
lower_threshold
upper_threshold
value
jensen_shannon
alert
lower_threshold
upper_threshold
value
y_pred_proba
jensen_shannon
alert
lower_threshold
upper_threshold
value
kolmogorov_smirnov
alert
lower_threshold
upper_threshold
value

0

False

0.1

0.0296736

False

0.0194257

0.0103

0

2018-01-31 06:27:16.848000

4999

[0:4999]

reference

2018-01-01 00:00:00

0

False

0.1

0.0333679

False

0.0185838

0.01112

False

0.1

0.0228713

False

0.0173417

0.00974

False

0.1

0.0242899

False

0.0166909

0.00818

False

0

1

0.414606

False

0.1

0.00415143

False

0

1

2.89878

False

0.1

0.010811

False

0

1

4.00124

False

0.1

0.0125401

False

0

1

0.733844

False

0.1

0.00549026

False

0.1

0.0133555

False

0.0145647

0.00922

1

False

0.1

0.0237846

False

0.0194257

0.00732

1

2018-03-02 13:03:16.848000

9999

[5000:9999]

reference

2018-01-31 06:36:00

5000

False

0.1

0.028066

False

0.0185838

0.01218

False

0.1

0.0335415

False

0.0173417

0.01186

False

0.1

0.0177897

False

0.0166909

0.00868

False

nan

nan

0.0334857

False

0.1

0.00124668

False

nan

nan

3.14439

False

0.1

0.01124

False

nan

nan

1.28891

False

0.1

0.00713799

False

nan

nan

0.983187

False

0.1

0.00634039

False

0.1

0.0211292

False

0.0145647

0.01042

2

False

0.1

0.0264685

False

0.0194257

0.00802

2

2018-04-01 19:39:16.848000

14999

[10000:14999]

reference

2018-03-02 13:12:00

10000

False

0.1

0.0225969

False

0.0185838

0.00878

False

0.1

0.029597

False

0.0173417

0.01262

False

0.1

0.0240002

False

0.0166909

0.0139

False

nan

nan

0.168656

False

0.1

0.00267997

False

nan

nan

2.45188

False

0.1

0.00980904

False

nan

nan

5.11796

False

0.1

0.0142803

False

nan

nan

0.576787

False

0.1

0.00487654

False

0.1

0.02237

False

0.0145647

0.0091

3

False

0.1

0.0217468

False

0.0194257

0.0085

3

2018-05-02 02:15:16.848000

19999

[15000:19999]

reference

2018-04-01 19:48:00

15000

False

0.1

0.0315869

False

0.0185838

0.0095

False

0.1

0.0286826

False

0.0173417

0.01056

False

0.1

0.0292131

False

0.0166909

0.0083

False

nan

nan

0.0562698

False

0.1

0.00158831

False

nan

nan

4.06262

False

0.1

0.0127697

False

nan

nan

1.84901

False

0.1

0.0085587

False

nan

nan

0.0691997

False

0.1

0.0017505

False

0.1

0.0178289

False

0.0145647

0.00872

4

False

0.1

0.024108

False

0.0194257

0.00892

4

2018-06-01 08:51:16.848000

24999

[20000:24999]

reference

2018-05-02 02:24:00

20000

False

0.1

0.0310501

False

0.0185838

0.00754

False

0.1

0.0209876

False

0.0173417

0.00922

False

0.1

0.0165946

False

0.0166909

0.00544

False

nan

nan

0.242059

False

0.1

0.00319188

False

nan

nan

2.41399

False

0.1

0.00968817

False

nan

nan

0.470551

False

0.1

0.00433131

False

nan

nan

0.325601

False

0.1

0.00368727

False

0.1

0.0216622

False

0.0145647

0.00852

5

False

0.1

0.0275587

False

0.0194257

0.01456

5

2018-07-01 15:27:16.848000

29999

[25000:29999]

reference

2018-06-01 09:00:00

25000

False

0.1

0.0316479

False

0.0185838

0.0103

False

0.1

0.0229349

False

0.0173417

0.00794

False

0.1

0.0271572

False

0.0166909

0.01112

False

nan

nan

3.61457

False

0.1

0.0120561

False

nan

nan

3.79606

False

0.1

0.0122934

False

nan

nan

0.137868

False

0.1

0.00233712

False

nan

nan

0.34437

False

0.1

0.00379022

False

0.1

0.017256

False

0.0145647

0.01028

6

False

0.1

0.0267818

False

0.0194257

0.01284

6

2018-07-31 22:03:16.848000

34999

[30000:34999]

reference

2018-07-01 15:36:00

30000

False

0.1

0.0258014

False

0.0185838

0.01094

False

0.1

0.0226753

False

0.0173417

0.0112

False

0.1

0.0259338

False

0.0166909

0.00464

False

nan

nan

0.0757052

False

0.1

0.00182666

False

nan

nan

3.22884

False

0.1

0.0112358

False

nan

nan

4.19999

False

0.1

0.0129223

False

nan

nan

0.000962674

False

0.1

0.000288895

False

0.1

0.0253217

False

0.0145647

0.01248

7

False

0.1

0.0312131

False

0.0194257

0.01348

7

2018-08-31 04:39:16.848000

39999

[35000:39999]

reference

2018-07-31 22:12:00

35000

False

0.1

0.0325098

False

0.0185838

0.01736

False

0.1

0.025517

False

0.0173417

0.0074

False

0.1

0.0185372

False

0.0166909

0.00548

False

nan

nan

0.414606

False

0.1

0.00415143

False

nan

nan

1.3933

False

0.1

0.00739444

False

nan

nan

0.716349

False

0.1

0.00533433

False

nan

nan

0.536536

False

0.1

0.00470665

False

0.1

0.0275068

False

0.0145647

0.0089

8

False

0.1

0.0273013

False

0.0194257

0.01572

8

2018-09-30 11:15:16.848000

44999

[40000:44999]

reference

2018-08-31 04:48:00

40000

False

0.1

0.0248975

False

0.0185838

0.00842

False

0.1

0.0244145

False

0.0173417

0.01458

False

0.1

0.0291086

False

0.0166909

0.01062

False

nan

nan

0.0126564

False

0.1

0.000802461

False

nan

nan

0.304785

False

0.1

0.00347061

False

nan

nan

0.596009

False

0.1

0.00485967

False

nan

nan

0.0275315

False

0.1

0.00113856

False

0.1

0.0243225

False

0.0145647

0.00768

9

False

0.1

0.0296982

False

0.0194257

0.00924

9

2018-10-30 17:51:16.848000

49999

[45000:49999]

reference

2018-09-30 11:24:00

45000

False

0.1

0.0284742

False

0.0185838

0.00786

False

0.1

0.032928

False

0.0173417

0.01304

False

0.1

0.0207199

False

0.0166909

0.00608

False

nan

nan

2.20383

False

0.1

0.00945409

False

nan

nan

2.98758

False

0.1

0.0108121

False

nan

nan

5.08023

False

0.1

0.0142629

False

nan

nan

0.167069

False

0.1

0.00266783

False

0.1

0.0303947

False

0.0145647

0.00498

10

False

0.1

0.0261935

False

0.0194257

0.01308

0

2018-11-30 00:27:16.848000

4999

[0:4999]

analysis

2018-10-30 18:00:00

0

False

0.1

0.0316611

False

0.0185838

0.01576

False

0.1

0.0309355

True

0.0173417

0.02114

False

0.1

0.0244278

False

0.0166909

0.00884

False

1.70319

False

0.1

0.0083078

False

1.03368

False

0.1

0.00639674

False

1.6025

False

0.1

0.00796199

True

5.78426

False

0.1

0.0152383

False

0.1

0.0289329

True

0.0145647

0.0253

11

False

0.1

0.0201778

False

0.0194257

0.01106

1

2018-12-30 07:03:16.848000

9999

[5000:9999]

analysis

2018-11-30 00:36:00

5000

False

0.1

0.0300113

False

0.0185838

0.01268

False

0.1

0.0383534

False

0.0173417

0.00994

False

0.1

0.0258391

False

0.0166909

0.01418

False

0.242059

False

0.1

0.00319188

False

5.76241

False

0.1

0.0153757

False

5.71897

False

0.1

0.0150859

False

1.94965

False

0.1

0.00889123

False

0.1

0.0221389

False

0.0145647

0.0123

12

False

0.1

0.0210184

False

0.0194257

0.01662

2

2019-01-29 13:39:16.848000

14999

[10000:14999]

analysis

2018-12-30 07:12:00

10000

False

0.1

0.0311286

False

0.0185838

0.01734

False

0.1

0.034176

True

0.0173417

0.02362

False

0.1

0.0293725

False

0.0166909

0.0124

False

3.17862

False

0.1

0.0113376

False

2.65396

False

0.1

0.0102823

False

2.08186

False

0.1

0.00907089

False

1.59109

False

0.1

0.00804087

False

0.1

0.0310428

True

0.0145647

0.01642

13

False

0.1

0.0363554

False

0.0194257

0.01434

3

2019-02-28 20:15:16.848000

19999

[15000:19999]

analysis

2019-01-29 13:48:00

15000

False

0.1

0.0294644

False

0.0185838

0.0128

False

0.1

0.0332968

False

0.0173417

0.0143

False

0.1

0.0290784

False

0.0166909

0.01298

False

0.0242988

False

0.1

0.00107588

False

0.0708428

False

0.1

0.00167698

False

0.489515

False

0.1

0.00440901

False

0.7808

False

0.1

0.00566028

False

0.1

0.0228333

False

0.0145647

0.01058

14

False

0.1

0.0287119

False

0.0194257

0.01116

4

2019-03-31 02:51:16.848000

24999

[20000:24999]

analysis

2019-02-28 20:24:00

20000

False

0.1

0.0308095

True

0.0185838

0.01918

False

0.1

0.0263609

False

0.0173417

0.00906

False

0.1

0.0287925

False

0.0166909

0.01022

False

0.487381

False

0.1

0.00449331

False

1.00542

False

0.1

0.00633255

False

3.15856

False

0.1

0.0112076

False

0.239784

False

0.1

0.00317755

False

0.1

0.0237474

False

0.0145647

0.01408

15

True

0.1

0.464759

True

0.0194257

0.4353

5

2019-04-30 09:27:16.848000

29999

[25000:29999]

analysis

2019-03-31 03:00:00

25000

False

0.1

0.0286811

False

0.0185838

0.00824

False

0.1

0.0288384

False

0.0173417

0.00698

True

0.1

0.233935

True

0.0166909

0.17992

True

1179.9

True

0.1

0.231198

True

455.622

True

0.1

0.183143

False

4.66135

False

0.1

0.0135741

False

0.424518

False

0.1

0.00419696

True

0.1

0.225486

True

0.0145647

0.1307

16

True

0.1

0.460057

True

0.0194257

0.43028

6

2019-05-30 16:03:16.848000

34999

[30000:34999]

analysis

2019-04-30 09:36:00

30000

False

0.1

0.0436276

False

0.0185838

0.01058

False

0.1

0.0265918

False

0.0173417

0.00826

True

0.1

0.231747

True

0.0166909

0.18032

True

1162.99

True

0.1

0.229333

True

428.633

True

0.1

0.174226

False

2.52181

False

0.1

0.0100123

False

0.0904949

False

0.1

0.00198817

True

0.1

0.208815

True

0.0145647

0.1273

17

True

0.1

0.466777

True

0.0194257

0.43772

7

2019-06-29 22:39:16.848000

39999

[35000:39999]

analysis

2019-05-30 16:12:00

35000

False

0.1

0.0292533

False

0.0185838

0.01002

False

0.1

0.0275949

False

0.0173417

0.01382

True

0.1

0.234016

True

0.0166909

0.19572

True

1170.49

True

0.1

0.230161

True

453.247

True

0.1

0.182913

False

3.41534

False

0.1

0.0116206

False

0.12587

False

0.1

0.002328

True

0.1

0.224282

True

0.0145647

0.1311

18

True

0.1

0.466199

True

0.0194257

0.43602

8

2019-07-30 05:15:16.848000

44999

[40000:44999]

analysis

2019-06-29 22:48:00

40000

False

0.1

0.0306276

False

0.0185838

0.01068

False

0.1

0.0232423

False

0.0173417

0.0088

True

0.1

0.231484

True

0.0166909

0.18212

True

1023.35

True

0.1

0.213579

True

438.26

True

0.1

0.177985

False

6.88171

False

0.1

0.0164851

False

0.313431

False

0.1

0.00362023

True

0.1

0.205352

True

0.0145647

0.1197

19

True

0.1

0.467827

True

0.0194257

0.43838

9

2019-08-29 11:51:16.848000

49999

[45000:49999]

analysis

2019-07-30 05:24:00

45000

False

0.1

0.0283303

False

0.0185838

0.0068

False

0.1

0.0279191

False

0.0173417

0.0062

True

0.1

0.24262

True

0.0166909

0.19872

True

1227.54

True

0.1

0.236408

True

474.892

True

0.1

0.19035

False

1.63759

False

0.1

0.00809379

True

5.91474

False

0.1

0.0154082

True

0.1

0.215539

True

0.0145647

0.13752

Filtering

Working with the Multilevel indexes can be very powerful yet also quite challenging. The following snippet illustrates retrieving all calculated method values from our results.

>>> print(results.to_df().loc[:, (slice(None), slice(None), 'value')].columns)
MultiIndex([(              'car_value',     'jensen_shannon', 'value'),
            (              'car_value', 'kolmogorov_smirnov', 'value'),
            (   'debt_to_income_ratio',     'jensen_shannon', 'value'),
            (   'debt_to_income_ratio', 'kolmogorov_smirnov', 'value'),
            (          'driver_tenure',     'jensen_shannon', 'value'),
            (          'driver_tenure', 'kolmogorov_smirnov', 'value'),
            (            'loan_length',     'jensen_shannon', 'value'),
            (            'loan_length', 'kolmogorov_smirnov', 'value'),
            ('repaid_loan_on_prev_car',               'chi2', 'value'),
            ('repaid_loan_on_prev_car',     'jensen_shannon', 'value'),
            (           'salary_range',               'chi2', 'value'),
            (           'salary_range',     'jensen_shannon', 'value'),
            (    'size_of_downpayment',               'chi2', 'value'),
            (    'size_of_downpayment',     'jensen_shannon', 'value'),
            (                 'y_pred',               'chi2', 'value'),
            (                 'y_pred',     'jensen_shannon', 'value'),
            (           'y_pred_proba',     'jensen_shannon', 'value'),
            (           'y_pred_proba', 'kolmogorov_smirnov', 'value')],
           )

To improve this experience, we have introduced a helper method that allows you to filter the result data to easily retrieve the information you want. Since the UnivariateDriftCalculator has two degrees of freedom, we have included both in the filter() method. Additionally, you can filter on the data period, i.e., reference or analysis.

The filter() method will return a new Result instance, allowing you to chain methods like, filter(), to_df(), and plot().

>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))
<class 'nannyml.drift.univariate.result.Result'>

When looking at the results after filtering, you can see only the chi2 data for the salary_range column during the analysis period is included.

>>> display(filtered_results.to_df())

chunk
chunk
chunk_index
end_date
end_index
key
period
start_date
start_index
salary_range
chi2
alert
lower_threshold
upper_threshold
value

0

0

2018-11-30 00:27:16.848000

4999

[0:4999]

analysis

2018-10-30 18:00:00

0

False

1.03368

1

1

2018-12-30 07:03:16.848000

9999

[5000:9999]

analysis

2018-11-30 00:36:00

5000

False

5.76241

2

2

2019-01-29 13:39:16.848000

14999

[10000:14999]

analysis

2018-12-30 07:12:00

10000

False

2.65396

3

3

2019-02-28 20:15:16.848000

19999

[15000:19999]

analysis

2019-01-29 13:48:00

15000

False

0.0708428

4

4

2019-03-31 02:51:16.848000

24999

[20000:24999]

analysis

2019-02-28 20:24:00

20000

False

1.00542

5

5

2019-04-30 09:27:16.848000

29999

[25000:29999]

analysis

2019-03-31 03:00:00

25000

True

455.622

6

6

2019-05-30 16:03:16.848000

34999

[30000:34999]

analysis

2019-04-30 09:36:00

30000

True

428.633

7

7

2019-06-29 22:39:16.848000

39999

[35000:39999]

analysis

2019-05-30 16:12:00

35000

True

453.247

8

8

2019-07-30 05:15:16.848000

44999

[40000:44999]

analysis

2019-06-29 22:48:00

40000

True

438.26

9

9

2019-08-29 11:51:16.848000

49999

[45000:49999]

analysis

2019-07-30 05:24:00

45000

True

474.892

To avoid the use of a Multilevel index, we have provided a switch in the to_df() method.

>>> display(filtered_results.to_df(multilevel=False))

chunk_index

chunk_end_date

chunk_end_index

chunk_key

chunk_period

chunk_start_date

chunk_start_index

salary_range_chi2_alert

salary_range_chi2_lower_threshold

salary_range_chi2_upper_threshold

salary_range_chi2_value

0

0

2018-11-30 00:27:16.848000

4999

[0:4999]

analysis

2018-10-30 18:00:00

0

False

1.03368

1

1

2018-12-30 07:03:16.848000

9999

[5000:9999]

analysis

2018-11-30 00:36:00

5000

False

5.76241

2

2

2019-01-29 13:39:16.848000

14999

[10000:14999]

analysis

2018-12-30 07:12:00

10000

False

2.65396

3

3

2019-02-28 20:15:16.848000

19999

[15000:19999]

analysis

2019-01-29 13:48:00

15000

False

0.0708428

4

4

2019-03-31 02:51:16.848000

24999

[20000:24999]

analysis

2019-02-28 20:24:00

20000

False

1.00542

5

5

2019-04-30 09:27:16.848000

29999

[25000:29999]

analysis

2019-03-31 03:00:00

25000

True

455.622

6

6

2019-05-30 16:03:16.848000

34999

[30000:34999]

analysis

2019-04-30 09:36:00

30000

True

428.633

7

7

2019-06-29 22:39:16.848000

39999

[35000:39999]

analysis

2019-05-30 16:12:00

35000

True

453.247

8

8

2019-07-30 05:15:16.848000

44999

[40000:44999]

analysis

2019-06-29 22:48:00

40000

True

438.26

9

9

2019-08-29 11:51:16.848000

49999

[45000:49999]

analysis

2019-07-30 05:24:00

45000

True

474.892

Plotting

Results can be visualized by using the built in plotting functionality. With a quick call of the plot() function we can create a Plotly Figure.

>>> print(filtered_results.to_df(multilevel=False).to_markdown(tablefmt="grid"))

To render it in our notebook we can call the show() method:

>>> results.plot().show()
../_images/result_plot.svg

The image can also be exported to disk by using the following snippet:

>>> results.plot().write_image(f'../_static/tutorials/working_with_results/result_plot.svg')

We might want to reduce the number of plots, since there is a lot happening on the visualization right now. Therefore, we can first apply filtering and then perform the plotting.

>>> filtered_results.plot().show()
../_images/filtered_result_plot.svg

Some result classes offer multiple ways of visualizing them. These are listed in their associated API reference docs. For example, when looking at the docs for univariate drift results, there is the default drift and the distribution kind. We can change the visualization by specifying the kind parameter.

>>> filtered_results.plot(kind='distribution').show()
../_images/distribution_plot.svg

Comparing

Another neat feature is that we can plot a comparison between multiple results. For example, suppose we want to visualize the estimated performance with respect to the univariate drift metrics for the _salary_range_ column. We will first get our estimated performance result.

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc'],
...     chunk_size=5000,
...     problem_type='classification_binary'
>>> ).fit(reference_df)
>>> est_perf_results = estimator.estimate(analysis_df)

Now we can compare our estimated performance to the univariate drift on features:

>>> est_perf_results.compare(results.filter(methods=['chi2'], column_names=['salary_range'])).plot().show()
../_images/comparison_plot1.svg

We can immediately spot how the estimated performance plummets when the Jensen-Shannon distance picks up!

Note

To reduce complexity, we only support comparing a single metric to another one.

As illustrated in the code snippet above, you can use filtering to select a single metric from your result before comparing it.

Exporting

Results can also be exported to external storage using a Writer. We currently support writing results to disk using a RawFilesWriter, serializing the Result into a Python pickle file and storing that to disk using the PickleFileWriter, or storing calculation results in a database using the DatabaseWriter. This example will show how to use the DatabaseWriter.

We construct the DatabaseWriter by providing a database connection string. Upon calling the write() method, all results will be written into the database, in this case, an SQLite database.

>>> results.plot().show()

A quick inspection shows that the database was populated and contains the univariate drift calculation results.

>>> results.plot().write_image(f'../_static/tutorials/working_with_results/result_plot.svg')
>>> filtered_results.plot().show()