Working with results

What are NannyML Results?

In NannyML, any calculation will return a Result object. Not returning a DataFrame directly allows NannyML to separate the concerns of storing calculation results and having users interact with them. It also means we can provide additional useful methods, such as filtering and plotting, on top of the results.

Just the code

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()

>>> column_names = [
...     col for col in reference_df.columns
...     if col not in ['timestamp', 'repaid']
>>> ]
>>> print(column_names)

>>> calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     treat_as_categorical=['y_pred'],
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)

>>> display(results.to_df())

>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))

>>> display(filtered_results.to_df())

>>> display(filtered_results.to_df(multilevel=False))

>>> results.plot().show()

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc'],
...     chunk_size=5000,
...     problem_type='classification_binary'
>>> ).fit(reference_df)
>>> est_perf_results = estimator.estimate(analysis_df)

>>> est_perf_results.compare(results.filter(methods=['chi2'], column_names=['salary_range'])).plot().show()

Walkthrough

The data structure

In order to obtain results, we first have to perform some calculation. We will start by loading the reference and analysis sample data for binary classification. Then, we will perform univariate drift detection on a number of columns whose names are printed below. Knowing the column names will help you understand this walkthrough better.

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()

>>> column_names = [
...     col for col in reference_df.columns
...     if col not in ['timestamp', 'repaid']
>>> ]
>>> print(column_names)
['id', 'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred']

We then set up the UnivariateDriftCalculator by specifying the names of the columns to evaluate and the continuous and categorical methods we would like to use.

We then fit the calculator on our reference data. The fitted calculator is then used to evaluate drift for the analysis data, stored here as the variable results.

>>> calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     treat_as_categorical=['y_pred'],
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)

This variable is an instance of the Result class. To turn this object into a DataFrame you can use the to_df() method. Let’s see what this DataFrame looks like.

>>> display(results.to_df())

We can immediately see that a MultiLevel index is being used to store the data. There is a part containing chunk information, followed by the numerical results of the drift calculations.

In the case of the UnivariateDriftCalculator, there are two degrees of freedom. You can specify columns to include in the calculation, and each column might be evaluated by different methods.

This structure is visible in the column index. The top level represents the column names. The middle level represents the specific methods used to evaluate a column. Finally, the bottom level contains the information relevant to each method: a value, upper and lower thresholds for alerts, and whether the evaluated method crossed the thresholds for that chunk, leading to an alert.

chunk
chunk
key
chunk_index
start_index
end_index
start_date
end_date
period
id
kolmogorov_smirnov
value
upper_threshold
lower_threshold
alert
jensen_shannon
value
upper_threshold
lower_threshold
alert
car_value
kolmogorov_smirnov
value
upper_threshold
lower_threshold
alert
jensen_shannon
value
upper_threshold
lower_threshold
alert
debt_to_income_ratio
kolmogorov_smirnov
value
upper_threshold
lower_threshold
alert
jensen_shannon
value
upper_threshold
lower_threshold
alert
loan_length
kolmogorov_smirnov
value
upper_threshold
lower_threshold
alert
jensen_shannon
value
upper_threshold
lower_threshold
alert
driver_tenure
kolmogorov_smirnov
value
upper_threshold
lower_threshold
alert
jensen_shannon
value
upper_threshold
lower_threshold
alert
y_pred_proba
kolmogorov_smirnov
value
upper_threshold
lower_threshold
alert
jensen_shannon
value
upper_threshold
lower_threshold
alert
salary_range
chi2
value
upper_threshold
lower_threshold
alert
jensen_shannon
value
upper_threshold
lower_threshold
alert
repaid_loan_on_prev_car
chi2
value
upper_threshold
lower_threshold
alert
jensen_shannon
value
upper_threshold
lower_threshold
alert
size_of_downpayment
chi2
value
upper_threshold
lower_threshold
alert
jensen_shannon
value
upper_threshold
lower_threshold
alert
y_pred
chi2
value
upper_threshold
lower_threshold
alert
jensen_shannon
value
upper_threshold
lower_threshold
alert

0

[0:4999]

0

0

4999

2018-01-01 00:00:00

2018-01-31 06:27:16.848000

reference

0.9

1

False

0.854338

0.888887

False

0.0103

0.0194257

False

0.0296736

0.0352619

False

0.01112

0.0185838

False

0.0333679

0.0393276

False

0.00818

0.0166909

False

0.0242899

0.0366378

False

0.00974

0.0173417

False

0.0228713

0.039192

False

0.00922

0.0145647

False

0.0133555

0.0365875

False

2.89878

False

0.010811

0.0177088

False

0.414606

False

0.00415143

0.0147343

False

4.00124

False

0.0125401

0.0214812

False

0.733844

False

0.00549026

0.00909686

False

1

[5000:9999]

1

5000

9999

2018-01-31 06:36:00

2018-03-02 13:03:16.848000

reference

0.8

1

False

0.810064

0.888887

False

0.00732

0.0194257

False

0.0237846

0.0352619

False

0.01218

0.0185838

False

0.028066

0.0393276

False

0.00868

0.0166909

False

0.0177897

0.0366378

False

0.01186

0.0173417

False

0.0335415

0.039192

False

0.01042

0.0145647

False

0.0211292

0.0365875

False

3.14439

False

0.01124

0.0177088

False

0.0334857

False

0.00124668

0.0147343

False

1.28891

False

0.00713799

0.0214812

False

0.983187

False

0.00634039

0.00909686

False

2

[10000:14999]

2

10000

14999

2018-03-02 13:12:00

2018-04-01 19:39:16.848000

reference

0.7

1

False

0.822917

0.888887

False

0.00802

0.0194257

False

0.0264685

0.0352619

False

0.00878

0.0185838

False

0.0225969

0.0393276

False

0.0139

0.0166909

False

0.0240002

0.0366378

False

0.01262

0.0173417

False

0.029597

0.039192

False

0.0091

0.0145647

False

0.02237

0.0365875

False

2.45188

False

0.00980904

0.0177088

False

0.168656

False

0.00267997

0.0147343

False

5.11796

False

0.0142803

0.0214812

False

0.576787

False

0.00487654

0.00909686

False

3

[15000:19999]

3

15000

19999

2018-04-01 19:48:00

2018-05-02 02:15:16.848000

reference

0.6

1

False

0.853731

0.888887

False

0.0085

0.0194257

False

0.0217468

0.0352619

False

0.0095

0.0185838

False

0.0315869

0.0393276

False

0.0083

0.0166909

False

0.0292131

0.0366378

False

0.01056

0.0173417

False

0.0286826

0.039192

False

0.00872

0.0145647

False

0.0178289

0.0365875

False

4.06262

False

0.0127697

0.0177088

False

0.0562698

False

0.00158831

0.0147343

False

1.84901

False

0.0085587

0.0214812

False

0.0691997

False

0.0017505

0.00909686

False

4

[20000:24999]

4

20000

24999

2018-05-02 02:24:00

2018-06-01 08:51:16.848000

reference

0.5

1

False

0.813675

0.888887

False

0.00892

0.0194257

False

0.024108

0.0352619

False

0.00754

0.0185838

False

0.0310501

0.0393276

False

0.00544

0.0166909

False

0.0165946

0.0366378

False

0.00922

0.0173417

False

0.0209876

0.039192

False

0.00852

0.0145647

False

0.0216622

0.0365875

False

2.41399

False

0.00968817

0.0177088

False

0.242059

False

0.00319188

0.0147343

False

0.470551

False

0.00433131

0.0214812

False

0.325601

False

0.00368727

0.00909686

False

5

[25000:29999]

5

25000

29999

2018-06-01 09:00:00

2018-07-01 15:27:16.848000

reference

0.5

1

False

0.813675

0.888887

False

0.01456

0.0194257

False

0.0275587

0.0352619

False

0.0103

0.0185838

False

0.0316479

0.0393276

False

0.01112

0.0166909

False

0.0271572

0.0366378

False

0.00794

0.0173417

False

0.0229349

0.039192

False

0.01028

0.0145647

False

0.017256

0.0365875

False

3.79606

False

0.0122934

0.0177088

False

3.61457

False

0.0120561

0.0147343

False

0.137868

False

0.00233712

0.0214812

False

0.34437

False

0.00379022

0.00909686

False

6

[30000:34999]

6

30000

34999

2018-07-01 15:36:00

2018-07-31 22:03:16.848000

reference

0.6

1

False

0.853731

0.888887

False

0.01284

0.0194257

False

0.0267818

0.0352619

False

0.01094

0.0185838

False

0.0258014

0.0393276

False

0.00464

0.0166909

False

0.0259338

0.0366378

False

0.0112

0.0173417

False

0.0226753

0.039192

False

0.01248

0.0145647

False

0.0253217

0.0365875

False

3.22884

False

0.0112358

0.0177088

False

0.0757052

False

0.00182666

0.0147343

False

4.19999

False

0.0129223

0.0214812

False

0.000962674

False

0.000288895

0.00909686

False

7

[35000:39999]

7

35000

39999

2018-07-31 22:12:00

2018-08-31 04:39:16.848000

reference

0.7

1

False

0.822917

0.888887

False

0.01348

0.0194257

False

0.0312131

0.0352619

False

0.01736

0.0185838

False

0.0325098

0.0393276

False

0.00548

0.0166909

False

0.0185372

0.0366378

False

0.0074

0.0173417

False

0.025517

0.039192

False

0.0089

0.0145647

False

0.0275068

0.0365875

False

1.3933

False

0.00739444

0.0177088

False

0.414606

False

0.00415143

0.0147343

False

0.716349

False

0.00533433

0.0214812

False

0.536536

False

0.00470665

0.00909686

False

8

[40000:44999]

8

40000

44999

2018-08-31 04:48:00

2018-09-30 11:15:16.848000

reference

0.8

1

False

0.810064

0.888887

False

0.01572

0.0194257

False

0.0273013

0.0352619

False

0.00842

0.0185838

False

0.0248975

0.0393276

False

0.01062

0.0166909

False

0.0291086

0.0366378

False

0.01458

0.0173417

False

0.0244145

0.039192

False

0.00768

0.0145647

False

0.0243225

0.0365875

False

0.304785

False

0.00347061

0.0177088

False

0.0126564

False

0.000802461

0.0147343

False

0.596009

False

0.00485967

0.0214812

False

0.0275315

False

0.00113856

0.00909686

False

9

[45000:49999]

9

45000

49999

2018-09-30 11:24:00

2018-10-30 17:51:16.848000

reference

0.9

1

False

0.854338

0.888887

False

0.00924

0.0194257

False

0.0296982

0.0352619

False

0.00786

0.0185838

False

0.0284742

0.0393276

False

0.00608

0.0166909

False

0.0207199

0.0366378

False

0.01304

0.0173417

False

0.032928

0.039192

False

0.00498

0.0145647

False

0.0303947

0.0365875

False

2.98758

False

0.0108121

0.0177088

False

2.20383

False

0.00945409

0.0147343

False

5.08023

False

0.0142629

0.0214812

False

0.167069

False

0.00266783

0.00909686

False

10

[0:4999]

0

0

4999

2018-10-30 18:00:00

2018-11-30 00:27:16.848000

analysis

1

1

False

1

0.888887

True

0.01308

0.0194257

False

0.0261935

0.0352619

False

0.01576

0.0185838

False

0.0316611

0.0393276

False

0.00884

0.0166909

False

0.0244278

0.0366378

False

0.02114

0.0173417

True

0.0309355

0.039192

False

0.0253

0.0145647

True

0.0289329

0.0365875

False

1.03368

False

0.00639674

0.0177088

False

1.70319

False

0.0083078

0.0147343

False

1.6025

False

0.00796199

0.0214812

False

5.78426

True

0.0152383

0.00909686

True

11

[5000:9999]

1

5000

9999

2018-11-30 00:36:00

2018-12-30 07:03:16.848000

analysis

1

1

False

1

0.888887

True

0.01106

0.0194257

False

0.0201778

0.0352619

False

0.01268

0.0185838

False

0.0300113

0.0393276

False

0.01418

0.0166909

False

0.0258391

0.0366378

False

0.00994

0.0173417

False

0.0383534

0.039192

False

0.0123

0.0145647

False

0.0221389

0.0365875

False

5.76241

False

0.0153757

0.0177088

False

0.242059

False

0.00319188

0.0147343

False

5.71897

False

0.0150859

0.0214812

False

1.94965

False

0.00889123

0.00909686

False

12

[10000:14999]

2

10000

14999

2018-12-30 07:12:00

2019-01-29 13:39:16.848000

analysis

1

1

False

1

0.888887

True

0.01662

0.0194257

False

0.0210184

0.0352619

False

0.01734

0.0185838

False

0.0311286

0.0393276

False

0.0124

0.0166909

False

0.0293725

0.0366378

False

0.02362

0.0173417

True

0.034176

0.039192

False

0.01642

0.0145647

True

0.0310428

0.0365875

False

2.65396

False

0.0102823

0.0177088

False

3.17862

False

0.0113376

0.0147343

False

2.08186

False

0.00907089

0.0214812

False

1.59109

False

0.00804087

0.00909686

False

13

[15000:19999]

3

15000

19999

2019-01-29 13:48:00

2019-02-28 20:15:16.848000

analysis

1

1

False

1

0.888887

True

0.01434

0.0194257

False

0.0363554

0.0352619

True

0.0128

0.0185838

False

0.0294644

0.0393276

False

0.01298

0.0166909

False

0.0290784

0.0366378

False

0.0143

0.0173417

False

0.0332968

0.039192

False

0.01058

0.0145647

False

0.0228333

0.0365875

False

0.0708428

False

0.00167698

0.0177088

False

0.0242988

False

0.00107588

0.0147343

False

0.489515

False

0.00440901

0.0214812

False

0.7808

False

0.00566028

0.00909686

False

14

[20000:24999]

4

20000

24999

2019-02-28 20:24:00

2019-03-31 02:51:16.848000

analysis

1

1

False

1

0.888887

True

0.01116

0.0194257

False

0.0287119

0.0352619

False

0.01918

0.0185838

True

0.0308095

0.0393276

False

0.01022

0.0166909

False

0.0287925

0.0366378

False

0.00906

0.0173417

False

0.0263609

0.039192

False

0.01408

0.0145647

False

0.0237474

0.0365875

False

1.00542

False

0.00633255

0.0177088

False

0.487381

False

0.00449331

0.0147343

False

3.15856

False

0.0112076

0.0214812

False

0.239784

False

0.00317755

0.00909686

False

15

[25000:29999]

5

25000

29999

2019-03-31 03:00:00

2019-04-30 09:27:16.848000

analysis

1

1

False

1

0.888887

True

0.4353

0.0194257

True

0.464759

0.0352619

True

0.00824

0.0185838

False

0.0286811

0.0393276

False

0.17992

0.0166909

True

0.233935

0.0366378

True

0.00698

0.0173417

False

0.0288384

0.039192

False

0.1307

0.0145647

True

0.225486

0.0365875

True

455.622

True

0.183143

0.0177088

True

1179.9

True

0.231198

0.0147343

True

4.66135

False

0.0135741

0.0214812

False

0.424518

False

0.00419696

0.00909686

False

16

[30000:34999]

6

30000

34999

2019-04-30 09:36:00

2019-05-30 16:03:16.848000

analysis

1

1

False

1

0.888887

True

0.43028

0.0194257

True

0.460057

0.0352619

True

0.01058

0.0185838

False

0.0436276

0.0393276

True

0.18032

0.0166909

True

0.231747

0.0366378

True

0.00826

0.0173417

False

0.0265918

0.039192

False

0.1273

0.0145647

True

0.208815

0.0365875

True

428.633

True

0.174226

0.0177088

True

1162.99

True

0.229333

0.0147343

True

2.52181

False

0.0100123

0.0214812

False

0.0904949

False

0.00198817

0.00909686

False

17

[35000:39999]

7

35000

39999

2019-05-30 16:12:00

2019-06-29 22:39:16.848000

analysis

1

1

False

1

0.888887

True

0.43772

0.0194257

True

0.466777

0.0352619

True

0.01002

0.0185838

False

0.0292533

0.0393276

False

0.19572

0.0166909

True

0.234016

0.0366378

True

0.01382

0.0173417

False

0.0275949

0.039192

False

0.1311

0.0145647

True

0.224282

0.0365875

True

453.247

True

0.182913

0.0177088

True

1170.49

True

0.230161

0.0147343

True

3.41534

False

0.0116206

0.0214812

False

0.12587

False

0.002328

0.00909686

False

18

[40000:44999]

8

40000

44999

2019-06-29 22:48:00

2019-07-30 05:15:16.848000

analysis

1

1

False

1

0.888887

True

0.43602

0.0194257

True

0.466199

0.0352619

True

0.01068

0.0185838

False

0.0306276

0.0393276

False

0.18212

0.0166909

True

0.231484

0.0366378

True

0.0088

0.0173417

False

0.0232423

0.039192

False

0.1197

0.0145647

True

0.205352

0.0365875

True

438.26

True

0.177985

0.0177088

True

1023.35

True

0.213579

0.0147343

True

6.88171

False

0.0164851

0.0214812

False

0.313431

False

0.00362023

0.00909686

False

19

[45000:49999]

9

45000

49999

2019-07-30 05:24:00

2019-08-29 11:51:16.848000

analysis

1

1

False

1

0.888887

True

0.43838

0.0194257

True

0.467827

0.0352619

True

0.0068

0.0185838

False

0.0283303

0.0393276

False

0.19872

0.0166909

True

0.24262

0.0366378

True

0.0062

0.0173417

False

0.0279191

0.039192

False

0.13752

0.0145647

True

0.215539

0.0365875

True

474.892

True

0.19035

0.0177088

True

1227.54

True

0.236408

0.0147343

True

1.63759

False

0.00809379

0.0214812

False

5.91474

True

0.0154082

0.00909686

True

Filtering

Working with the Multilevel indexes can be very powerful yet also quite challenging. The following snippet illustrates retrieving all calculated method values from our results.

>>> print(results.to_df().loc[:, (slice(None), slice(None), 'value')].columns)
MultiIndex([(                     'id', 'kolmogorov_smirnov', 'value'),
            (                     'id',     'jensen_shannon', 'value'),
            (              'car_value', 'kolmogorov_smirnov', 'value'),
            (              'car_value',     'jensen_shannon', 'value'),
            (   'debt_to_income_ratio', 'kolmogorov_smirnov', 'value'),
            (   'debt_to_income_ratio',     'jensen_shannon', 'value'),
            (            'loan_length', 'kolmogorov_smirnov', 'value'),
            (            'loan_length',     'jensen_shannon', 'value'),
            (          'driver_tenure', 'kolmogorov_smirnov', 'value'),
            (          'driver_tenure',     'jensen_shannon', 'value'),
            (           'y_pred_proba', 'kolmogorov_smirnov', 'value'),
            (           'y_pred_proba',     'jensen_shannon', 'value'),
            (           'salary_range',               'chi2', 'value'),
            (           'salary_range',     'jensen_shannon', 'value'),
            ('repaid_loan_on_prev_car',               'chi2', 'value'),
            ('repaid_loan_on_prev_car',     'jensen_shannon', 'value'),
            (    'size_of_downpayment',               'chi2', 'value'),
            (    'size_of_downpayment',     'jensen_shannon', 'value'),
            (                 'y_pred',               'chi2', 'value'),
            (                 'y_pred',     'jensen_shannon', 'value')],
           )

To improve this experience, we have introduced a helper method that allows you to filter the result data to easily retrieve the information you want. Since the UnivariateDriftCalculator has two degrees of freedom, we have included both in the filter() method. Additionally, you can filter on the data period, i.e., reference or analysis.

The filter() method will return a new Result instance, allowing you to chain methods like, filter(), to_df(), and plot().

>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))
<class 'nannyml.drift.univariate.result.Result'>

When looking at the results after filtering, you can see only the chi2 data for the salary_range column during the analysis period is included.

>>> display(filtered_results.to_df())

chunk
chunk
key
chunk_index
start_index
end_index
start_date
end_date
period
salary_range
chi2
value
upper_threshold
lower_threshold
alert

0

[0:4999]

0

0

4999

2018-10-30 18:00:00

2018-11-30 00:27:16.848000

analysis

1.03368

False

1

[5000:9999]

1

5000

9999

2018-11-30 00:36:00

2018-12-30 07:03:16.848000

analysis

5.76241

False

2

[10000:14999]

2

10000

14999

2018-12-30 07:12:00

2019-01-29 13:39:16.848000

analysis

2.65396

False

3

[15000:19999]

3

15000

19999

2019-01-29 13:48:00

2019-02-28 20:15:16.848000

analysis

0.0708428

False

4

[20000:24999]

4

20000

24999

2019-02-28 20:24:00

2019-03-31 02:51:16.848000

analysis

1.00542

False

5

[25000:29999]

5

25000

29999

2019-03-31 03:00:00

2019-04-30 09:27:16.848000

analysis

455.622

True

6

[30000:34999]

6

30000

34999

2019-04-30 09:36:00

2019-05-30 16:03:16.848000

analysis

428.633

True

7

[35000:39999]

7

35000

39999

2019-05-30 16:12:00

2019-06-29 22:39:16.848000

analysis

453.247

True

8

[40000:44999]

8

40000

44999

2019-06-29 22:48:00

2019-07-30 05:15:16.848000

analysis

438.26

True

9

[45000:49999]

9

45000

49999

2019-07-30 05:24:00

2019-08-29 11:51:16.848000

analysis

474.892

True

To avoid the use of a Multilevel index, we have provided a switch in the to_df() method.

>>> display(filtered_results.to_df(multilevel=False))

chunk_key

chunk_index

chunk_start_index

chunk_end_index

chunk_start_date

chunk_end_date

chunk_period

salary_range_chi2_value

salary_range_chi2_upper_threshold

salary_range_chi2_lower_threshold

salary_range_chi2_alert

0

[0:4999]

0

0

4999

2018-10-30 18:00:00

2018-11-30 00:27:16.848000

analysis

1.03368

False

1

[5000:9999]

1

5000

9999

2018-11-30 00:36:00

2018-12-30 07:03:16.848000

analysis

5.76241

False

2

[10000:14999]

2

10000

14999

2018-12-30 07:12:00

2019-01-29 13:39:16.848000

analysis

2.65396

False

3

[15000:19999]

3

15000

19999

2019-01-29 13:48:00

2019-02-28 20:15:16.848000

analysis

0.0708428

False

4

[20000:24999]

4

20000

24999

2019-02-28 20:24:00

2019-03-31 02:51:16.848000

analysis

1.00542

False

5

[25000:29999]

5

25000

29999

2019-03-31 03:00:00

2019-04-30 09:27:16.848000

analysis

455.622

True

6

[30000:34999]

6

30000

34999

2019-04-30 09:36:00

2019-05-30 16:03:16.848000

analysis

428.633

True

7

[35000:39999]

7

35000

39999

2019-05-30 16:12:00

2019-06-29 22:39:16.848000

analysis

453.247

True

8

[40000:44999]

8

40000

44999

2019-06-29 22:48:00

2019-07-30 05:15:16.848000

analysis

438.26

True

9

[45000:49999]

9

45000

49999

2019-07-30 05:24:00

2019-08-29 11:51:16.848000

analysis

474.892

True

Plotting

Results can be visualized by using the built in plotting functionality. With a quick call of the plot() function we can create a Plotly Figure.

>>> print(filtered_results.to_df(multilevel=False).to_markdown(tablefmt="grid"))

To render it in our notebook we can call the show() method:

>>> results.plot().show()
../_images/result_plot.svg

The image can also be exported to disk by using the following snippet:

>>> results.plot().write_image(f'../_static/tutorials/working_with_results/result_plot.svg')

We might want to reduce the number of plots, since there is a lot happening on the visualization right now. Therefore, we can first apply filtering and then perform the plotting.

>>> filtered_results.plot().show()
../_images/filtered_result_plot.svg

Some result classes offer multiple ways of visualizing them. These are listed in their associated API reference docs. For example, when looking at the docs for univariate drift results, there is the default drift and the distribution kind. We can change the visualization by specifying the kind parameter.

>>> filtered_results.plot(kind='distribution').show()
../_images/distribution_plot.svg

Comparing

Another neat feature is that we can plot a comparison between multiple results. For example, suppose we want to visualize the estimated performance with respect to the univariate drift metrics for the _salary_range_ column. We will first get our estimated performance result.

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc'],
...     chunk_size=5000,
...     problem_type='classification_binary'
>>> ).fit(reference_df)
>>> est_perf_results = estimator.estimate(analysis_df)

Now we can compare our estimated performance to the univariate drift on features:

>>> est_perf_results.compare(results.filter(methods=['chi2'], column_names=['salary_range'])).plot().show()
../_images/comparison_plot1.svg

We can immediately spot how the estimated performance plummets when the Jensen-Shannon distance picks up!

Note

To reduce complexity, we only support comparing a single metric to another one.

As illustrated in the code snippet above, you can use filtering to select a single metric from your result before comparing it.

Exporting

Results can also be exported to external storage using a Writer. We currently support writing results to disk using a RawFilesWriter, serializing the Result into a Python pickle file and storing that to disk using the PickleFileWriter, or storing calculation results in a database using the DatabaseWriter. This example will show how to use the DatabaseWriter.

In order to get the dependencies required for database access, please ensure you’ve installed the optional db dependency. Check the installation instructions for more information.

We construct the DatabaseWriter by providing a database connection string. Upon calling the write() method, all results will be written into the database, in this case, an SQLite database.

>>> database_writer = nml.DatabaseWriter(connection_string='sqlite:///nml.db')
>>> database_writer.write(results)

A quick inspection shows that the database was populated and contains the univariate drift calculation results.

>>> import sqlite3
>>> cursor = sqlite3.connect('nml.db').cursor()
>>> cursor.execute("""SELECT name FROM sqlite_master WHERE type='table'""")
>>> print(cursor.fetchall())
[('model',), ('run',), ('univariate_drift_metrics',), ('data_reconstruction_feature_drift_metrics',), ('realized_performance_metrics',), ('cbpe_performance_metrics',), ('dle_performance_metrics',), ('unseen_values_metrics',), ('missing_values_metrics',)]
>>> cursor.execute("""SELECT * FROM univariate_drift_metrics LIMIT 3""")
>>> print(cursor.fetchall())
[(1, None, 1, '2018-10-30 18:00:00.000000', '2018-11-30 00:27:16.848000', '2018-11-14 21:13:38.424000', 'kolmogorov_smirnov', 0.9999999999999062, 0, 'id'), (2, None, 1, '2018-11-30 00:36:00.000000', '2018-12-30 07:03:16.848000', '2018-12-15 03:49:38.424000', 'kolmogorov_smirnov', 0.9999999999999062, 0, 'id'), (3, None, 1, '2018-12-30 07:12:00.000000', '2019-01-29 13:39:16.848000', '2019-01-14 10:25:38.424000', 'kolmogorov_smirnov', 0.9999999999999062, 0, 'id')]