Working with results

What are NannyML Results?

In NannyML any calculation will return a Result object. Not returning a DataFrame directly allows NannyML to separate the concerns of storing calculation results and having users interact with them. It also means we can provide some additional useful methods on top of the results, for example filtering and plotting.

Just the code

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df = nml.load_synthetic_binary_classification_dataset()[0]
>>> analysis_df = nml.load_synthetic_binary_classification_dataset()[1]
>>> column_names = [col for col in reference_df.columns if col not in ['timestamp', 'identifier', 'period', 'work_home_actual']]
>>> print(column_names)

>>> calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)

>>> display(results.to_df())

>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))

>>> display(filtered_results.to_df())

>>> display(filtered_results.to_df(multilevel=False))

>>> database_writer = nml.DatabaseWriter(connection_string='sqlite:///nml.db')
>>> database_writer.write(results)

Walkthrough

In order to obtain results we first have to perform some calculation. We’ll start by loading the reference and analysis sample data for binary classification. We’ll perform univariate drift detection on a number of columns whose names are printed below. Knowing the column names will help you understand this walkthrough better.

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df = nml.load_synthetic_binary_classification_dataset()[0]
>>> analysis_df = nml.load_synthetic_binary_classification_dataset()[1]
>>> column_names = [col for col in reference_df.columns if col not in ['timestamp', 'identifier', 'period', 'work_home_actual']]
>>> print(column_names)
['distance_from_office', 'salary_range', 'gas_price_per_litre', 'public_transportation_cost', 'wfh_prev_workday', 'workday', 'tenure', 'y_pred_proba', 'y_pred']

We then set up the UnivariateDriftCalculator by specifying the names of the columns to evaluate and the continuous and categorical methods we would like to use. We then fit the calculator on our reference data. The fitted calculator is then used to evaluate drift for the analysis data, stored here as the variable results.

>>> calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)

This variable is an instance of the Result class. To turn this object into a DataFrame you can use the to_df() method. Let’s see what this DataFrame looks like.

>>> display(results.to_df())

We can immediately see that the a MultiLevel index is being used to store the data. There is a part containing chunk information, followed by the numerical results of the drift calculations. In the case of the UnivariateDriftCalculator there are two degrees of freedom. You can specify columns to include in the calculation, and each column might be evaluated by different methods.

This structure is visible in the column index. The top level represents the column names. The middle level represents the specific methods used to evaluate a column. The bottom level contains the information relevant to each method: a value, upper and lower thresholds for alerts and whether the evaluated method crossed the thresholds for that chunk, leading to an alert.

(‘chunk’, ‘chunk’, ‘chunk_index’)

(‘chunk’, ‘chunk’, ‘end_date’)

(‘chunk’, ‘chunk’, ‘end_index’)

(‘chunk’, ‘chunk’, ‘key’)

(‘chunk’, ‘chunk’, ‘period’)

(‘chunk’, ‘chunk’, ‘start_date’)

(‘chunk’, ‘chunk’, ‘start_index’)

(‘distance_from_office’, ‘jensen_shannon’, ‘alert’)

(‘distance_from_office’, ‘jensen_shannon’, ‘lower_threshold’)

(‘distance_from_office’, ‘jensen_shannon’, ‘upper_threshold’)

(‘distance_from_office’, ‘jensen_shannon’, ‘value’)

(‘distance_from_office’, ‘kolmogorov_smirnov’, ‘alert’)

(‘distance_from_office’, ‘kolmogorov_smirnov’, ‘lower_threshold’)

(‘distance_from_office’, ‘kolmogorov_smirnov’, ‘upper_threshold’)

(‘distance_from_office’, ‘kolmogorov_smirnov’, ‘value’)

(‘gas_price_per_litre’, ‘jensen_shannon’, ‘alert’)

(‘gas_price_per_litre’, ‘jensen_shannon’, ‘lower_threshold’)

(‘gas_price_per_litre’, ‘jensen_shannon’, ‘upper_threshold’)

(‘gas_price_per_litre’, ‘jensen_shannon’, ‘value’)

(‘gas_price_per_litre’, ‘kolmogorov_smirnov’, ‘alert’)

(‘gas_price_per_litre’, ‘kolmogorov_smirnov’, ‘lower_threshold’)

(‘gas_price_per_litre’, ‘kolmogorov_smirnov’, ‘upper_threshold’)

(‘gas_price_per_litre’, ‘kolmogorov_smirnov’, ‘value’)

(‘public_transportation_cost’, ‘jensen_shannon’, ‘alert’)

(‘public_transportation_cost’, ‘jensen_shannon’, ‘lower_threshold’)

(‘public_transportation_cost’, ‘jensen_shannon’, ‘upper_threshold’)

(‘public_transportation_cost’, ‘jensen_shannon’, ‘value’)

(‘public_transportation_cost’, ‘kolmogorov_smirnov’, ‘alert’)

(‘public_transportation_cost’, ‘kolmogorov_smirnov’, ‘lower_threshold’)

(‘public_transportation_cost’, ‘kolmogorov_smirnov’, ‘upper_threshold’)

(‘public_transportation_cost’, ‘kolmogorov_smirnov’, ‘value’)

(‘salary_range’, ‘chi2’, ‘alert’)

(‘salary_range’, ‘chi2’, ‘lower_threshold’)

(‘salary_range’, ‘chi2’, ‘upper_threshold’)

(‘salary_range’, ‘chi2’, ‘value’)

(‘salary_range’, ‘jensen_shannon’, ‘alert’)

(‘salary_range’, ‘jensen_shannon’, ‘lower_threshold’)

(‘salary_range’, ‘jensen_shannon’, ‘upper_threshold’)

(‘salary_range’, ‘jensen_shannon’, ‘value’)

(‘tenure’, ‘jensen_shannon’, ‘alert’)

(‘tenure’, ‘jensen_shannon’, ‘lower_threshold’)

(‘tenure’, ‘jensen_shannon’, ‘upper_threshold’)

(‘tenure’, ‘jensen_shannon’, ‘value’)

(‘tenure’, ‘kolmogorov_smirnov’, ‘alert’)

(‘tenure’, ‘kolmogorov_smirnov’, ‘lower_threshold’)

(‘tenure’, ‘kolmogorov_smirnov’, ‘upper_threshold’)

(‘tenure’, ‘kolmogorov_smirnov’, ‘value’)

(‘wfh_prev_workday’, ‘chi2’, ‘alert’)

(‘wfh_prev_workday’, ‘chi2’, ‘lower_threshold’)

(‘wfh_prev_workday’, ‘chi2’, ‘upper_threshold’)

(‘wfh_prev_workday’, ‘chi2’, ‘value’)

(‘wfh_prev_workday’, ‘jensen_shannon’, ‘alert’)

(‘wfh_prev_workday’, ‘jensen_shannon’, ‘lower_threshold’)

(‘wfh_prev_workday’, ‘jensen_shannon’, ‘upper_threshold’)

(‘wfh_prev_workday’, ‘jensen_shannon’, ‘value’)

(‘workday’, ‘chi2’, ‘alert’)

(‘workday’, ‘chi2’, ‘lower_threshold’)

(‘workday’, ‘chi2’, ‘upper_threshold’)

(‘workday’, ‘chi2’, ‘value’)

(‘workday’, ‘jensen_shannon’, ‘alert’)

(‘workday’, ‘jensen_shannon’, ‘lower_threshold’)

(‘workday’, ‘jensen_shannon’, ‘upper_threshold’)

(‘workday’, ‘jensen_shannon’, ‘value’)

(‘y_pred’, ‘jensen_shannon’, ‘alert’)

(‘y_pred’, ‘jensen_shannon’, ‘lower_threshold’)

(‘y_pred’, ‘jensen_shannon’, ‘upper_threshold’)

(‘y_pred’, ‘jensen_shannon’, ‘value’)

(‘y_pred’, ‘kolmogorov_smirnov’, ‘alert’)

(‘y_pred’, ‘kolmogorov_smirnov’, ‘lower_threshold’)

(‘y_pred’, ‘kolmogorov_smirnov’, ‘upper_threshold’)

(‘y_pred’, ‘kolmogorov_smirnov’, ‘value’)

(‘y_pred_proba’, ‘jensen_shannon’, ‘alert’)

(‘y_pred_proba’, ‘jensen_shannon’, ‘lower_threshold’)

(‘y_pred_proba’, ‘jensen_shannon’, ‘upper_threshold’)

(‘y_pred_proba’, ‘jensen_shannon’, ‘value’)

(‘y_pred_proba’, ‘kolmogorov_smirnov’, ‘alert’)

(‘y_pred_proba’, ‘kolmogorov_smirnov’, ‘lower_threshold’)

(‘y_pred_proba’, ‘kolmogorov_smirnov’, ‘upper_threshold’)

(‘y_pred_proba’, ‘kolmogorov_smirnov’, ‘value’)

0

0

2014-09-09 08:18:27

4999

[0:4999]

reference

2014-05-09 22:27:20

0

False

0.1

0.0294645

False

0.01034

False

0.1

0.0277569

False

0.01122

False

0.1

0.0267158

False

0.00998

False

2.89878

False

0.1

0.010811

False

0.1

0.0228713

False

0.00978

False

0.414606

False

0.1

0.00415143

False

4.00124

False

0.1

0.0125401

False

0.1

0.00684559

False

0.00806

False

0.1

0.0133555

False

0.00922

1

1

2015-01-09 00:02:51

9999

[5000:9999]

reference

2014-09-09 09:13:35

5000

False

0.1

0.0236588

False

0.0075

False

0.1

0.0292061

False

0.01222

False

0.1

0.0193572

False

0.01046

False

3.14439

False

0.1

0.01124

False

0.1

0.0335415

False

0.01192

False

0.0334857

False

0.1

0.00124668

False

1.28891

False

0.1

0.00713799

False

0.1

0.00463736

False

0.00546

False

0.1

0.0211292

False

0.01042

2

2

2015-05-09 15:54:26

14999

[10000:14999]

reference

2015-01-09 00:04:43

10000

False

0.1

0.0264403

False

0.0082

False

0.1

0.02533

False

0.00886

False

0.1

0.0299168

False

0.01706

False

2.45188

False

0.1

0.00980904

False

0.1

0.029597

False

0.01268

False

0.168656

False

0.1

0.00267997

False

5.11796

False

0.1

0.0142803

False

0.1

0.00419613

False

0.00494

False

0.1

0.02237

False

0.0091

3

3

2015-09-07 07:14:37

19999

[15000:19999]

reference

2015-05-09 16:02:08

15000

False

0.1

0.0217733

False

0.0086

False

0.1

0.0264593

False

0.00956

False

0.1

0.0333228

False

0.0122

False

4.06262

False

0.1

0.0127697

False

0.1

0.0286826

False

0.01074

False

0.0562698

False

0.1

0.00158831

False

1.84901

False

0.1

0.0085587

False

0.1

0.000220834

False

0.00026

False

0.1

0.0178289

False

0.00872

4

4

2016-01-08 16:02:05

24999

[20000:24999]

reference

2015-09-07 07:27:47

20000

False

0.1

0.0239721

False

0.0091

False

0.1

0.0295752

False

0.00758

False

0.1

0.0222416

False

0.00662

False

2.41399

False

0.1

0.00968817

False

0.1

0.0209876

False

0.00924

False

0.242059

False

0.1

0.00319188

False

0.470551

False

0.1

0.00433131

False

0.1

0.00368645

False

0.00434

False

0.1

0.0216622

False

0.00852

5

5

2016-05-09 11:09:39

29999

[25000:29999]

reference

2016-01-08 17:22:00

25000

False

0.1

0.0275768

False

0.01458

False

0.1

0.028514

False

0.01032

False

0.1

0.0303899

False

0.01186

False

3.79606

False

0.1

0.0122934

False

0.1

0.0229349

False

0.00794

False

3.61457

False

0.1

0.0120561

False

0.137868

False

0.1

0.00233712

False

0.1

0.00480722

False

0.00566

False

0.1

0.017256

False

0.01028

6

6

2016-09-04 03:30:35

34999

[30000:34999]

reference

2016-05-09 11:19:36

30000

False

0.1

0.0268749

False

0.0129

False

0.1

0.0228658

False

0.01094

False

0.1

0.0279513

False

0.00636

False

3.22884

False

0.1

0.0112358

False

0.1

0.0226753

False

0.0112

False

0.0757052

False

0.1

0.00182666

False

4.19999

False

0.1

0.0129223

False

0.1

0.00385634

False

0.00454

False

0.1

0.0253217

False

0.01248

7

7

2017-01-03 18:48:21

39999

[35000:39999]

reference

2016-09-04 04:09:35

35000

False

0.1

0.0312645

False

0.0138

False

0.1

0.0304354

False

0.01736

False

0.1

0.0215885

False

0.00832

False

1.3933

False

0.1

0.00739444

False

0.1

0.025517

False

0.0074

False

0.414606

False

0.1

0.00415143

False

0.716349

False

0.1

0.00533433

False

0.1

0.00453593

False

0.00534

False

0.1

0.0275068

False

0.0089

8

8

2017-05-03 02:34:24

44999

[40000:44999]

reference

2017-01-03 19:00:51

40000

False

0.1

0.0273523

False

0.01586

False

0.1

0.0243664

False

0.00842

False

0.1

0.0293265

False

0.01176

False

0.304785

False

0.1

0.00347061

False

0.1

0.0244145

False

0.01464

False

0.0126564

False

0.1

0.000802461

False

0.596009

False

0.1

0.00485967

False

0.1

0.000220834

False

0.00026

False

0.1

0.0243225

False

0.00768

9

9

2017-08-31 03:10:29

49999

[45000:49999]

reference

2017-05-03 02:49:38

45000

False

0.1

0.0296272

False

0.00924

False

0.1

0.0282426

False

0.00786

False

0.1

0.0235042

False

0.0082

False

2.98758

False

0.1

0.0108121

False

0.1

0.032928

False

0.01306

False

2.20383

False

0.1

0.00945409

False

5.08023

False

0.1

0.0142629

False

0.1

0.00045866

False

0.00054

False

0.1

0.0303947

False

0.00498

10

0

2018-01-02 00:45:44

4999

[0:4999]

analysis

2017-08-31 04:20:00

0

False

0.1

0.0261007

False

0.0131

False

0.1

0.0314247

False

0.01576

False

0.1

0.0281611

False

0.00956

False

1.03368

False

0.1

0.00639674

False

0.1

0.0309355

True

0.02124

False

1.70319

False

0.1

0.0083078

False

1.6025

False

0.1

0.00796199

False

0.1

0.0172838

True

0.02034

False

0.1

0.0289329

True

0.0253

11

1

2018-05-01 13:10:10

9999

[5000:9999]

analysis

2018-01-02 01:13:11

5000

False

0.1

0.0202971

False

0.01124

False

0.1

0.0271235

False

0.01272

False

0.1

0.0269486

False

0.01488

False

5.76241

False

0.1

0.0153757

False

0.1

0.0383534

False

0.01006

False

0.242059

False

0.1

0.00319188

False

5.71897

False

0.1

0.0150859

False

0.1

0.00854425

False

0.01006

False

0.1

0.0221389

False

0.0123

12

2

2018-09-01 15:40:40

14999

[10000:14999]

analysis

2018-05-01 14:25:25

10000

False

0.1

0.0210957

False

0.01682

False

0.1

0.0319369

False

0.01746

False

0.1

0.0381738

False

0.0129

False

2.65396

False

0.1

0.0102823

False

0.1

0.034176

True

0.0237

False

3.17862

False

0.1

0.0113376

False

2.08186

False

0.1

0.00907089

False

0.1

0.00837438

False

0.00986

False

0.1

0.0310428

False

0.01642

13

3

2018-12-31 10:11:21

19999

[15000:19999]

analysis

2018-09-01 16:19:07

15000

False

0.1

0.0362101

False

0.01436

False

0.1

0.0289334

False

0.01282

False

0.1

0.0344702

False

0.01598

False

0.0708428

False

0.1

0.00167698

False

0.1

0.0332968

False

0.01446

False

0.0242988

False

0.1

0.00107588

False

0.489515

False

0.1

0.00440901

False

0.1

0.00803465

False

0.00946

False

0.1

0.0228333

False

0.01058

14

4

2019-04-30 11:01:30

24999

[20000:24999]

analysis

2018-12-31 10:38:45

20000

False

0.1

0.0287082

False

0.01116

False

0.1

0.0305991

False

0.01922

False

0.1

0.0322846

False

0.01136

False

1.00542

False

0.1

0.00633255

False

0.1

0.0263609

False

0.00912

False

0.487381

False

0.1

0.00449331

False

3.15856

False

0.1

0.0112076

False

0.1

0.0016478

False

0.00194

False

0.1

0.0237474

False

0.01408

15

5

2019-09-01 00:24:27

29999

[25000:29999]

analysis

2019-04-30 11:02:00

25000

True

0.1

0.464732

True

0.43548

False

0.1

0.0301321

False

0.00824

True

0.1

0.262577

True

0.18346

True

455.622

True

0.1

0.183143

False

0.1

0.0288384

False

0.00702

True

1179.9

True

0.1

0.231198

False

4.66135

False

0.1

0.0135741

False

0.1

0.0223873

True

0.02634

True

0.1

0.225486

True

0.1307

16

6

2019-12-31 09:09:12

34999

[30000:34999]

analysis

2019-09-01 00:28:54

30000

True

0.1

0.460044

True

0.43032

False

0.1

0.0412587

False

0.01068

True

0.1

0.264073

True

0.18334

True

428.633

True

0.1

0.174226

False

0.1

0.0265918

False

0.00826

True

1162.99

True

0.1

0.229333

False

2.52181

False

0.1

0.0100123

False

0.1

0.0213664

True

0.02514

True

0.1

0.208815

True

0.1273

17

7

2020-04-30 11:46:53

39999

[35000:39999]

analysis

2019-12-31 10:07:15

35000

True

0.1

0.466746

True

0.43786

False

0.1

0.0283644

False

0.01002

True

0.1

0.267208

True

0.20062

True

453.247

True

0.1

0.182913

False

0.1

0.0275949

False

0.01398

True

1170.49

True

0.1

0.230161

False

3.41534

False

0.1

0.0116206

False

0.1

0.0198352

True

0.02334

True

0.1

0.224282

True

0.1311

18

8

2020-09-01 02:46:02

44999

[40000:44999]

analysis

2020-04-30 12:04:32

40000

True

0.1

0.4663

True

0.43608

False

0.1

0.0244792

False

0.0107

True

0.1

0.265218

True

0.1874

True

438.26

True

0.1

0.177985

False

0.1

0.0232423

False

0.00896

True

1023.35

True

0.1

0.213579

False

6.88171

False

0.1

0.0164851

False

0.1

0.0123531

False

0.01454

True

0.1

0.205352

True

0.1197

19

9

2021-01-01 04:29:32

49999

[45000:49999]

analysis

2020-09-01 02:46:13

45000

True

0.1

0.467798

True

0.43852

False

0.1

0.0283063

False

0.007

True

0.1

0.270583

True

0.20018

True

474.892

True

0.1

0.19035

False

0.1

0.0279191

False

0.00632

True

1227.54

True

0.1

0.236408

False

1.63759

False

0.1

0.00809379

False

0.1

0.0334576

True

0.03934

True

0.1

0.215539

True

0.13752

Working with the Multilevel indexes can be very powerful, yet also quite challenging. The following snippet illustrates how to retrieve all calculated method values from our results.

>>> print(results.to_df().loc[:, (slice(None), slice(None), 'value')].columns)
MultiIndex([(      'distance_from_office',     'jensen_shannon', 'value'),
            (      'distance_from_office', 'kolmogorov_smirnov', 'value'),
            (       'gas_price_per_litre',     'jensen_shannon', 'value'),
            (       'gas_price_per_litre', 'kolmogorov_smirnov', 'value'),
            ('public_transportation_cost',     'jensen_shannon', 'value'),
            ('public_transportation_cost', 'kolmogorov_smirnov', 'value'),
            (              'salary_range',               'chi2', 'value'),
            (              'salary_range',     'jensen_shannon', 'value'),
            (                    'tenure',     'jensen_shannon', 'value'),
            (                    'tenure', 'kolmogorov_smirnov', 'value'),
            (          'wfh_prev_workday',               'chi2', 'value'),
            (          'wfh_prev_workday',     'jensen_shannon', 'value'),
            (                   'workday',               'chi2', 'value'),
            (                   'workday',     'jensen_shannon', 'value'),
            (                    'y_pred',     'jensen_shannon', 'value'),
            (                    'y_pred', 'kolmogorov_smirnov', 'value'),
            (              'y_pred_proba',     'jensen_shannon', 'value'),
            (              'y_pred_proba', 'kolmogorov_smirnov', 'value')],
           )

To improve this experience we’ve introduced a helper method that allows you to filter the result data so you can easily retrieve the information you want. Since the UnivariateDriftCalculator has two degrees of freedom we’ve included both in the filter() method. Additionally you can filter on the data period, i.e. reference or analysis.

The filter() method will return a new Result instance, allowing you to chain methods like, filter(), to_df() and plot().

>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))
<class 'nannyml.drift.univariate.result.Result'>

When looking at the results after filtering, you can see only the chi2 data for the salary_range column during the analysis period is included.

>>> display(filtered_results.to_df())

(‘chunk’, ‘chunk’, ‘chunk_index’)

(‘chunk’, ‘chunk’, ‘end_date’)

(‘chunk’, ‘chunk’, ‘end_index’)

(‘chunk’, ‘chunk’, ‘key’)

(‘chunk’, ‘chunk’, ‘period’)

(‘chunk’, ‘chunk’, ‘start_date’)

(‘chunk’, ‘chunk’, ‘start_index’)

(‘salary_range’, ‘chi2’, ‘alert’)

(‘salary_range’, ‘chi2’, ‘lower_threshold’)

(‘salary_range’, ‘chi2’, ‘upper_threshold’)

(‘salary_range’, ‘chi2’, ‘value’)

0

0

2018-01-02 00:45:44

4999

[0:4999]

analysis

2017-08-31 04:20:00

0

False

1.03368

1

1

2018-05-01 13:10:10

9999

[5000:9999]

analysis

2018-01-02 01:13:11

5000

False

5.76241

2

2

2018-09-01 15:40:40

14999

[10000:14999]

analysis

2018-05-01 14:25:25

10000

False

2.65396

3

3

2018-12-31 10:11:21

19999

[15000:19999]

analysis

2018-09-01 16:19:07

15000

False

0.0708428

4

4

2019-04-30 11:01:30

24999

[20000:24999]

analysis

2018-12-31 10:38:45

20000

False

1.00542

5

5

2019-09-01 00:24:27

29999

[25000:29999]

analysis

2019-04-30 11:02:00

25000

True

455.622

6

6

2019-12-31 09:09:12

34999

[30000:34999]

analysis

2019-09-01 00:28:54

30000

True

428.633

7

7

2020-04-30 11:46:53

39999

[35000:39999]

analysis

2019-12-31 10:07:15

35000

True

453.247

8

8

2020-09-01 02:46:02

44999

[40000:44999]

analysis

2020-04-30 12:04:32

40000

True

438.26

9

9

2021-01-01 04:29:32

49999

[45000:49999]

analysis

2020-09-01 02:46:13

45000

True

474.892

To avoid the use of a Multilevel index, we’ve provided as switch in the to_df() method.

>>> display(filtered_results.to_df(multilevel=False))

chunk_index

chunk_end_date

chunk_end_index

chunk_key

chunk_period

chunk_start_date

chunk_start_index

salary_range_chi2_alert

salary_range_chi2_lower_threshold

salary_range_chi2_upper_threshold

salary_range_chi2_value

0

0

2018-01-02 00:45:44

4999

[0:4999]

analysis

2017-08-31 04:20:00

0

False

1.03368

1

1

2018-05-01 13:10:10

9999

[5000:9999]

analysis

2018-01-02 01:13:11

5000

False

5.76241

2

2

2018-09-01 15:40:40

14999

[10000:14999]

analysis

2018-05-01 14:25:25

10000

False

2.65396

3

3

2018-12-31 10:11:21

19999

[15000:19999]

analysis

2018-09-01 16:19:07

15000

False

0.0708428

4

4

2019-04-30 11:01:30

24999

[20000:24999]

analysis

2018-12-31 10:38:45

20000

False

1.00542

5

5

2019-09-01 00:24:27

29999

[25000:29999]

analysis

2019-04-30 11:02:00

25000

True

455.622

6

6

2019-12-31 09:09:12

34999

[30000:34999]

analysis

2019-09-01 00:28:54

30000

True

428.633

7

7

2020-04-30 11:46:53

39999

[35000:39999]

analysis

2019-12-31 10:07:15

35000

True

453.247

8

8

2020-09-01 02:46:02

44999

[40000:44999]

analysis

2020-04-30 12:04:32

40000

True

438.26

9

9

2021-01-01 04:29:32

49999

[45000:49999]

analysis

2020-09-01 02:46:13

45000

True

474.892

Results can also be exported to external storage using a Writer. We currently support writing results to disk using a RawFilesWriter, serializing the Result into a Python pickle file and storing that to disk using the PickleFileWriter or storing calculation results in a database using the DatabaseWriter. This example will show how to use the DatabaseWriter.

We construct the DatabaseWriter by providing a database connection string. Upon calling the write() method all results will be written into the database, in this case a SQLite database.

>>> database_writer = nml.DatabaseWriter(connection_string='sqlite:///nml.db')
>>> database_writer.write(results)

A quick inspection shows the database was populated and contains the univariate drift calculation results.

>>> import sqlite3
>>> cursor = sqlite3.connect('nml.db').cursor()
>>> cursor.execute("""SELECT name FROM sqlite_master WHERE type='table'""")
>>> print(cursor.fetchall())
[('model',), ('run',), ('univariate_drift_metrics',), ('data_reconstruction_feature_drift_metrics',), ('realized_performance_metrics',), ('cbpe_performance_metrics',), ('dle_performance_metrics',)]
>>> cursor.execute("""SELECT * FROM univariate_drift_metrics LIMIT 3""")
>>> print(cursor.fetchall())
[(1, None, 1, '2017-08-31 04:20:00.000000', '2018-01-02 00:45:44.000000', '2017-11-01 02:32:52.000000', 'kolmogorov_smirnov', 0.0131, 0, 'distance_from_office'), (2, None, 1, '2018-01-02 01:13:11.000000', '2018-05-01 13:10:10.000000', '2018-03-02 19:11:40.500000', 'kolmogorov_smirnov', 0.011239999999999972, 0, 'distance_from_office'), (3, None, 1, '2018-05-01 14:25:25.000000', '2018-09-01 15:40:40.000000', '2018-07-02 03:03:02.500000', 'kolmogorov_smirnov', 0.01682, 0, 'distance_from_office')]