Working with results¶
What are NannyML Results?¶
In NannyML any calculation will return a Result
object. Not returning
a DataFrame directly allows NannyML to separate the concerns of storing calculation results and having users interact
with them. It also means we can provide some additional useful methods on top of the results, for example filtering
and plotting.
Just the code¶
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df = nml.load_synthetic_binary_classification_dataset()[0]
>>> analysis_df = nml.load_synthetic_binary_classification_dataset()[1]
>>> column_names = [col for col in reference_df.columns if col not in ['timestamp', 'identifier', 'period', 'work_home_actual']]
>>> print(column_names)
>>> calc = nml.UnivariateDriftCalculator(
... column_names=column_names,
... timestamp_column_name='timestamp',
... continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
... categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)
>>> display(results.to_df())
>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))
>>> display(filtered_results.to_df())
>>> display(filtered_results.to_df(multilevel=False))
>>> database_writer = nml.DatabaseWriter(connection_string='sqlite:///nml.db')
>>> database_writer.write(results)
Walkthrough¶
In order to obtain results we first have to perform some calculation. We’ll start by loading the reference and analysis sample data for binary classification. We’ll perform univariate drift detection on a number of columns whose names are printed below. Knowing the column names will help you understand this walkthrough better.
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df = nml.load_synthetic_binary_classification_dataset()[0]
>>> analysis_df = nml.load_synthetic_binary_classification_dataset()[1]
>>> column_names = [col for col in reference_df.columns if col not in ['timestamp', 'identifier', 'period', 'work_home_actual']]
>>> print(column_names)
['distance_from_office', 'salary_range', 'gas_price_per_litre', 'public_transportation_cost', 'wfh_prev_workday', 'workday', 'tenure', 'y_pred_proba', 'y_pred']
We then set up the UnivariateDriftCalculator
by specifying the names
of the columns to evaluate and the continuous and categorical methods we would like to use.
We then fit the calculator on our reference data. The fitted calculator is then used to evaluate drift for the
analysis data, stored here as the variable results
.
>>> calc = nml.UnivariateDriftCalculator(
... column_names=column_names,
... timestamp_column_name='timestamp',
... continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
... categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)
This variable is an instance of the Result
class. To turn this object into a
DataFrame you can use the to_df()
method. Let’s see what
this DataFrame looks like.
>>> display(results.to_df())
We can immediately see that the a MultiLevel index is being used to store the data. There is a part containing chunk
information, followed by the numerical results of the drift calculations.
In the case of the UnivariateDriftCalculator
there are two degrees of
freedom. You can specify columns to include in the calculation, and each column might be evaluated by different methods.
This structure is visible in the column index. The top level represents the column names. The middle level represents the specific methods used to evaluate a column. The bottom level contains the information relevant to each method: a value, upper and lower thresholds for alerts and whether the evaluated method crossed the thresholds for that chunk, leading to an alert.
(‘chunk’, ‘chunk’, ‘chunk_index’) |
(‘chunk’, ‘chunk’, ‘end_date’) |
(‘chunk’, ‘chunk’, ‘end_index’) |
(‘chunk’, ‘chunk’, ‘key’) |
(‘chunk’, ‘chunk’, ‘period’) |
(‘chunk’, ‘chunk’, ‘start_date’) |
(‘chunk’, ‘chunk’, ‘start_index’) |
(‘distance_from_office’, ‘jensen_shannon’, ‘alert’) |
(‘distance_from_office’, ‘jensen_shannon’, ‘lower_threshold’) |
(‘distance_from_office’, ‘jensen_shannon’, ‘upper_threshold’) |
(‘distance_from_office’, ‘jensen_shannon’, ‘value’) |
(‘distance_from_office’, ‘kolmogorov_smirnov’, ‘alert’) |
(‘distance_from_office’, ‘kolmogorov_smirnov’, ‘lower_threshold’) |
(‘distance_from_office’, ‘kolmogorov_smirnov’, ‘upper_threshold’) |
(‘distance_from_office’, ‘kolmogorov_smirnov’, ‘value’) |
(‘gas_price_per_litre’, ‘jensen_shannon’, ‘alert’) |
(‘gas_price_per_litre’, ‘jensen_shannon’, ‘lower_threshold’) |
(‘gas_price_per_litre’, ‘jensen_shannon’, ‘upper_threshold’) |
(‘gas_price_per_litre’, ‘jensen_shannon’, ‘value’) |
(‘gas_price_per_litre’, ‘kolmogorov_smirnov’, ‘alert’) |
(‘gas_price_per_litre’, ‘kolmogorov_smirnov’, ‘lower_threshold’) |
(‘gas_price_per_litre’, ‘kolmogorov_smirnov’, ‘upper_threshold’) |
(‘gas_price_per_litre’, ‘kolmogorov_smirnov’, ‘value’) |
(‘public_transportation_cost’, ‘jensen_shannon’, ‘alert’) |
(‘public_transportation_cost’, ‘jensen_shannon’, ‘lower_threshold’) |
(‘public_transportation_cost’, ‘jensen_shannon’, ‘upper_threshold’) |
(‘public_transportation_cost’, ‘jensen_shannon’, ‘value’) |
(‘public_transportation_cost’, ‘kolmogorov_smirnov’, ‘alert’) |
(‘public_transportation_cost’, ‘kolmogorov_smirnov’, ‘lower_threshold’) |
(‘public_transportation_cost’, ‘kolmogorov_smirnov’, ‘upper_threshold’) |
(‘public_transportation_cost’, ‘kolmogorov_smirnov’, ‘value’) |
(‘salary_range’, ‘chi2’, ‘alert’) |
(‘salary_range’, ‘chi2’, ‘lower_threshold’) |
(‘salary_range’, ‘chi2’, ‘upper_threshold’) |
(‘salary_range’, ‘chi2’, ‘value’) |
(‘salary_range’, ‘jensen_shannon’, ‘alert’) |
(‘salary_range’, ‘jensen_shannon’, ‘lower_threshold’) |
(‘salary_range’, ‘jensen_shannon’, ‘upper_threshold’) |
(‘salary_range’, ‘jensen_shannon’, ‘value’) |
(‘tenure’, ‘jensen_shannon’, ‘alert’) |
(‘tenure’, ‘jensen_shannon’, ‘lower_threshold’) |
(‘tenure’, ‘jensen_shannon’, ‘upper_threshold’) |
(‘tenure’, ‘jensen_shannon’, ‘value’) |
(‘tenure’, ‘kolmogorov_smirnov’, ‘alert’) |
(‘tenure’, ‘kolmogorov_smirnov’, ‘lower_threshold’) |
(‘tenure’, ‘kolmogorov_smirnov’, ‘upper_threshold’) |
(‘tenure’, ‘kolmogorov_smirnov’, ‘value’) |
(‘wfh_prev_workday’, ‘chi2’, ‘alert’) |
(‘wfh_prev_workday’, ‘chi2’, ‘lower_threshold’) |
(‘wfh_prev_workday’, ‘chi2’, ‘upper_threshold’) |
(‘wfh_prev_workday’, ‘chi2’, ‘value’) |
(‘wfh_prev_workday’, ‘jensen_shannon’, ‘alert’) |
(‘wfh_prev_workday’, ‘jensen_shannon’, ‘lower_threshold’) |
(‘wfh_prev_workday’, ‘jensen_shannon’, ‘upper_threshold’) |
(‘wfh_prev_workday’, ‘jensen_shannon’, ‘value’) |
(‘workday’, ‘chi2’, ‘alert’) |
(‘workday’, ‘chi2’, ‘lower_threshold’) |
(‘workday’, ‘chi2’, ‘upper_threshold’) |
(‘workday’, ‘chi2’, ‘value’) |
(‘workday’, ‘jensen_shannon’, ‘alert’) |
(‘workday’, ‘jensen_shannon’, ‘lower_threshold’) |
(‘workday’, ‘jensen_shannon’, ‘upper_threshold’) |
(‘workday’, ‘jensen_shannon’, ‘value’) |
(‘y_pred’, ‘jensen_shannon’, ‘alert’) |
(‘y_pred’, ‘jensen_shannon’, ‘lower_threshold’) |
(‘y_pred’, ‘jensen_shannon’, ‘upper_threshold’) |
(‘y_pred’, ‘jensen_shannon’, ‘value’) |
(‘y_pred’, ‘kolmogorov_smirnov’, ‘alert’) |
(‘y_pred’, ‘kolmogorov_smirnov’, ‘lower_threshold’) |
(‘y_pred’, ‘kolmogorov_smirnov’, ‘upper_threshold’) |
(‘y_pred’, ‘kolmogorov_smirnov’, ‘value’) |
(‘y_pred_proba’, ‘jensen_shannon’, ‘alert’) |
(‘y_pred_proba’, ‘jensen_shannon’, ‘lower_threshold’) |
(‘y_pred_proba’, ‘jensen_shannon’, ‘upper_threshold’) |
(‘y_pred_proba’, ‘jensen_shannon’, ‘value’) |
(‘y_pred_proba’, ‘kolmogorov_smirnov’, ‘alert’) |
(‘y_pred_proba’, ‘kolmogorov_smirnov’, ‘lower_threshold’) |
(‘y_pred_proba’, ‘kolmogorov_smirnov’, ‘upper_threshold’) |
(‘y_pred_proba’, ‘kolmogorov_smirnov’, ‘value’) |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
0 |
2014-09-09 08:18:27 |
4999 |
[0:4999] |
reference |
2014-05-09 22:27:20 |
0 |
False |
0.1 |
0.0294645 |
False |
0.01034 |
False |
0.1 |
0.0277569 |
False |
0.01122 |
False |
0.1 |
0.0267158 |
False |
0.00998 |
False |
2.89878 |
False |
0.1 |
0.010811 |
False |
0.1 |
0.0228713 |
False |
0.00978 |
False |
0.414606 |
False |
0.1 |
0.00415143 |
False |
4.00124 |
False |
0.1 |
0.0125401 |
False |
0.1 |
0.00684559 |
False |
0.00806 |
False |
0.1 |
0.0133555 |
False |
0.00922 |
|||||||||||||||||||||||||||
1 |
1 |
2015-01-09 00:02:51 |
9999 |
[5000:9999] |
reference |
2014-09-09 09:13:35 |
5000 |
False |
0.1 |
0.0236588 |
False |
0.0075 |
False |
0.1 |
0.0292061 |
False |
0.01222 |
False |
0.1 |
0.0193572 |
False |
0.01046 |
False |
3.14439 |
False |
0.1 |
0.01124 |
False |
0.1 |
0.0335415 |
False |
0.01192 |
False |
0.0334857 |
False |
0.1 |
0.00124668 |
False |
1.28891 |
False |
0.1 |
0.00713799 |
False |
0.1 |
0.00463736 |
False |
0.00546 |
False |
0.1 |
0.0211292 |
False |
0.01042 |
|||||||||||||||||||||||||||
2 |
2 |
2015-05-09 15:54:26 |
14999 |
[10000:14999] |
reference |
2015-01-09 00:04:43 |
10000 |
False |
0.1 |
0.0264403 |
False |
0.0082 |
False |
0.1 |
0.02533 |
False |
0.00886 |
False |
0.1 |
0.0299168 |
False |
0.01706 |
False |
2.45188 |
False |
0.1 |
0.00980904 |
False |
0.1 |
0.029597 |
False |
0.01268 |
False |
0.168656 |
False |
0.1 |
0.00267997 |
False |
5.11796 |
False |
0.1 |
0.0142803 |
False |
0.1 |
0.00419613 |
False |
0.00494 |
False |
0.1 |
0.02237 |
False |
0.0091 |
|||||||||||||||||||||||||||
3 |
3 |
2015-09-07 07:14:37 |
19999 |
[15000:19999] |
reference |
2015-05-09 16:02:08 |
15000 |
False |
0.1 |
0.0217733 |
False |
0.0086 |
False |
0.1 |
0.0264593 |
False |
0.00956 |
False |
0.1 |
0.0333228 |
False |
0.0122 |
False |
4.06262 |
False |
0.1 |
0.0127697 |
False |
0.1 |
0.0286826 |
False |
0.01074 |
False |
0.0562698 |
False |
0.1 |
0.00158831 |
False |
1.84901 |
False |
0.1 |
0.0085587 |
False |
0.1 |
0.000220834 |
False |
0.00026 |
False |
0.1 |
0.0178289 |
False |
0.00872 |
|||||||||||||||||||||||||||
4 |
4 |
2016-01-08 16:02:05 |
24999 |
[20000:24999] |
reference |
2015-09-07 07:27:47 |
20000 |
False |
0.1 |
0.0239721 |
False |
0.0091 |
False |
0.1 |
0.0295752 |
False |
0.00758 |
False |
0.1 |
0.0222416 |
False |
0.00662 |
False |
2.41399 |
False |
0.1 |
0.00968817 |
False |
0.1 |
0.0209876 |
False |
0.00924 |
False |
0.242059 |
False |
0.1 |
0.00319188 |
False |
0.470551 |
False |
0.1 |
0.00433131 |
False |
0.1 |
0.00368645 |
False |
0.00434 |
False |
0.1 |
0.0216622 |
False |
0.00852 |
|||||||||||||||||||||||||||
5 |
5 |
2016-05-09 11:09:39 |
29999 |
[25000:29999] |
reference |
2016-01-08 17:22:00 |
25000 |
False |
0.1 |
0.0275768 |
False |
0.01458 |
False |
0.1 |
0.028514 |
False |
0.01032 |
False |
0.1 |
0.0303899 |
False |
0.01186 |
False |
3.79606 |
False |
0.1 |
0.0122934 |
False |
0.1 |
0.0229349 |
False |
0.00794 |
False |
3.61457 |
False |
0.1 |
0.0120561 |
False |
0.137868 |
False |
0.1 |
0.00233712 |
False |
0.1 |
0.00480722 |
False |
0.00566 |
False |
0.1 |
0.017256 |
False |
0.01028 |
|||||||||||||||||||||||||||
6 |
6 |
2016-09-04 03:30:35 |
34999 |
[30000:34999] |
reference |
2016-05-09 11:19:36 |
30000 |
False |
0.1 |
0.0268749 |
False |
0.0129 |
False |
0.1 |
0.0228658 |
False |
0.01094 |
False |
0.1 |
0.0279513 |
False |
0.00636 |
False |
3.22884 |
False |
0.1 |
0.0112358 |
False |
0.1 |
0.0226753 |
False |
0.0112 |
False |
0.0757052 |
False |
0.1 |
0.00182666 |
False |
4.19999 |
False |
0.1 |
0.0129223 |
False |
0.1 |
0.00385634 |
False |
0.00454 |
False |
0.1 |
0.0253217 |
False |
0.01248 |
|||||||||||||||||||||||||||
7 |
7 |
2017-01-03 18:48:21 |
39999 |
[35000:39999] |
reference |
2016-09-04 04:09:35 |
35000 |
False |
0.1 |
0.0312645 |
False |
0.0138 |
False |
0.1 |
0.0304354 |
False |
0.01736 |
False |
0.1 |
0.0215885 |
False |
0.00832 |
False |
1.3933 |
False |
0.1 |
0.00739444 |
False |
0.1 |
0.025517 |
False |
0.0074 |
False |
0.414606 |
False |
0.1 |
0.00415143 |
False |
0.716349 |
False |
0.1 |
0.00533433 |
False |
0.1 |
0.00453593 |
False |
0.00534 |
False |
0.1 |
0.0275068 |
False |
0.0089 |
|||||||||||||||||||||||||||
8 |
8 |
2017-05-03 02:34:24 |
44999 |
[40000:44999] |
reference |
2017-01-03 19:00:51 |
40000 |
False |
0.1 |
0.0273523 |
False |
0.01586 |
False |
0.1 |
0.0243664 |
False |
0.00842 |
False |
0.1 |
0.0293265 |
False |
0.01176 |
False |
0.304785 |
False |
0.1 |
0.00347061 |
False |
0.1 |
0.0244145 |
False |
0.01464 |
False |
0.0126564 |
False |
0.1 |
0.000802461 |
False |
0.596009 |
False |
0.1 |
0.00485967 |
False |
0.1 |
0.000220834 |
False |
0.00026 |
False |
0.1 |
0.0243225 |
False |
0.00768 |
|||||||||||||||||||||||||||
9 |
9 |
2017-08-31 03:10:29 |
49999 |
[45000:49999] |
reference |
2017-05-03 02:49:38 |
45000 |
False |
0.1 |
0.0296272 |
False |
0.00924 |
False |
0.1 |
0.0282426 |
False |
0.00786 |
False |
0.1 |
0.0235042 |
False |
0.0082 |
False |
2.98758 |
False |
0.1 |
0.0108121 |
False |
0.1 |
0.032928 |
False |
0.01306 |
False |
2.20383 |
False |
0.1 |
0.00945409 |
False |
5.08023 |
False |
0.1 |
0.0142629 |
False |
0.1 |
0.00045866 |
False |
0.00054 |
False |
0.1 |
0.0303947 |
False |
0.00498 |
|||||||||||||||||||||||||||
10 |
0 |
2018-01-02 00:45:44 |
4999 |
[0:4999] |
analysis |
2017-08-31 04:20:00 |
0 |
False |
0.1 |
0.0261007 |
False |
0.0131 |
False |
0.1 |
0.0314247 |
False |
0.01576 |
False |
0.1 |
0.0281611 |
False |
0.00956 |
False |
1.03368 |
False |
0.1 |
0.00639674 |
False |
0.1 |
0.0309355 |
True |
0.02124 |
False |
1.70319 |
False |
0.1 |
0.0083078 |
False |
1.6025 |
False |
0.1 |
0.00796199 |
False |
0.1 |
0.0172838 |
True |
0.02034 |
False |
0.1 |
0.0289329 |
True |
0.0253 |
|||||||||||||||||||||||||||
11 |
1 |
2018-05-01 13:10:10 |
9999 |
[5000:9999] |
analysis |
2018-01-02 01:13:11 |
5000 |
False |
0.1 |
0.0202971 |
False |
0.01124 |
False |
0.1 |
0.0271235 |
False |
0.01272 |
False |
0.1 |
0.0269486 |
False |
0.01488 |
False |
5.76241 |
False |
0.1 |
0.0153757 |
False |
0.1 |
0.0383534 |
False |
0.01006 |
False |
0.242059 |
False |
0.1 |
0.00319188 |
False |
5.71897 |
False |
0.1 |
0.0150859 |
False |
0.1 |
0.00854425 |
False |
0.01006 |
False |
0.1 |
0.0221389 |
False |
0.0123 |
|||||||||||||||||||||||||||
12 |
2 |
2018-09-01 15:40:40 |
14999 |
[10000:14999] |
analysis |
2018-05-01 14:25:25 |
10000 |
False |
0.1 |
0.0210957 |
False |
0.01682 |
False |
0.1 |
0.0319369 |
False |
0.01746 |
False |
0.1 |
0.0381738 |
False |
0.0129 |
False |
2.65396 |
False |
0.1 |
0.0102823 |
False |
0.1 |
0.034176 |
True |
0.0237 |
False |
3.17862 |
False |
0.1 |
0.0113376 |
False |
2.08186 |
False |
0.1 |
0.00907089 |
False |
0.1 |
0.00837438 |
False |
0.00986 |
False |
0.1 |
0.0310428 |
False |
0.01642 |
|||||||||||||||||||||||||||
13 |
3 |
2018-12-31 10:11:21 |
19999 |
[15000:19999] |
analysis |
2018-09-01 16:19:07 |
15000 |
False |
0.1 |
0.0362101 |
False |
0.01436 |
False |
0.1 |
0.0289334 |
False |
0.01282 |
False |
0.1 |
0.0344702 |
False |
0.01598 |
False |
0.0708428 |
False |
0.1 |
0.00167698 |
False |
0.1 |
0.0332968 |
False |
0.01446 |
False |
0.0242988 |
False |
0.1 |
0.00107588 |
False |
0.489515 |
False |
0.1 |
0.00440901 |
False |
0.1 |
0.00803465 |
False |
0.00946 |
False |
0.1 |
0.0228333 |
False |
0.01058 |
|||||||||||||||||||||||||||
14 |
4 |
2019-04-30 11:01:30 |
24999 |
[20000:24999] |
analysis |
2018-12-31 10:38:45 |
20000 |
False |
0.1 |
0.0287082 |
False |
0.01116 |
False |
0.1 |
0.0305991 |
False |
0.01922 |
False |
0.1 |
0.0322846 |
False |
0.01136 |
False |
1.00542 |
False |
0.1 |
0.00633255 |
False |
0.1 |
0.0263609 |
False |
0.00912 |
False |
0.487381 |
False |
0.1 |
0.00449331 |
False |
3.15856 |
False |
0.1 |
0.0112076 |
False |
0.1 |
0.0016478 |
False |
0.00194 |
False |
0.1 |
0.0237474 |
False |
0.01408 |
|||||||||||||||||||||||||||
15 |
5 |
2019-09-01 00:24:27 |
29999 |
[25000:29999] |
analysis |
2019-04-30 11:02:00 |
25000 |
True |
0.1 |
0.464732 |
True |
0.43548 |
False |
0.1 |
0.0301321 |
False |
0.00824 |
True |
0.1 |
0.262577 |
True |
0.18346 |
True |
455.622 |
True |
0.1 |
0.183143 |
False |
0.1 |
0.0288384 |
False |
0.00702 |
True |
1179.9 |
True |
0.1 |
0.231198 |
False |
4.66135 |
False |
0.1 |
0.0135741 |
False |
0.1 |
0.0223873 |
True |
0.02634 |
True |
0.1 |
0.225486 |
True |
0.1307 |
|||||||||||||||||||||||||||
16 |
6 |
2019-12-31 09:09:12 |
34999 |
[30000:34999] |
analysis |
2019-09-01 00:28:54 |
30000 |
True |
0.1 |
0.460044 |
True |
0.43032 |
False |
0.1 |
0.0412587 |
False |
0.01068 |
True |
0.1 |
0.264073 |
True |
0.18334 |
True |
428.633 |
True |
0.1 |
0.174226 |
False |
0.1 |
0.0265918 |
False |
0.00826 |
True |
1162.99 |
True |
0.1 |
0.229333 |
False |
2.52181 |
False |
0.1 |
0.0100123 |
False |
0.1 |
0.0213664 |
True |
0.02514 |
True |
0.1 |
0.208815 |
True |
0.1273 |
|||||||||||||||||||||||||||
17 |
7 |
2020-04-30 11:46:53 |
39999 |
[35000:39999] |
analysis |
2019-12-31 10:07:15 |
35000 |
True |
0.1 |
0.466746 |
True |
0.43786 |
False |
0.1 |
0.0283644 |
False |
0.01002 |
True |
0.1 |
0.267208 |
True |
0.20062 |
True |
453.247 |
True |
0.1 |
0.182913 |
False |
0.1 |
0.0275949 |
False |
0.01398 |
True |
1170.49 |
True |
0.1 |
0.230161 |
False |
3.41534 |
False |
0.1 |
0.0116206 |
False |
0.1 |
0.0198352 |
True |
0.02334 |
True |
0.1 |
0.224282 |
True |
0.1311 |
|||||||||||||||||||||||||||
18 |
8 |
2020-09-01 02:46:02 |
44999 |
[40000:44999] |
analysis |
2020-04-30 12:04:32 |
40000 |
True |
0.1 |
0.4663 |
True |
0.43608 |
False |
0.1 |
0.0244792 |
False |
0.0107 |
True |
0.1 |
0.265218 |
True |
0.1874 |
True |
438.26 |
True |
0.1 |
0.177985 |
False |
0.1 |
0.0232423 |
False |
0.00896 |
True |
1023.35 |
True |
0.1 |
0.213579 |
False |
6.88171 |
False |
0.1 |
0.0164851 |
False |
0.1 |
0.0123531 |
False |
0.01454 |
True |
0.1 |
0.205352 |
True |
0.1197 |
|||||||||||||||||||||||||||
19 |
9 |
2021-01-01 04:29:32 |
49999 |
[45000:49999] |
analysis |
2020-09-01 02:46:13 |
45000 |
True |
0.1 |
0.467798 |
True |
0.43852 |
False |
0.1 |
0.0283063 |
False |
0.007 |
True |
0.1 |
0.270583 |
True |
0.20018 |
True |
474.892 |
True |
0.1 |
0.19035 |
False |
0.1 |
0.0279191 |
False |
0.00632 |
True |
1227.54 |
True |
0.1 |
0.236408 |
False |
1.63759 |
False |
0.1 |
0.00809379 |
False |
0.1 |
0.0334576 |
True |
0.03934 |
True |
0.1 |
0.215539 |
True |
0.13752 |
Working with the Multilevel indexes can be very powerful, yet also quite challenging. The following snippet illustrates how to retrieve all calculated method values from our results.
>>> print(results.to_df().loc[:, (slice(None), slice(None), 'value')].columns)
MultiIndex([( 'distance_from_office', 'jensen_shannon', 'value'),
( 'distance_from_office', 'kolmogorov_smirnov', 'value'),
( 'gas_price_per_litre', 'jensen_shannon', 'value'),
( 'gas_price_per_litre', 'kolmogorov_smirnov', 'value'),
('public_transportation_cost', 'jensen_shannon', 'value'),
('public_transportation_cost', 'kolmogorov_smirnov', 'value'),
( 'salary_range', 'chi2', 'value'),
( 'salary_range', 'jensen_shannon', 'value'),
( 'tenure', 'jensen_shannon', 'value'),
( 'tenure', 'kolmogorov_smirnov', 'value'),
( 'wfh_prev_workday', 'chi2', 'value'),
( 'wfh_prev_workday', 'jensen_shannon', 'value'),
( 'workday', 'chi2', 'value'),
( 'workday', 'jensen_shannon', 'value'),
( 'y_pred', 'jensen_shannon', 'value'),
( 'y_pred', 'kolmogorov_smirnov', 'value'),
( 'y_pred_proba', 'jensen_shannon', 'value'),
( 'y_pred_proba', 'kolmogorov_smirnov', 'value')],
)
To improve this experience we’ve introduced a helper method that allows you to filter the result data so you can easily
retrieve the information you want. Since the UnivariateDriftCalculator
has
two degrees of freedom we’ve included both in the filter()
method.
Additionally you can filter on the data period, i.e. reference
or analysis
.
The filter()
method will return a new
Result
instance, allowing you to chain methods like,
filter()
, to_df()
and
plot()
.
>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))
<class 'nannyml.drift.univariate.result.Result'>
When looking at the results after filtering, you can see only the chi2 data for the salary_range column during the analysis period is included.
>>> display(filtered_results.to_df())
(‘chunk’, ‘chunk’, ‘chunk_index’) |
(‘chunk’, ‘chunk’, ‘end_date’) |
(‘chunk’, ‘chunk’, ‘end_index’) |
(‘chunk’, ‘chunk’, ‘key’) |
(‘chunk’, ‘chunk’, ‘period’) |
(‘chunk’, ‘chunk’, ‘start_date’) |
(‘chunk’, ‘chunk’, ‘start_index’) |
(‘salary_range’, ‘chi2’, ‘alert’) |
(‘salary_range’, ‘chi2’, ‘lower_threshold’) |
(‘salary_range’, ‘chi2’, ‘upper_threshold’) |
(‘salary_range’, ‘chi2’, ‘value’) |
|
---|---|---|---|---|---|---|---|---|---|---|---|
0 |
0 |
2018-01-02 00:45:44 |
4999 |
[0:4999] |
analysis |
2017-08-31 04:20:00 |
0 |
False |
1.03368 |
||
1 |
1 |
2018-05-01 13:10:10 |
9999 |
[5000:9999] |
analysis |
2018-01-02 01:13:11 |
5000 |
False |
5.76241 |
||
2 |
2 |
2018-09-01 15:40:40 |
14999 |
[10000:14999] |
analysis |
2018-05-01 14:25:25 |
10000 |
False |
2.65396 |
||
3 |
3 |
2018-12-31 10:11:21 |
19999 |
[15000:19999] |
analysis |
2018-09-01 16:19:07 |
15000 |
False |
0.0708428 |
||
4 |
4 |
2019-04-30 11:01:30 |
24999 |
[20000:24999] |
analysis |
2018-12-31 10:38:45 |
20000 |
False |
1.00542 |
||
5 |
5 |
2019-09-01 00:24:27 |
29999 |
[25000:29999] |
analysis |
2019-04-30 11:02:00 |
25000 |
True |
455.622 |
||
6 |
6 |
2019-12-31 09:09:12 |
34999 |
[30000:34999] |
analysis |
2019-09-01 00:28:54 |
30000 |
True |
428.633 |
||
7 |
7 |
2020-04-30 11:46:53 |
39999 |
[35000:39999] |
analysis |
2019-12-31 10:07:15 |
35000 |
True |
453.247 |
||
8 |
8 |
2020-09-01 02:46:02 |
44999 |
[40000:44999] |
analysis |
2020-04-30 12:04:32 |
40000 |
True |
438.26 |
||
9 |
9 |
2021-01-01 04:29:32 |
49999 |
[45000:49999] |
analysis |
2020-09-01 02:46:13 |
45000 |
True |
474.892 |
To avoid the use of a Multilevel index, we’ve provided as switch in the
to_df()
method.
>>> display(filtered_results.to_df(multilevel=False))
chunk_index |
chunk_end_date |
chunk_end_index |
chunk_key |
chunk_period |
chunk_start_date |
chunk_start_index |
salary_range_chi2_alert |
salary_range_chi2_lower_threshold |
salary_range_chi2_upper_threshold |
salary_range_chi2_value |
|
---|---|---|---|---|---|---|---|---|---|---|---|
0 |
0 |
2018-01-02 00:45:44 |
4999 |
[0:4999] |
analysis |
2017-08-31 04:20:00 |
0 |
False |
1.03368 |
||
1 |
1 |
2018-05-01 13:10:10 |
9999 |
[5000:9999] |
analysis |
2018-01-02 01:13:11 |
5000 |
False |
5.76241 |
||
2 |
2 |
2018-09-01 15:40:40 |
14999 |
[10000:14999] |
analysis |
2018-05-01 14:25:25 |
10000 |
False |
2.65396 |
||
3 |
3 |
2018-12-31 10:11:21 |
19999 |
[15000:19999] |
analysis |
2018-09-01 16:19:07 |
15000 |
False |
0.0708428 |
||
4 |
4 |
2019-04-30 11:01:30 |
24999 |
[20000:24999] |
analysis |
2018-12-31 10:38:45 |
20000 |
False |
1.00542 |
||
5 |
5 |
2019-09-01 00:24:27 |
29999 |
[25000:29999] |
analysis |
2019-04-30 11:02:00 |
25000 |
True |
455.622 |
||
6 |
6 |
2019-12-31 09:09:12 |
34999 |
[30000:34999] |
analysis |
2019-09-01 00:28:54 |
30000 |
True |
428.633 |
||
7 |
7 |
2020-04-30 11:46:53 |
39999 |
[35000:39999] |
analysis |
2019-12-31 10:07:15 |
35000 |
True |
453.247 |
||
8 |
8 |
2020-09-01 02:46:02 |
44999 |
[40000:44999] |
analysis |
2020-04-30 12:04:32 |
40000 |
True |
438.26 |
||
9 |
9 |
2021-01-01 04:29:32 |
49999 |
[45000:49999] |
analysis |
2020-09-01 02:46:13 |
45000 |
True |
474.892 |
Results can also be exported to external storage using a Writer
. We currently support writing
results to disk using a RawFilesWriter
, serializing the
Result
into a Python pickle file and storing that to disk using the
PickleFileWriter
or storing calculation results in a database using the
DatabaseWriter
. This example will show how to use the
DatabaseWriter
.
We construct the DatabaseWriter
by providing a database connection string.
Upon calling the write()
method all results will be written into
the database, in this case a SQLite database.
>>> database_writer = nml.DatabaseWriter(connection_string='sqlite:///nml.db')
>>> database_writer.write(results)
A quick inspection shows the database was populated and contains the univariate drift calculation results.
>>> import sqlite3
>>> cursor = sqlite3.connect('nml.db').cursor()
>>> cursor.execute("""SELECT name FROM sqlite_master WHERE type='table'""")
>>> print(cursor.fetchall())
[('model',), ('run',), ('univariate_drift_metrics',), ('data_reconstruction_feature_drift_metrics',), ('realized_performance_metrics',), ('cbpe_performance_metrics',), ('dle_performance_metrics',)]
>>> cursor.execute("""SELECT * FROM univariate_drift_metrics LIMIT 3""")
>>> print(cursor.fetchall())
[(1, None, 1, '2017-08-31 04:20:00.000000', '2018-01-02 00:45:44.000000', '2017-11-01 02:32:52.000000', 'kolmogorov_smirnov', 0.0131, 0, 'distance_from_office'), (2, None, 1, '2018-01-02 01:13:11.000000', '2018-05-01 13:10:10.000000', '2018-03-02 19:11:40.500000', 'kolmogorov_smirnov', 0.011239999999999972, 0, 'distance_from_office'), (3, None, 1, '2018-05-01 14:25:25.000000', '2018-09-01 15:40:40.000000', '2018-07-02 03:03:02.500000', 'kolmogorov_smirnov', 0.01682, 0, 'distance_from_office')]