Working with results¶

What are NannyML Results?¶

In NannyML any calculation will return a Result object. Not returning a DataFrame directly allows NannyML to separate the concerns of storing calculation results and having users interact with them. It also means we can provide some additional useful methods on top of the results, for example filtering and plotting.

Just the code¶

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df = nml.load_synthetic_binary_classification_dataset()[0]
>>> analysis_df = nml.load_synthetic_binary_classification_dataset()[1]
>>> column_names = [col for col in reference_df.columns if col not in ['timestamp', 'identifier', 'period', 'work_home_actual']]
>>> print(column_names)

>>> calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)

>>> display(results.to_df())

>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))

>>> display(filtered_results.to_df())

>>> display(filtered_results.to_df(multilevel=False))

>>> database_writer = nml.DatabaseWriter(connection_string='sqlite:///nml.db')
>>> database_writer.write(results)

Walkthrough¶

In order to obtain results we first have to perform some calculation. We’ll start by loading the reference and analysis sample data for binary classification. We’ll perform univariate drift detection on a number of columns whose names are printed below. Knowing the column names will help you understand this walkthrough better.

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df = nml.load_synthetic_binary_classification_dataset()[0]
>>> analysis_df = nml.load_synthetic_binary_classification_dataset()[1]
>>> column_names = [col for col in reference_df.columns if col not in ['timestamp', 'identifier', 'period', 'work_home_actual']]
>>> print(column_names)
['distance_from_office', 'salary_range', 'gas_price_per_litre', 'public_transportation_cost', 'wfh_prev_workday', 'workday', 'tenure', 'y_pred_proba', 'y_pred']

We then set up the UnivariateDriftCalculator by specifying the names of the columns to evaluate and the continuous and categorical methods we would like to use. We then fit the calculator on our reference data. The fitted calculator is then used to evaluate drift for the analysis data, stored here as the variable results.

>>> calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)

This variable is an instance of the Result class. To turn this object into a DataFrame you can use the to_df() method. Let’s see what this DataFrame looks like.

>>> display(results.to_df())

We can immediately see that the a MultiLevel index is being used to store the data. There is a part containing chunk information, followed by the numerical results of the drift calculations. In the case of the UnivariateDriftCalculator there are two degrees of freedom. You can specify columns to include in the calculation, and each column might be evaluated by different methods.

This structure is visible in the column index. The top level represents the column names. The middle level represents the specific methods used to evaluate a column. The bottom level contains the information relevant to each method: a value, upper and lower thresholds for alerts and whether the evaluated method crossed the thresholds for that chunk, leading to an alert.

	(‘chunk’, ‘chunk’, ‘chunk_index’)	(‘chunk’, ‘chunk’, ‘end_date’)	(‘chunk’, ‘chunk’, ‘end_index’)	(‘chunk’, ‘chunk’, ‘key’)	(‘chunk’, ‘chunk’, ‘period’)	(‘chunk’, ‘chunk’, ‘start_date’)	(‘chunk’, ‘chunk’, ‘start_index’)	(‘distance_from_office’, ‘jensen_shannon’, ‘alert’)	(‘distance_from_office’, ‘jensen_shannon’, ‘upper_threshold’)	(‘distance_from_office’, ‘jensen_shannon’, ‘value’)	(‘distance_from_office’, ‘kolmogorov_smirnov’, ‘alert’)	(‘distance_from_office’, ‘kolmogorov_smirnov’, ‘value’)	(‘gas_price_per_litre’, ‘jensen_shannon’, ‘alert’)	(‘gas_price_per_litre’, ‘jensen_shannon’, ‘upper_threshold’)	(‘gas_price_per_litre’, ‘jensen_shannon’, ‘value’)	(‘gas_price_per_litre’, ‘kolmogorov_smirnov’, ‘alert’)	(‘gas_price_per_litre’, ‘kolmogorov_smirnov’, ‘value’)	(‘public_transportation_cost’, ‘jensen_shannon’, ‘alert’)	(‘public_transportation_cost’, ‘jensen_shannon’, ‘upper_threshold’)	(‘public_transportation_cost’, ‘jensen_shannon’, ‘value’)	(‘public_transportation_cost’, ‘kolmogorov_smirnov’, ‘alert’)	(‘public_transportation_cost’, ‘kolmogorov_smirnov’, ‘value’)	(‘salary_range’, ‘chi2’, ‘alert’)	(‘salary_range’, ‘chi2’, ‘value’)	(‘salary_range’, ‘jensen_shannon’, ‘alert’)	(‘salary_range’, ‘jensen_shannon’, ‘upper_threshold’)	(‘salary_range’, ‘jensen_shannon’, ‘value’)	(‘tenure’, ‘jensen_shannon’, ‘alert’)	(‘tenure’, ‘jensen_shannon’, ‘upper_threshold’)	(‘tenure’, ‘jensen_shannon’, ‘value’)	(‘tenure’, ‘kolmogorov_smirnov’, ‘alert’)	(‘tenure’, ‘kolmogorov_smirnov’, ‘value’)	(‘wfh_prev_workday’, ‘chi2’, ‘alert’)	(‘wfh_prev_workday’, ‘chi2’, ‘value’)	(‘wfh_prev_workday’, ‘jensen_shannon’, ‘alert’)	(‘wfh_prev_workday’, ‘jensen_shannon’, ‘upper_threshold’)	(‘wfh_prev_workday’, ‘jensen_shannon’, ‘value’)	(‘workday’, ‘chi2’, ‘alert’)	(‘workday’, ‘chi2’, ‘value’)	(‘workday’, ‘jensen_shannon’, ‘alert’)	(‘workday’, ‘jensen_shannon’, ‘upper_threshold’)	(‘workday’, ‘jensen_shannon’, ‘value’)	(‘y_pred’, ‘jensen_shannon’, ‘alert’)	(‘y_pred’, ‘jensen_shannon’, ‘upper_threshold’)	(‘y_pred’, ‘jensen_shannon’, ‘value’)	(‘y_pred’, ‘kolmogorov_smirnov’, ‘alert’)	(‘y_pred’, ‘kolmogorov_smirnov’, ‘value’)	(‘y_pred_proba’, ‘jensen_shannon’, ‘alert’)	(‘y_pred_proba’, ‘jensen_shannon’, ‘upper_threshold’)	(‘y_pred_proba’, ‘jensen_shannon’, ‘value’)	(‘y_pred_proba’, ‘kolmogorov_smirnov’, ‘alert’)	(‘y_pred_proba’, ‘kolmogorov_smirnov’, ‘value’)
0	0	2014-09-09 08:18:27	4999	[0:4999]	reference	2014-05-09 22:27:20	0	False	0.1	0.0294645	False	0.01034	False	0.1	0.0277569	False	0.01122	False	0.1	0.0267158	False	0.00998	False	2.89878	False	0.1	0.010811	False	0.1	0.0228713	False	0.00978	False	0.414606	False	0.1	0.00415143	False	4.00124	False	0.1	0.0125401	False	0.1	0.00684559	False	0.00806	False	0.1	0.0133555	False	0.00922
1	1	2015-01-09 00:02:51	9999	[5000:9999]	reference	2014-09-09 09:13:35	5000	False	0.1	0.0236588	False	0.0075	False	0.1	0.0292061	False	0.01222	False	0.1	0.0193572	False	0.01046	False	3.14439	False	0.1	0.01124	False	0.1	0.0335415	False	0.01192	False	0.0334857	False	0.1	0.00124668	False	1.28891	False	0.1	0.00713799	False	0.1	0.00463736	False	0.00546	False	0.1	0.0211292	False	0.01042
2	2	2015-05-09 15:54:26	14999	[10000:14999]	reference	2015-01-09 00:04:43	10000	False	0.1	0.0264403	False	0.0082	False	0.1	0.02533	False	0.00886	False	0.1	0.0299168	False	0.01706	False	2.45188	False	0.1	0.00980904	False	0.1	0.029597	False	0.01268	False	0.168656	False	0.1	0.00267997	False	5.11796	False	0.1	0.0142803	False	0.1	0.00419613	False	0.00494	False	0.1	0.02237	False	0.0091
3	3	2015-09-07 07:14:37	19999	[15000:19999]	reference	2015-05-09 16:02:08	15000	False	0.1	0.0217733	False	0.0086	False	0.1	0.0264593	False	0.00956	False	0.1	0.0333228	False	0.0122	False	4.06262	False	0.1	0.0127697	False	0.1	0.0286826	False	0.01074	False	0.0562698	False	0.1	0.00158831	False	1.84901	False	0.1	0.0085587	False	0.1	0.000220834	False	0.00026	False	0.1	0.0178289	False	0.00872
4	4	2016-01-08 16:02:05	24999	[20000:24999]	reference	2015-09-07 07:27:47	20000	False	0.1	0.0239721	False	0.0091	False	0.1	0.0295752	False	0.00758	False	0.1	0.0222416	False	0.00662	False	2.41399	False	0.1	0.00968817	False	0.1	0.0209876	False	0.00924	False	0.242059	False	0.1	0.00319188	False	0.470551	False	0.1	0.00433131	False	0.1	0.00368645	False	0.00434	False	0.1	0.0216622	False	0.00852
5	5	2016-05-09 11:09:39	29999	[25000:29999]	reference	2016-01-08 17:22:00	25000	False	0.1	0.0275768	False	0.01458	False	0.1	0.028514	False	0.01032	False	0.1	0.0303899	False	0.01186	False	3.79606	False	0.1	0.0122934	False	0.1	0.0229349	False	0.00794	False	3.61457	False	0.1	0.0120561	False	0.137868	False	0.1	0.00233712	False	0.1	0.00480722	False	0.00566	False	0.1	0.017256	False	0.01028
6	6	2016-09-04 03:30:35	34999	[30000:34999]	reference	2016-05-09 11:19:36	30000	False	0.1	0.0268749	False	0.0129	False	0.1	0.0228658	False	0.01094	False	0.1	0.0279513	False	0.00636	False	3.22884	False	0.1	0.0112358	False	0.1	0.0226753	False	0.0112	False	0.0757052	False	0.1	0.00182666	False	4.19999	False	0.1	0.0129223	False	0.1	0.00385634	False	0.00454	False	0.1	0.0253217	False	0.01248
7	7	2017-01-03 18:48:21	39999	[35000:39999]	reference	2016-09-04 04:09:35	35000	False	0.1	0.0312645	False	0.0138	False	0.1	0.0304354	False	0.01736	False	0.1	0.0215885	False	0.00832	False	1.3933	False	0.1	0.00739444	False	0.1	0.025517	False	0.0074	False	0.414606	False	0.1	0.00415143	False	0.716349	False	0.1	0.00533433	False	0.1	0.00453593	False	0.00534	False	0.1	0.0275068	False	0.0089
8	8	2017-05-03 02:34:24	44999	[40000:44999]	reference	2017-01-03 19:00:51	40000	False	0.1	0.0273523	False	0.01586	False	0.1	0.0243664	False	0.00842	False	0.1	0.0293265	False	0.01176	False	0.304785	False	0.1	0.00347061	False	0.1	0.0244145	False	0.01464	False	0.0126564	False	0.1	0.000802461	False	0.596009	False	0.1	0.00485967	False	0.1	0.000220834	False	0.00026	False	0.1	0.0243225	False	0.00768
9	9	2017-08-31 03:10:29	49999	[45000:49999]	reference	2017-05-03 02:49:38	45000	False	0.1	0.0296272	False	0.00924	False	0.1	0.0282426	False	0.00786	False	0.1	0.0235042	False	0.0082	False	2.98758	False	0.1	0.0108121	False	0.1	0.032928	False	0.01306	False	2.20383	False	0.1	0.00945409	False	5.08023	False	0.1	0.0142629	False	0.1	0.00045866	False	0.00054	False	0.1	0.0303947	False	0.00498
10	0	2018-01-02 00:45:44	4999	[0:4999]	analysis	2017-08-31 04:20:00	0	False	0.1	0.0261007	False	0.0131	False	0.1	0.0314247	False	0.01576	False	0.1	0.0281611	False	0.00956	False	1.03368	False	0.1	0.00639674	False	0.1	0.0309355	True	0.02124	False	1.70319	False	0.1	0.0083078	False	1.6025	False	0.1	0.00796199	False	0.1	0.0172838	True	0.02034	False	0.1	0.0289329	True	0.0253
11	1	2018-05-01 13:10:10	9999	[5000:9999]	analysis	2018-01-02 01:13:11	5000	False	0.1	0.0202971	False	0.01124	False	0.1	0.0271235	False	0.01272	False	0.1	0.0269486	False	0.01488	False	5.76241	False	0.1	0.0153757	False	0.1	0.0383534	False	0.01006	False	0.242059	False	0.1	0.00319188	False	5.71897	False	0.1	0.0150859	False	0.1	0.00854425	False	0.01006	False	0.1	0.0221389	False	0.0123
12	2	2018-09-01 15:40:40	14999	[10000:14999]	analysis	2018-05-01 14:25:25	10000	False	0.1	0.0210957	False	0.01682	False	0.1	0.0319369	False	0.01746	False	0.1	0.0381738	False	0.0129	False	2.65396	False	0.1	0.0102823	False	0.1	0.034176	True	0.0237	False	3.17862	False	0.1	0.0113376	False	2.08186	False	0.1	0.00907089	False	0.1	0.00837438	False	0.00986	False	0.1	0.0310428	False	0.01642
13	3	2018-12-31 10:11:21	19999	[15000:19999]	analysis	2018-09-01 16:19:07	15000	False	0.1	0.0362101	False	0.01436	False	0.1	0.0289334	False	0.01282	False	0.1	0.0344702	False	0.01598	False	0.0708428	False	0.1	0.00167698	False	0.1	0.0332968	False	0.01446	False	0.0242988	False	0.1	0.00107588	False	0.489515	False	0.1	0.00440901	False	0.1	0.00803465	False	0.00946	False	0.1	0.0228333	False	0.01058
14	4	2019-04-30 11:01:30	24999	[20000:24999]	analysis	2018-12-31 10:38:45	20000	False	0.1	0.0287082	False	0.01116	False	0.1	0.0305991	False	0.01922	False	0.1	0.0322846	False	0.01136	False	1.00542	False	0.1	0.00633255	False	0.1	0.0263609	False	0.00912	False	0.487381	False	0.1	0.00449331	False	3.15856	False	0.1	0.0112076	False	0.1	0.0016478	False	0.00194	False	0.1	0.0237474	False	0.01408
15	5	2019-09-01 00:24:27	29999	[25000:29999]	analysis	2019-04-30 11:02:00	25000	True	0.1	0.464732	True	0.43548	False	0.1	0.0301321	False	0.00824	True	0.1	0.262577	True	0.18346	True	455.622	True	0.1	0.183143	False	0.1	0.0288384	False	0.00702	True	1179.9	True	0.1	0.231198	False	4.66135	False	0.1	0.0135741	False	0.1	0.0223873	True	0.02634	True	0.1	0.225486	True	0.1307
16	6	2019-12-31 09:09:12	34999	[30000:34999]	analysis	2019-09-01 00:28:54	30000	True	0.1	0.460044	True	0.43032	False	0.1	0.0412587	False	0.01068	True	0.1	0.264073	True	0.18334	True	428.633	True	0.1	0.174226	False	0.1	0.0265918	False	0.00826	True	1162.99	True	0.1	0.229333	False	2.52181	False	0.1	0.0100123	False	0.1	0.0213664	True	0.02514	True	0.1	0.208815	True	0.1273
17	7	2020-04-30 11:46:53	39999	[35000:39999]	analysis	2019-12-31 10:07:15	35000	True	0.1	0.466746	True	0.43786	False	0.1	0.0283644	False	0.01002	True	0.1	0.267208	True	0.20062	True	453.247	True	0.1	0.182913	False	0.1	0.0275949	False	0.01398	True	1170.49	True	0.1	0.230161	False	3.41534	False	0.1	0.0116206	False	0.1	0.0198352	True	0.02334	True	0.1	0.224282	True	0.1311
18	8	2020-09-01 02:46:02	44999	[40000:44999]	analysis	2020-04-30 12:04:32	40000	True	0.1	0.4663	True	0.43608	False	0.1	0.0244792	False	0.0107	True	0.1	0.265218	True	0.1874	True	438.26	True	0.1	0.177985	False	0.1	0.0232423	False	0.00896	True	1023.35	True	0.1	0.213579	False	6.88171	False	0.1	0.0164851	False	0.1	0.0123531	False	0.01454	True	0.1	0.205352	True	0.1197
19	9	2021-01-01 04:29:32	49999	[45000:49999]	analysis	2020-09-01 02:46:13	45000	True	0.1	0.467798	True	0.43852	False	0.1	0.0283063	False	0.007	True	0.1	0.270583	True	0.20018	True	474.892	True	0.1	0.19035	False	0.1	0.0279191	False	0.00632	True	1227.54	True	0.1	0.236408	False	1.63759	False	0.1	0.00809379	False	0.1	0.0334576	True	0.03934	True	0.1	0.215539	True	0.13752

Working with the Multilevel indexes can be very powerful, yet also quite challenging. The following snippet illustrates how to retrieve all calculated method values from our results.

>>> print(results.to_df().loc[:, (slice(None), slice(None), 'value')].columns)
MultiIndex([(      'distance_from_office',     'jensen_shannon', 'value'),
            (      'distance_from_office', 'kolmogorov_smirnov', 'value'),
            (       'gas_price_per_litre',     'jensen_shannon', 'value'),
            (       'gas_price_per_litre', 'kolmogorov_smirnov', 'value'),
            ('public_transportation_cost',     'jensen_shannon', 'value'),
            ('public_transportation_cost', 'kolmogorov_smirnov', 'value'),
            (              'salary_range',               'chi2', 'value'),
            (              'salary_range',     'jensen_shannon', 'value'),
            (                    'tenure',     'jensen_shannon', 'value'),
            (                    'tenure', 'kolmogorov_smirnov', 'value'),
            (          'wfh_prev_workday',               'chi2', 'value'),
            (          'wfh_prev_workday',     'jensen_shannon', 'value'),
            (                   'workday',               'chi2', 'value'),
            (                   'workday',     'jensen_shannon', 'value'),
            (                    'y_pred',     'jensen_shannon', 'value'),
            (                    'y_pred', 'kolmogorov_smirnov', 'value'),
            (              'y_pred_proba',     'jensen_shannon', 'value'),
            (              'y_pred_proba', 'kolmogorov_smirnov', 'value')],
           )

To improve this experience we’ve introduced a helper method that allows you to filter the result data so you can easily retrieve the information you want. Since the UnivariateDriftCalculator has two degrees of freedom we’ve included both in the filter() method. Additionally you can filter on the data period, i.e. reference or analysis.

The filter() method will return a new Result instance, allowing you to chain methods like, filter(), to_df() and plot().

>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))
<class 'nannyml.drift.univariate.result.Result'>

When looking at the results after filtering, you can see only the chi2 data for the salary_range column during the analysis period is included.

>>> display(filtered_results.to_df())

	(‘chunk’, ‘chunk’, ‘chunk_index’)	(‘chunk’, ‘chunk’, ‘end_date’)	(‘chunk’, ‘chunk’, ‘end_index’)	(‘chunk’, ‘chunk’, ‘key’)	(‘chunk’, ‘chunk’, ‘period’)	(‘chunk’, ‘chunk’, ‘start_date’)	(‘chunk’, ‘chunk’, ‘start_index’)	(‘salary_range’, ‘chi2’, ‘alert’)	(‘salary_range’, ‘chi2’, ‘value’)
0	0	2018-01-02 00:45:44	4999	[0:4999]	analysis	2017-08-31 04:20:00	0	False	1.03368
1	1	2018-05-01 13:10:10	9999	[5000:9999]	analysis	2018-01-02 01:13:11	5000	False	5.76241
2	2	2018-09-01 15:40:40	14999	[10000:14999]	analysis	2018-05-01 14:25:25	10000	False	2.65396
3	3	2018-12-31 10:11:21	19999	[15000:19999]	analysis	2018-09-01 16:19:07	15000	False	0.0708428
4	4	2019-04-30 11:01:30	24999	[20000:24999]	analysis	2018-12-31 10:38:45	20000	False	1.00542
5	5	2019-09-01 00:24:27	29999	[25000:29999]	analysis	2019-04-30 11:02:00	25000	True	455.622
6	6	2019-12-31 09:09:12	34999	[30000:34999]	analysis	2019-09-01 00:28:54	30000	True	428.633
7	7	2020-04-30 11:46:53	39999	[35000:39999]	analysis	2019-12-31 10:07:15	35000	True	453.247
8	8	2020-09-01 02:46:02	44999	[40000:44999]	analysis	2020-04-30 12:04:32	40000	True	438.26
9	9	2021-01-01 04:29:32	49999	[45000:49999]	analysis	2020-09-01 02:46:13	45000	True	474.892

To avoid the use of a Multilevel index, we’ve provided as switch in the to_df() method.

>>> display(filtered_results.to_df(multilevel=False))

	chunk_index	chunk_end_date	chunk_end_index	chunk_key	chunk_period	chunk_start_date	chunk_start_index	salary_range_chi2_alert	salary_range_chi2_value
0	0	2018-01-02 00:45:44	4999	[0:4999]	analysis	2017-08-31 04:20:00	0	False	1.03368
1	1	2018-05-01 13:10:10	9999	[5000:9999]	analysis	2018-01-02 01:13:11	5000	False	5.76241
2	2	2018-09-01 15:40:40	14999	[10000:14999]	analysis	2018-05-01 14:25:25	10000	False	2.65396
3	3	2018-12-31 10:11:21	19999	[15000:19999]	analysis	2018-09-01 16:19:07	15000	False	0.0708428
4	4	2019-04-30 11:01:30	24999	[20000:24999]	analysis	2018-12-31 10:38:45	20000	False	1.00542
5	5	2019-09-01 00:24:27	29999	[25000:29999]	analysis	2019-04-30 11:02:00	25000	True	455.622
6	6	2019-12-31 09:09:12	34999	[30000:34999]	analysis	2019-09-01 00:28:54	30000	True	428.633
7	7	2020-04-30 11:46:53	39999	[35000:39999]	analysis	2019-12-31 10:07:15	35000	True	453.247
8	8	2020-09-01 02:46:02	44999	[40000:44999]	analysis	2020-04-30 12:04:32	40000	True	438.26
9	9	2021-01-01 04:29:32	49999	[45000:49999]	analysis	2020-09-01 02:46:13	45000	True	474.892

Results can also be exported to external storage using a Writer. We currently support writing results to disk using a RawFilesWriter, serializing the Result into a Python pickle file and storing that to disk using the PickleFileWriter or storing calculation results in a database using the DatabaseWriter. This example will show how to use the DatabaseWriter.

We construct the DatabaseWriter by providing a database connection string. Upon calling the write() method all results will be written into the database, in this case a SQLite database.

>>> database_writer = nml.DatabaseWriter(connection_string='sqlite:///nml.db')
>>> database_writer.write(results)

A quick inspection shows the database was populated and contains the univariate drift calculation results.

>>> import sqlite3
>>> cursor = sqlite3.connect('nml.db').cursor()
>>> cursor.execute("""SELECT name FROM sqlite_master WHERE type='table'""")
>>> print(cursor.fetchall())
[('model',), ('run',), ('univariate_drift_metrics',), ('data_reconstruction_feature_drift_metrics',), ('realized_performance_metrics',), ('cbpe_performance_metrics',), ('dle_performance_metrics',)]

>>> cursor.execute("""SELECT * FROM univariate_drift_metrics LIMIT 3""")
>>> print(cursor.fetchall())
[(1, None, 1, '2017-08-31 04:20:00.000000', '2018-01-02 00:45:44.000000', '2017-11-01 02:32:52.000000', 'kolmogorov_smirnov', 0.0131, 0, 'distance_from_office'), (2, None, 1, '2018-01-02 01:13:11.000000', '2018-05-01 13:10:10.000000', '2018-03-02 19:11:40.500000', 'kolmogorov_smirnov', 0.011239999999999972, 0, 'distance_from_office'), (3, None, 1, '2018-05-01 14:25:25.000000', '2018-09-01 15:40:40.000000', '2018-07-02 03:03:02.500000', 'kolmogorov_smirnov', 0.01682, 0, 'distance_from_office')]