Working with results

What are NannyML Results?

In NannyML, any calculation will return a Result object. Not returning a DataFrame directly allows NannyML to separate the concerns of storing calculation results and having users interact with them. It also means we can provide additional useful methods, such as filtering and plotting, on top of the results.

Just the code

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()

>>> column_names = [
...     col for col in reference_df.columns
...     if col not in ['timestamp', 'repaid']
>>> ]
>>> print(column_names)

>>> calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     treat_as_categorical=['y_pred'],
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)

>>> display(results.to_df())

>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))

>>> display(filtered_results.to_df())

>>> display(filtered_results.to_df(multilevel=False))

>>> results.plot().show()

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc'],
...     chunk_size=5000,
...     problem_type='classification_binary'
>>> ).fit(reference_df)
>>> est_perf_results = estimator.estimate(analysis_df)

>>> est_perf_results.compare(results.filter(methods=['chi2'], column_names=['salary_range'])).plot().show()

Walkthrough

The data structure

In order to obtain results, we first have to perform some calculation. We will start by loading the reference and analysis sample data for binary classification. Then, we will perform univariate drift detection on a number of columns whose names are printed below. Knowing the column names will help you understand this walkthrough better.

>>> import nannyml as nml
>>> from IPython.display import display

>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()

>>> column_names = [
...     col for col in reference_df.columns
...     if col not in ['timestamp', 'repaid']
>>> ]
>>> print(column_names)
['id', 'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred']

We then set up the UnivariateDriftCalculator by specifying the names of the columns to evaluate and the continuous and categorical methods we would like to use.

We then fit the calculator on our reference data. The fitted calculator is then used to evaluate drift for the analysis data, stored here as the variable results.

>>> calc = nml.UnivariateDriftCalculator(
...     column_names=column_names,
...     treat_as_categorical=['y_pred'],
...     timestamp_column_name='timestamp',
...     continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
...     categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)

This variable is an instance of the Result class. To turn this object into a DataFrame you can use the to_df() method. Let’s see what this DataFrame looks like.

>>> display(results.to_df())

We can immediately see that a MultiLevel index is being used to store the data. There is a part containing chunk information, followed by the numerical results of the drift calculations.

In the case of the UnivariateDriftCalculator, there are two degrees of freedom. You can specify columns to include in the calculation, and each column might be evaluated by different methods.

This structure is visible in the column index. The top level represents the column names. The middle level represents the specific methods used to evaluate a column. Finally, the bottom level contains the information relevant to each method: a value, upper and lower thresholds for alerts, and whether the evaluated method crossed the thresholds for that chunk, leading to an alert.

	chunk chunk key	chunk_index	start_index	end_index	start_date	end_date	period	id kolmogorov_smirnov value	upper_threshold	alert	jensen_shannon value	upper_threshold	alert	car_value kolmogorov_smirnov value	upper_threshold	alert	jensen_shannon value	upper_threshold	alert	salary_range jensen_shannon value	upper_threshold	alert	chi2 value	alert	debt_to_income_ratio kolmogorov_smirnov value	upper_threshold	alert	jensen_shannon value	upper_threshold	alert	loan_length kolmogorov_smirnov value	upper_threshold	alert	jensen_shannon value	upper_threshold	alert	repaid_loan_on_prev_car jensen_shannon value	upper_threshold	alert	chi2 value	alert	size_of_downpayment jensen_shannon value	upper_threshold	alert	chi2 value	alert	driver_tenure kolmogorov_smirnov value	upper_threshold	alert	jensen_shannon value	upper_threshold	alert	y_pred_proba kolmogorov_smirnov value	upper_threshold	alert	jensen_shannon value	upper_threshold	alert	y_pred jensen_shannon value	upper_threshold	alert	chi2 value	alert
0	[0:4999]	0	0	4999	2018-01-01 00:00:00	2018-01-31 06:27:16.848000	reference	0.9	1	False	0.854338	0.888887	False	0.0103	0.0194257	False	0.0296736	0.0352619	False	0.010811	0.0177088	False	2.89878	False	0.01112	0.0185838	False	0.0333679	0.0393276	False	0.00818	0.0166909	False	0.0242899	0.0366378	False	0.00415143	0.0147343	False	0.414606	False	0.0125401	0.0214812	False	4.00124	False	0.00974	0.0173417	False	0.0228713	0.039192	False	0.00922	0.0145647	False	0.0133555	0.0365875	False	0.00549026	0.00909686	False	0.733844	False
1	[5000:9999]	1	5000	9999	2018-01-31 06:36:00	2018-03-02 13:03:16.848000	reference	0.8	1	False	0.810064	0.888887	False	0.00732	0.0194257	False	0.0237846	0.0352619	False	0.01124	0.0177088	False	3.14439	False	0.01218	0.0185838	False	0.028066	0.0393276	False	0.00868	0.0166909	False	0.0177897	0.0366378	False	0.00124668	0.0147343	False	0.0334857	False	0.00713799	0.0214812	False	1.28891	False	0.01186	0.0173417	False	0.0335415	0.039192	False	0.01042	0.0145647	False	0.0211292	0.0365875	False	0.00634039	0.00909686	False	0.983187	False
2	[10000:14999]	2	10000	14999	2018-03-02 13:12:00	2018-04-01 19:39:16.848000	reference	0.7	1	False	0.822917	0.888887	False	0.00802	0.0194257	False	0.0264685	0.0352619	False	0.00980904	0.0177088	False	2.45188	False	0.00878	0.0185838	False	0.0225969	0.0393276	False	0.0139	0.0166909	False	0.0240002	0.0366378	False	0.00267997	0.0147343	False	0.168656	False	0.0142803	0.0214812	False	5.11796	False	0.01262	0.0173417	False	0.029597	0.039192	False	0.0091	0.0145647	False	0.02237	0.0365875	False	0.00487654	0.00909686	False	0.576787	False
3	[15000:19999]	3	15000	19999	2018-04-01 19:48:00	2018-05-02 02:15:16.848000	reference	0.6	1	False	0.853731	0.888887	False	0.0085	0.0194257	False	0.0217468	0.0352619	False	0.0127697	0.0177088	False	4.06262	False	0.0095	0.0185838	False	0.0315869	0.0393276	False	0.0083	0.0166909	False	0.0292131	0.0366378	False	0.00158831	0.0147343	False	0.0562698	False	0.0085587	0.0214812	False	1.84901	False	0.01056	0.0173417	False	0.0286826	0.039192	False	0.00872	0.0145647	False	0.0178289	0.0365875	False	0.0017505	0.00909686	False	0.0691997	False
4	[20000:24999]	4	20000	24999	2018-05-02 02:24:00	2018-06-01 08:51:16.848000	reference	0.5	1	False	0.813675	0.888887	False	0.00892	0.0194257	False	0.024108	0.0352619	False	0.00968817	0.0177088	False	2.41399	False	0.00754	0.0185838	False	0.0310501	0.0393276	False	0.00544	0.0166909	False	0.0165946	0.0366378	False	0.00319188	0.0147343	False	0.242059	False	0.00433131	0.0214812	False	0.470551	False	0.00922	0.0173417	False	0.0209876	0.039192	False	0.00852	0.0145647	False	0.0216622	0.0365875	False	0.00368727	0.00909686	False	0.325601	False
5	[25000:29999]	5	25000	29999	2018-06-01 09:00:00	2018-07-01 15:27:16.848000	reference	0.5	1	False	0.813675	0.888887	False	0.01456	0.0194257	False	0.0275587	0.0352619	False	0.0122934	0.0177088	False	3.79606	False	0.0103	0.0185838	False	0.0316479	0.0393276	False	0.01112	0.0166909	False	0.0271572	0.0366378	False	0.0120561	0.0147343	False	3.61457	False	0.00233712	0.0214812	False	0.137868	False	0.00794	0.0173417	False	0.0229349	0.039192	False	0.01028	0.0145647	False	0.017256	0.0365875	False	0.00379022	0.00909686	False	0.34437	False
6	[30000:34999]	6	30000	34999	2018-07-01 15:36:00	2018-07-31 22:03:16.848000	reference	0.6	1	False	0.853731	0.888887	False	0.01284	0.0194257	False	0.0267818	0.0352619	False	0.0112358	0.0177088	False	3.22884	False	0.01094	0.0185838	False	0.0258014	0.0393276	False	0.00464	0.0166909	False	0.0259338	0.0366378	False	0.00182666	0.0147343	False	0.0757052	False	0.0129223	0.0214812	False	4.19999	False	0.0112	0.0173417	False	0.0226753	0.039192	False	0.01248	0.0145647	False	0.0253217	0.0365875	False	0.000288895	0.00909686	False	0.000962674	False
7	[35000:39999]	7	35000	39999	2018-07-31 22:12:00	2018-08-31 04:39:16.848000	reference	0.7	1	False	0.822917	0.888887	False	0.01348	0.0194257	False	0.0312131	0.0352619	False	0.00739444	0.0177088	False	1.3933	False	0.01736	0.0185838	False	0.0325098	0.0393276	False	0.00548	0.0166909	False	0.0185372	0.0366378	False	0.00415143	0.0147343	False	0.414606	False	0.00533433	0.0214812	False	0.716349	False	0.0074	0.0173417	False	0.025517	0.039192	False	0.0089	0.0145647	False	0.0275068	0.0365875	False	0.00470665	0.00909686	False	0.536536	False
8	[40000:44999]	8	40000	44999	2018-08-31 04:48:00	2018-09-30 11:15:16.848000	reference	0.8	1	False	0.810064	0.888887	False	0.01572	0.0194257	False	0.0273013	0.0352619	False	0.00347061	0.0177088	False	0.304785	False	0.00842	0.0185838	False	0.0248975	0.0393276	False	0.01062	0.0166909	False	0.0291086	0.0366378	False	0.000802461	0.0147343	False	0.0126564	False	0.00485967	0.0214812	False	0.596009	False	0.01458	0.0173417	False	0.0244145	0.039192	False	0.00768	0.0145647	False	0.0243225	0.0365875	False	0.00113856	0.00909686	False	0.0275315	False
9	[45000:49999]	9	45000	49999	2018-09-30 11:24:00	2018-10-30 17:51:16.848000	reference	0.9	1	False	0.854338	0.888887	False	0.00924	0.0194257	False	0.0296982	0.0352619	False	0.0108121	0.0177088	False	2.98758	False	0.00786	0.0185838	False	0.0284742	0.0393276	False	0.00608	0.0166909	False	0.0207199	0.0366378	False	0.00945409	0.0147343	False	2.20383	False	0.0142629	0.0214812	False	5.08023	False	0.01304	0.0173417	False	0.032928	0.039192	False	0.00498	0.0145647	False	0.0303947	0.0365875	False	0.00266783	0.00909686	False	0.167069	False
10	[0:4999]	0	0	4999	2018-10-30 18:00:00	2018-11-30 00:27:16.848000	analysis	1	1	False	1	0.888887	True	0.01308	0.0194257	False	0.0261935	0.0352619	False	0.00639674	0.0177088	False	1.03368	False	0.01576	0.0185838	False	0.0316611	0.0393276	False	0.00884	0.0166909	False	0.0244278	0.0366378	False	0.0083078	0.0147343	False	1.70319	False	0.00796199	0.0214812	False	1.6025	False	0.02114	0.0173417	True	0.0309355	0.039192	False	0.0253	0.0145647	True	0.0289329	0.0365875	False	0.0152383	0.00909686	True	5.78426	True
11	[5000:9999]	1	5000	9999	2018-11-30 00:36:00	2018-12-30 07:03:16.848000	analysis	1	1	False	1	0.888887	True	0.01106	0.0194257	False	0.0201778	0.0352619	False	0.0153757	0.0177088	False	5.76241	False	0.01268	0.0185838	False	0.0300113	0.0393276	False	0.01418	0.0166909	False	0.0258391	0.0366378	False	0.00319188	0.0147343	False	0.242059	False	0.0150859	0.0214812	False	5.71897	False	0.00994	0.0173417	False	0.0383534	0.039192	False	0.0123	0.0145647	False	0.0221389	0.0365875	False	0.00889123	0.00909686	False	1.94965	False
12	[10000:14999]	2	10000	14999	2018-12-30 07:12:00	2019-01-29 13:39:16.848000	analysis	1	1	False	1	0.888887	True	0.01662	0.0194257	False	0.0210184	0.0352619	False	0.0102823	0.0177088	False	2.65396	False	0.01734	0.0185838	False	0.0311286	0.0393276	False	0.0124	0.0166909	False	0.0293725	0.0366378	False	0.0113376	0.0147343	False	3.17862	False	0.00907089	0.0214812	False	2.08186	False	0.02362	0.0173417	True	0.034176	0.039192	False	0.01642	0.0145647	True	0.0310428	0.0365875	False	0.00804087	0.00909686	False	1.59109	False
13	[15000:19999]	3	15000	19999	2019-01-29 13:48:00	2019-02-28 20:15:16.848000	analysis	1	1	False	1	0.888887	True	0.01434	0.0194257	False	0.0363554	0.0352619	True	0.00167698	0.0177088	False	0.0708428	False	0.0128	0.0185838	False	0.0294644	0.0393276	False	0.01298	0.0166909	False	0.0290784	0.0366378	False	0.00107588	0.0147343	False	0.0242988	False	0.00440901	0.0214812	False	0.489515	False	0.0143	0.0173417	False	0.0332968	0.039192	False	0.01058	0.0145647	False	0.0228333	0.0365875	False	0.00566028	0.00909686	False	0.7808	False
14	[20000:24999]	4	20000	24999	2019-02-28 20:24:00	2019-03-31 02:51:16.848000	analysis	1	1	False	1	0.888887	True	0.01116	0.0194257	False	0.0287119	0.0352619	False	0.00633255	0.0177088	False	1.00542	False	0.01918	0.0185838	True	0.0308095	0.0393276	False	0.01022	0.0166909	False	0.0287925	0.0366378	False	0.00449331	0.0147343	False	0.487381	False	0.0112076	0.0214812	False	3.15856	False	0.00906	0.0173417	False	0.0263609	0.039192	False	0.01408	0.0145647	False	0.0237474	0.0365875	False	0.00317755	0.00909686	False	0.239784	False
15	[25000:29999]	5	25000	29999	2019-03-31 03:00:00	2019-04-30 09:27:16.848000	analysis	1	1	False	1	0.888887	True	0.4353	0.0194257	True	0.464759	0.0352619	True	0.183143	0.0177088	True	455.622	True	0.00824	0.0185838	False	0.0286811	0.0393276	False	0.17992	0.0166909	True	0.233935	0.0366378	True	0.231198	0.0147343	True	1179.9	True	0.0135741	0.0214812	False	4.66135	False	0.00698	0.0173417	False	0.0288384	0.039192	False	0.1307	0.0145647	True	0.225486	0.0365875	True	0.00419696	0.00909686	False	0.424518	False
16	[30000:34999]	6	30000	34999	2019-04-30 09:36:00	2019-05-30 16:03:16.848000	analysis	1	1	False	1	0.888887	True	0.43028	0.0194257	True	0.460057	0.0352619	True	0.174226	0.0177088	True	428.633	True	0.01058	0.0185838	False	0.0436276	0.0393276	True	0.18032	0.0166909	True	0.231747	0.0366378	True	0.229333	0.0147343	True	1162.99	True	0.0100123	0.0214812	False	2.52181	False	0.00826	0.0173417	False	0.0265918	0.039192	False	0.1273	0.0145647	True	0.208815	0.0365875	True	0.00198817	0.00909686	False	0.0904949	False
17	[35000:39999]	7	35000	39999	2019-05-30 16:12:00	2019-06-29 22:39:16.848000	analysis	1	1	False	1	0.888887	True	0.43772	0.0194257	True	0.466777	0.0352619	True	0.182913	0.0177088	True	453.247	True	0.01002	0.0185838	False	0.0292533	0.0393276	False	0.19572	0.0166909	True	0.234016	0.0366378	True	0.230161	0.0147343	True	1170.49	True	0.0116206	0.0214812	False	3.41534	False	0.01382	0.0173417	False	0.0275949	0.039192	False	0.1311	0.0145647	True	0.224282	0.0365875	True	0.002328	0.00909686	False	0.12587	False
18	[40000:44999]	8	40000	44999	2019-06-29 22:48:00	2019-07-30 05:15:16.848000	analysis	1	1	False	1	0.888887	True	0.43602	0.0194257	True	0.466199	0.0352619	True	0.177985	0.0177088	True	438.26	True	0.01068	0.0185838	False	0.0306276	0.0393276	False	0.18212	0.0166909	True	0.231484	0.0366378	True	0.213579	0.0147343	True	1023.35	True	0.0164851	0.0214812	False	6.88171	False	0.0088	0.0173417	False	0.0232423	0.039192	False	0.1197	0.0145647	True	0.205352	0.0365875	True	0.00362023	0.00909686	False	0.313431	False
19	[45000:49999]	9	45000	49999	2019-07-30 05:24:00	2019-08-29 11:51:16.848000	analysis	1	1	False	1	0.888887	True	0.43838	0.0194257	True	0.467827	0.0352619	True	0.19035	0.0177088	True	474.892	True	0.0068	0.0185838	False	0.0283303	0.0393276	False	0.19872	0.0166909	True	0.24262	0.0366378	True	0.236408	0.0147343	True	1227.54	True	0.00809379	0.0214812	False	1.63759	False	0.0062	0.0173417	False	0.0279191	0.039192	False	0.13752	0.0145647	True	0.215539	0.0365875	True	0.0154082	0.00909686	True	5.91474	True

Filtering

Working with the Multilevel indexes can be very powerful yet also quite challenging. The following snippet illustrates retrieving all calculated method values from our results.

>>> print(results.to_df().loc[:, (slice(None), slice(None), 'value')].columns)
MultiIndex([(                     'id', 'kolmogorov_smirnov', 'value'),
            (                     'id',     'jensen_shannon', 'value'),
            (              'car_value', 'kolmogorov_smirnov', 'value'),
            (              'car_value',     'jensen_shannon', 'value'),
            (           'salary_range',     'jensen_shannon', 'value'),
            (           'salary_range',               'chi2', 'value'),
            (   'debt_to_income_ratio', 'kolmogorov_smirnov', 'value'),
            (   'debt_to_income_ratio',     'jensen_shannon', 'value'),
            (            'loan_length', 'kolmogorov_smirnov', 'value'),
            (            'loan_length',     'jensen_shannon', 'value'),
            ('repaid_loan_on_prev_car',     'jensen_shannon', 'value'),
            ('repaid_loan_on_prev_car',               'chi2', 'value'),
            (    'size_of_downpayment',     'jensen_shannon', 'value'),
            (    'size_of_downpayment',               'chi2', 'value'),
            (          'driver_tenure', 'kolmogorov_smirnov', 'value'),
            (          'driver_tenure',     'jensen_shannon', 'value'),
            (           'y_pred_proba', 'kolmogorov_smirnov', 'value'),
            (           'y_pred_proba',     'jensen_shannon', 'value'),
            (                 'y_pred',     'jensen_shannon', 'value'),
            (                 'y_pred',               'chi2', 'value')],
           )

To improve this experience, we have introduced a helper method that allows you to filter the result data to easily retrieve the information you want. Since the UnivariateDriftCalculator has two degrees of freedom, we have included both in the filter() method. Additionally, you can filter on the data period, i.e., reference or analysis.

The filter() method will return a new Result instance, allowing you to chain methods like, filter(), to_df(), and plot().

>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))
<class 'nannyml.drift.univariate.result.Result'>

When looking at the results after filtering, you can see only the chi2 data for the salary_range column during the analysis period is included.

>>> display(filtered_results.to_df())

	chunk chunk key	chunk_index	start_index	end_index	start_date	end_date	period	salary_range chi2 value	alert
0	[0:4999]	0	0	4999	2018-10-30 18:00:00	2018-11-30 00:27:16.848000	analysis	1.03368	False
1	[5000:9999]	1	5000	9999	2018-11-30 00:36:00	2018-12-30 07:03:16.848000	analysis	5.76241	False
2	[10000:14999]	2	10000	14999	2018-12-30 07:12:00	2019-01-29 13:39:16.848000	analysis	2.65396	False
3	[15000:19999]	3	15000	19999	2019-01-29 13:48:00	2019-02-28 20:15:16.848000	analysis	0.0708428	False
4	[20000:24999]	4	20000	24999	2019-02-28 20:24:00	2019-03-31 02:51:16.848000	analysis	1.00542	False
5	[25000:29999]	5	25000	29999	2019-03-31 03:00:00	2019-04-30 09:27:16.848000	analysis	455.622	True
6	[30000:34999]	6	30000	34999	2019-04-30 09:36:00	2019-05-30 16:03:16.848000	analysis	428.633	True
7	[35000:39999]	7	35000	39999	2019-05-30 16:12:00	2019-06-29 22:39:16.848000	analysis	453.247	True
8	[40000:44999]	8	40000	44999	2019-06-29 22:48:00	2019-07-30 05:15:16.848000	analysis	438.26	True
9	[45000:49999]	9	45000	49999	2019-07-30 05:24:00	2019-08-29 11:51:16.848000	analysis	474.892	True

To avoid the use of a Multilevel index, we have provided a switch in the to_df() method.

>>> display(filtered_results.to_df(multilevel=False))

	chunk_key	chunk_index	chunk_start_index	chunk_end_index	chunk_start_date	chunk_end_date	chunk_period	salary_range_chi2_value	salary_range_chi2_alert
0	[0:4999]	0	0	4999	2018-10-30 18:00:00	2018-11-30 00:27:16.848000	analysis	1.03368	False
1	[5000:9999]	1	5000	9999	2018-11-30 00:36:00	2018-12-30 07:03:16.848000	analysis	5.76241	False
2	[10000:14999]	2	10000	14999	2018-12-30 07:12:00	2019-01-29 13:39:16.848000	analysis	2.65396	False
3	[15000:19999]	3	15000	19999	2019-01-29 13:48:00	2019-02-28 20:15:16.848000	analysis	0.0708428	False
4	[20000:24999]	4	20000	24999	2019-02-28 20:24:00	2019-03-31 02:51:16.848000	analysis	1.00542	False
5	[25000:29999]	5	25000	29999	2019-03-31 03:00:00	2019-04-30 09:27:16.848000	analysis	455.622	True
6	[30000:34999]	6	30000	34999	2019-04-30 09:36:00	2019-05-30 16:03:16.848000	analysis	428.633	True
7	[35000:39999]	7	35000	39999	2019-05-30 16:12:00	2019-06-29 22:39:16.848000	analysis	453.247	True
8	[40000:44999]	8	40000	44999	2019-06-29 22:48:00	2019-07-30 05:15:16.848000	analysis	438.26	True
9	[45000:49999]	9	45000	49999	2019-07-30 05:24:00	2019-08-29 11:51:16.848000	analysis	474.892	True

Plotting

Results can be visualized by using the built in plotting functionality. With a quick call of the plot() function we can create a Plotly Figure.

>>> print(filtered_results.to_df(multilevel=False).to_markdown(tablefmt="grid"))

To render it in our notebook we can call the show() method:

>>> results.plot().show()

The image can also be exported to disk by using the following snippet:

>>> results.plot().write_image(f'../_static/tutorials/working_with_results/result_plot.svg')

We might want to reduce the number of plots, since there is a lot happening on the visualization right now. Therefore, we can first apply filtering and then perform the plotting.

>>> filtered_results.plot().show()

Some result classes offer multiple ways of visualizing them. These are listed in their associated API reference docs. For example, when looking at the docs for univariate drift results, there is the default drift and the distribution kind. We can change the visualization by specifying the kind parameter.

>>> filtered_results.plot(kind='distribution').show()

Comparing

Another neat feature is that we can plot a comparison between multiple results. For example, suppose we want to visualize the estimated performance with respect to the univariate drift metrics for the _salary_range_ column. We will first get our estimated performance result.

>>> estimator = nml.CBPE(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     metrics=['roc_auc'],
...     chunk_size=5000,
...     problem_type='classification_binary'
>>> ).fit(reference_df)
>>> est_perf_results = estimator.estimate(analysis_df)

Now we can compare our estimated performance to the univariate drift on features:

>>> est_perf_results.compare(results.filter(methods=['chi2'], column_names=['salary_range'])).plot().show()

We can immediately spot how the estimated performance plummets when the Jensen-Shannon distance picks up!

Note

To reduce complexity, we only support comparing a single metric to another one.

As illustrated in the code snippet above, you can use filtering to select a single metric from your result before comparing it.

Exporting

Results can also be exported to external storage using a Writer. We currently support writing results to disk using a RawFilesWriter, serializing the Result into a Python pickle file and storing that to disk using the PickleFileWriter, or storing calculation results in a database using the DatabaseWriter. This example will show how to use the DatabaseWriter.

We construct the DatabaseWriter by providing a database connection string. Upon calling the write() method, all results will be written into the database, in this case, an SQLite database.

>>> database_writer = nml.DatabaseWriter(connection_string='sqlite:///nml.db')
>>> database_writer.write(results)

A quick inspection shows that the database was populated and contains the univariate drift calculation results.

>>> import sqlite3
>>> cursor = sqlite3.connect('nml.db').cursor()
>>> cursor.execute("""SELECT name FROM sqlite_master WHERE type='table'""")
>>> print(cursor.fetchall())
[('model',), ('run',), ('univariate_drift_metrics',), ('data_reconstruction_feature_drift_metrics',), ('realized_performance_metrics',), ('cbpe_performance_metrics',), ('dle_performance_metrics',), ('unseen_values_metrics',), ('missing_values_metrics',)]

>>> cursor.execute("""SELECT * FROM univariate_drift_metrics LIMIT 3""")
>>> print(cursor.fetchall())
[(1, None, 1, '2018-10-30 18:00:00.000000', '2018-11-30 00:27:16.848000', '2018-11-14 21:13:38.424000', 'kolmogorov_smirnov', 0.9999999999999062, 0, 'id'), (2, None, 1, '2018-11-30 00:36:00.000000', '2018-12-30 07:03:16.848000', '2018-12-15 03:49:38.424000', 'kolmogorov_smirnov', 0.9999999999999062, 0, 'id'), (3, None, 1, '2018-12-30 07:12:00.000000', '2019-01-29 13:39:16.848000', '2019-01-14 10:25:38.424000', 'kolmogorov_smirnov', 0.9999999999999062, 0, 'id')]