Working with results
What are NannyML Results?
In NannyML, any calculation will return a Result
object. Not returning
a DataFrame directly allows NannyML to separate the concerns of storing calculation results and having users interact
with them. It also means we can provide additional useful methods, such as filtering and plotting, on top of the results.
Just the code
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()
>>> column_names = [
... col for col in reference_df.columns
... if col not in ['timestamp', 'repaid']
>>> ]
>>> print(column_names)
>>> calc = nml.UnivariateDriftCalculator(
... column_names=column_names,
... treat_as_categorical=['y_pred'],
... timestamp_column_name='timestamp',
... continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
... categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)
>>> display(results.to_df())
>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))
>>> display(filtered_results.to_df())
>>> display(filtered_results.to_df(multilevel=False))
>>> results.plot().show()
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc'],
... chunk_size=5000,
... problem_type='classification_binary'
>>> ).fit(reference_df)
>>> est_perf_results = estimator.estimate(analysis_df)
>>> est_perf_results.compare(results.filter(methods=['chi2'], column_names=['salary_range'])).plot().show()
Walkthrough
The data structure
In order to obtain results, we first have to perform some calculation. We will start by loading the reference and analysis sample data for binary classification. Then, we will perform univariate drift detection on a number of columns whose names are printed below. Knowing the column names will help you understand this walkthrough better.
>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df, analysis_df, _ = nml.load_synthetic_car_loan_dataset()
>>> column_names = [
... col for col in reference_df.columns
... if col not in ['timestamp', 'repaid']
>>> ]
>>> print(column_names)
['id', 'car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred']
We then set up the UnivariateDriftCalculator
by specifying the names
of the columns to evaluate and the continuous and categorical methods we would like to use.
We then fit the calculator on our reference data. The fitted calculator is then used to evaluate drift for the
analysis data, stored here as the variable results
.
>>> calc = nml.UnivariateDriftCalculator(
... column_names=column_names,
... treat_as_categorical=['y_pred'],
... timestamp_column_name='timestamp',
... continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
... categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)
This variable is an instance of the Result
class. To turn this object into a
DataFrame you can use the to_df()
method. Let’s see what
this DataFrame looks like.
>>> display(results.to_df())
We can immediately see that a MultiLevel index is being used to store the data. There is a part containing chunk information, followed by the numerical results of the drift calculations.
In the case of the UnivariateDriftCalculator
, there are two degrees of
freedom. You can specify columns to include in the calculation, and each column might be evaluated by different methods.
This structure is visible in the column index. The top level represents the column names. The middle level represents the specific methods used to evaluate a column. Finally, the bottom level contains the information relevant to each method: a value, upper and lower thresholds for alerts, and whether the evaluated method crossed the thresholds for that chunk, leading to an alert.
chunk
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
id
kolmogorov_smirnov
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
car_value
kolmogorov_smirnov
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
debt_to_income_ratio
kolmogorov_smirnov
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
loan_length
kolmogorov_smirnov
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
driver_tenure
kolmogorov_smirnov
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
y_pred_proba
kolmogorov_smirnov
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
salary_range
chi2
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
repaid_loan_on_prev_car
chi2
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
size_of_downpayment
chi2
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
y_pred
chi2
value
|
upper_threshold
|
lower_threshold
|
alert
|
jensen_shannon
value
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-01-01 00:00:00 |
2018-01-31 06:27:16.848000 |
reference |
0.9 |
1 |
False |
0.854338 |
0.888887 |
False |
0.0103 |
0.0194257 |
False |
0.0296736 |
0.0352619 |
False |
0.01112 |
0.0185838 |
False |
0.0333679 |
0.0393276 |
False |
0.00818 |
0.0166909 |
False |
0.0242899 |
0.0366378 |
False |
0.00974 |
0.0173417 |
False |
0.0228713 |
0.039192 |
False |
0.00922 |
0.0145647 |
False |
0.0133555 |
0.0365875 |
False |
2.89878 |
False |
0.010811 |
0.0177088 |
False |
0.414606 |
False |
0.00415143 |
0.0147343 |
False |
4.00124 |
False |
0.0125401 |
0.0214812 |
False |
0.733844 |
False |
0.00549026 |
0.00909686 |
False |
||||||||||||||||||||||||
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-01-31 06:36:00 |
2018-03-02 13:03:16.848000 |
reference |
0.8 |
1 |
False |
0.810064 |
0.888887 |
False |
0.00732 |
0.0194257 |
False |
0.0237846 |
0.0352619 |
False |
0.01218 |
0.0185838 |
False |
0.028066 |
0.0393276 |
False |
0.00868 |
0.0166909 |
False |
0.0177897 |
0.0366378 |
False |
0.01186 |
0.0173417 |
False |
0.0335415 |
0.039192 |
False |
0.01042 |
0.0145647 |
False |
0.0211292 |
0.0365875 |
False |
3.14439 |
False |
0.01124 |
0.0177088 |
False |
0.0334857 |
False |
0.00124668 |
0.0147343 |
False |
1.28891 |
False |
0.00713799 |
0.0214812 |
False |
0.983187 |
False |
0.00634039 |
0.00909686 |
False |
||||||||||||||||||||||||
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-03-02 13:12:00 |
2018-04-01 19:39:16.848000 |
reference |
0.7 |
1 |
False |
0.822917 |
0.888887 |
False |
0.00802 |
0.0194257 |
False |
0.0264685 |
0.0352619 |
False |
0.00878 |
0.0185838 |
False |
0.0225969 |
0.0393276 |
False |
0.0139 |
0.0166909 |
False |
0.0240002 |
0.0366378 |
False |
0.01262 |
0.0173417 |
False |
0.029597 |
0.039192 |
False |
0.0091 |
0.0145647 |
False |
0.02237 |
0.0365875 |
False |
2.45188 |
False |
0.00980904 |
0.0177088 |
False |
0.168656 |
False |
0.00267997 |
0.0147343 |
False |
5.11796 |
False |
0.0142803 |
0.0214812 |
False |
0.576787 |
False |
0.00487654 |
0.00909686 |
False |
||||||||||||||||||||||||
3 |
[15000:19999] |
3 |
15000 |
19999 |
2018-04-01 19:48:00 |
2018-05-02 02:15:16.848000 |
reference |
0.6 |
1 |
False |
0.853731 |
0.888887 |
False |
0.0085 |
0.0194257 |
False |
0.0217468 |
0.0352619 |
False |
0.0095 |
0.0185838 |
False |
0.0315869 |
0.0393276 |
False |
0.0083 |
0.0166909 |
False |
0.0292131 |
0.0366378 |
False |
0.01056 |
0.0173417 |
False |
0.0286826 |
0.039192 |
False |
0.00872 |
0.0145647 |
False |
0.0178289 |
0.0365875 |
False |
4.06262 |
False |
0.0127697 |
0.0177088 |
False |
0.0562698 |
False |
0.00158831 |
0.0147343 |
False |
1.84901 |
False |
0.0085587 |
0.0214812 |
False |
0.0691997 |
False |
0.0017505 |
0.00909686 |
False |
||||||||||||||||||||||||
4 |
[20000:24999] |
4 |
20000 |
24999 |
2018-05-02 02:24:00 |
2018-06-01 08:51:16.848000 |
reference |
0.5 |
1 |
False |
0.813675 |
0.888887 |
False |
0.00892 |
0.0194257 |
False |
0.024108 |
0.0352619 |
False |
0.00754 |
0.0185838 |
False |
0.0310501 |
0.0393276 |
False |
0.00544 |
0.0166909 |
False |
0.0165946 |
0.0366378 |
False |
0.00922 |
0.0173417 |
False |
0.0209876 |
0.039192 |
False |
0.00852 |
0.0145647 |
False |
0.0216622 |
0.0365875 |
False |
2.41399 |
False |
0.00968817 |
0.0177088 |
False |
0.242059 |
False |
0.00319188 |
0.0147343 |
False |
0.470551 |
False |
0.00433131 |
0.0214812 |
False |
0.325601 |
False |
0.00368727 |
0.00909686 |
False |
||||||||||||||||||||||||
5 |
[25000:29999] |
5 |
25000 |
29999 |
2018-06-01 09:00:00 |
2018-07-01 15:27:16.848000 |
reference |
0.5 |
1 |
False |
0.813675 |
0.888887 |
False |
0.01456 |
0.0194257 |
False |
0.0275587 |
0.0352619 |
False |
0.0103 |
0.0185838 |
False |
0.0316479 |
0.0393276 |
False |
0.01112 |
0.0166909 |
False |
0.0271572 |
0.0366378 |
False |
0.00794 |
0.0173417 |
False |
0.0229349 |
0.039192 |
False |
0.01028 |
0.0145647 |
False |
0.017256 |
0.0365875 |
False |
3.79606 |
False |
0.0122934 |
0.0177088 |
False |
3.61457 |
False |
0.0120561 |
0.0147343 |
False |
0.137868 |
False |
0.00233712 |
0.0214812 |
False |
0.34437 |
False |
0.00379022 |
0.00909686 |
False |
||||||||||||||||||||||||
6 |
[30000:34999] |
6 |
30000 |
34999 |
2018-07-01 15:36:00 |
2018-07-31 22:03:16.848000 |
reference |
0.6 |
1 |
False |
0.853731 |
0.888887 |
False |
0.01284 |
0.0194257 |
False |
0.0267818 |
0.0352619 |
False |
0.01094 |
0.0185838 |
False |
0.0258014 |
0.0393276 |
False |
0.00464 |
0.0166909 |
False |
0.0259338 |
0.0366378 |
False |
0.0112 |
0.0173417 |
False |
0.0226753 |
0.039192 |
False |
0.01248 |
0.0145647 |
False |
0.0253217 |
0.0365875 |
False |
3.22884 |
False |
0.0112358 |
0.0177088 |
False |
0.0757052 |
False |
0.00182666 |
0.0147343 |
False |
4.19999 |
False |
0.0129223 |
0.0214812 |
False |
0.000962674 |
False |
0.000288895 |
0.00909686 |
False |
||||||||||||||||||||||||
7 |
[35000:39999] |
7 |
35000 |
39999 |
2018-07-31 22:12:00 |
2018-08-31 04:39:16.848000 |
reference |
0.7 |
1 |
False |
0.822917 |
0.888887 |
False |
0.01348 |
0.0194257 |
False |
0.0312131 |
0.0352619 |
False |
0.01736 |
0.0185838 |
False |
0.0325098 |
0.0393276 |
False |
0.00548 |
0.0166909 |
False |
0.0185372 |
0.0366378 |
False |
0.0074 |
0.0173417 |
False |
0.025517 |
0.039192 |
False |
0.0089 |
0.0145647 |
False |
0.0275068 |
0.0365875 |
False |
1.3933 |
False |
0.00739444 |
0.0177088 |
False |
0.414606 |
False |
0.00415143 |
0.0147343 |
False |
0.716349 |
False |
0.00533433 |
0.0214812 |
False |
0.536536 |
False |
0.00470665 |
0.00909686 |
False |
||||||||||||||||||||||||
8 |
[40000:44999] |
8 |
40000 |
44999 |
2018-08-31 04:48:00 |
2018-09-30 11:15:16.848000 |
reference |
0.8 |
1 |
False |
0.810064 |
0.888887 |
False |
0.01572 |
0.0194257 |
False |
0.0273013 |
0.0352619 |
False |
0.00842 |
0.0185838 |
False |
0.0248975 |
0.0393276 |
False |
0.01062 |
0.0166909 |
False |
0.0291086 |
0.0366378 |
False |
0.01458 |
0.0173417 |
False |
0.0244145 |
0.039192 |
False |
0.00768 |
0.0145647 |
False |
0.0243225 |
0.0365875 |
False |
0.304785 |
False |
0.00347061 |
0.0177088 |
False |
0.0126564 |
False |
0.000802461 |
0.0147343 |
False |
0.596009 |
False |
0.00485967 |
0.0214812 |
False |
0.0275315 |
False |
0.00113856 |
0.00909686 |
False |
||||||||||||||||||||||||
9 |
[45000:49999] |
9 |
45000 |
49999 |
2018-09-30 11:24:00 |
2018-10-30 17:51:16.848000 |
reference |
0.9 |
1 |
False |
0.854338 |
0.888887 |
False |
0.00924 |
0.0194257 |
False |
0.0296982 |
0.0352619 |
False |
0.00786 |
0.0185838 |
False |
0.0284742 |
0.0393276 |
False |
0.00608 |
0.0166909 |
False |
0.0207199 |
0.0366378 |
False |
0.01304 |
0.0173417 |
False |
0.032928 |
0.039192 |
False |
0.00498 |
0.0145647 |
False |
0.0303947 |
0.0365875 |
False |
2.98758 |
False |
0.0108121 |
0.0177088 |
False |
2.20383 |
False |
0.00945409 |
0.0147343 |
False |
5.08023 |
False |
0.0142629 |
0.0214812 |
False |
0.167069 |
False |
0.00266783 |
0.00909686 |
False |
||||||||||||||||||||||||
10 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
1 |
1 |
False |
1 |
0.888887 |
True |
0.01308 |
0.0194257 |
False |
0.0261935 |
0.0352619 |
False |
0.01576 |
0.0185838 |
False |
0.0316611 |
0.0393276 |
False |
0.00884 |
0.0166909 |
False |
0.0244278 |
0.0366378 |
False |
0.02114 |
0.0173417 |
True |
0.0309355 |
0.039192 |
False |
0.0253 |
0.0145647 |
True |
0.0289329 |
0.0365875 |
False |
1.03368 |
False |
0.00639674 |
0.0177088 |
False |
1.70319 |
False |
0.0083078 |
0.0147343 |
False |
1.6025 |
False |
0.00796199 |
0.0214812 |
False |
5.78426 |
True |
0.0152383 |
0.00909686 |
True |
||||||||||||||||||||||||
11 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
1 |
1 |
False |
1 |
0.888887 |
True |
0.01106 |
0.0194257 |
False |
0.0201778 |
0.0352619 |
False |
0.01268 |
0.0185838 |
False |
0.0300113 |
0.0393276 |
False |
0.01418 |
0.0166909 |
False |
0.0258391 |
0.0366378 |
False |
0.00994 |
0.0173417 |
False |
0.0383534 |
0.039192 |
False |
0.0123 |
0.0145647 |
False |
0.0221389 |
0.0365875 |
False |
5.76241 |
False |
0.0153757 |
0.0177088 |
False |
0.242059 |
False |
0.00319188 |
0.0147343 |
False |
5.71897 |
False |
0.0150859 |
0.0214812 |
False |
1.94965 |
False |
0.00889123 |
0.00909686 |
False |
||||||||||||||||||||||||
12 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
1 |
1 |
False |
1 |
0.888887 |
True |
0.01662 |
0.0194257 |
False |
0.0210184 |
0.0352619 |
False |
0.01734 |
0.0185838 |
False |
0.0311286 |
0.0393276 |
False |
0.0124 |
0.0166909 |
False |
0.0293725 |
0.0366378 |
False |
0.02362 |
0.0173417 |
True |
0.034176 |
0.039192 |
False |
0.01642 |
0.0145647 |
True |
0.0310428 |
0.0365875 |
False |
2.65396 |
False |
0.0102823 |
0.0177088 |
False |
3.17862 |
False |
0.0113376 |
0.0147343 |
False |
2.08186 |
False |
0.00907089 |
0.0214812 |
False |
1.59109 |
False |
0.00804087 |
0.00909686 |
False |
||||||||||||||||||||||||
13 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
1 |
1 |
False |
1 |
0.888887 |
True |
0.01434 |
0.0194257 |
False |
0.0363554 |
0.0352619 |
True |
0.0128 |
0.0185838 |
False |
0.0294644 |
0.0393276 |
False |
0.01298 |
0.0166909 |
False |
0.0290784 |
0.0366378 |
False |
0.0143 |
0.0173417 |
False |
0.0332968 |
0.039192 |
False |
0.01058 |
0.0145647 |
False |
0.0228333 |
0.0365875 |
False |
0.0708428 |
False |
0.00167698 |
0.0177088 |
False |
0.0242988 |
False |
0.00107588 |
0.0147343 |
False |
0.489515 |
False |
0.00440901 |
0.0214812 |
False |
0.7808 |
False |
0.00566028 |
0.00909686 |
False |
||||||||||||||||||||||||
14 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
1 |
1 |
False |
1 |
0.888887 |
True |
0.01116 |
0.0194257 |
False |
0.0287119 |
0.0352619 |
False |
0.01918 |
0.0185838 |
True |
0.0308095 |
0.0393276 |
False |
0.01022 |
0.0166909 |
False |
0.0287925 |
0.0366378 |
False |
0.00906 |
0.0173417 |
False |
0.0263609 |
0.039192 |
False |
0.01408 |
0.0145647 |
False |
0.0237474 |
0.0365875 |
False |
1.00542 |
False |
0.00633255 |
0.0177088 |
False |
0.487381 |
False |
0.00449331 |
0.0147343 |
False |
3.15856 |
False |
0.0112076 |
0.0214812 |
False |
0.239784 |
False |
0.00317755 |
0.00909686 |
False |
||||||||||||||||||||||||
15 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
1 |
1 |
False |
1 |
0.888887 |
True |
0.4353 |
0.0194257 |
True |
0.464759 |
0.0352619 |
True |
0.00824 |
0.0185838 |
False |
0.0286811 |
0.0393276 |
False |
0.17992 |
0.0166909 |
True |
0.233935 |
0.0366378 |
True |
0.00698 |
0.0173417 |
False |
0.0288384 |
0.039192 |
False |
0.1307 |
0.0145647 |
True |
0.225486 |
0.0365875 |
True |
455.622 |
True |
0.183143 |
0.0177088 |
True |
1179.9 |
True |
0.231198 |
0.0147343 |
True |
4.66135 |
False |
0.0135741 |
0.0214812 |
False |
0.424518 |
False |
0.00419696 |
0.00909686 |
False |
||||||||||||||||||||||||
16 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
1 |
1 |
False |
1 |
0.888887 |
True |
0.43028 |
0.0194257 |
True |
0.460057 |
0.0352619 |
True |
0.01058 |
0.0185838 |
False |
0.0436276 |
0.0393276 |
True |
0.18032 |
0.0166909 |
True |
0.231747 |
0.0366378 |
True |
0.00826 |
0.0173417 |
False |
0.0265918 |
0.039192 |
False |
0.1273 |
0.0145647 |
True |
0.208815 |
0.0365875 |
True |
428.633 |
True |
0.174226 |
0.0177088 |
True |
1162.99 |
True |
0.229333 |
0.0147343 |
True |
2.52181 |
False |
0.0100123 |
0.0214812 |
False |
0.0904949 |
False |
0.00198817 |
0.00909686 |
False |
||||||||||||||||||||||||
17 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
1 |
1 |
False |
1 |
0.888887 |
True |
0.43772 |
0.0194257 |
True |
0.466777 |
0.0352619 |
True |
0.01002 |
0.0185838 |
False |
0.0292533 |
0.0393276 |
False |
0.19572 |
0.0166909 |
True |
0.234016 |
0.0366378 |
True |
0.01382 |
0.0173417 |
False |
0.0275949 |
0.039192 |
False |
0.1311 |
0.0145647 |
True |
0.224282 |
0.0365875 |
True |
453.247 |
True |
0.182913 |
0.0177088 |
True |
1170.49 |
True |
0.230161 |
0.0147343 |
True |
3.41534 |
False |
0.0116206 |
0.0214812 |
False |
0.12587 |
False |
0.002328 |
0.00909686 |
False |
||||||||||||||||||||||||
18 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
1 |
1 |
False |
1 |
0.888887 |
True |
0.43602 |
0.0194257 |
True |
0.466199 |
0.0352619 |
True |
0.01068 |
0.0185838 |
False |
0.0306276 |
0.0393276 |
False |
0.18212 |
0.0166909 |
True |
0.231484 |
0.0366378 |
True |
0.0088 |
0.0173417 |
False |
0.0232423 |
0.039192 |
False |
0.1197 |
0.0145647 |
True |
0.205352 |
0.0365875 |
True |
438.26 |
True |
0.177985 |
0.0177088 |
True |
1023.35 |
True |
0.213579 |
0.0147343 |
True |
6.88171 |
False |
0.0164851 |
0.0214812 |
False |
0.313431 |
False |
0.00362023 |
0.00909686 |
False |
||||||||||||||||||||||||
19 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
1 |
1 |
False |
1 |
0.888887 |
True |
0.43838 |
0.0194257 |
True |
0.467827 |
0.0352619 |
True |
0.0068 |
0.0185838 |
False |
0.0283303 |
0.0393276 |
False |
0.19872 |
0.0166909 |
True |
0.24262 |
0.0366378 |
True |
0.0062 |
0.0173417 |
False |
0.0279191 |
0.039192 |
False |
0.13752 |
0.0145647 |
True |
0.215539 |
0.0365875 |
True |
474.892 |
True |
0.19035 |
0.0177088 |
True |
1227.54 |
True |
0.236408 |
0.0147343 |
True |
1.63759 |
False |
0.00809379 |
0.0214812 |
False |
5.91474 |
True |
0.0154082 |
0.00909686 |
True |
Filtering
Working with the Multilevel indexes can be very powerful yet also quite challenging. The following snippet illustrates retrieving all calculated method values from our results.
>>> print(results.to_df().loc[:, (slice(None), slice(None), 'value')].columns)
MultiIndex([( 'id', 'kolmogorov_smirnov', 'value'),
( 'id', 'jensen_shannon', 'value'),
( 'car_value', 'kolmogorov_smirnov', 'value'),
( 'car_value', 'jensen_shannon', 'value'),
( 'debt_to_income_ratio', 'kolmogorov_smirnov', 'value'),
( 'debt_to_income_ratio', 'jensen_shannon', 'value'),
( 'loan_length', 'kolmogorov_smirnov', 'value'),
( 'loan_length', 'jensen_shannon', 'value'),
( 'driver_tenure', 'kolmogorov_smirnov', 'value'),
( 'driver_tenure', 'jensen_shannon', 'value'),
( 'y_pred_proba', 'kolmogorov_smirnov', 'value'),
( 'y_pred_proba', 'jensen_shannon', 'value'),
( 'salary_range', 'chi2', 'value'),
( 'salary_range', 'jensen_shannon', 'value'),
('repaid_loan_on_prev_car', 'chi2', 'value'),
('repaid_loan_on_prev_car', 'jensen_shannon', 'value'),
( 'size_of_downpayment', 'chi2', 'value'),
( 'size_of_downpayment', 'jensen_shannon', 'value'),
( 'y_pred', 'chi2', 'value'),
( 'y_pred', 'jensen_shannon', 'value')],
)
To improve this experience, we have introduced a helper method that allows you to filter the result data to easily
retrieve the information you want. Since the UnivariateDriftCalculator
has
two degrees of freedom, we have included both in the filter()
method.
Additionally, you can filter on the data period, i.e., reference
or analysis
.
The filter()
method will return a new
Result
instance, allowing you to chain methods like,
filter()
, to_df()
, and
plot()
.
>>> filtered_results = results.filter(period='analysis', methods=['chi2'], column_names=['salary_range'])
>>> print(type(filtered_results))
<class 'nannyml.drift.univariate.result.Result'>
When looking at the results after filtering, you can see only the chi2 data for the salary_range column during the analysis period is included.
>>> display(filtered_results.to_df())
chunk
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
salary_range
chi2
value
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
1.03368 |
False |
||
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
5.76241 |
False |
||
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
2.65396 |
False |
||
3 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
0.0708428 |
False |
||
4 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
1.00542 |
False |
||
5 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
455.622 |
True |
||
6 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
428.633 |
True |
||
7 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
453.247 |
True |
||
8 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
438.26 |
True |
||
9 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
474.892 |
True |
To avoid the use of a Multilevel index, we have provided a switch in the
to_df()
method.
>>> display(filtered_results.to_df(multilevel=False))
chunk_key |
chunk_index |
chunk_start_index |
chunk_end_index |
chunk_start_date |
chunk_end_date |
chunk_period |
salary_range_chi2_value |
salary_range_chi2_upper_threshold |
salary_range_chi2_lower_threshold |
salary_range_chi2_alert |
|
---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
1.03368 |
False |
||
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
5.76241 |
False |
||
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
2.65396 |
False |
||
3 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
0.0708428 |
False |
||
4 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
1.00542 |
False |
||
5 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
455.622 |
True |
||
6 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
428.633 |
True |
||
7 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
453.247 |
True |
||
8 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
438.26 |
True |
||
9 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
474.892 |
True |
Plotting
Results can be visualized by using the built in plotting functionality. With a quick call of the
plot()
function we can create a Plotly
Figure.
>>> print(filtered_results.to_df(multilevel=False).to_markdown(tablefmt="grid"))
To render it in our notebook we can call the show()
method:
>>> results.plot().show()
The image can also be exported to disk by using the following snippet:
>>> results.plot().write_image(f'../_static/tutorials/working_with_results/result_plot.svg')
We might want to reduce the number of plots, since there is a lot happening on the visualization right now. Therefore, we can first apply filtering and then perform the plotting.
>>> filtered_results.plot().show()
Some result classes offer multiple ways of visualizing them. These are listed in their associated API reference docs. For example, when looking at the docs for univariate drift results, there is the default drift and the distribution kind. We can change the visualization by specifying the kind parameter.
>>> filtered_results.plot(kind='distribution').show()
Comparing
Another neat feature is that we can plot a comparison between multiple results. For example, suppose we want to visualize the estimated performance with respect to the univariate drift metrics for the _salary_range_ column. We will first get our estimated performance result.
>>> estimator = nml.CBPE(
... y_pred_proba='y_pred_proba',
... y_pred='y_pred',
... y_true='repaid',
... timestamp_column_name='timestamp',
... metrics=['roc_auc'],
... chunk_size=5000,
... problem_type='classification_binary'
>>> ).fit(reference_df)
>>> est_perf_results = estimator.estimate(analysis_df)
Now we can compare our estimated performance to the univariate drift on features:
>>> est_perf_results.compare(results.filter(methods=['chi2'], column_names=['salary_range'])).plot().show()
We can immediately spot how the estimated performance plummets when the Jensen-Shannon distance picks up!
Note
To reduce complexity, we only support comparing a single metric to another one.
As illustrated in the code snippet above, you can use filtering to select a single metric from your result before comparing it.
Exporting
Results can also be exported to external storage using a Writer
. We currently support writing
results to disk using a RawFilesWriter
, serializing the
Result
into a Python pickle file and storing that to disk using the
PickleFileWriter
, or storing calculation results in a database using the
DatabaseWriter
. This example will show how to use the
DatabaseWriter
.
In order to get the dependencies required for database access, please ensure you’ve installed the optional db dependency. Check the installation instructions for more information.
We construct the DatabaseWriter
by providing a database connection string.
Upon calling the write()
method, all results will be written into
the database, in this case, an SQLite database.
>>> database_writer = nml.DatabaseWriter(connection_string='sqlite:///nml.db')
>>> database_writer.write(results)
A quick inspection shows that the database was populated and contains the univariate drift calculation results.
>>> import sqlite3
>>> cursor = sqlite3.connect('nml.db').cursor()
>>> cursor.execute("""SELECT name FROM sqlite_master WHERE type='table'""")
>>> print(cursor.fetchall())
[('model',), ('run',), ('univariate_drift_metrics',), ('data_reconstruction_feature_drift_metrics',), ('realized_performance_metrics',), ('cbpe_performance_metrics',), ('dle_performance_metrics',), ('unseen_values_metrics',), ('missing_values_metrics',)]
>>> cursor.execute("""SELECT * FROM univariate_drift_metrics LIMIT 3""")
>>> print(cursor.fetchall())
[(1, None, 1, '2018-10-30 18:00:00.000000', '2018-11-30 00:27:16.848000', '2018-11-14 21:13:38.424000', 'kolmogorov_smirnov', 0.9999999999999062, 0, 'id'), (2, None, 1, '2018-11-30 00:36:00.000000', '2018-12-30 07:03:16.848000', '2018-12-15 03:49:38.424000', 'kolmogorov_smirnov', 0.9999999999999062, 0, 'id'), (3, None, 1, '2018-12-30 07:12:00.000000', '2019-01-29 13:39:16.848000', '2019-01-14 10:25:38.424000', 'kolmogorov_smirnov', 0.9999999999999062, 0, 'id')]