Storing and loading calculators
Fitting a calculator or estimator is only required when the reference data for a monitored model changes.
To avoid unnecessary calculations and speed up (repeating) runs of NannyML, you can store the fitted calculators
to a Store
.
Note
We currently support persisting objects to a local or remote filesystem such as S3, Google Cloud Storage buckets or Azure Blob Storage. You can find some examples in the walkthrough.
Note
For more information on how to use this functionality with the CLI or container, check the configuration file documentation.
Just the code
Create the calculator and fit it on reference. Store the fitted calculator to local disk.
>>> import nannyml as nml
>>> reference_df, _, _ = nml.load_synthetic_car_loan_dataset()
>>> column_names = ['car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred']
>>> calc = nml.UnivariateDriftCalculator(
... column_names=column_names,
... treat_as_categorical=['y_pred'],
... timestamp_column_name='timestamp',
... continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
... categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
>>> store = nml.io.store.FilesystemStore(root_path='/tmp/nml-cache')
>>> store.store(calc, path='example/calc.pkl')
In a new session load the stored calculator and use it.
>>> import nannyml as nml
>>> _, analysis_df, _ = nml.load_synthetic_car_loan_dataset()
>>> store = nml.io.store.FilesystemStore(root_path='/tmp/nml-cache')
>>> loaded_calc = store.load(path='example/calc.pkl', as_type=nml.UnivariateDriftCalculator)
>>> result = loaded_calc.calculate(analysis_df)
>>> display(result.to_df())
Walkthrough
In the first part we create a new UnivariateDriftCalculator
and fit it
to the reference data.
>>> import nannyml as nml
>>> reference_df, _, _ = nml.load_synthetic_car_loan_dataset()
>>> column_names = ['car_value', 'salary_range', 'debt_to_income_ratio', 'loan_length', 'repaid_loan_on_prev_car', 'size_of_downpayment', 'driver_tenure', 'y_pred_proba', 'y_pred']
>>> calc = nml.UnivariateDriftCalculator(
... column_names=column_names,
... treat_as_categorical=['y_pred'],
... timestamp_column_name='timestamp',
... continuous_methods=['kolmogorov_smirnov', 'jensen_shannon'],
... categorical_methods=['chi2', 'jensen_shannon'],
>>> )
>>> calc.fit(reference_df)
In this snippet we’ll set up the FilesystemStore
. It is a class responsible for
storing objects on a filesystem and retrieving it back.
We’ll first illustrate creating a store using the local filesystem. The root_path parameter configures the directory
on the filesystem that will be used as the root of our store. Additional directories and files can be created when
actually storing objects.
We’ll now provide a directory on the local filesystem.
>>> store = nml.io.store.FilesystemStore(root_path='/tmp/nml-cache')
Because we’re using the fsspec library under the covers we also support a lot of remote filesystems out of the box.
The following snippet shows how to use S3 as a backing filesystem. See https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html to learn more about the required access key id and secret access key credentials.
>>> store = nml.io.store.FilesystemStore(
... root_path='s3://my-bucket-name/some/path',
... credentials={
... 'client_kwargs': {
... 'aws_access_key_id': '<ACCESS_KEY_ID>',
... 'aws_secret_access_key': '<SECRET_ACCESS_KEY>'
... }
... }
>>> )
This is how to use Google Cloud Storage as a backing system. See https://cloud.google.com/iam/docs/creating-managing-service-account-keys to learn more about the required service account key credentials.
>>> store = nml.io.store.FilesystemStore(
... root_path='gs://my-bucket-name/some/path',
... credentials={'token': 'service-account-access-key.json'}
>>> )
This snippet illustrates how to do this using Azure Blob Storage. See https://github.com/fsspec/adlfs#setting-credentials to learn more about the required credentials.
>>> store = nml.io.store.FilesystemStore(
... root_path='abfs://my-container-name/some/path',
... credentials={'account_name': '<ACCOUNT_NAME>', 'account_key': '<ACCOUNT_KEY>'}
>>> )
The next step is using the FilesystemStore
to store our fitted calculator.
To do this we can provide an optional path string parameter. It allows us to set a custom subdirectory and file name.
If no path is provided a file will be created using a standard name within the root directory of the store.
>>> store.store(calc, path='example/calc.pkl')
This concludes the first part: storing the fitted calculator.
When running NannyML in a new session to perform calculations on analysis data (e.g. repeated on a daily basis) we can load the pre-fitted calculator from the store. First we define the analysis data and declare the store:
>>> import nannyml as nml
>>> _, analysis_df, _ = nml.load_synthetic_car_loan_dataset()
>>> store = nml.io.store.FilesystemStore(root_path='/tmp/nml-cache')
Now we’ll use the store to load the pre-fitted calculator from disk. By providing the optional as_type parameter
we can have the store check the type of the loaded object before returning it. If it is not an instance of as_type the
load()
method will raise a StoreException
.
If nothing is found at the given path the load()
method will return
None.
>>> loaded_calc = store.load(path='example/calc.pkl', as_type=nml.UnivariateDriftCalculator)
>>> result = loaded_calc.calculate(analysis_df)
>>> display(result.to_df())
chunk
key
|
chunk_index
|
start_index
|
end_index
|
start_date
|
end_date
|
period
|
roc_auc
value
|
sampling_error
|
realized
|
upper_confidence_boundary
|
lower_confidence_boundary
|
upper_threshold
|
lower_threshold
|
alert
|
recall
value
|
sampling_error
|
realized
|
upper_confidence_boundary
|
lower_confidence_boundary
|
upper_threshold
|
lower_threshold
|
alert
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 |
[0:4999] |
0 |
0 |
4999 |
2018-10-30 18:00:00 |
2018-11-30 00:27:16.848000 |
analysis |
0.968631 |
0.00181072 |
0.970962 |
0.974063 |
0.963198 |
0.97866 |
0.963317 |
False |
0.928723 |
0.00513664 |
0.930394 |
0.944133 |
0.913313 |
0.941033 |
0.9171 |
False |
1 |
[5000:9999] |
1 |
5000 |
9999 |
2018-11-30 00:36:00 |
2018-12-30 07:03:16.848000 |
analysis |
0.969044 |
0.00181072 |
0.970248 |
0.974476 |
0.963612 |
0.97866 |
0.963317 |
False |
0.925261 |
0.00513664 |
0.923922 |
0.940671 |
0.909851 |
0.941033 |
0.9171 |
False |
2 |
[10000:14999] |
2 |
10000 |
14999 |
2018-12-30 07:12:00 |
2019-01-29 13:39:16.848000 |
analysis |
0.969444 |
0.00181072 |
0.976282 |
0.974876 |
0.964012 |
0.97866 |
0.963317 |
False |
0.929317 |
0.00513664 |
0.938246 |
0.944727 |
0.913907 |
0.941033 |
0.9171 |
False |
3 |
[15000:19999] |
3 |
15000 |
19999 |
2019-01-29 13:48:00 |
2019-02-28 20:15:16.848000 |
analysis |
0.969047 |
0.00181072 |
0.967721 |
0.974479 |
0.963615 |
0.97866 |
0.963317 |
False |
0.929713 |
0.00513664 |
0.92506 |
0.945123 |
0.914303 |
0.941033 |
0.9171 |
False |
4 |
[20000:24999] |
4 |
20000 |
24999 |
2019-02-28 20:24:00 |
2019-03-31 02:51:16.848000 |
analysis |
0.968873 |
0.00181072 |
0.969886 |
0.974305 |
0.963441 |
0.97866 |
0.963317 |
False |
0.930604 |
0.00513664 |
0.927577 |
0.946014 |
0.915194 |
0.941033 |
0.9171 |
False |
5 |
[25000:29999] |
5 |
25000 |
29999 |
2019-03-31 03:00:00 |
2019-04-30 09:27:16.848000 |
analysis |
0.960478 |
0.00181072 |
0.96005 |
0.96591 |
0.955046 |
0.97866 |
0.963317 |
True |
0.88399 |
0.00513664 |
0.905086 |
0.8994 |
0.86858 |
0.941033 |
0.9171 |
True |
6 |
[30000:34999] |
6 |
30000 |
34999 |
2019-04-30 09:36:00 |
2019-05-30 16:03:16.848000 |
analysis |
0.961134 |
0.00181072 |
0.95853 |
0.966566 |
0.955701 |
0.97866 |
0.963317 |
True |
0.883528 |
0.00513664 |
0.89901 |
0.898938 |
0.868118 |
0.941033 |
0.9171 |
True |
7 |
[35000:39999] |
7 |
35000 |
39999 |
2019-05-30 16:12:00 |
2019-06-29 22:39:16.848000 |
analysis |
0.960536 |
0.00181072 |
0.959041 |
0.965968 |
0.955104 |
0.97866 |
0.963317 |
True |
0.885501 |
0.00513664 |
0.901718 |
0.900911 |
0.870091 |
0.941033 |
0.9171 |
True |
8 |
[40000:44999] |
8 |
40000 |
44999 |
2019-06-29 22:48:00 |
2019-07-30 05:15:16.848000 |
analysis |
0.961869 |
0.00181072 |
0.963094 |
0.967301 |
0.956437 |
0.97866 |
0.963317 |
True |
0.885978 |
0.00513664 |
0.906124 |
0.901388 |
0.870568 |
0.941033 |
0.9171 |
True |
9 |
[45000:49999] |
9 |
45000 |
49999 |
2019-07-30 05:24:00 |
2019-08-29 11:51:16.848000 |
analysis |
0.960537 |
0.00181072 |
0.957556 |
0.965969 |
0.955104 |
0.97866 |
0.963317 |
True |
0.889808 |
0.00513664 |
0.905823 |
0.905218 |
0.874398 |
0.941033 |
0.9171 |
True |
What’s Next
The FilesystemStore
can also be used when running NannyML using the CLI or as
a container. You can learn how in the configuration file documentation.