nannyml.drift.model_inputs.univariate.statistical.results module

Module containing univariate statistical drift calculation results and associated plotting implementations.

class nannyml.drift.model_inputs.univariate.statistical.results.UnivariateDriftResult(analysis_data: List[nannyml.chunk.Chunk], drift_data: pandas.core.frame.DataFrame, model_metadata: nannyml.metadata.base.ModelMetadata)[source]

Bases: nannyml.drift.base.DriftResult

Contains the univariate statistical drift calculation results and provides additional plotting functionality.

Creates a new DriftResult instance.

Parameters

analysis_data (List[Chunk]) – The data that was provided to calculate drift on. This is required in order to plot distributions.
drift_data (pd.DataFrame) – The results of the drift calculation.
model_metadata (ModelMetadata) – The metadata describing the monitored model. Used to

__repr__()[source]: Represent the DriftResults object as the data it contains.

plot(kind: str = 'feature', metric: str = 'statistic', feature_label: Optional[str] = None, feature_column_name: Optional[str] = None, class_label: Optional[str] = None, *args, **kwargs) → plotly.graph_objs._figure.Figure[source]

Renders a line plot for a chosen metric of statistical statistical drift calculation results.

Given either a feature label (check model_metadata.features) or the actual feature column name and a metric (one of either statistic or p_value) this function will render a line plot displaying the metric value for the selected feature per chunk. Chunks are set on a time-based X-axis by using the period containing their observations. Chunks of different partitions (reference and analysis) are represented using different colors and a vertical separation if the drift results contain multiple partitions.

The different plot kinds that are available:

feature_drift: plots drift per Chunk for a single feature of a chunked data set.
prediction_drift: plots drift per Chunk for the predictions of a chunked data set.
feature_distribution: plots feature distribution per Chunk. Joyplot for continuous features, stacked bar charts for categorical features.
prediction_distribution: plots the prediction distribution per Chunk of a chunked data set as a joyplot.

Parameters

kind (str, default=`feature_drift`) – The kind of plot you want to have. Value must be one of feature_drift, prediction_drift, feature_distribution or prediction_distribution.
metric (str, default=``statistic``) – The metric to plot. Value must be one of statistic or p_value
feature_label (str) – Feature label identifying a feature according to the preset model metadata. The function will raise an exception when no feature of that label was found in the metadata. Either feature_label or feature_column_name should be specified.
feature_column_name (str) – Column name identifying a feature according to the preset model metadata. The function will raise an exception when no feature using that column name was found in the metadata. Either feature_column_name or feature_label should be specified.
class_label (str, default=None) – The label of the class to plot the prediction distribution for. Only required in case of multiclass models.

Returns

fig – A Figure object containing the requested drift plot. Can be saved to disk or shown rendered on screen using fig.show().

Return type

plotly.graph_objects.Figure

Examples

>>> import nannyml as nml
>>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY)
>>> drift_calc = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_period='W')
>>> drift_calc.fit(ref_df)
>>> drifts = drift_calc.calculate(ana_df)
>>> # loop over all features and plot the feature drift and feature distribution for each
>>> for f in metadata.features:
>>>     drifts.plot(kind='feature_drift', feature_label=f.label).show()
>>>     drifts.plot(kind='feature_distribution', feature_label=f.label).show()