nannyml.drift.model_inputs.univariate.statistical.results module
Module containing univariate statistical drift calculation results and associated plotting implementations.
- class nannyml.drift.model_inputs.univariate.statistical.results.UnivariateDriftResult(analysis_data: List[nannyml.chunk.Chunk], drift_data: pandas.core.frame.DataFrame, model_metadata: nannyml.metadata.base.ModelMetadata)[source]
Bases:
nannyml.drift.base.DriftResult
Contains the univariate statistical drift calculation results and provides additional plotting functionality.
Creates a new DriftResult instance.
- Parameters
analysis_data (List[Chunk]) – The data that was provided to calculate drift on. This is required in order to plot distributions.
drift_data (pd.DataFrame) – The results of the drift calculation.
model_metadata (ModelMetadata) – The metadata describing the monitored model. Used to
- plot(kind: str = 'feature', metric: str = 'statistic', feature_label: Optional[str] = None, feature_column_name: Optional[str] = None, class_label: Optional[str] = None, *args, **kwargs) plotly.graph_objs._figure.Figure [source]
Renders a line plot for a chosen metric of statistical statistical drift calculation results.
Given either a feature label (check
model_metadata.features
) or the actual feature column name and a metric (one of eitherstatistic
orp_value
) this function will render a line plot displaying the metric value for the selected feature per chunk. Chunks are set on a time-based X-axis by using the period containing their observations. Chunks of different partitions (reference
andanalysis
) are represented using different colors and a vertical separation if the drift results contain multiple partitions.The different plot kinds that are available:
feature_drift
: plots drift perChunk
for a single feature of a chunked data set.prediction_drift
: plots drift perChunk
for the predictions of a chunked data set.feature_distribution
: plots feature distribution perChunk
. Joyplot for continuous features, stacked bar charts for categorical features.prediction_distribution
: plots the prediction distribution perChunk
of a chunked data set as a joyplot.
- Parameters
kind (str, default=`feature_drift`) – The kind of plot you want to have. Value must be one of
feature_drift
,prediction_drift
,feature_distribution
orprediction_distribution
.metric (str, default=``statistic``) – The metric to plot. Value must be one of
statistic
orp_value
feature_label (str) – Feature label identifying a feature according to the preset model metadata. The function will raise an exception when no feature of that label was found in the metadata. Either
feature_label
orfeature_column_name
should be specified.feature_column_name (str) – Column name identifying a feature according to the preset model metadata. The function will raise an exception when no feature using that column name was found in the metadata. Either
feature_column_name
orfeature_label
should be specified.class_label (str, default=None) – The label of the class to plot the prediction distribution for. Only required in case of multiclass models.
- Returns
fig – A
Figure
object containing the requested drift plot. Can be saved to disk or shown rendered on screen usingfig.show()
.- Return type
plotly.graph_objects.Figure
Examples
>>> import nannyml as nml >>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset() >>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY) >>> drift_calc = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_period='W') >>> drift_calc.fit(ref_df) >>> drifts = drift_calc.calculate(ana_df) >>> # loop over all features and plot the feature drift and feature distribution for each >>> for f in metadata.features: >>> drifts.plot(kind='feature_drift', feature_label=f.label).show() >>> drifts.plot(kind='feature_distribution', feature_label=f.label).show()