nannyml.drift.model_inputs.univariate.statistical.results module
Module containing univariate statistical drift calculation results and associated plotting implementations.
- class nannyml.drift.model_inputs.univariate.statistical.results.UnivariateDriftResult(analysis_data: List[nannyml.chunk.Chunk], drift_data: pandas.core.frame.DataFrame, model_metadata: nannyml.metadata.base.ModelMetadata)[source]
Bases:
nannyml.drift.base.DriftResultContains the univariate statistical drift calculation results and provides additional plotting functionality.
Creates a new DriftResult instance.
- Parameters
analysis_data (List[Chunk]) – The data that was provided to calculate drift on. This is required in order to plot distributions.
drift_data (pd.DataFrame) – The results of the drift calculation.
model_metadata (ModelMetadata) – The metadata describing the monitored model. Used to
- plot(kind: str = 'feature', metric: str = 'statistic', feature_label: Optional[str] = None, feature_column_name: Optional[str] = None, class_label: Optional[str] = None, *args, **kwargs) plotly.graph_objs._figure.Figure[source]
Renders a line plot for a chosen metric of statistical statistical drift calculation results.
Given either a feature label (check
model_metadata.features) or the actual feature column name and a metric (one of eitherstatisticorp_value) this function will render a line plot displaying the metric value for the selected feature per chunk. Chunks are set on a time-based X-axis by using the period containing their observations. Chunks of different partitions (referenceandanalysis) are represented using different colors and a vertical separation if the drift results contain multiple partitions.The different plot kinds that are available:
feature_drift: plots drift perChunkfor a single feature of a chunked data set.prediction_drift: plots drift perChunkfor the predictions of a chunked data set.feature_distribution: plots feature distribution perChunk. Joyplot for continuous features, stacked bar charts for categorical features.prediction_distribution: plots the prediction distribution perChunkof a chunked data set as a joyplot.
- Parameters
kind (str, default=`feature_drift`) – The kind of plot you want to have. Value must be one of
feature_drift,prediction_drift,feature_distributionorprediction_distribution.metric (str, default=``statistic``) – The metric to plot. Value must be one of
statisticorp_valuefeature_label (str) – Feature label identifying a feature according to the preset model metadata. The function will raise an exception when no feature of that label was found in the metadata. Either
feature_labelorfeature_column_nameshould be specified.feature_column_name (str) – Column name identifying a feature according to the preset model metadata. The function will raise an exception when no feature using that column name was found in the metadata. Either
feature_column_nameorfeature_labelshould be specified.class_label (str, default=None) – The label of the class to plot the prediction distribution for. Only required in case of multiclass models.
- Returns
fig – A
Figureobject containing the requested drift plot. Can be saved to disk or shown rendered on screen usingfig.show().- Return type
plotly.graph_objects.Figure
Examples
>>> import nannyml as nml >>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset() >>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY) >>> drift_calc = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_period='W') >>> drift_calc.fit(ref_df) >>> drifts = drift_calc.calculate(ana_df) >>> # loop over all features and plot the feature drift and feature distribution for each >>> for f in metadata.features: >>> drifts.plot(kind='feature_drift', feature_label=f.label).show() >>> drifts.plot(kind='feature_distribution', feature_label=f.label).show()