nannyml.drift.model_outputs.univariate.statistical.results module

Module containing univariate statistical drift calculation results and associated plotting implementations.

class nannyml.drift.model_outputs.univariate.statistical.results.UnivariateDriftResult(results_data: DataFrame, calculator: AbstractCalculator)[source]

Bases: AbstractCalculatorResult

Contains the results of the model output statistical drift calculation and provides plotting functionality.

Creates a new AbstractCalculatorResult instance.

Parameters:

results_data (pd.DataFrame) – The data returned by the Calculator.

property calculator_name: str
plot(kind: str = 'prediction_drift', metric: str = 'statistic', class_label: Optional[str] = None, plot_reference: bool = False, *args, **kwargs) Optional[Figure][source]

Renders plots for metrics returned by the univariate statistical drift calculator.

For both model predictions and outputs you can render the statistic value or p-values as a step plot, or create a distribution plot. For multiclass use cases it is required to provide a class_label parameter when rendering model output plots.

Select a plot using the kind parameter:

  • prediction_drift

    plots the drift metric per Chunk for the model predictions y_pred.

  • prediction_distribution

    plots the distribution per Chunk for the model predictions y_pred.

  • score_drift

    plots the drift metric per Chunk for the model outputs y_pred_proba.

  • score_distribution

    plots the distribution per per Chunk for the model outputs y_pred_proba

Parameters:
  • kind (str, default=`prediction_drift`) – The kind of plot you want to have. Allowed values are prediction_drift, prediction_distribution, score_drift and score_distribution.

  • metric (str, default=``statistic``) – The metric to plot. Allowed values are statistic and p_value. Not applicable when plotting distributions.

  • plot_reference (bool, default=False) – Indicates whether to include the reference period in the plot or not. Defaults to False.

  • class_label (str, default=None) – The label of the class to plot the prediction distribution for. Only required in case of multiclass use cases.

Returns:

fig – A Figure object containing the requested drift plot.

Can be saved to disk using the write_image() method or shown rendered on screen using the show() method.

Return type:

plotly.graph_objs._figure.Figure

Examples

>>> import nannyml as nml
>>>
>>> reference_df, analysis_df, _ = nml.load_synthetic_binary_classification_dataset()
>>>
>>> calc = nml.StatisticalOutputDriftCalculator(
>>>     y_pred_proba='y_pred_proba',
>>>     y_pred='y_pred',
>>>     timestamp_column_name='timestamp'
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)
>>>
>>> print(results.data)  # check the numbers
             key  start_index  ...  y_pred_proba_alert y_pred_proba_threshold
0       [0:4999]            0  ...                True                   0.05
1    [5000:9999]         5000  ...               False                   0.05
2  [10000:14999]        10000  ...               False                   0.05
3  [15000:19999]        15000  ...               False                   0.05
4  [20000:24999]        20000  ...               False                   0.05
5  [25000:29999]        25000  ...                True                   0.05
6  [30000:34999]        30000  ...                True                   0.05
7  [35000:39999]        35000  ...                True                   0.05
8  [40000:44999]        40000  ...                True                   0.05
9  [45000:49999]        45000  ...                True                   0.05
>>>
>>> results.plot(kind='score_drift', plot_reference=True).show()
>>> results.plot(kind='score_distribution', plot_reference=True).show()
>>> results.plot(kind='prediction_drift', plot_reference=True).show()
>>> results.plot(kind='prediction_distribution', plot_reference=True).show()