nannyml.drift.model_inputs.univariate.statistical.results module

Contains the results of the univariate statistical drift calculation and provides plotting functionality.

class nannyml.drift.model_inputs.univariate.statistical.results.UnivariateStatisticalDriftCalculatorResult(results_data: DataFrame, calculator)[source]

Bases: AbstractCalculatorResult

Contains the results of the univariate statistical drift calculation and provides plotting functionality.

Creates a new AbstractCalculatorResult instance.

Parameters:

results_data (pd.DataFrame) – The data returned by the Calculator.

property calculator_name: str
plot(kind: str = 'feature', metric: str = 'statistic', feature_column_name: Optional[str] = None, plot_reference: bool = False, *args, **kwargs) Optional[Figure][source]

Renders plots for metrics returned by the univariate statistical drift calculator.

For any feature you can render the statistic value or p-values as a step plot, or create a distribution plot. Select a plot using the kind parameter:

  • feature_drift

    plots drift per Chunk for a single feature of a chunked data set.

  • feature_distribution

    plots feature distribution per Chunk. Joyplot for continuous features, stacked bar charts for categorical features.

Parameters:
  • kind (str, default=`feature_drift`) – The kind of plot you want to have. Allowed values are feature_drift` and feature_distribution.

  • metric (str, default=``statistic``) – The metric to plot. Allowed values are statistic and p_value. Not applicable when plotting distributions.

  • feature_column_name (str) – Column name identifying a feature according to the preset model metadata. The function will raise an exception when no feature using that column name was found in the metadata. Either feature_column_name or feature_label should be specified.

  • plot_reference (bool, default=False) – Indicates whether to include the reference period in the plot or not. Defaults to False.

Returns:

fig – A Figure object containing the requested drift plot.

Can be saved to disk using the write_image() method or shown rendered on screen using the show() method.

Return type:

plotly.graph_objs._figure.Figure

Examples

>>> import nannyml as nml
>>>
>>> reference_df, analysis_df, _ = nml.load_synthetic_binary_classification_dataset()
>>>
>>> feature_column_names = [col for col in reference_df.columns
>>>                         if col not in ['y_pred', 'y_pred_proba', 'work_home_actual', 'timestamp']]
>>> calc = nml.UnivariateStatisticalDriftCalculator(
>>>     feature_column_names=feature_column_names,
>>>     timestamp_column_name='timestamp'
>>> )
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)
>>> print(results.data)  # check the numbers
             key  start_index  ...  identifier_alert identifier_threshold
0       [0:4999]            0  ...              True                 0.05
1    [5000:9999]         5000  ...              True                 0.05
2  [10000:14999]        10000  ...              True                 0.05
3  [15000:19999]        15000  ...              True                 0.05
4  [20000:24999]        20000  ...              True                 0.05
5  [25000:29999]        25000  ...              True                 0.05
6  [30000:34999]        30000  ...              True                 0.05
7  [35000:39999]        35000  ...              True                 0.05
8  [40000:44999]        40000  ...              True                 0.05
9  [45000:49999]        45000  ...              True                 0.05
>>> for feature in calc.feature_column_names:
>>>     fig = results.plot(kind='feature_drift', metric='statistic', plot_reference=True,
>>>                        feature_column_name=feature)
>>>     fig.show()