nannyml.performance_estimation.confidence_based.cbpe module

Implementation of the CBPE estimator.

class nannyml.performance_estimation.confidence_based.cbpe.CBPE(model_metadata: nannyml.metadata.base.ModelMetadata, *args, **kwargs)[source]

Bases: nannyml.performance_estimation.base.PerformanceEstimator

Performance estimator using the Confidence Based Performance Estimation (CBPE) technique.

Initializes a new CBPE performance estimator.

Parameters
  • model_metadata (ModelMetadata) – Metadata telling the DriftCalculator what columns are required for drift calculation.

  • metrics (List[str]) – A list of metrics to calculate.

  • features (List[str], default=None) – An optional list of feature column names. When set only these columns will be included in the drift calculation. If not set all feature columns will be used.

  • chunk_size (int, default=None) – Splits the data into chunks containing chunks_size observations. Only one of chunk_size, chunk_number or chunk_period should be given.

  • chunk_number (int, default=None) – Splits the data into chunk_number pieces. Only one of chunk_size, chunk_number or chunk_period should be given.

  • chunk_period (str, default=None) – Splits the data according to the given period. Only one of chunk_size, chunk_number or chunk_period should be given.

  • chunker (Chunker, default=None) – The Chunker used to split the data sets into a lists of chunks.

  • calibration (str, default='isotonic') – Determines which calibration will be applied to the model predictions. Defaults to isotonic, currently the only supported value.

  • calibrator (Calibrator, default=None) – A specific instance of a Calibrator to be applied to the model predictions. If not set NannyML will use the value of the calibration variable instead.

Examples

>>> import nannyml as nml
>>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(ref_df)
>>> # create a new estimator, chunking by week
>>> estimator = nml.CBPE(model_metadata=metadata, chunk_period='W')
static __new__(cls, model_metadata: nannyml.metadata.base.ModelMetadata, *args, **kwargs)[source]

Creates a new CBPE subclass instance based on the type of the provided model_metadata.

abstract estimate(data: pandas.core.frame.DataFrame) nannyml.performance_estimation.confidence_based.results.CBPEPerformanceEstimatorResult[source]

Calculates the data reconstruction drift for a given data set.

Parameters

data (pd.DataFrame) – The dataset to calculate the reconstruction drift for.

Returns

estimates – A result object where each row represents a Chunk, containing Chunk properties and the estimated metrics for that Chunk.

Return type

PerformanceEstimatorResult

Examples

>>> import nannyml as nml
>>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY)
>>> # create a new estimator and fit it on reference data
>>> estimator = nml.CBPE(model_metadata=metadata, chunk_period='W').fit(ref_df)
>>> estimates = estimator.estimate(data)
abstract fit(reference_data: pandas.core.frame.DataFrame) nannyml.performance_estimation.base.PerformanceEstimator[source]

Fits the drift calculator using a set of reference data.

Parameters

reference_data (pd.DataFrame) – A reference data set containing predictions (labels and/or probabilities) and target values.

Returns

estimator – The fitted estimator.

Return type

PerformanceEstimator

Examples

>>> import nannyml as nml
>>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY)
>>> # create a new estimator and fit it on reference data
>>> estimator = nml.CBPE(model_metadata=metadata, chunk_period='W').fit(ref_df)