nannyml.performance_estimation.confidence_based.cbpe module

Implementation of the CBPE estimator.

class nannyml.performance_estimation.confidence_based.cbpe.CBPE(model_metadata: nannyml.metadata.base.ModelMetadata, *args, **kwargs)[source]

Bases: nannyml.performance_estimation.base.PerformanceEstimator

Performance estimator using the Confidence Based Performance Estimation (CBPE) technique.

Initializes a new CBPE performance estimator.

Parameters

model_metadata (ModelMetadata) – Metadata telling the DriftCalculator what columns are required for drift calculation.
metrics (List[str]) – A list of metrics to calculate.
features (List[str], default=None) – An optional list of feature column names. When set only these columns will be included in the drift calculation. If not set all feature columns will be used.
chunk_size (int, default=None) – Splits the data into chunks containing chunks_size observations. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_number (int, default=None) – Splits the data into chunk_number pieces. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_period (str, default=None) – Splits the data according to the given period. Only one of chunk_size, chunk_number or chunk_period should be given.
chunker (Chunker, default=None) – The Chunker used to split the data sets into a lists of chunks.
calibration (str, default='isotonic') – Determines which calibration will be applied to the model predictions. Defaults to isotonic, currently the only supported value.
calibrator (Calibrator, default=None) – A specific instance of a Calibrator to be applied to the model predictions. If not set NannyML will use the value of the calibration variable instead.

Examples

>>> import nannyml as nml
>>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(ref_df)
>>> # create a new estimator, chunking by week
>>> estimator = nml.CBPE(model_metadata=metadata, chunk_period='W')

static __new__(cls, model_metadata: nannyml.metadata.base.ModelMetadata, *args, **kwargs)[source]: Creates a new CBPE subclass instance based on the type of the provided model_metadata.

abstract estimate(data: pandas.core.frame.DataFrame) → nannyml.performance_estimation.confidence_based.results.CBPEPerformanceEstimatorResult[source]

Calculates the data reconstruction drift for a given data set.

Parameters: data (pd.DataFrame) – The dataset to calculate the reconstruction drift for.
Returns: estimates – A result object where each row represents a Chunk, containing Chunk properties and the estimated metrics for that Chunk.
Return type: PerformanceEstimatorResult

Examples

>>> import nannyml as nml
>>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY)
>>> # create a new estimator and fit it on reference data
>>> estimator = nml.CBPE(model_metadata=metadata, chunk_period='W').fit(ref_df)
>>> estimates = estimator.estimate(data)

abstract fit(reference_data: pandas.core.frame.DataFrame) → nannyml.performance_estimation.base.PerformanceEstimator[source]

Fits the drift calculator using a set of reference data.

Parameters: reference_data (pd.DataFrame) – A reference data set containing predictions (labels and/or probabilities) and target values.
Returns: estimator – The fitted estimator.
Return type: PerformanceEstimator

Examples

>>> import nannyml as nml
>>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY)
>>> # create a new estimator and fit it on reference data
>>> estimator = nml.CBPE(model_metadata=metadata, chunk_period='W').fit(ref_df)