nannyml.drift.model_inputs.multivariate.data_reconstruction.calculator module

Drift calculator using Reconstruction Error as a measure of drift.

class nannyml.drift.model_inputs.multivariate.data_reconstruction.calculator.DataReconstructionDriftCalculator(model_metadata, features: Optional[List[str]] = None, n_components: Union[int, float, str] = 0.65, chunk_size: Optional[int] = None, chunk_number: Optional[int] = None, chunk_period: Optional[str] = None, chunker: Optional[nannyml.chunk.Chunker] = None, imputer_categorical: Optional[sklearn.impute._base.SimpleImputer] = None, imputer_continuous: Optional[sklearn.impute._base.SimpleImputer] = None)[source]

Bases: nannyml.drift.base.DriftCalculator

BaseDriftCalculator implementation using Reconstruction Error as a measure of drift.

Creates a new DataReconstructionDriftCalculator instance.

Parameters
  • model_metadata (ModelMetadata) – Metadata for the model whose data is to be processed.

  • features (List[str], default=None) – An optional list of feature names to use during drift calculation. None by default, in this case all features are used during calculation.

  • n_components (Union[int, float, str]) – The n_components parameter as passed to the sklearn.decomposition.PCA constructor. See https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

  • chunk_size (int) – Splits the data into chunks containing chunks_size observations. Only one of chunk_size, chunk_number or chunk_period should be given.

  • chunk_number (int) – Splits the data into chunk_number pieces. Only one of chunk_size, chunk_number or chunk_period should be given.

  • chunk_period (str) – Splits the data according to the given period. Only one of chunk_size, chunk_number or chunk_period should be given.

  • chunker (Chunker) – The Chunker used to split the data sets into a lists of chunks.

  • imputer_categorical (SimpleImputer) – The SimpleImputer used to impute categorical features in the data. Defaults to using most_frequent value.

  • imputer_continuous (SimpleImputer) – The SimpleImputer used to impute continuous features in the data. Defaults to using mean value.

Examples

>>> import nannyml as nml
>>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY)
>>> # Create a calculator that will chunk by week
>>> drift_calc = nml.DataReconstructionDriftCalculator(model_metadata=metadata, chunk_period='W')
calculate(data: pandas.core.frame.DataFrame) nannyml.drift.model_inputs.multivariate.data_reconstruction.results.DataReconstructionDriftCalculatorResult[source]

Calculates the data reconstruction drift for a given data set.

Parameters

data (pd.DataFrame) – The dataset to calculate the reconstruction drift for.

Returns

reconstruction_drift – A result object where each row represents a Chunk, containing Chunk properties and the reconstruction_drift calculated for that Chunk.

Return type

DataReconstructionDriftCalculatorResult

Examples

>>> import nannyml as nml
>>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY)
>>> # Create a calculator and fit it
>>> drift_calc = nml.DataReconstructionDriftCalculator(model_metadata=metadata, chunk_period='W').fit(ref_df)
>>> drift = drift_calc.calculate(data)
fit(reference_data: pandas.core.frame.DataFrame)[source]

Fits the drift calculator using a set of reference data.

Parameters

reference_data (pd.DataFrame) – A reference data set containing predictions (labels and/or probabilities) and target values.

Returns

calculator – The fitted calculator.

Return type

DriftCalculator

Examples

>>> import nannyml as nml
>>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY)
>>> # Create a calculator and fit it
>>> drift_calc = nml.DataReconstructionDriftCalculator(model_metadata=metadata, chunk_period='W').fit(ref_df)