nannyml.drift.model_inputs.univariate.statistical.calculator module
Statistical drift calculation using Kolmogorov-Smirnov and chi2-contingency tests.
- class nannyml.drift.model_inputs.univariate.statistical.calculator.UnivariateStatisticalDriftCalculator(model_metadata: nannyml.metadata.base.ModelMetadata, features: Optional[List[str]] = None, chunk_size: Optional[int] = None, chunk_number: Optional[int] = None, chunk_period: Optional[str] = None, chunker: Optional[nannyml.chunk.Chunker] = None)[source]
Bases:
nannyml.drift.base.DriftCalculator
A drift calculator that relies on statistics to detect drift.
Constructs a new UnivariateStatisticalDriftCalculator.
- Parameters
model_metadata (ModelMetadata) – Metadata for the model whose data is to be processed.
features (List[str], default=None) – An optional list of feature names to use during drift calculation. None by default, in this case all features are used during calculation.
chunk_size (int) – Splits the data into chunks containing chunks_size observations. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_number (int) – Splits the data into chunk_number pieces. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_period (str) – Splits the data according to the given period. Only one of chunk_size, chunk_number or chunk_period should be given.
chunker (Chunker) – The Chunker used to split the data sets into a lists of chunks.
Examples
>>> import nannyml as nml >>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset() >>> metadata = nml.extract_metadata(ref_df) >>> # Create a calculator that will chunk by week >>> drift_calc = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_period='W')
- calculate(data: pandas.core.frame.DataFrame) nannyml.drift.model_inputs.univariate.statistical.results.UnivariateDriftResult [source]
Calculates the data reconstruction drift for a given data set.
- Parameters
data (pd.DataFrame) – The dataset to calculate the reconstruction drift for.
- Returns
reconstruction_drift – A
result
object where each row represents aChunk
, containingChunk
properties and the reconstruction_drift calculated for thatChunk
.- Return type
Examples
>>> import nannyml as nml >>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset() >>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY) >>> # Create a calculator and fit it >>> drift_calc = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_period='W').fit(ref_df) >>> drift = drift_calc.calculate(data)
- fit(reference_data: pandas.core.frame.DataFrame)[source]
Fits the drift calculator using a set of reference data.
- Parameters
reference_data (pd.DataFrame) – A reference data set containing predictions (labels and/or probabilities) and target values.
- Returns
calculator – The fitted calculator.
- Return type
Examples
>>> import nannyml as nml >>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset() >>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY) >>> # Create a calculator and fit it >>> drift_calc = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_period='W').fit(ref_df)