nannyml.drift.target.target_distribution.calculator module
Module for target distribution monitoring.
- class nannyml.drift.target.target_distribution.calculator.TargetDistributionCalculator(model_metadata: nannyml.metadata.base.ModelMetadata, chunk_size: Optional[int] = None, chunk_number: Optional[int] = None, chunk_period: Optional[str] = None, chunker: Optional[nannyml.chunk.Chunker] = None)[source]
Bases:
object
Calculates target distribution for a given dataset.
Constructs a new TargetDistributionCalculator.
- Parameters
model_metadata (ModelMetadata) – Metadata for the model whose data is to be processed.
chunk_size (int) – Splits the data into chunks containing chunks_size observations. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_number (int) – Splits the data into chunk_number pieces. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_period (str) – Splits the data according to the given period. Only one of chunk_size, chunk_number or chunk_period should be given.
chunker (Chunker) – The Chunker used to split the data sets into a lists of chunks.
Examples
>>> import nannyml as nml >>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset() >>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY) >>> # Create a calculator that will chunk by week >>> target_distribution_calc = nml.TargetDistributionCalculator(model_metadata=metadata, chunk_period='W')
- calculate(data: pandas.core.frame.DataFrame)[source]
Calculates the target distribution of a binary classifier.
Requires fitting the calculator on reference data first.
- Parameters
data (pd.DataFrame) – Data for the model, i.e. model inputs, predictions and targets.
Examples
>>> import nannyml as nml >>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset() >>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY) >>> target_distribution_calc = nml.TargetDistributionCalculator(model_metadata=metadata, chunk_period='W') >>> target_distribution_calc.fit(ref_df) >>> # calculate target distribution >>> target_distribution = target_distribution_calc.calculate(ana_df)
- fit(reference_data: pandas.core.frame.DataFrame) nannyml.drift.target.target_distribution.calculator.TargetDistributionCalculator [source]
Fits the calculator to reference data.
During fitting the reference target data is validated and stored for later use.
Examples
>>> import nannyml as nml >>> ref_df, ana_df, _ = nml.load_synthetic_binary_classification_dataset() >>> metadata = nml.extract_metadata(ref_df, model_type=nml.ModelType.CLASSIFICATION_BINARY) >>> target_distribution_calc = nml.TargetDistributionCalculator(model_metadata=metadata, chunk_period='W') >>> # fit the calculator on reference data >>> target_distribution_calc.fit(ref_df)