nannyml.stats.avg.calculator module
Simple Statistics Average Calculator.
- class nannyml.stats.avg.calculator.SummaryStatsAvgCalculator(column_names: ~typing.Union[str, ~typing.List[str]], timestamp_column_name: ~typing.Optional[str] = None, chunk_size: ~typing.Optional[int] = None, chunk_number: ~typing.Optional[int] = None, chunk_period: ~typing.Optional[str] = None, chunker: ~typing.Optional[~nannyml.chunk.Chunker] = None, threshold: ~nannyml.thresholds.Threshold = StandardDeviationThreshold{'std_lower_multiplier': 3, 'std_upper_multiplier': 3, 'offset_from': <function nanmean>})[source]
Bases:
AbstractCalculator
SummaryStatsAvgCalculator implementation.
Creates a new SummaryStatsAvgCalculator instance.
- Parameters:
column_names (Union[str, List[str]]) – A string or list containing the names of features in the provided data set. Missing Values will be calculated for each entry in this list.
timestamp_column_name (str) – The name of the column containing the timestamp of the model prediction.
chunk_size (int) – Splits the data into chunks containing chunks_size observations. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_number (int) – Splits the data into chunk_number pieces. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_period (str) – Splits the data according to the given period. Only one of chunk_size, chunk_number or chunk_period should be given.
chunker (Chunker) – The Chunker used to split the data sets into a lists of chunks.
threshold (Appropriate Threshold subclass.) – Defines alert thresholds strategy. Defaults to StandardDeviationThreshold()
Examples
>>> import nannyml as nml >>> reference, analysis, _ = nml.load_synthetic_car_price_dataset() >>> column_names = ['car_value', 'debt_to_income_ratio', 'driver_tenure'] >>> calc = nml.SummaryStatsSumCalculator( ... column_names=column_names, ... timestamp_column_name='timestamp', ... ).fit(reference) >>> res = calc.calculate(analysis) >>> for column_name in res.column_names: ... res = res.filter(period='analysis', column_name=column_name).plot().show()