nannyml.base module
Module containing base classes for drift calculation.
- class nannyml.base.Abstract1DResult(results_data: DataFrame, *args, **kwargs)[source]
Bases:
AbstractResult
,ABC
Creates a new
AbstractCalculatorResult
instance.- Parameters:
results_data (pd.DataFrame) – The data returned by the Calculator.
- property chunk_end_dates: Series
- property chunk_end_indices: Series
- property chunk_indices: Series
- property chunk_keys: Series
- property chunk_periods: Series
- property chunk_start_dates: Series
- property chunk_start_index: Series
- property chunk_start_indices: Series
- class nannyml.base.Abstract2DResult(results_data: DataFrame, *args, **kwargs)[source]
Bases:
AbstractResult
,ABC
Creates a new
AbstractCalculatorResult
instance.- Parameters:
results_data (pd.DataFrame) – The data returned by the Calculator.
- property chunk_end_dates: Series
- property chunk_end_indices: Series
- property chunk_indices: Series
- property chunk_keys: Series
- property chunk_periods: Series
- property chunk_start_dates: Series
- property chunk_start_index: Series
- property chunk_start_indices: Series
- class nannyml.base.AbstractCalculator(chunk_size: Optional[int] = None, chunk_number: Optional[int] = None, chunk_period: Optional[str] = None, chunker: Optional[Chunker] = None, timestamp_column_name: Optional[str] = None)[source]
Bases:
ABC
Base class for drift calculation.
Creates a new instance of an abstract DriftCalculator.
- Parameters:
chunk_size (int) – Splits the data into chunks containing chunks_size observations. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_number (int) – Splits the data into chunk_number pieces. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_period (str) – Splits the data according to the given period. Only one of chunk_size, chunk_number or chunk_period should be given.
chunker (Chunker) – The Chunker used to split the data sets into a lists of chunks.
timestamp_column_name (str) – The column name of the column containing timestamp information.
- class nannyml.base.AbstractEstimator(chunk_size: Optional[int] = None, chunk_number: Optional[int] = None, chunk_period: Optional[str] = None, chunker: Optional[Chunker] = None, timestamp_column_name: Optional[str] = None)[source]
Bases:
ABC
Base class for drift calculation.
Creates a new instance of an abstract DriftCalculator.
- Parameters:
chunk_size (int) – Splits the data into chunks containing chunks_size observations. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_number (int) – Splits the data into chunk_number pieces. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_period (str) – Splits the data according to the given period. Only one of chunk_size, chunk_number or chunk_period should be given.
chunker (Chunker) – The Chunker used to split the data sets into a lists of chunks.
timestamp_column_name (str) – The column name of the column containing timestamp information.
- class nannyml.base.AbstractEstimatorResult(results_data: DataFrame)[source]
Bases:
ABC
Contains the results of a drift calculation and provides additional functionality such as plotting.
The result of the
calculate()
method of aDriftCalculator
.It is an abstract class containing shared properties and methods across implementations. For each
DriftCalculator
class there will be an associatedDriftResult
implementation.Creates a new DriftResult instance.
- Parameters:
results_data (pd.DataFrame) – The result data of the performed calculation.
- DEFAULT_COLUMNS = ['key', 'chunk_index', 'start_index', 'end_index', 'start_date', 'end_date', 'period']
- property empty: bool
- class nannyml.base.AbstractResult(results_data: DataFrame, *args, **kwargs)[source]
Bases:
ABC
Contains the results of a calculation and provides plotting functionality.
The result of the
calculate()
method of aAbstractCalculator
.It is an abstract class containing shared properties and methods across implementations. For each
AbstractCalculator
class there will be a correspondingAbstractCalculatorResult
implementation.Creates a new
AbstractCalculatorResult
instance.- Parameters:
results_data (pd.DataFrame) – The data returned by the Calculator.
- DEFAULT_COLUMNS = ('key', 'chunk_index', 'start_index', 'end_index', 'start_date', 'end_date', 'period')
- property empty: bool
- class nannyml.base.PerColumnResult(results_data: DataFrame, column_names: Union[str, List[str]] = [], *args, **kwargs)[source]
Bases:
Abstract1DResult
,ABC
Creates a new
AbstractCalculatorResult
instance.- Parameters:
results_data (pd.DataFrame) – The data returned by the Calculator.
- class nannyml.base.PerMetricPerColumnResult(results_data: DataFrame, metrics: list[MetricLike] = [], column_names: List[str] = [], *args, **kwargs)[source]
Bases:
Abstract2DResult
,ABC
,Generic
[MetricLike
]Creates a new
AbstractCalculatorResult
instance.- Parameters:
results_data (pd.DataFrame) – The data returned by the Calculator.
- class nannyml.base.PerMetricResult(results_data: DataFrame, metrics: list[MetricLike] = [], *args, **kwargs)[source]
Bases:
Abstract1DResult
,ABC
,Generic
[MetricLike
]Creates a new
AbstractCalculatorResult
instance.- Parameters:
results_data (pd.DataFrame) – The data returned by the Calculator.
- nannyml.base.common_nan_removal(data: DataFrame, selected_columns: List[str]) Tuple[DataFrame, bool] [source]
- nannyml.base.common_nan_removal(data: Sequence[array], selected_columns: List[int]) Tuple[DataFrame, bool]
Wrapper function to handle both pandas DataFrame and sequences of numpy ndarrays.
- Parameters:
data (Union[pd.DataFrame, Sequence[np.array]]) – Pandas dataframe or sequence of numpy ndarrays containing data.
selected_columns (Union[List[str], List[int]]) – List containing the column names or indices
- Returns:
result – Dataframe with rows containing NaN’s on selected columns removed. All columns of original dataframe or ndarrays are being returned.
empty – Boolean whether the resulting data contains any rows (false) or not (true)