nannyml.drift.univariate.methods module

class nannyml.drift.univariate.methods.Chi2Statistic[source]

Bases: Method

Calculates the Chi2-contingency statistic.

An alert will be raised for a Chunk if p_value < 0.05.

Creates a new Metric instance.

Parameters:

display_name (str) – The name of the metric. Used to display in plots. If not given this name will be derived from the calculation_function.
column_name (str) – The name used to indicate the metric in columns of a DataFrame.
upper_threshold_limit (float, default=None) – An optional upper threshold for the performance metric.
lower_threshold_limit (float, default=None) – An optional lower threshold for the performance metric.

class nannyml.drift.univariate.methods.FeatureType(value)[source]

Bases: str, Enum

An enumeration indicating if a Method is applicable to continuous data, categorical data or both.

CATEGORICAL = 'categorical'

CONTINUOUS = 'continuous'

class nannyml.drift.univariate.methods.JensenShannonDistance[source]

Bases: Method

Calculates Jensen-Shannon distance.

An alert will be raised if distance > 0.1.

Creates a new Metric instance.

Parameters:

display_name (str) – The name of the metric. Used to display in plots. If not given this name will be derived from the calculation_function.
column_name (str) – The name used to indicate the metric in columns of a DataFrame.
upper_threshold_limit (float, default=None) – An optional upper threshold for the performance metric.
lower_threshold_limit (float, default=None) – An optional lower threshold for the performance metric.

class nannyml.drift.univariate.methods.KolmogorovSmirnovStatistic[source]

Bases: Method

Calculates the Kolmogorov-Smirnov d-stat.

An alert will be raised for a Chunk if p_value < 0.05.

Creates a new Metric instance.

Parameters:

display_name (str) – The name of the metric. Used to display in plots. If not given this name will be derived from the calculation_function.
column_name (str) – The name used to indicate the metric in columns of a DataFrame.
upper_threshold_limit (float, default=None) – An optional upper threshold for the performance metric.
lower_threshold_limit (float, default=None) – An optional lower threshold for the performance metric.

class nannyml.drift.univariate.methods.Method(display_name: str, column_name: str, upper_threshold: Optional[float] = None, lower_threshold: Optional[float] = None, upper_threshold_limit: Optional[float] = None, lower_threshold_limit: Optional[float] = None)[source]

Bases: ABC

A method to express the amount of drift between two distributions.

Creates a new Metric instance.

Parameters:

display_name (str) – The name of the metric. Used to display in plots. If not given this name will be derived from the calculation_function.
column_name (str) – The name used to indicate the metric in columns of a DataFrame.
upper_threshold_limit (float, default=None) – An optional upper threshold for the performance metric.
lower_threshold_limit (float, default=None) – An optional lower threshold for the performance metric.

__eq__(other)[source]: Establishes equality by comparing all properties.

alert(data: Series)[source]

Evaluates if an alert has occurred for this method on the current chunk data.

Parameters:: data (pd.DataFrame) – The data to evaluate for an alert.

calculate(data: Series)[source]

Calculates drift within data with respect to the reference data.

Parameters:: data (pd.DataFrame) – The data to compare to the reference data.

fit(reference_data: Series) → Method[source]

Fits a Method on reference data.

Parameters:: reference_data (pd.DataFrame) – The reference data used for fitting a Method. Must have target data available.

class nannyml.drift.univariate.methods.MethodFactory[source]

Bases: object

A factory class that produces Method instances given a ‘key’ string and a ‘feature_type’ it supports.

classmethod create(key: str, feature_type: FeatureType, **kwargs) → Method[source]

Returns a Method instance for a given key and FeatureType.

The value for the key is passed explicitly by the end user (provided within the UnivariateDriftCalculator initializer). The value for the FeatureType is provided implicitly by deducing it from the reference data upon fitting the UnivariateDriftCalculator.

Any additional keyword arguments are passed along to the initializer of the Method.

classmethod register(key: str, feature_type: FeatureType) → Callable[source]

A decorator used to register a specific Method implementation to the factory.

Registering a Method requires a key string and a FeatureType.

The key sets the string value to select a Method by, e.g. chi2 to select the Chi2-contingency implementation when creating a UnivariateDriftCalculator.

Some Methods will only be applicable to one FeatureType, e.g. Kolmogorov-Smirnov can only be used with continuous data, Chi2-contingency only with categorical data. Some support multiple types however, such as the Jensen-Shannon distance. These can be registered multiple times, once for each FeatureType they support. The value for key can be identical, the factory will use both the FeatureType and the key value to determine which class to instantiate.

Examples

>>> @MethodFactory.register(key='jensen_shannon', feature_type=FeatureType.CONTINUOUS)
>>> @MethodFactory.register(key='jensen_shannon', feature_type=FeatureType.CATEGORICAL)
>>> class JensenShannonDistance(Method):
...   pass

registry: Dict[str, Dict[FeatureType, Method]] = {'chi2': {FeatureType.CATEGORICAL: <class 'nannyml.drift.univariate.methods.Chi2Statistic'>}, 'jensen_shannon': {FeatureType.CATEGORICAL: <class 'nannyml.drift.univariate.methods.JensenShannonDistance'>, FeatureType.CONTINUOUS: <class 'nannyml.drift.univariate.methods.JensenShannonDistance'>}, 'kolmogorov_smirnov': {FeatureType.CONTINUOUS: <class 'nannyml.drift.univariate.methods.KolmogorovSmirnovStatistic'>}}