nannyml.drift.ranking module

Module containing ways to rank drifting features.

class nannyml.drift.ranking.AlertCountRanking[source]

Bases: nannyml.drift.ranking.Ranking

Ranks drifting features by the number of ‘alerts’ they’ve caused.

ALERT_COLUMN_SUFFIX = '_alert'

rank(drift_calculation_result: nannyml.drift.base.DriftResult, model_metadata: nannyml.metadata.base.ModelMetadata, only_drifting: bool = False) → pandas.core.frame.DataFrame[source]

Compares the number of alerts for each feature and uses that for ranking.

Parameters

drift_calculation_result (pd.DataFrame) – The drift calculation results. Requires alert columns to be present. These are recognized and parsed using the ALERT_COLUMN_SUFFIX pattern, currently equal to '_alert'.
model_metadata (ModelMetadata) – Metadata describing the monitored model, used to check what the features are and exclude predictions from ranking results.
only_drifting (bool) – Omits features without alerts from the ranking results.

Returns

feature_ranking – A DataFrame containing the feature names and their ranks (the highest rank starts at 1, second-highest rank is 2, etc.)

Return type

pd.DataFrame

Examples

>>> import nannyml as nml
>>> reference_df, analysis_df, target_df = nml.load_synthetic_binary_classification_dataset()
>>> metadata = nml.extract_metadata(reference_df)
>>> metadata.target_column_name = 'work_home_actual'
>>> calc = nml.UnivariateStatisticalDriftCalculator(metadata, chunk_size=5000)
>>> calc.fit(reference_df)
>>> drift = calc.calculate(analysis_df)
>>>
>>> ranked = Ranker.by('alert_count').rank(drift, metadata)
>>> ranked

class nannyml.drift.ranking.Ranker[source]

Bases: object

Factory class to easily access Ranking implementations.

classmethod by(key: Optional[str], **kwargs)[source]

Returns a Ranking subclass instance given a key value.

If the provided key equals None, then a new instance of the default Ranking (AlertCountRanking) will be returned.

If a non-existent key is provided an InvalidArgumentsException is raised.

Parameters: key (str) – The key used to retrieve a Ranking. When providing a key that is already in the index, the value will be overwritten.
Returns: ranking – A new instance of a specific Ranking subclass.
Return type: Ranking

Examples

>>> ranking = Ranker.by('alert_count')

classmethod register_ranking(key: str, ranking: nannyml.drift.ranking.Ranking)[source]

Registers a new calibrator to the index.

This index associates a certain key with a Ranking instance.

Parameters

key (str) – The key used to retrieve a Calibrator. When providing a key that is already in the index, the value will be overwritten.
ranking (Ranking) – An instance of a Ranking subclass.

Examples

>>> Ranker.register_ranking('alert_count', AlertCountRanking())

class nannyml.drift.ranking.Ranking[source]

Bases: abc.ABC

Used to rank drifting features according to impact.

rank(drift_calculation_result: nannyml.drift.base.DriftResult, model_metadata: nannyml.metadata.base.ModelMetadata, only_drifting: bool = False) → pandas.core.frame.DataFrame[source]

Ranks the features within a drift calculation according to impact.

Parameters

drift_calculation_result (pd.DataFrame) – The drift calculation results.
model_metadata (ModelMetadata) – Metadata describing the monitored model.
only_drifting (bool) – Omits non-drifting features from the ranking if True.

Returns

feature_ranking – A DataFrame containing at least a feature name and a rank per row.

Return type

pd.DataFrame