nannyml.drift.ranker module

Module containing ways to rank drifting features.

class nannyml.drift.ranker.AlertCountRanker[source]

Bases: object

Ranks features by the number of drift ‘alerts’ they’ve caused.

rank(drift_calculation_result: Result, only_drifting: bool = False) DataFrame[source]

Compares the number of alerts for each feature and ranks them accordingly.

Parameters:
  • drift_calculation_result (nannyml.driQft.univariate.Result) – The result of a univariate drift calculation.

  • only_drifting (bool, default=False) – Omits features without alerts from the ranking results.

Returns:

ranking – A DataFrame containing the feature names and their ranks (the highest rank starts at 1, second-highest rank is 2, etc.)

Return type:

pd.DataFrame

Examples

>>> import nannyml as nml
>>> from IPython.display import display
>>>
>>> reference_df = nml.load_synthetic_binary_classification_dataset()[0]
>>> analysis_df = nml.load_synthetic_binary_classification_dataset()[1]
>>> target_df = nml.load_synthetic_binary_classification_dataset()[2]
>>>
>>> display(reference_df.head())
>>>
>>> column_names = [
>>>     col for col in reference_df.columns if col not in ['timestamp', 'y_pred_proba', 'period',
>>>                                                        'y_pred', 'repaid', 'identifier']]
>>>
>>> calc = nml.UnivariateStatisticalDriftCalculator(column_names=column_names,
>>>                                                 timestamp_column_name='timestamp')
>>>
>>> calc.fit(reference_df)
>>>
>>> results = calc.calculate(analysis_df.merge(target_df, on='identifier'))
>>>
>>> ranker = AlertCountRanker(drift_calculation_result=results)
>>> ranked_features = ranker.rank(only_drifting=False)
>>> display(ranked_features)
                  column_name  number_of_alerts  rank
1        distance_from_office                 5     1
2                salary_range                 5     2
3  public_transportation_cost                 5     3
4            wfh_prev_workday                 5     4
5                      tenure                 2     5
6         gas_price_per_litre                 0     6
7                     workday                 0     7
8            work_home_actual                 0     8
class nannyml.drift.ranker.CorrelationRanker[source]

Bases: object

Ranks features according to drift correlation with performance impact.

Ranks the features according to the correlation of their selected drift results and absolute performance change from mean reference performance on selected metric.

Creates a new CorrelationRanker instance.

fit(reference_performance_calculation_result: Result | Result | Result | None = None) CorrelationRanker[source]
rank(drift_calculation_result: Result, performance_calculation_result: Result | Result | Result | None = None, only_drifting: bool = False)[source]