nannyml.drift.ranker module¶
Module containing ways to rank drifting features.
- class nannyml.drift.ranker.AlertCountRanker[source]¶
Bases:
object
Ranks features by the number of drift ‘alerts’ they’ve caused.
- rank(drift_calculation_result: Result, only_drifting: bool = False) DataFrame [source]¶
Compares the number of alerts for each feature and ranks them accordingly.
- Parameters:
drift_calculation_result (nannyml.driQft.univariate.Result) – The result of a univariate drift calculation.
only_drifting (bool, default=False) – Omits features without alerts from the ranking results.
- Returns:
ranking – A DataFrame containing the feature names and their ranks (the highest rank starts at 1, second-highest rank is 2, etc.)
- Return type:
pd.DataFrame
Examples
>>> import nannyml as nml >>> from IPython.display import display >>> >>> reference_df = nml.load_synthetic_binary_classification_dataset()[0] >>> analysis_df = nml.load_synthetic_binary_classification_dataset()[1] >>> target_df = nml.load_synthetic_binary_classification_dataset()[2] >>> >>> display(reference_df.head()) >>> >>> column_names = [ >>> col for col in reference_df.columns if col not in ['timestamp', 'y_pred_proba', 'period', >>> 'y_pred', 'repaid', 'identifier']] >>> >>> calc = nml.UnivariateStatisticalDriftCalculator(column_names=column_names, >>> timestamp_column_name='timestamp') >>> >>> calc.fit(reference_df) >>> >>> results = calc.calculate(analysis_df.merge(target_df, on='identifier')) >>> >>> ranker = AlertCountRanker(drift_calculation_result=results) >>> ranked_features = ranker.rank(only_drifting=False) >>> display(ranked_features) column_name number_of_alerts rank 1 distance_from_office 5 1 2 salary_range 5 2 3 public_transportation_cost 5 3 4 wfh_prev_workday 5 4 5 tenure 2 5 6 gas_price_per_litre 0 6 7 workday 0 7 8 work_home_actual 0 8
- class nannyml.drift.ranker.CorrelationRanker[source]¶
Bases:
object
Ranks features according to drift correlation with performance impact.
Ranks the features according to the correlation of their selected drift results and absolute performance change from mean reference performance on selected metric.
Creates a new CorrelationRanker instance.