nannyml.drift.ranking module
Module containing ways to rank drifting features.
- class nannyml.drift.ranking.AlertCountRanking[source]
Bases:
nannyml.drift.ranking.Ranking
Ranks drifting features by the number of ‘alerts’ they’ve caused.
- ALERT_COLUMN_SUFFIX = '_alert'
- rank(drift_calculation_result: nannyml.drift.base.DriftResult, model_metadata: nannyml.metadata.base.ModelMetadata, only_drifting: bool = False) pandas.core.frame.DataFrame [source]
Compares the number of alerts for each feature and uses that for ranking.
- Parameters
drift_calculation_result (pd.DataFrame) – The drift calculation results. Requires alert columns to be present. These are recognized and parsed using the ALERT_COLUMN_SUFFIX pattern, currently equal to
'_alert'
.model_metadata (ModelMetadata) – Metadata describing the monitored model, used to check what the features are and exclude predictions from ranking results.
only_drifting (bool) – Omits features without alerts from the ranking results.
- Returns
feature_ranking – A DataFrame containing the feature names and their ranks (the highest rank starts at 1, second-highest rank is 2, etc.)
- Return type
pd.DataFrame
Examples
>>> import nannyml as nml >>> reference_df, analysis_df, target_df = nml.load_synthetic_binary_classification_dataset() >>> metadata = nml.extract_metadata(reference_df) >>> metadata.target_column_name = 'work_home_actual' >>> calc = nml.UnivariateStatisticalDriftCalculator(metadata, chunk_size=5000) >>> calc.fit(reference_df) >>> drift = calc.calculate(analysis_df) >>> >>> ranked = Ranker.by('alert_count').rank(drift, metadata) >>> ranked
- class nannyml.drift.ranking.Ranker[source]
Bases:
object
Factory class to easily access Ranking implementations.
- classmethod by(key: Optional[str], **kwargs)[source]
Returns a Ranking subclass instance given a key value.
If the provided key equals
None
, then a new instance of the default Ranking (AlertCountRanking) will be returned.If a non-existent key is provided an
InvalidArgumentsException
is raised.- Parameters
key (str) – The key used to retrieve a Ranking. When providing a key that is already in the index, the value will be overwritten.
- Returns
ranking – A new instance of a specific Ranking subclass.
- Return type
Examples
>>> ranking = Ranker.by('alert_count')
- classmethod register_ranking(key: str, ranking: nannyml.drift.ranking.Ranking)[source]
Registers a new calibrator to the index.
This index associates a certain key with a Ranking instance.
- Parameters
key (str) – The key used to retrieve a Calibrator. When providing a key that is already in the index, the value will be overwritten.
ranking (Ranking) – An instance of a Ranking subclass.
Examples
>>> Ranker.register_ranking('alert_count', AlertCountRanking())
- class nannyml.drift.ranking.Ranking[source]
Bases:
abc.ABC
Used to rank drifting features according to impact.
- rank(drift_calculation_result: nannyml.drift.base.DriftResult, model_metadata: nannyml.metadata.base.ModelMetadata, only_drifting: bool = False) pandas.core.frame.DataFrame [source]
Ranks the features within a drift calculation according to impact.
- Parameters
drift_calculation_result (pd.DataFrame) – The drift calculation results.
model_metadata (ModelMetadata) – Metadata describing the monitored model.
only_drifting (bool) – Omits non-drifting features from the ranking if True.
- Returns
feature_ranking – A DataFrame containing at least a feature name and a rank per row.
- Return type
pd.DataFrame