nannyml.calibration module

Calibrating model scores into probabilities.

class nannyml.calibration.Calibrator[source]

Bases: ABC

Class that is able to calibrate y_pred_proba scores into probabilities.

calibrate(y_pred_proba: ndarray)[source]

Perform calibration of prediction scores.

Parameters:

y_pred_proba (numpy.ndarray) – Vector of continuous scores/probabilities. Has to be the same shape as y_true.

fit(y_pred_proba: ndarray, y_true: ndarray)[source]

Fits the calibrator using a reference data set.

Parameters:
  • y_pred_proba (numpy.ndarray) – Vector of continuous reference scores/probabilities. Has to be the same shape as y_true.

  • y_true (numpy.ndarray) – Vector with reference binary targets - 0 or 1. Shape (n,).

class nannyml.calibration.CalibratorFactory[source]

Bases: object

Factory class to aid in construction of Calibrators.

classmethod create(key: Optional[str], **kwargs)[source]

Creates a new Calibrator given a key value and optional keyword args.

If the provided key equals None, then a new instance of the default Calibrator (IsotonicCalibrator) will be returned.

If a non-existent key is provided an InvalidArgumentsException is raised.

Parameters:
  • key (str) – The key used to retrieve a Calibrator. When providing a key that is already in the index, the value will be overwritten.

  • kwargs (dict) – Optional keyword arguments that will be passed along to the function associated with the key. It can then use these arguments during the creation of a new Calibrator instance.

Returns:

calibrator – A new instance of a specific Calibrator subclass.

Return type:

Calibrator

Examples

>>> calibrator = CalibratorFactory.create('isotonic', kwargs={'foo': 'bar'})
classmethod register_calibrator(key: str, create_calibrator: Callable)[source]

Registers a new calibrator to the index.

This index associates a certain key with a function that can be used to construct a new Calibrator instance.

Parameters:
  • key (str) – The key used to retrieve a Calibrator. When providing a key that is already in the index, the value will be overwritten.

  • create_calibrator (Callable) – A function that - given a **kwargs argument - create a new instance of a Calibrator subclass.

Examples

>>> CalibratorFactory.register_calibrator('isotonic', lambda kwargs: IsotonicCalibrator())
class nannyml.calibration.IsotonicCalibrator[source]

Bases: Calibrator

Calibrates using IsotonicRegression model.

Creates a new IsotonicCalibrator.

calibrate(y_pred_proba: ndarray)[source]

Perform calibration of prediction scores.

Parameters:

y_pred_proba (numpy.ndarray) – Vector of continuous scores/probabilities. Has to be the same shape as y_true.

fit(y_pred_proba: ndarray, y_true: ndarray)[source]

Fits the calibrator using a reference data set.

Parameters:
  • y_pred_proba (numpy.ndarray) – Vector of continuous reference scores/probabilities. Has to be the same shape as y_true.

  • y_true (numpy.ndarray) – Vector with reference binary targets - 0 or 1. Shape (n,).

class nannyml.calibration.NoopCalibrator[source]

Bases: Calibrator

A Calibrator subclass that simply returns the inputs unaltered.

calibrate(y_pred_proba: ndarray)[source]

Calibrate nothing and just return the original y_pred_proba inputs.

fit(y_pred_proba: ndarray, y_true: ndarray)[source]

Fit nothing and just return the calibrator.

nannyml.calibration.needs_calibration(y_true: ndarray, y_pred_proba: ndarray, calibrator: Calibrator, bin_count: int = 10, split_count: int = 10) bool[source]

Returns whether a series of prediction scores benefits from additional calibration or not.

Performs probability calibration in cross validation loop. For each fold a difference of Expected Calibration Error (ECE) between non calibrated and calibrated probabilites is calculated. If in any of the folds the difference is lower than zero (i.e. ECE of calibrated probability is larger than that of non-calibrated) returns False. Otherwise - returns True.

Parameters:
  • calibrator (Calibrator) – The Calibrator to use during testing.

  • y_true (np.array) – Series with reference binary targets - 0 or 1. Shape (n,).

  • y_pred_proba (np.array) – Series or DataFrame of continuous reference scores/probabilities. Has to be the same shape as y_true.

  • bin_count (int) – Desired amount of bins to calculate ECE on.

  • split_count (int) – Desired number of splits to make, i.e. number of times to evaluate calibration.

Returns:

needs_calibrationTrue when the scores benefit from calibration, False otherwise.

Return type:

bool

Examples

>>> import numpy as np
>>> from nannyml.calibration import IsotonicCalibrator
>>> np.random.seed(1)
>>> y_true = np.random.binomial(1, 0.5, 10)
>>> y_pred_proba = np.linspace(0, 1, 10)
>>> calibrator = IsotonicCalibrator()
>>> needs_calibration(y_true, y_pred_proba, calibrator, bin_count=2, split_count=3)
True