nannyml.performance_calculation.calculator module

Calculates realized performance metrics when target data is available.

The performance calculator manages a list of Metric instances, constructed using the MetricFactory. The estimator is then responsible for delegating the fit and estimate method calls to each of the managed Metric instances and building a Result object.

For more information, check out the tutorials.

Examples

>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df, analysis_df, analysis_targets_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_df = analysis_df.merge(analysis_targets_df, left_index=True, right_index=True)
>>> display(reference_df.head(3))
>>> calc = nml.PerformanceCalculator(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     problem_type='classification_binary',
...     metrics=['roc_auc', 'f1', 'precision', 'recall', 'specificity', 'accuracy', 'average_precision'],
...     chunk_size=5000)
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)
>>> display(results.filter(period='analysis').to_df())
>>> display(results.filter(period='reference').to_df())
>>> figure = results.plot()
>>> figure.show()

class nannyml.performance_calculation.calculator.PerformanceCalculator(metrics: Union[str, List[str]], y_true: str, problem_type: Union[str, ProblemType], y_pred: Optional[str] = None, y_pred_proba: Optional[Union[str, Dict[str, str]]] = None, timestamp_column_name: Optional[str] = None, thresholds: Optional[Dict[str, Threshold]] = None, chunk_size: Optional[int] = None, chunk_number: Optional[int] = None, chunk_period: Optional[str] = None, chunker: Optional[Chunker] = None, normalize_confusion_matrix: Optional[str] = None, business_value_matrix: Optional[Union[List, ndarray]] = None, normalize_business_value: Optional[str] = None)[source]

Bases: AbstractCalculator

Calculates realized performance metrics when target data is available.

Creates a new performance calculator.

Parameters:

metrics (Union[str, List[str]]) – A metric or list of metrics to calculate.
y_true (str) – The name of the column containing target values.
y_pred (Optional[str], default=None) – The name of the column containing your model predictions. This parameter is optional for binary classification cases. When it is not given, only the ROC AUC and Average Precision metrics are supported.
problem_type (Union[str, ProblemType]) –
Determines which method to use. Allowed values are:
- ’regression’
- ’classification_binary’
- ’classification_multiclass’
y_pred_proba (ModelOutputsType, default=None) – Name(s) of the column(s) containing your model output. Pass a single string when there is only a single model output column, e.g. in binary classification cases. Pass a dictionary when working with multiple output columns, e.g. in multiclass classification cases. The dictionary maps a class/label string to the column name containing model outputs for that class/label.
timestamp_column_name (str, default=None) – The name of the column containing the timestamp of the model prediction.

thresholds (dict) –

The default values are:

{
    'roc_auc': StandardDeviationThreshold(),
    'f1': StandardDeviationThreshold(),
    'precision': StandardDeviationThreshold(),
    'average_precision': StandardDeviationThreshold(),
    'recall': StandardDeviationThreshold(),
    'specificity': StandardDeviationThreshold(),
    'accuracy': StandardDeviationThreshold(),
    'confusion_matrix': StandardDeviationThreshold(),
    'business_value': StandardDeviationThreshold(),
    'mae': StandardDeviationThreshold(),
    'mape': StandardDeviationThreshold(),
    'mse': StandardDeviationThreshold(),
    'msle': StandardDeviationThreshold(),
    'rmse': StandardDeviationThreshold(),
    'rmsle': StandardDeviationThreshold(),
}

A dictionary allowing users to set a custom threshold for each method. It links a Threshold subclass to a method name. This dictionary is optional. When a dictionary is given its values will override the default values. If no dictionary is given a default will be applied.

chunk_size (int, default=None) – Splits the data into chunks containing chunks_size observations. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_number (int, default=None) – Splits the data into chunk_number pieces. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_period (str, default=None) – Splits the data according to the given period. Only one of chunk_size, chunk_number or chunk_period should be given.
chunker (Chunker, default=None) – The Chunker used to split the data sets into a lists of chunks.
normalize_confusion_matrix (str, default=None) – Determines how the confusion matrix will be normalized. Allowed values are None, ‘all’, ‘true’ and ‘predicted’. If None, the confusion matrix will not be normalized and the counts for each cell of the matrix will be returned. If ‘all’, the confusion matrix will be normalized by the total number of observations. If ‘true’, the confusion matrix will be normalized by the total number of observations for each true class. If ‘predicted’, the confusion matrix will be normalized by the total number of observations for each predicted class.
business_value_matrix (Optional[Union[List, np.ndarray]], default=None) – A nxn matrix that specifies the value of each cell in the confusion matrix. The format of the business value matrix must be specified so that each element represents the business value of it’s respective confusion matrix element. Hence the element on the i-th row and j-column of the business value matrix tells us the value of the i-th target while we predicted the j-th value. It can be provided as a list of lists or a numpy array.
normalize_business_value (str, default=None) – Determines how the business value will be normalized. Allowed values are None and ‘per_prediction’. If None, the business value will not be normalized and the value returned will be the total value per chunk. If ‘per_prediction’, the value will be normalized by the number of predictions in the chunk.

Examples

>>> import nannyml as nml
>>> from IPython.display import display
>>> reference_df, analysis_df, analysis_targets_df = nml.load_synthetic_car_loan_dataset()
>>> analysis_df = analysis_df.merge(analysis_targets_df, left_index=True, right_index=True)
>>> display(reference_df.head(3))
>>> calc = nml.PerformanceCalculator(
...     y_pred_proba='y_pred_proba',
...     y_pred='y_pred',
...     y_true='repaid',
...     timestamp_column_name='timestamp',
...     problem_type='classification_binary',
...     metrics=['roc_auc', 'f1', 'precision', 'recall', 'specificity', 'accuracy', 'average_precision'],
...     chunk_size=5000)
>>> calc.fit(reference_df)
>>> results = calc.calculate(analysis_df)
>>> display(results.filter(period='analysis').to_df())
>>> display(results.filter(period='reference').to_df())
>>> figure = results.plot()
>>> figure.show()

nannyml.performance_calculation.calculator.raise_if_metrics_require_y_pred(metrics: List[str], y_pred: Optional[str])[source]

Raise an exception if metrics require y_pred and y_pred is not set.

Current metrics that require ‘y_pred’ are: - roc_auc - average_precision