nannyml.performance_calculation.metrics.multiclass_classification module

Module containing metric utilities and implementations.

class nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationAP(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Dict[str, str], **kwargs)[source]

Bases: Metric

Average Precision metric.

Creates a new AP instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Union[str, Dict[str, str]]) –

    Name(s) of the column(s) containing your model output.

    • For binary classification, pass a single string refering to the model output column.

    • For multiclass classification, pass a dictionary that maps a class string to the column name containing model outputs for that class.

__str__()[source]

Get string representation of metric.

y_pred_proba: Dict[str, str]
class nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationAUROC(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Dict[str, str], **kwargs)[source]

Bases: Metric

Area under Receiver Operating Curve metric.

Creates a new AUROC instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Union[str, Dict[str, str]]) –

    Name(s) of the column(s) containing your model output.

    • For binary classification, pass a single string refering to the model output column.

    • For multiclass classification, pass a dictionary that maps a class string to the column name containing model outputs for that class.

__str__()[source]

Get string representation of metric.

y_pred_proba: Dict[str, str]
class nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationAccuracy(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[Union[str, Dict[str, str]]] = None, **kwargs)[source]

Bases: Metric

Accuracy metric.

Creates a new Accuracy instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Union[str, Dict[str, str]]) –

    Name(s) of the column(s) containing your model output.

    • For binary classification, pass a single string refering to the model output column.

    • For multiclass classification, pass a dictionary that maps a class string to the column name containing model outputs for that class.

__str__()[source]

Get string representation of metric.

y_pred: str
y_pred_proba: Dict[str, str]
class nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationBusinessValue(y_true: str, y_pred: str, threshold: Threshold, business_value_matrix: Union[List, ndarray], normalize_business_value: Optional[str] = None, y_pred_proba: Optional[Dict[str, str]] = None, **kwargs)[source]

Bases: Metric

Business Value metric.

Creates a new Business Value instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • business_value_matrix (Union[List, np.ndarray]) – A nxn matrix that specifies the value of each cell in the confusion matrix. The format of the business value matrix must be specified so that each element represents the business value of it’s respective confusion matrix element. Hence the element on the i-th row and j-column of the business value matrix tells us the value of the i-th target while we predicted the j-th value. It can be provided as a list of lists or a numpy array.

  • normalize_business_value (Optional[str], default=None) – Determines how the business value will be normalized. Allowed values are None and ‘per_prediction’.

  • y_pred_proba (Optional[str], default=None) – Name(s) of the column(s) containing your model output. For binary classification, pass a single string refering to the model output column.

__str__()[source]

Get string representation of metric.

y_pred: str
y_pred_proba: Dict[str, str]
class nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationConfusionMatrix(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[Union[str, Dict[str, str]]] = None, normalize_confusion_matrix: Optional[str] = None, **kwargs)[source]

Bases: Metric

Multiclass Confusion Matrix metric.

Creates a new confusion matrix instance.

__str__()[source]

Get string representation of metric.

fit(reference_data: DataFrame, chunker: Chunker)[source]

Fits a Metric on reference data.

Parameters:
  • reference_data (pd.DataFrame) – The reference data used for fitting. Must have target data available.

  • chunker (Chunker) – The Chunker used to split the reference data into chunks. This value is provided by the calling PerformanceCalculator.

get_chunk_record(chunk_data: DataFrame) Dict[str, Union[float, bool]][source]

Create results for provided chunk data.

sampling_error(data: DataFrame)[source]

Calculates the sampling error with respect to the reference data for a given chunk of data.

Parameters:

data (pd.DataFrame) – The data to calculate the sampling error on, with respect to the reference data.

Returns:

sampling_error – The expected sampling error.

Return type:

float

y_pred: str
y_pred_proba: Dict[str, str]
class nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationF1(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[Union[str, Dict[str, str]]] = None, **kwargs)[source]

Bases: Metric

F1 score metric.

Creates a new F1 instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Union[str, Dict[str, str]]) –

    Name(s) of the column(s) containing your model output.

    • For binary classification, pass a single string refering to the model output column.

    • For multiclass classification, pass a dictionary that maps a class string to the column name containing model outputs for that class.

__str__()[source]

Get string representation of metric.

y_pred: str
y_pred_proba: Dict[str, str]
class nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationPrecision(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[Union[str, Dict[str, str]]] = None, **kwargs)[source]

Bases: Metric

Precision metric.

Creates a new Precision instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Union[str, Dict[str, str]]) –

    Name(s) of the column(s) containing your model output.

    • For binary classification, pass a single string refering to the model output column.

    • For multiclass classification, pass a dictionary that maps a class string to the column name containing model outputs for that class.

__str__()[source]

Get string representation of metric.

y_pred: str
y_pred_proba: Dict[str, str]
class nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationRecall(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[Union[str, Dict[str, str]]] = None, **kwargs)[source]

Bases: Metric

Recall metric, also known as ‘sensitivity’.

Creates a new Recall instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Union[str, Dict[str, str]]) –

    Name(s) of the column(s) containing your model output.

    • For binary classification, pass a single string refering to the model output column.

    • For multiclass classification, pass a dictionary that maps a class string to the column name containing model outputs for that class.

__str__()[source]

Get string representation of metric.

y_pred: str
y_pred_proba: Dict[str, str]
class nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationSpecificity(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[Union[str, Dict[str, str]]] = None, **kwargs)[source]

Bases: Metric

Specificity metric.

Creates a new Specificity instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Union[str, Dict[str, str]]) –

    Name(s) of the column(s) containing your model output.

    • For binary classification, pass a single string refering to the model output column.

    • For multiclass classification, pass a dictionary that maps a class string to the column name containing model outputs for that class.

__str__()[source]

Get string representation of metric.

y_pred: str
y_pred_proba: Dict[str, str]