nannyml.performance_calculation.metrics.binary_classification module

Module containing implemenations for binary classification metrics and utilities.

class nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationAP(y_true: str, threshold: Threshold, y_pred: Optional[str] = None, y_pred_proba: Optional[str] = None, **kwargs)[source]

Bases: Metric

Average Precision metric.

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html

Creates a new AP instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Optional[str], default=None) – Name(s) of the column(s) containing your model output. For binary classification, pass a single string referring to the model output column.

__str__()[source]

Metric string.

y_pred_proba: str
class nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationAUROC(y_true: str, threshold: Threshold, y_pred: Optional[str] = None, y_pred_proba: Optional[str] = None, **kwargs)[source]

Bases: Metric

Area under Receiver Operating Curve metric.

Creates a new AUROC instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Optional[str], default=None) – Name(s) of the column(s) containing your model output. For binary classification, pass a single string referring to the model output column.

__str__()[source]

Metric string.

y_pred_proba: str
class nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationAccuracy(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[str] = None, **kwargs)[source]

Bases: Metric

Accuracy metric.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Optional[str], default=None) – Name(s) of the column(s) containing your model output. For binary classification, pass a single string refering to the model output column.

Creates a new Accuracy instance.

__str__()[source]

Get string representation of metric.

y_pred: str
class nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationBusinessValue(y_true: str, y_pred: str, threshold: Threshold, business_value_matrix: Union[List, ndarray], normalize_business_value: Optional[str] = None, y_pred_proba: Optional[str] = None, **kwargs)[source]

Bases: Metric

Business Value metric.

Creates a new Business Value instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • business_value_matrix (Union[List, np.ndarray]) – A 2x2 matrix that specifies the value of each cell in the confusion matrix. The format of the business value matrix must be specified as [[value_of_TN, value_of_FP], [value_of_FN, value_of_TP]]. Required when estimating the ‘business_value’ metric.

  • normalize_business_value (Optional[str], default=None) – Determines how the business value will be normalized. Allowed values are None and ‘per_prediction’.

  • y_pred_proba (Optional[str], default=None) – Name(s) of the column(s) containing your model output. For binary classification, pass a single string refering to the model output column.

__str__()[source]

Get string representation of metric.

y_pred: str
class nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationConfusionMatrix(y_true: str, y_pred: str, threshold: Threshold, normalize_confusion_matrix: Optional[str] = None, y_pred_proba: Optional[str] = None, **kwargs)[source]

Bases: Metric

Confusion Matrix metric.

Creates a new Confusion Matrix instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • normalize_confusion_matrix (Optional[str], default=None) – Determines how the confusion matrix will be normalized. Allowed values are None, ‘all’, ‘true’ and ‘predicted’.

  • y_pred_proba (Optional[str], default=None) – Name(s) of the column(s) containing your model output. For binary classification, pass a single string refering to the model output column.

__str__()[source]

Get string representation of metric.

fit(reference_data: DataFrame, chunker: Chunker)[source]

Fits a Metric on reference data.

Parameters:
  • reference_data (pd.DataFrame) – The reference data used for fitting. Must have target data available.

  • chunker (Chunker) – The Chunker used to split the reference data into chunks. This value is provided by the calling PerformanceCalculator.

get_chunk_record(chunk_data: DataFrame) Dict[source]

Returns a dictionary containing the conduction matrix values for a given chunk.

Parameters:

chunk_data (pd.DataFrame) – A pandas dataframe containing the data for a given chunk.

Returns:

chunk_record – A dictionary of confusion matrix metrics, value pairs.

Return type:

Dict

get_false_neg_info(chunk_data: DataFrame) Dict[source]

Returns a dictionary containing infomation about the false negatives for a given chunk.

Parameters:

chunk_data (pd.DataFrame) – A pandas dataframe containing the data for a given chunk.

Returns:

false_neg_info – A dictionary of false negative’s information and its value pairs.

Return type:

Dict

get_false_pos_info(chunk_data: DataFrame) Dict[source]

Returns a dictionary containing infomation about the false positives for a given chunk.

Parameters:

chunk_data (pd.DataFrame) – A pandas dataframe containing the data for a given chunk.

Returns:

false_pos_info – A dictionary of false positive’s information and its value pairs.

Return type:

Dict

get_true_neg_info(chunk_data: DataFrame) Dict[source]

Returns a dictionary containing infomation about the true negatives for a given chunk.

Parameters:

chunk_data (pd.DataFrame) – A pandas dataframe containing the data for a given chunk.

Returns:

true_neg_info – A dictionary of true negative’s information and its value pairs.

Return type:

Dict

get_true_pos_info(chunk_data: DataFrame) Dict[source]

Returns a dictionary containing infomation about the true positives for a given chunk.

Parameters:

chunk_data (pd.DataFrame) – A pandas dataframe containing the data for a given chunk.

Returns:

true_pos_info – A dictionary of true positive’s information and its value pairs.

Return type:

Dict

y_pred: str
class nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationF1(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[str] = None, **kwargs)[source]

Bases: Metric

F1 score metric.

Creates a new F1 instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Optional[str], default=None) – Name(s) of the column(s) containing your model output. For binary classification, pass a single string refering to the model output column.

__str__()[source]

Get string representation of metric.

y_pred: str
class nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationPrecision(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[str] = None, **kwargs)[source]

Bases: Metric

Precision metric.

Creates a new Precision instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Optional[str], default=None) – Name(s) of the column(s) containing your model output. For binary classification, pass a single string refering to the model output column.

__str__()[source]

Get string representation of metric.

y_pred: str
class nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationRecall(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[str] = None, **kwargs)[source]

Bases: Metric

Recall metric, also known as ‘sensitivity’.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Optional[str], default=None) – Name(s) of the column(s) containing your model output. For binary classification, pass a single string refering to the model output column.

Creates a new Recall instance.

__str__()[source]

Get string representation of metric.

y_pred: str
class nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationSpecificity(y_true: str, y_pred: str, threshold: Threshold, y_pred_proba: Optional[str] = None, **kwargs)[source]

Bases: Metric

Specificity metric.

Creates a new F1 instance.

Parameters:
  • y_true (str) – The name of the column containing target values.

  • y_pred (str) – The name of the column containing your model predictions.

  • threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.

  • y_pred_proba (Optional[str], default=None) – Name(s) of the column(s) containing your model output. For binary classification, pass a single string refering to the model output column.

__str__()[source]

Get string representation of metric.

y_pred: str