nannyml.sampling_error.multiclass_classification module

Module containing functions to estimate sampling error for multiclass classification metrics.

nannyml.sampling_error.multiclass_classification.accuracy_sampling_error(sampling_error_components: Tuple, data) float[source]

Calculate the accuracy sampling error for a chunk of data.

Parameters:
  • sampling_error_components – a set of parameters that were derived from reference data.

  • data – the (analysis) data you want to calculate or estimate a metric for.

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.multiclass_classification.accuracy_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series])[source]

Calculate sampling error components for accuracy using reference data.

The y_true_reference and y_pred_proba_reference lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.

Parameters:
  • y_true_reference (List[pd.Series]) – Target values for the reference dataset.

  • y_pred_reference (List[pd.Series]) – Prediction values for the reference dataset.

Returns:

sampling_error_components

Return type:

Tuple

nannyml.sampling_error.multiclass_classification.auroc_sampling_error(sampling_error_components, data) float[source]

Calculate the AUROC sampling error for a chunk of data.

Parameters:
  • sampling_error_components – a set of parameters that were derived from reference data.

  • data – the (analysis) data you want to calculate or estimate a metric for.

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.multiclass_classification.auroc_sampling_error_components(y_true_reference: List[Series], y_pred_proba_reference: List[Series])[source]

Calculate sampling error components for AUROC using reference data.

The y_true_reference and y_pred_proba_reference lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.

Parameters:
  • y_true_reference (List[pd.Series]) – Target values for the reference dataset.

  • y_pred_proba_reference (List[pd.Series]) – Prediction probability values for the reference dataset.

Returns:

sampling_error_components

Return type:

List[Tuple]

nannyml.sampling_error.multiclass_classification.average_precision_sampling_error(sampling_error_components, data) float[source]

Calculate the AUROC sampling error for a chunk of data.

Parameters:
  • sampling_error_components – a set of parameters that were derived from reference data.

  • data – the (chunk) data you want to calculate or estimate a metric for.

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.multiclass_classification.average_precision_sampling_error_components(y_true_reference: List[ndarray], y_pred_proba_reference: List[Series])[source]

Calculate sampling error components for AP using reference data.

The y_true_reference and y_pred_proba_reference lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.

Parameters:
  • y_true_reference (List[np.ndarray]) – Target values for the reference dataset.

  • y_pred_proba_reference (List[pd.Series]) – Prediction probability values for the reference dataset.

Returns:

sampling_error_components

Return type:

List[Tuple]

nannyml.sampling_error.multiclass_classification.business_value_sampling_error(sampling_error_components: Tuple, data) float[source]

Calculate the false positive rate sampling error for a chunk of data.

Parameters:
  • sampling_error_components – a set of parameters that were derived from reference data.

  • data – the (chunk) data you want to calculate or estimate a metric for.

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.multiclass_classification.business_value_sampling_error_components(y_true_reference: Series, y_pred_reference: Series, business_value_matrix: ndarray, classes: List[str], normalize_business_value: Optional[str]) Tuple[float, Optional[str]][source]

Estimate sampling error for the false negative rate.

Parameters:
  • y_true_reference (pd.Series) – Target values for the reference dataset.

  • y_pred_reference (pd.Series) – Predictions for the reference dataset.

  • business_value_matrix (np.ndarray) – A nxn matrix of values for the business problem.

  • classes (List[str]) – An alphanumerically sorted list of the unique classes in the multiclass problem

  • normalize_business_value (Optional[str], default=None) – Determines how the business value will be normalized. Allowed values are None and ‘per_prediction’.

Returns:

components

Return type:

tuple

nannyml.sampling_error.multiclass_classification.f1_sampling_error(sampling_error_components: List[Tuple], data) float[source]

Calculate the F1 sampling error for a chunk of data.

Parameters:
  • sampling_error_components – a set of parameters that were derived from reference data.

  • data – the (analysis) data you want to calculate or estimate a metric for.

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.multiclass_classification.f1_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series])[source]

Calculate sampling error components for F1 using reference data.

The y_true_reference and y_pred_proba_reference lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.

Parameters:
  • y_true_reference (List[pd.Series]) – Target values for the reference dataset.

  • y_pred_reference (List[pd.Series]) – Prediction values for the reference dataset.

Returns:

sampling_error_components

Return type:

List[Tuple]

nannyml.sampling_error.multiclass_classification.multiclass_confusion_matrix_sampling_error(sampling_error_components: Tuple, data)[source]

Calculate the CM sampling error for a chunk of data.

nannyml.sampling_error.multiclass_classification.multiclass_confusion_matrix_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series], normalize_confusion_matrix: Optional[str])[source]

Calculate sampling error components for CM using reference data.

nannyml.sampling_error.multiclass_classification.precision_sampling_error(sampling_error_components: List[Tuple], data) float[source]

Calculate the precision sampling error for a chunk of data.

Parameters:
  • sampling_error_components – a set of parameters that were derived from reference data.

  • data – the (analysis) data you want to calculate or estimate a metric for.

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.multiclass_classification.precision_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series])[source]

Calculate sampling error components for precision using reference data.

The y_true_reference and y_pred_proba_reference lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.

Parameters:
  • y_true_reference (List[pd.Series]) – Target values for the reference dataset.

  • y_pred_reference (List[pd.Series]) – Prediction values for the reference dataset.

Returns:

sampling_error_components

Return type:

List[Tuple]

nannyml.sampling_error.multiclass_classification.recall_sampling_error(sampling_error_components: List[Tuple], data) float[source]

Calculate the recall sampling error for a chunk of data.

Parameters:
  • sampling_error_components – a set of parameters that were derived from reference data.

  • data – the (analysis) data you want to calculate or estimate a metric for.

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.multiclass_classification.recall_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series])[source]

Calculate sampling error components for recall using reference data.

The y_true_reference and y_pred_proba_reference lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.

Parameters:
  • y_true_reference (List[pd.Series]) – Target values for the reference dataset.

  • y_pred_reference (List[pd.Series]) – Prediction values for the reference dataset.

Returns:

sampling_error_components

Return type:

List[Tuple]

nannyml.sampling_error.multiclass_classification.specificity_sampling_error(sampling_error_components: List[Tuple], data) float[source]

Calculate the specificity sampling error for a chunk of data.

Parameters:
  • sampling_error_components – a set of parameters that were derived from reference data.

  • data – the (analysis) data you want to calculate or estimate a metric for.

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.multiclass_classification.specificity_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series])[source]

Calculate sampling error components for specificity using reference data.

The y_true_reference and y_pred_proba_reference lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.

Parameters:
  • y_true_reference (List[pd.Series]) – Target values for the reference dataset.

  • y_pred_reference (List[pd.Series]) – Prediction values for the reference dataset.

Returns:

sampling_error_components

Return type:

List[Tuple]