nannyml.sampling_error.multiclass_classification module
Module containing functions to estimate sampling error for multiclass classification metrics.
- nannyml.sampling_error.multiclass_classification.accuracy_sampling_error(sampling_error_components: Tuple, data) float [source]
Calculate the accuracy sampling error for a chunk of data.
- Parameters:
sampling_error_components – a set of parameters that were derived from reference data.
data – the (analysis) data you want to calculate or estimate a metric for.
- Returns:
sampling_error
- Return type:
float
- nannyml.sampling_error.multiclass_classification.accuracy_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series])[source]
Calculate sampling error components for accuracy using reference data.
The
y_true_reference
andy_pred_proba_reference
lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.- Parameters:
y_true_reference (List[pd.Series]) – Target values for the reference dataset.
y_pred_reference (List[pd.Series]) – Prediction values for the reference dataset.
- Returns:
sampling_error_components
- Return type:
Tuple
- nannyml.sampling_error.multiclass_classification.auroc_sampling_error(sampling_error_components, data) float [source]
Calculate the AUROC sampling error for a chunk of data.
- Parameters:
sampling_error_components – a set of parameters that were derived from reference data.
data – the (analysis) data you want to calculate or estimate a metric for.
- Returns:
sampling_error
- Return type:
float
- nannyml.sampling_error.multiclass_classification.auroc_sampling_error_components(y_true_reference: List[Series], y_pred_proba_reference: List[Series])[source]
Calculate sampling error components for AUROC using reference data.
The
y_true_reference
andy_pred_proba_reference
lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.- Parameters:
y_true_reference (List[pd.Series]) – Target values for the reference dataset.
y_pred_proba_reference (List[pd.Series]) – Prediction probability values for the reference dataset.
- Returns:
sampling_error_components
- Return type:
List[Tuple]
- nannyml.sampling_error.multiclass_classification.average_precision_sampling_error(sampling_error_components, data) float [source]
Calculate the AUROC sampling error for a chunk of data.
- Parameters:
sampling_error_components – a set of parameters that were derived from reference data.
data – the (chunk) data you want to calculate or estimate a metric for.
- Returns:
sampling_error
- Return type:
float
- nannyml.sampling_error.multiclass_classification.average_precision_sampling_error_components(y_true_reference: List[ndarray], y_pred_proba_reference: List[Series])[source]
Calculate sampling error components for AP using reference data.
The
y_true_reference
andy_pred_proba_reference
lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.- Parameters:
y_true_reference (List[np.ndarray]) – Target values for the reference dataset.
y_pred_proba_reference (List[pd.Series]) – Prediction probability values for the reference dataset.
- Returns:
sampling_error_components
- Return type:
List[Tuple]
- nannyml.sampling_error.multiclass_classification.business_value_sampling_error(sampling_error_components: Tuple, data) float [source]
Calculate the false positive rate sampling error for a chunk of data.
- Parameters:
sampling_error_components – a set of parameters that were derived from reference data.
data – the (chunk) data you want to calculate or estimate a metric for.
- Returns:
sampling_error
- Return type:
float
- nannyml.sampling_error.multiclass_classification.business_value_sampling_error_components(y_true_reference: Series, y_pred_reference: Series, business_value_matrix: ndarray, classes: List[str], normalize_business_value: Optional[str]) Tuple[float, Optional[str]] [source]
Estimate sampling error for the false negative rate.
- Parameters:
y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.
business_value_matrix (np.ndarray) – A nxn matrix of values for the business problem.
classes (List[str]) – An alphanumerically sorted list of the unique classes in the multiclass problem
normalize_business_value (Optional[str], default=None) – Determines how the business value will be normalized. Allowed values are None and ‘per_prediction’.
- Returns:
components
- Return type:
tuple
- nannyml.sampling_error.multiclass_classification.f1_sampling_error(sampling_error_components: List[Tuple], data) float [source]
Calculate the F1 sampling error for a chunk of data.
- Parameters:
sampling_error_components – a set of parameters that were derived from reference data.
data – the (analysis) data you want to calculate or estimate a metric for.
- Returns:
sampling_error
- Return type:
float
- nannyml.sampling_error.multiclass_classification.f1_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series])[source]
Calculate sampling error components for F1 using reference data.
The
y_true_reference
andy_pred_proba_reference
lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.- Parameters:
y_true_reference (List[pd.Series]) – Target values for the reference dataset.
y_pred_reference (List[pd.Series]) – Prediction values for the reference dataset.
- Returns:
sampling_error_components
- Return type:
List[Tuple]
- nannyml.sampling_error.multiclass_classification.multiclass_confusion_matrix_sampling_error(sampling_error_components: Tuple, data)[source]
Calculate the CM sampling error for a chunk of data.
- nannyml.sampling_error.multiclass_classification.multiclass_confusion_matrix_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series], normalize_confusion_matrix: Optional[str])[source]
Calculate sampling error components for CM using reference data.
- nannyml.sampling_error.multiclass_classification.precision_sampling_error(sampling_error_components: List[Tuple], data) float [source]
Calculate the precision sampling error for a chunk of data.
- Parameters:
sampling_error_components – a set of parameters that were derived from reference data.
data – the (analysis) data you want to calculate or estimate a metric for.
- Returns:
sampling_error
- Return type:
float
- nannyml.sampling_error.multiclass_classification.precision_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series])[source]
Calculate sampling error components for precision using reference data.
The
y_true_reference
andy_pred_proba_reference
lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.- Parameters:
y_true_reference (List[pd.Series]) – Target values for the reference dataset.
y_pred_reference (List[pd.Series]) – Prediction values for the reference dataset.
- Returns:
sampling_error_components
- Return type:
List[Tuple]
- nannyml.sampling_error.multiclass_classification.recall_sampling_error(sampling_error_components: List[Tuple], data) float [source]
Calculate the recall sampling error for a chunk of data.
- Parameters:
sampling_error_components – a set of parameters that were derived from reference data.
data – the (analysis) data you want to calculate or estimate a metric for.
- Returns:
sampling_error
- Return type:
float
- nannyml.sampling_error.multiclass_classification.recall_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series])[source]
Calculate sampling error components for recall using reference data.
The
y_true_reference
andy_pred_proba_reference
lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.- Parameters:
y_true_reference (List[pd.Series]) – Target values for the reference dataset.
y_pred_reference (List[pd.Series]) – Prediction values for the reference dataset.
- Returns:
sampling_error_components
- Return type:
List[Tuple]
- nannyml.sampling_error.multiclass_classification.specificity_sampling_error(sampling_error_components: List[Tuple], data) float [source]
Calculate the specificity sampling error for a chunk of data.
- Parameters:
sampling_error_components – a set of parameters that were derived from reference data.
data – the (analysis) data you want to calculate or estimate a metric for.
- Returns:
sampling_error
- Return type:
float
- nannyml.sampling_error.multiclass_classification.specificity_sampling_error_components(y_true_reference: List[Series], y_pred_reference: List[Series])[source]
Calculate sampling error components for specificity using reference data.
The
y_true_reference
andy_pred_proba_reference
lists represent the binarized target values and model probabilities. The order of the Series in both lists should both match the list of class labels present.- Parameters:
y_true_reference (List[pd.Series]) – Target values for the reference dataset.
y_pred_reference (List[pd.Series]) – Prediction values for the reference dataset.
- Returns:
sampling_error_components
- Return type:
List[Tuple]