nannyml.sampling_error.binary_classification module¶
Module containing functions to estimate sampling error for binary classification metrics.
The implementation of the sampling error estimation is split into two functions.
The first function is called during fitting and will calculate the sampling error components based the reference data.
Most of the time these will be the standard deviation of the distribution of differences between
y_true
and y_pred
and the fraction of positive labels in y_true
.
The second function will be called during calculation or estimation. It takes the predetermined error components and combines them with the size of the (analysis) data to give an estimate for the sampling error.
- nannyml.sampling_error.binary_classification.accuracy_sampling_error(sampling_error_components: Tuple, data) float [source]¶
Calculate the accuracy sampling error for a chunk of data.
- Parameters
sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –
- Returns
sampling_error
- Return type
float
- nannyml.sampling_error.binary_classification.accuracy_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_reference: pandas.core.series.Series) Tuple [source]¶
Estimate sampling error for accuracy.
- Parameters
y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.
- Returns
(std,)
- Return type
Tuple[np.ndarray]
- nannyml.sampling_error.binary_classification.auroc_sampling_error(sampling_error_components, data)[source]¶
Calculate the AUROC sampling error for a chunk of data.
- Parameters
sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –
- Returns
sampling_error
- Return type
float
- nannyml.sampling_error.binary_classification.auroc_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_proba_reference: pandas.core.series.Series) Tuple [source]¶
Estimation of AUROC sampling error. Calculation is based on the Variance Sum Law and expressing AUROC as a Mann-Whitney U statistic.
- Parameters
y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_proba_reference (pd.Series) – Prediction values for the reference dataset.
- Returns
(std, fraction)
- Return type
Tuple[np.ndarray, float]
- nannyml.sampling_error.binary_classification.f1_sampling_error(sampling_error_components, data)[source]¶
Calculate the F1 sampling error for a chunk of data.
- Parameters
sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –
- Returns
sampling_error
- Return type
float
- nannyml.sampling_error.binary_classification.f1_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_reference: pandas.core.series.Series) Tuple [source]¶
Estimate sampling error of F1 using modified standard error of mean formula.
- Parameters
y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.
- Returns
(std, fraction)
- Return type
Tuple[np.ndarray, float]
- nannyml.sampling_error.binary_classification.precision_sampling_error(sampling_error_components, data)[source]¶
Calculate the precision sampling error for a chunk of data.
- Parameters
sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –
- Returns
sampling_error
- Return type
float
- nannyml.sampling_error.binary_classification.precision_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_reference: pandas.core.series.Series) Tuple [source]¶
Estimate sampling error for precision using modified standard error of mean formula.
- Parameters
y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.
- Returns
(std, fraction)
- Return type
Tuple[np.ndarray, float]
- nannyml.sampling_error.binary_classification.recall_sampling_error(sampling_error_components, data)[source]¶
Calculate the recall sampling error for a chunk of data.
- Parameters
sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –
- Returns
sampling_error
- Return type
float
- nannyml.sampling_error.binary_classification.recall_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_reference: pandas.core.series.Series) Tuple [source]¶
Estimate sampling error for recall using modified standard error of mean formula.
- Parameters
y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.
- Returns
(std, fraction)
- Return type
Tuple[np.ndarray, float]
- nannyml.sampling_error.binary_classification.specificity_sampling_error(sampling_error_components, data)[source]¶
Calculate the specificity sampling error for a chunk of data.
- Parameters
sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –
- Returns
sampling_error
- Return type
float
- nannyml.sampling_error.binary_classification.specificity_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_reference: pandas.core.series.Series) Tuple [source]¶
Estimate sampling error for specificity using modified standard error of mean formula.
- Parameters
y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.
- Returns
(std, fraction)
- Return type
Tuple[np.ndarray, float]