nannyml.sampling_error.binary_classification module¶

Module containing functions to estimate sampling error for binary classification metrics.

The implementation of the sampling error estimation is split into two functions.

The first function is called during fitting and will calculate the sampling error components based the reference data. Most of the time these will be the standard deviation of the distribution of differences between y_true and y_pred and the fraction of positive labels in y_true.

The second function will be called during calculation or estimation. It takes the predetermined error components and combines them with the size of the (analysis) data to give an estimate for the sampling error.

nannyml.sampling_error.binary_classification.accuracy_sampling_error(sampling_error_components: Tuple, data) → float[source]¶

Calculate the accuracy sampling error for a chunk of data.

Parameters

sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns

sampling_error

Return type

float

nannyml.sampling_error.binary_classification.accuracy_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_reference: pandas.core.series.Series) → Tuple[source]¶

Estimate sampling error for accuracy.

Parameters

y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.

Returns

(std,)

Return type

Tuple[np.ndarray]

nannyml.sampling_error.binary_classification.auroc_sampling_error(sampling_error_components, data)[source]¶

Calculate the AUROC sampling error for a chunk of data.

Parameters

sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns

sampling_error

Return type

float

nannyml.sampling_error.binary_classification.auroc_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_proba_reference: pandas.core.series.Series) → Tuple[source]¶

Estimation of AUROC sampling error. Calculation is based on the Variance Sum Law and expressing AUROC as a Mann-Whitney U statistic.

Parameters

y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_proba_reference (pd.Series) – Prediction values for the reference dataset.

Returns

(std, fraction)

Return type

Tuple[np.ndarray, float]

nannyml.sampling_error.binary_classification.f1_sampling_error(sampling_error_components, data)[source]¶

Calculate the F1 sampling error for a chunk of data.

Parameters

sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns

sampling_error

Return type

float

nannyml.sampling_error.binary_classification.f1_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_reference: pandas.core.series.Series) → Tuple[source]¶

Estimate sampling error of F1 using modified standard error of mean formula.

Parameters

y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.

Returns

(std, fraction)

Return type

Tuple[np.ndarray, float]

nannyml.sampling_error.binary_classification.precision_sampling_error(sampling_error_components, data)[source]¶

Calculate the precision sampling error for a chunk of data.

Parameters

sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns

sampling_error

Return type

float

nannyml.sampling_error.binary_classification.precision_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_reference: pandas.core.series.Series) → Tuple[source]¶

Estimate sampling error for precision using modified standard error of mean formula.

Parameters

y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.

Returns

(std, fraction)

Return type

Tuple[np.ndarray, float]

nannyml.sampling_error.binary_classification.recall_sampling_error(sampling_error_components, data)[source]¶

Calculate the recall sampling error for a chunk of data.

Parameters

sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns

sampling_error

Return type

float

nannyml.sampling_error.binary_classification.recall_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_reference: pandas.core.series.Series) → Tuple[source]¶

Estimate sampling error for recall using modified standard error of mean formula.

Parameters

y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.

Returns

(std, fraction)

Return type

Tuple[np.ndarray, float]

nannyml.sampling_error.binary_classification.specificity_sampling_error(sampling_error_components, data)[source]¶

Calculate the specificity sampling error for a chunk of data.

Parameters

sampling_error_components (a set of parameters that were derived from reference data.) –
data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns

sampling_error

Return type

float

nannyml.sampling_error.binary_classification.specificity_sampling_error_components(y_true_reference: pandas.core.series.Series, y_pred_reference: pandas.core.series.Series) → Tuple[source]¶

Estimate sampling error for specificity using modified standard error of mean formula.

Parameters

y_true_reference (pd.Series) – Target values for the reference dataset.
y_pred_reference (pd.Series) – Predictions for the reference dataset.

Returns

(std, fraction)

Return type

Tuple[np.ndarray, float]