nannyml.sampling_error.binary_classification module

Module containing functions to estimate sampling error for binary classification metrics.

The implementation of the sampling error estimation is split into two functions.

The first function is called during fitting and will calculate the sampling error components based the reference data. Most of the time these will be the standard deviation of the distribution of differences between y_true and y_pred and the fraction of positive labels in y_true.

The second function will be called during calculation or estimation. It takes the predetermined error components and combines them with the size of the (analysis) data to give an estimate for the sampling error.

nannyml.sampling_error.binary_classification.accuracy_sampling_error(sampling_error_components: Tuple, data) float[source]

Calculate the accuracy sampling error for a chunk of data.

Parameters:
  • sampling_error_components (a set of parameters that were derived from reference data.) –

  • data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.binary_classification.accuracy_sampling_error_components(y_true_reference: Series, y_pred_reference: Series) Tuple[source]

Estimate sampling error for accuracy.

Parameters:
  • y_true_reference (pd.Series) – Target values for the reference dataset.

  • y_pred_reference (pd.Series) – Predictions for the reference dataset.

Returns:

(std,)

Return type:

Tuple[np.ndarray]

nannyml.sampling_error.binary_classification.auroc_sampling_error(sampling_error_components, data)[source]

Calculate the AUROC sampling error for a chunk of data.

Parameters:
  • sampling_error_components (a set of parameters that were derived from reference data.) –

  • data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.binary_classification.auroc_sampling_error_components(y_true_reference: Series, y_pred_proba_reference: Series) Tuple[source]

Estimation of AUROC sampling error. Calculation is based on the Variance Sum Law and expressing AUROC as a Mann-Whitney U statistic.

Parameters:
  • y_true_reference (pd.Series) – Target values for the reference dataset.

  • y_pred_proba_reference (pd.Series) – Prediction values for the reference dataset.

Returns:

(std, fraction)

Return type:

Tuple[np.ndarray, float]

nannyml.sampling_error.binary_classification.f1_sampling_error(sampling_error_components, data)[source]

Calculate the F1 sampling error for a chunk of data.

Parameters:
  • sampling_error_components (a set of parameters that were derived from reference data.) –

  • data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.binary_classification.f1_sampling_error_components(y_true_reference: Series, y_pred_reference: Series) Tuple[source]

Estimate sampling error of F1 using modified standard error of mean formula.

Parameters:
  • y_true_reference (pd.Series) – Target values for the reference dataset.

  • y_pred_reference (pd.Series) – Predictions for the reference dataset.

Returns:

(std, fraction)

Return type:

Tuple[np.ndarray, float]

nannyml.sampling_error.binary_classification.precision_sampling_error(sampling_error_components, data)[source]

Calculate the precision sampling error for a chunk of data.

Parameters:
  • sampling_error_components (a set of parameters that were derived from reference data.) –

  • data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.binary_classification.precision_sampling_error_components(y_true_reference: Series, y_pred_reference: Series) Tuple[source]

Estimate sampling error for precision using modified standard error of mean formula.

Parameters:
  • y_true_reference (pd.Series) – Target values for the reference dataset.

  • y_pred_reference (pd.Series) – Predictions for the reference dataset.

Returns:

(std, fraction)

Return type:

Tuple[np.ndarray, float]

nannyml.sampling_error.binary_classification.recall_sampling_error(sampling_error_components, data)[source]

Calculate the recall sampling error for a chunk of data.

Parameters:
  • sampling_error_components (a set of parameters that were derived from reference data.) –

  • data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.binary_classification.recall_sampling_error_components(y_true_reference: Series, y_pred_reference: Series) Tuple[source]

Estimate sampling error for recall using modified standard error of mean formula.

Parameters:
  • y_true_reference (pd.Series) – Target values for the reference dataset.

  • y_pred_reference (pd.Series) – Predictions for the reference dataset.

Returns:

(std, fraction)

Return type:

Tuple[np.ndarray, float]

nannyml.sampling_error.binary_classification.specificity_sampling_error(sampling_error_components, data)[source]

Calculate the specificity sampling error for a chunk of data.

Parameters:
  • sampling_error_components (a set of parameters that were derived from reference data.) –

  • data (the (analysis) data you want to calculate or estimate a metric for.) –

Returns:

sampling_error

Return type:

float

nannyml.sampling_error.binary_classification.specificity_sampling_error_components(y_true_reference: Series, y_pred_reference: Series) Tuple[source]

Estimate sampling error for specificity using modified standard error of mean formula.

Parameters:
  • y_true_reference (pd.Series) – Target values for the reference dataset.

  • y_pred_reference (pd.Series) – Predictions for the reference dataset.

Returns:

(std, fraction)

Return type:

Tuple[np.ndarray, float]