nannyml.performance_calculation.metrics.base module
- class nannyml.performance_calculation.metrics.base.Metric(name: str, y_true: str, y_pred: str, components: List[Tuple[str, str]], threshold: Threshold, y_pred_proba: Optional[Union[str, Dict[str, str]]] = None, upper_threshold_limit: Optional[float] = None, lower_threshold_limit: Optional[float] = None, **kwargs)[source]
Bases:
ABC
A performance metric used to calculate realized model performance.
Creates a new Metric instance.
- Parameters:
name (str) – The name used to indicate the metric in columns of a DataFrame.
y_true (str) – The name of the column containing target values.
y_pred (str) – The name of the column containing your model predictions.
components (List[Tuple[str, str]]) – A list of (display_name, column_name) tuples. The display_name is used for display purposes, while the column_name is used for column names in the output.
threshold (Threshold) – The Threshold instance that determines how the lower and upper threshold values will be calculated.
y_pred_proba (Optional[Union[str, Dict[str, str]]], default=None) – Name(s) of the column(s) containing your model output. - For binary classification, pass a single string refering to the model output column. - For multiclass classification, pass a dictionary that maps a class string to the column name containing model outputs for that class.
upper_threshold_limit (float, default=None) – An optional upper threshold for the performance metric.
lower_threshold_limit (float, default=None) – An optional lower threshold for the performance metric.
- alert(value: float) bool [source]
Returns True if a calculated metric value is below a lower threshold or above an upper threshold.
- Parameters:
value (float) – Value of a calculated metric.
- Returns:
bool
- Return type:
bool
- calculate(data: DataFrame)[source]
Calculates performance metrics on data.
- Parameters:
data (pd.DataFrame) – The data to calculate performance metrics on. Requires presence of either the predicted labels or prediction scores/probabilities (depending on the metric to be calculated), as well as the target data.
- property column_name: str
- property column_names: List[str]
- property display_name: str
- property display_names: List[str]
- fit(reference_data: DataFrame, chunker: Chunker)[source]
Fits a Metric on reference data.
- Parameters:
reference_data (pd.DataFrame) – The reference data used for fitting. Must have target data available.
chunker (Chunker) – The
Chunker
used to split the reference data into chunks. This value is provided by the callingPerformanceCalculator
.
- get_chunk_record(chunk_data: DataFrame) Dict [source]
Returns a DataFrame containing the performance metrics for a given chunk.
- sampling_error(data: DataFrame)[source]
Calculates the sampling error with respect to the reference data for a given chunk of data.
- Parameters:
data (pd.DataFrame) – The data to calculate the sampling error on, with respect to the reference data.
- Returns:
sampling_error – The expected sampling error.
- Return type:
float
- class nannyml.performance_calculation.metrics.base.MetricFactory[source]
Bases:
object
A factory class that produces Metric instances based on a given magic string or a metric specification.
- classmethod create(key: str, use_case: ProblemType, **kwargs) Metric [source]
Returns a Metric instance for a given key.
- registry: Dict[str, Dict[ProblemType, Type[Metric]]] = {'accuracy': {ProblemType.CLASSIFICATION_BINARY: <class 'nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationAccuracy'>, ProblemType.CLASSIFICATION_MULTICLASS: <class 'nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationAccuracy'>}, 'average_precision': {ProblemType.CLASSIFICATION_BINARY: <class 'nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationAP'>}, 'business_value': {ProblemType.CLASSIFICATION_BINARY: <class 'nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationBusinessValue'>}, 'confusion_matrix': {ProblemType.CLASSIFICATION_BINARY: <class 'nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationConfusionMatrix'>, ProblemType.CLASSIFICATION_MULTICLASS: <class 'nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationConfusionMatrix'>}, 'f1': {ProblemType.CLASSIFICATION_BINARY: <class 'nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationF1'>, ProblemType.CLASSIFICATION_MULTICLASS: <class 'nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationF1'>}, 'mae': {ProblemType.REGRESSION: <class 'nannyml.performance_calculation.metrics.regression.MAE'>}, 'mape': {ProblemType.REGRESSION: <class 'nannyml.performance_calculation.metrics.regression.MAPE'>}, 'mse': {ProblemType.REGRESSION: <class 'nannyml.performance_calculation.metrics.regression.MSE'>}, 'msle': {ProblemType.REGRESSION: <class 'nannyml.performance_calculation.metrics.regression.MSLE'>}, 'precision': {ProblemType.CLASSIFICATION_BINARY: <class 'nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationPrecision'>, ProblemType.CLASSIFICATION_MULTICLASS: <class 'nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationPrecision'>}, 'recall': {ProblemType.CLASSIFICATION_BINARY: <class 'nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationRecall'>, ProblemType.CLASSIFICATION_MULTICLASS: <class 'nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationRecall'>}, 'rmse': {ProblemType.REGRESSION: <class 'nannyml.performance_calculation.metrics.regression.RMSE'>}, 'rmsle': {ProblemType.REGRESSION: <class 'nannyml.performance_calculation.metrics.regression.RMSLE'>}, 'roc_auc': {ProblemType.CLASSIFICATION_BINARY: <class 'nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationAUROC'>, ProblemType.CLASSIFICATION_MULTICLASS: <class 'nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationAUROC'>}, 'specificity': {ProblemType.CLASSIFICATION_BINARY: <class 'nannyml.performance_calculation.metrics.binary_classification.BinaryClassificationSpecificity'>, ProblemType.CLASSIFICATION_MULTICLASS: <class 'nannyml.performance_calculation.metrics.multiclass_classification.MulticlassClassificationSpecificity'>}}