nannyml.performance_estimation.direct_loss_estimation.metrics module
A module containing the implementations of metrics estimated by
DLE
.
The DLE
estimator
converts a list of metric names into Metric
instances using the MetricFactory
.
The DLE
estimator will then loop over these
Metric
instances to fit them on reference data
and run the estimation on analysis data.
- class nannyml.performance_estimation.direct_loss_estimation.metrics.MAE(feature_column_names: List[str], y_true: str, y_pred: str, chunker: Chunker, threshold: Threshold, tune_hyperparameters: bool, hyperparameter_tuning_config: Dict[str, Any], hyperparameters: Dict[str, Any])[source]
Bases:
Metric
Creates a new Mean Absolute Error (MAE) Metric instance.
- Parameters:
feature_column_names (List[str]) – A list of column names indicating which columns contain feature values.
y_true (str,) – The name of the column containing target values (that are provided in reference data during fitting).
y_pred (str,) – The name of the column containing your model predictions.
chunker (Chunker,) – The Chunker used to split the data sets into a lists of chunks.
tune_hyperparameters (bool,) – A boolean controlling whether hypertuning should be performed on the internal regressor models whilst fitting on reference data. Tuning hyperparameters takes some time and does not guarantee better results, hence it defaults to False.
hyperparameter_tuning_config (Dict[str, Any],) –
A dictionary that allows you to provide a custom hyperparameter tuning configuration when tune_hyperparameters has been set to True. The following dictionary is the default tuning configuration. It can be used as a template to modify:
{ "time_budget": 15, "metric": "mse", "estimator_list": ['lgbm'], "eval_method": "cv", "hpo_method": "cfo", "n_splits": 5, "task": 'regression', "seed": 1, "verbose": 0, }
For an overview of possible parameters for the tuning process check out the FLAML documentation.
hyperparameters (Dict[str, Any],) – A dictionary used to provide your own custom hyperparameters when tune_hyperparameters has been set to True. Check out the available hyperparameter options in the LightGBM documentation.
threshold (Threshold,) – The Threshold instance that determines how the lower and upper threshold values will be calculated.
- realized_performance(data: DataFrame) float [source]
Calculates de realized performance of a model with respect of a given chunk of data. The data needs to have both prediction and real targets.
- Parameters:
data (pd.DataFrame) – The data to calculate the realized performance on.
- Returns:
mae (float)
Mean Absolute Error
- class nannyml.performance_estimation.direct_loss_estimation.metrics.MAPE(feature_column_names: List[str], y_true: str, y_pred: str, chunker: Chunker, threshold: Threshold, tune_hyperparameters: bool, hyperparameter_tuning_config: Dict[str, Any], hyperparameters: Dict[str, Any])[source]
Bases:
Metric
Creates a new Mean Absolute Percentage Error (MAPE) Metric instance.
- Parameters:
feature_column_names (List[str]) – A list of column names indicating which columns contain feature values.
y_true (str,) – The name of the column containing target values (that are provided in reference data during fitting).
y_pred (str,) – The name of the column containing your model predictions.
chunker (Chunker,) – The Chunker used to split the data sets into a lists of chunks.
tune_hyperparameters (bool,) – A boolean controlling whether hypertuning should be performed on the internal regressor models whilst fitting on reference data. Tuning hyperparameters takes some time and does not guarantee better results, hence it defaults to False.
hyperparameter_tuning_config (Dict[str, Any],) –
A dictionary that allows you to provide a custom hyperparameter tuning configuration when tune_hyperparameters has been set to True. The following dictionary is the default tuning configuration. It can be used as a template to modify:
{ "time_budget": 15, "metric": "mse", "estimator_list": ['lgbm'], "eval_method": "cv", "hpo_method": "cfo", "n_splits": 5, "task": 'regression', "seed": 1, "verbose": 0, }
For an overview of possible parameters for the tuning process check out the FLAML documentation.
hyperparameters (Dict[str, Any],) –
A dictionary used to provide your own custom hyperparameters when tune_hyperparameters has been set to True. Check out the available hyperparameter options in the LightGBM documentation.
threshold (Threshold,) – The Threshold instance that determines how the lower and upper threshold values will be calculated.
- realized_performance(data: DataFrame) float [source]
Calculates de realized performance of a model with respect of a given chunk of data. The data needs to have both prediction and real targets.
- Parameters:
data (pd.DataFrame) – The data to calculate the realized performance on.
- Returns:
mae (float)
Mean Absolute Percentage Error
- class nannyml.performance_estimation.direct_loss_estimation.metrics.MSE(feature_column_names: List[str], y_true: str, y_pred: str, chunker: Chunker, threshold: Threshold, tune_hyperparameters: bool, hyperparameter_tuning_config: Dict[str, Any], hyperparameters: Dict[str, Any])[source]
Bases:
Metric
Creates a new Mean Squared Error (MSE) Metric instance.
- Parameters:
feature_column_names (List[str]) – A list of column names indicating which columns contain feature values.
y_true (str,) – The name of the column containing target values (that are provided in reference data during fitting).
y_pred (str,) – The name of the column containing your model predictions.
chunker (Chunker,) – The Chunker used to split the data sets into a lists of chunks.
tune_hyperparameters (bool,) – A boolean controlling whether hypertuning should be performed on the internal regressor models whilst fitting on reference data. Tuning hyperparameters takes some time and does not guarantee better results, hence it defaults to False.
hyperparameter_tuning_config (Dict[str, Any],) –
A dictionary that allows you to provide a custom hyperparameter tuning configuration when tune_hyperparameters has been set to True. The following dictionary is the default tuning configuration. It can be used as a template to modify:
{ "time_budget": 15, "metric": "mse", "estimator_list": ['lgbm'], "eval_method": "cv", "hpo_method": "cfo", "n_splits": 5, "task": 'regression', "seed": 1, "verbose": 0, }
For an overview of possible parameters for the tuning process check out the FLAML documentation.
hyperparameters (Dict[str, Any],) –
A dictionary used to provide your own custom hyperparameters when tune_hyperparameters has been set to True. Check out the available hyperparameter options in the LightGBM documentation.
threshold (Threshold,) – The Threshold instance that determines how the lower and upper threshold values will be calculated.
- realized_performance(data: DataFrame) float [source]
Calculates de realized performance of a model with respect of a given chunk of data. The data needs to have both prediction and real targets.
- Parameters:
data (pd.DataFrame) – The data to calculate the realized performance on.
- Returns:
mae (float)
Mean Squared Error
- class nannyml.performance_estimation.direct_loss_estimation.metrics.MSLE(feature_column_names: List[str], y_true: str, y_pred: str, chunker: Chunker, threshold: Threshold, tune_hyperparameters: bool, hyperparameter_tuning_config: Dict[str, Any], hyperparameters: Dict[str, Any])[source]
Bases:
Metric
Creates a new Mean Squared Log Error (MSLE) Metric instance.
- Parameters:
feature_column_names (List[str]) – A list of column names indicating which columns contain feature values.
y_true (str,) – The name of the column containing target values (that are provided in reference data during fitting).
y_pred (str,) – The name of the column containing your model predictions.
chunker (Chunker,) – The Chunker used to split the data sets into a lists of chunks.
tune_hyperparameters (bool,) – A boolean controlling whether hypertuning should be performed on the internal regressor models whilst fitting on reference data. Tuning hyperparameters takes some time and does not guarantee better results, hence it defaults to False.
hyperparameter_tuning_config (Dict[str, Any],) –
A dictionary that allows you to provide a custom hyperparameter tuning configuration when tune_hyperparameters has been set to True. The following dictionary is the default tuning configuration. It can be used as a template to modify:
{ "time_budget": 15, "metric": "mse", "estimator_list": ['lgbm'], "eval_method": "cv", "hpo_method": "cfo", "n_splits": 5, "task": 'regression', "seed": 1, "verbose": 0, }
For an overview of possible parameters for the tuning process check out the FLAML documentation.
hyperparameters (Dict[str, Any],) –
A dictionary used to provide your own custom hyperparameters when tune_hyperparameters has been set to True. Check out the available hyperparameter options in the LightGBM documentation.
threshold (Threshold,) – The Threshold instance that determines how the lower and upper threshold values will be calculated.
- realized_performance(data: DataFrame) float [source]
Calculates de realized performance of a model with respect of a given chunk of data. The data needs to have both prediction and real targets.
- Parameters:
data (pd.DataFrame) – The data to calculate the realized performance on.
- Raises:
_raise_exception_for_negative_values – when any of y_true or y_pred contain negative values.:
- Returns:
mae (float)
Mean Squared Log Error
- class nannyml.performance_estimation.direct_loss_estimation.metrics.Metric(display_name: str, column_name: str, feature_column_names: List[str], y_true: str, y_pred: str, chunker: Chunker, tune_hyperparameters: bool, hyperparameter_tuning_config: Dict[str, Any], hyperparameters: Dict[str, Any], threshold: Threshold, upper_value_limit: Optional[float] = None, lower_value_limit: Optional[float] = 0.0)[source]
Bases:
ABC
A performance metric used to estimate regression performance.
Creates a new Metric instance.
- Parameters:
display_name (str) – The name of the metric. Used to display in plots. If not given this name will be derived from the
calculation_function
.column_name (str) – The name used to indicate the metric in columns of a DataFrame.
feature_column_names (List[str]) – A list of column names indicating which columns contain feature values.
y_true (str,) – The name of the column containing target values (that are provided in reference data during fitting).
y_pred (str,) – The name of the column containing your model predictions.
chunker (Chunker,) – The Chunker used to split the data sets into a lists of chunks.
tune_hyperparameters (bool,) – A boolean controlling whether hypertuning should be performed on the internal regressor models whilst fitting on reference data. Tuning hyperparameters takes some time and does not guarantee better results, hence it defaults to False.
hyperparameter_tuning_config (Dict[str, Any],) –
A dictionary that allows you to provide a custom hyperparameter tuning configuration when tune_hyperparameters has been set to True. The following dictionary is the default tuning configuration. It can be used as a template to modify:
{ "time_budget": 15, "metric": "mse", "estimator_list": ['lgbm'], "eval_method": "cv", "hpo_method": "cfo", "n_splits": 5, "task": 'regression', "seed": 1, "verbose": 0, }
For an overview of possible parameters for the tuning process check out the FLAML documentation.
hyperparameters (Dict[str, Any],) –
A dictionary used to provide your own custom hyperparameters when tune_hyperparameters has been set to True. Check out the available hyperparameter options in the LightGBM documentation.
threshold (Threshold,) – The Threshold instance that determines how the lower and upper threshold values will be calculated.
upper_value_limit (Optional[float], default=None,) – An optional value that serves as a limit for the upper threshold value. Any calculated upper threshold values that end up above this limit will be replaced by this limit value. The limit is often a theoretical constraint enforced by a specific drift detection method or performance metric.
lower_value_limit (Optional[float], default=0.0,) – An optional value that serves as a limit for the lower threshold value. Any calculated lower threshold values that end up below this limit will be replaced by this limit value. The limit is often a theoretical constraint enforced by a specific drift detection method or performance metric.
- alert(value: float) bool [source]
Returns True if an estimated metric value is below a lower threshold or above an upper threshold.
- Parameters:
value (float) – Value of an estimated metric.
- Returns:
bool
- Return type:
bool
- estimate(data: DataFrame)[source]
Calculates performance metrics on data.
- Parameters:
data (pd.DataFrame) – The data to estimate performance metrics for. Requires presence of either the predicted labels or prediction scores/probabilities (depending on the metric to be calculated).
- fit(reference_data: DataFrame)[source]
Fits a Metric on reference data.
- Parameters:
reference_data (pd.DataFrame) – The reference data used for fitting. Must have target data available.
- abstract realized_performance(data: DataFrame) float [source]
Calculates de realized performance of a model with respect of a given chunk of data. The data needs to have both prediction and real targets.
- Parameters:
data (pd.DataFrame) – The data to calculate the realized performance on.
- sampling_error(data: DataFrame)[source]
Calculates the sampling error with respect to the reference data for a given chunk of data.
- Parameters:
data (pd.DataFrame) – The data to calculate the sampling error on, with respect to the reference data.
- Returns:
sampling_error – The expected sampling error.
- Return type:
float
- class nannyml.performance_estimation.direct_loss_estimation.metrics.MetricFactory[source]
Bases:
object
A factory class that produces Metric instances based on a given magic string or a metric specification.
- classmethod create(key: str, problem_type: ProblemType, **kwargs) Metric [source]
Returns a Metric instance for a given key.
- Parameters:
key (str) –
problem_type (ProblemType) – Determines which method to use. Use ‘regression’ for regression tasks.
- registry: Dict[str, Dict[ProblemType, Type[Metric]]] = {'mae': {ProblemType.REGRESSION: <class 'nannyml.performance_estimation.direct_loss_estimation.metrics.MAE'>}, 'mape': {ProblemType.REGRESSION: <class 'nannyml.performance_estimation.direct_loss_estimation.metrics.MAPE'>}, 'mse': {ProblemType.REGRESSION: <class 'nannyml.performance_estimation.direct_loss_estimation.metrics.MSE'>}, 'msle': {ProblemType.REGRESSION: <class 'nannyml.performance_estimation.direct_loss_estimation.metrics.MSLE'>}, 'rmse': {ProblemType.REGRESSION: <class 'nannyml.performance_estimation.direct_loss_estimation.metrics.RMSE'>}, 'rmsle': {ProblemType.REGRESSION: <class 'nannyml.performance_estimation.direct_loss_estimation.metrics.RMSLE'>}}
- class nannyml.performance_estimation.direct_loss_estimation.metrics.RMSE(feature_column_names: List[str], y_true: str, y_pred: str, chunker: Chunker, threshold: Threshold, tune_hyperparameters: bool, hyperparameter_tuning_config: Dict[str, Any], hyperparameters: Dict[str, Any])[source]
Bases:
Metric
Creates a new Root Mean Squared Error (RMSE) Metric instance.
- Parameters:
feature_column_names (List[str]) – A list of column names indicating which columns contain feature values.
y_true (str,) – The name of the column containing target values (that are provided in reference data during fitting).
y_pred (str,) – The name of the column containing your model predictions.
chunker (Chunker,) – The Chunker used to split the data sets into a lists of chunks.
tune_hyperparameters (bool,) – A boolean controlling whether hypertuning should be performed on the internal regressor models whilst fitting on reference data. Tuning hyperparameters takes some time and does not guarantee better results, hence it defaults to False.
hyperparameter_tuning_config (Dict[str, Any],) –
A dictionary that allows you to provide a custom hyperparameter tuning configuration when tune_hyperparameters has been set to True. The following dictionary is the default tuning configuration. It can be used as a template to modify:
{ "time_budget": 15, "metric": "mse", "estimator_list": ['lgbm'], "eval_method": "cv", "hpo_method": "cfo", "n_splits": 5, "task": 'regression', "seed": 1, "verbose": 0, }
For an overview of possible parameters for the tuning process check out the FLAML documentation.
hyperparameters (Dict[str, Any],) –
A dictionary used to provide your own custom hyperparameters when tune_hyperparameters has been set to True. Check out the available hyperparameter options in the LightGBM documentation.
threshold (Threshold,) – The Threshold instance that determines how the lower and upper threshold values will be calculated.
- realized_performance(data: DataFrame) float [source]
Calculates de realized performance of a model with respect of a given chunk of data. The data needs to have both prediction and real targets.
- Parameters:
data (pd.DataFrame) – The data to calculate the realized performance on.
- Returns:
rmse (float)
Root Mean Squared Error
- class nannyml.performance_estimation.direct_loss_estimation.metrics.RMSLE(feature_column_names: List[str], y_true: str, y_pred: str, chunker: Chunker, threshold: Threshold, tune_hyperparameters: bool, hyperparameter_tuning_config: Dict[str, Any], hyperparameters: Dict[str, Any])[source]
Bases:
Metric
Creates a new Root Mean Squared Log Error (RMSLE) Metric instance.
- Parameters:
feature_column_names (List[str]) – A list of column names indicating which columns contain feature values.
y_true (str,) – The name of the column containing target values (that are provided in reference data during fitting).
y_pred (str,) – The name of the column containing your model predictions.
chunker (Chunker,) – The Chunker used to split the data sets into a lists of chunks.
tune_hyperparameters (bool,) – A boolean controlling whether hypertuning should be performed on the internal regressor models whilst fitting on reference data. Tuning hyperparameters takes some time and does not guarantee better results, hence it defaults to False.
hyperparameter_tuning_config (Dict[str, Any],) –
A dictionary that allows you to provide a custom hyperparameter tuning configuration when tune_hyperparameters has been set to True. The following dictionary is the default tuning configuration. It can be used as a template to modify:
{ "time_budget": 15, "metric": "mse", "estimator_list": ['lgbm'], "eval_method": "cv", "hpo_method": "cfo", "n_splits": 5, "task": 'regression', "seed": 1, "verbose": 0, }
For an overview of possible parameters for the tuning process check out the FLAML documentation.
hyperparameters (Dict[str, Any],) –
A dictionary used to provide your own custom hyperparameters when tune_hyperparameters has been set to True. Check out the available hyperparameter options in the LightGBM documentation.
threshold (Threshold,) – The Threshold instance that determines how the lower and upper threshold values will be calculated.
- realized_performance(data: DataFrame) float [source]
Calculates de realized performance of a model with respect of a given chunk of data. The data needs to have both prediction and real targets.
- Parameters:
data (pd.DataFrame) – The data to calculate the realized performance on.
- Raises:
_raise_exception_for_negative_values – when any of y_true or y_pred contain negative values.:
- Returns:
rmsle (float)
Root Mean Squared Log Error