nannyml.performance_estimation.direct_loss_estimation.dle module
Classs implementing Direct Loss Estimation algorithm to estimate performance for regression models.
- class nannyml.performance_estimation.direct_loss_estimation.dle.DLE(feature_column_names: List[str], y_pred: str, y_true: str, timestamp_column_name: Optional[str] = None, chunk_size: Optional[int] = None, chunk_number: Optional[int] = None, chunk_period: Optional[str] = None, chunker: Optional[Chunker] = None, metrics: Optional[Union[str, List[str]]] = None, hyperparameters: Optional[Dict[str, Any]] = None, tune_hyperparameters: bool = False, hyperparameter_tuning_config: Optional[Dict[str, Any]] = None, thresholds: Optional[Dict[str, Threshold]] = None)[source]
Bases:
AbstractEstimator
Class implementing the Direct Loss Estimation method.
The Direct Loss Estimator (DLE) estimates the loss resulting from the difference between the prediction and the target before the targets become known. The loss is defined from the regression performance metric specified. For all metrics used the loss function is positive.
It uses an internal LGBMRegressor model per metric to predict the value of the error function (the function returning the error for a given prediction) of the monitored model.
The error results on the reference data become a target for those internal models.
It is possible to specify a set of hyperparameters to instantiate these internal nanny models with using the hyperparameters parameter. You can also opt to run hyperparameter tuning using FLAML to determine hyperparameters for you. Tuning hyperparameters takes some time and does not guarantee better results, hence we don’t do it by default.
The estimator manages a list of
Metric
instances, constructed using theMetricFactory
.The estimator is then responsible for delegating the fit and estimate method calls to each of the managed
Metric
instances and building aResult
object.For more information, check out the tutorial and the deep dive.
Creates a new Direct Loss Estimator.
- Parameters:
feature_column_names (List[str]) – A list of column names indicating which columns contain feature values.
y_pred (str) – A column name indicating which column contains the model predictions.
y_true (str) – A column name indicating which column contains the target values.
timestamp_column_name (str) – A column name indicating which column contains the timestamp of the prediction.
chunk_size (int, default=None) – Splits the data into chunks containing chunks_size observations. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_number (int, default=None) – Splits the data into chunk_number pieces. Only one of chunk_size, chunk_number or chunk_period should be given.
chunk_period (str, default=None) – Splits the data according to the given period. Only one of chunk_size, chunk_number or chunk_period should be given.
chunker (Chunker, default=None) – The Chunker used to split the data sets into a lists of chunks.
metrics (Optional[Union[str, List[str]]], default = ['mae', 'mape', 'mse', 'rmse', 'msle', 'rmsle']) – A list of metrics to calculate. When not provided it will default to include all currently supported metrics.
hyperparameters (Dict[str, Any], default = None) – A dictionary used to provide your own custom hyperparameters when tune_hyperparameters has been set to True. Check out the available hyperparameter options in the LightGBM documentation.
tune_hyperparameters (bool, default = False) – A boolean controlling whether hypertuning should be performed on the internal regressor models whilst fitting on reference data. Tuning hyperparameters takes some time and does not guarantee better results, hence it defaults to False.
hyperparameter_tuning_config (Dict[str, Any], default = None) –
A dictionary that allows you to provide a custom hyperparameter tuning configuration when tune_hyperparameters has been set to True. The following dictionary is the default tuning configuration. It can be used as a template to modify:
{ "time_budget": 15, "metric": "mse", "estimator_list": ['lgbm'], "eval_method": "cv", "hpo_method": "cfo", "n_splits": 5, "task": 'regression', "seed": 1, "verbose": 0, }
For an overview of possible parameters for the tuning process check out the FLAML documentation.
thresholds (dict) –
The default values are:
{ 'mae': StandardDeviationThreshold(), 'mape': StandardDeviationThreshold(), 'mse': StandardDeviationThreshold(), 'msle': StandardDeviationThreshold(), 'rmse': StandardDeviationThreshold(), 'rmsle': StandardDeviationThreshold(), }
A dictionary allowing users to set a custom threshold for each method. It links a Threshold subclass to a method name. This dictionary is optional. When a dictionary is given its values will override the default values. If no dictionary is given a default will be applied.
- Returns:
estimator – A new DLE instance to be fitted on reference data.
- Return type:
Examples
Without hyperparameter tuning:
>>> import nannyml as nml >>> reference_df, analysis_df, _ = nml.load_synthetic_car_price_dataset() >>> estimator = nml.DLE( ... feature_column_names=['car_age', 'km_driven', 'price_new', 'accident_count', ... 'door_count', 'fuel', 'transmission'], ... y_pred='y_pred', ... y_true='y_true', ... timestamp_column_name='timestamp', ... metrics=['rmse', 'rmsle'], ... chunk_size=6000, >>> ) >>> estimator.fit(reference_df) >>> results = estimator.estimate(analysis_df) >>> metric_fig = results.plot() >>> metric_fig.show()
With hyperparameter tuning, using a custom hyperparameter tuning configuration:
>>> import nannyml as nml >>> reference_df, analysis_df, _ = nml.load_synthetic_car_price_dataset() >>> estimator = nml.DLE( ... feature_column_names=['car_age', 'km_driven', 'price_new', 'accident_count', ... 'door_count', 'fuel', 'transmission'], ... y_pred='y_pred', ... y_true='y_true', ... timestamp_column_name='timestamp', ... metrics=['rmse', 'rmsle'], ... chunk_size=6000, ... tune_hyperparameters=True, ... hyperparameter_tuning_config={ ... "time_budget": 60, # run longer ... "metric": "mse", ... "estimator_list": ['lgbm'], ... "eval_method": "cv", ... "hpo_method": "cfo", ... "n_splits": 5, ... "task": 'regression', ... "seed": 1, ... "verbose": 0, ... } >>> ) >>> estimator.fit(reference_df) >>> results = estimator.estimate(analysis_df) >>> metric_fig = results.plot() >>> metric_fig.show()