nannyml.metadata.regression module

class nannyml.metadata.regression.RegressionMetadata(prediction_column_name: Optional[str] = None, *args, **kwargs)[source]

Bases: nannyml.metadata.base.ModelMetadata

Creates a new ModelMetadata instance.

Parameters
  • model_type (ModelType) – The kind of problem your model is trying to solve. Used to determine which metadata properties should be known by NannyML.

  • model_name (string, default=None) – A human-readable name for the model.

  • features (List[Feature]) – The list of Features for the model. Optional, defaults to None.

  • target_column_name (string) – The name of the column that contains the ground truth / target / actual. Optional, defaults to target

  • partition_column_name (string) – The name of the column that contains the partition the observation belongs to. Allowed partition values are ‘reference’ and ‘analysis’. Optional, defaults to partition

  • timestamp_column_name (string) – The name of the column that contains the timestamp indicating when the observation occurred. Optional, defaults to date.

Returns

metadata

Return type

ModelMetadata

Examples

>>> from nannyml.metadata import ModelMetadata, Feature, FeatureType
>>> metadata = ModelMetadata(model_type='classification_binary', target_column_name='work_home_actual')
>>> metadata.features = [Feature(column_name='dist_from_office', label='office_distance',
description='Distance from home to the office', feature_type=FeatureType.CONTINUOUS),
>>> Feature(column_name='salary_range', label='salary_range', feature_type=FeatureType.CATEGORICAL)]
>>> metadata.to_dict()
{'timestamp_column_name': 'date',
 'partition_column_name': 'partition',
 'target_column_name': 'work_home_actual',
 'prediction_column_name': None,
 'predicted_probability_column_name': None,
 'features': [{'label': 'office_distance',
   'column_name': 'dist_from_office',
   'type': 'continuous',
   'description': 'Distance from home to the office'},
  {'label': 'salary_range',
   'column_name': 'salary_range',
   'type': 'categorical',
   'description': None}]}
enrich(data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Creates copies of all metadata columns with fixed names.

Parameters

data (DataFrame) – The data to enrich

Returns

enriched_data – A DataFrame that has all metadata present in NannyML-specific columns.

Return type

DataFrame

extract(data: pandas.core.frame.DataFrame, model_name: Optional[str] = None, exclude_columns: Optional[List[str]] = None)[source]

Tries to extract model metadata from a given data set.

Manually constructing model metadata can be cumbersome, especially if you have hundreds of features. NannyML includes this helper function that tries to do the boring stuff for you using some simple rules.

By default, all columns in the given dataset are considered to be either model features or metadata. Use the exclude_columns parameter to prevent columns from being interpreted as metadata or features.

Parameters
  • data (DataFrame) – The dataset containing model inputs and outputs, enriched with the required metadata.

  • model_name (str) – A human-readable name for the model.

  • exclude_columns (List[str], default=None) – A list of column names that are to be skipped during metadata extraction, preventing them from being interpreted as either model metadata or model features.

Returns

metadata – A fully initialized ModelMetadata subclass instance.

Return type

ModelMetadata

Notes

This method is most often not used directly, but by calling the nannyml.metadata.extraction.extract_metadata() function that will delegate to this method.

This particular abstract method provides common functionality for its subclasses and is always called there using a super().extract() call.

is_complete() Tuple[bool, List[str]][source]

Flags if the ModelMetadata is considered complete or still missing values.

Returns

  • complete (bool) – True when all required fields are present, False otherwise

  • missing (List[str]) – A list of all missing properties. Empty when metadata is complete.

Examples

>>> from nannyml.metadata import ModelMetadata, Feature, FeatureType
>>> metadata = ModelMetadata('work_from_home', target_column_name='work_home_actual')
>>> metadata.features = [
>>>     Feature('cat1', 'cat1', FeatureType.CATEGORICAL), Feature('cat2', 'cat2', FeatureType.CATEGORICAL),
>>>     Feature('cont1', 'cont1', FeatureType.CONTINUOUS), Feature('cont2', 'cont2', FeatureType.UNKNOWN)]
>>> # missing either predicted labels or predicted probabilities, 'cont2' has an unknown feature type
>>> metadata.is_complete()
(False, ['predicted_probability_column_name', 'prediction_column_name'])
>>> metadata.predicted_probability_column_name = 'y_pred_proba'  # fix the missing value
>>> metadata.feature(feature='cont2').feature_type = FeatureType.CONTINUOUS
>>> metadata.is_complete()
(True, [])
property metadata_columns

Returns all metadata columns that are added to the data by the enrich method.

property prediction_column_name
to_df() pandas.core.frame.DataFrame[source]

Converts a ModelMetadata instance into a read-only DataFrame.

Examples

>>> from nannyml.metadata import ModelMetadata, Feature, FeatureType
>>> metadata = ModelMetadata(model_type='classification_binary', target_column_name='work_home_actual')
>>> metadata.features = [Feature(column_name='dist_from_office', label='office_distance',
description='Distance from home to the office', feature_type=FeatureType.CONTINUOUS),
>>> Feature(column_name='salary_range', label='salary_range', feature_type=FeatureType.CATEGORICAL)]
>>> metadata.to_df()
to_dict() Dict[str, Any][source]

Converts a ModelMetadata instance into a Dictionary.