nannyml.metadata.binary_classification module

Module containing the metadata for binary classification models.

class nannyml.metadata.binary_classification.BinaryClassificationMetadata(prediction_column_name: Optional[str] = None, predicted_probability_column_name: Optional[str] = None, *args, **kwargs)[source]

Bases: nannyml.metadata.base.ModelMetadata

Contains the metadata for multiclass classification models.

An extension of nannyml.metadata.base.ModelMetadata that contains properties relating to binary classification models: - prediction_column_name: the name of the column containing the predicted label - predicted_probability_column_name: the name of the column containing the predicted score or probability.

Creates a new instance of BinaryClassificationMetadata.

Parameters

prediction_column_name (str) – The name of the column that contains the models’ predictions. Optional, defaults to None.
predicted_probability_column_name (str) – The name of the column containing the predicted score or probability. Optional, defaults to None.

Warning

Whilst at least one of these two properties must be given for the metadata to be deemed complete, most performance-related calculators and estimators will require both predicted labels and predicted probabilities to be given.

enrich(data: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]

Creates copies of all metadata columns with fixed names.

Parameters: data (DataFrame) – The data to enrich
Returns: enriched_data – A DataFrame that has all metadata present in NannyML-specific columns.
Return type: DataFrame

extract(data: pandas.core.frame.DataFrame, model_name: Optional[str] = None, exclude_columns: Optional[List[str]] = None)[source]

Tries to extract model metadata from a given data set.

Manually constructing model metadata can be cumbersome, especially if you have hundreds of features. NannyML includes this helper function that tries to do the boring stuff for you using some simple rules.

By default, all columns in the given dataset are considered to be either model features or metadata. Use the exclude_columns parameter to prevent columns from being interpreted as metadata or features.

Parameters

data (DataFrame) – The dataset containing model inputs and outputs, enriched with the required metadata.
model_name (str) – A human-readable name for the model.
exclude_columns (List[str], default=None) – A list of column names that are to be skipped during metadata extraction, preventing them from being interpreted as either model metadata or model features.

Returns

metadata – A fully initialized BinaryClassificationMetadata instance.

Return type

BinaryClassificationMetadata

Notes

This method is most often not used directly, but by calling the nannyml.metadata.extraction.extract_metadata() function that will delegate to this method.

is_complete() → Tuple[bool, List[str]][source]

Flags if the ModelMetadata is considered complete or still missing values.

Returns

complete (bool) – True when all required fields are present, False otherwise
missing (List[str]) – A list of all missing properties. Empty when metadata is complete.

Examples

>>> from nannyml.metadata import ModelMetadata, Feature, FeatureType
>>> metadata = ModelMetadata('work_from_home', target_column_name='work_home_actual')
>>> metadata.features = [
>>>     Feature('cat1', 'cat1', FeatureType.CATEGORICAL), Feature('cat2', 'cat2', FeatureType.CATEGORICAL),
>>>     Feature('cont1', 'cont1', FeatureType.CONTINUOUS), Feature('cont2', 'cont2', FeatureType.UNKNOWN)]
>>> # missing either predicted labels or predicted probabilities, 'cont2' has an unknown feature type
>>> metadata.is_complete()
(False, ['predicted_probability_column_name', 'prediction_column_name'])
>>> metadata.predicted_probability_column_name = 'y_pred_proba'  # fix the missing value
>>> metadata.feature(feature='cont2').feature_type = FeatureType.CONTINUOUS
>>> metadata.is_complete()
(True, [])

property metadata_columns: Returns all metadata columns that are added to the data by the enrich method.

property predicted_probability_column_name

property prediction_column_name

to_df() → pandas.core.frame.DataFrame[source]: Represents a MulticlassClassificationMetadata instance as a DataFrame.

to_dict() → Dict[str, Any][source]: Represents a MulticlassClassificationMetadata instance as a dictionary.