nannyml.metadata.multiclass_classification module
Module containing the metadata for multiclass classification models.
- class nannyml.metadata.multiclass_classification.MulticlassClassificationMetadata(prediction_column_name: Optional[str] = None, predicted_probabilities_column_names: Optional[Dict[Any, str]] = None, *args, **kwargs)[source]
Bases:
nannyml.metadata.base.ModelMetadata
Contains the metadata for multiclass classification models.
An extension of
nannyml.metadata.base.ModelMetadata
that contains properties relating to multiclass classification models.The main differentiator with the
BinaryClassificationMetadata
class is that the predicted probabilities are represented by multiple columns, one for each result class. It is to be provided explicitly as a class-to-column-name mapping (a dictionary mapping a class string to a column name containing predicted probabilities for that class) or will be extracted automatically by NannyML.Creates a new instance of MulticlassClassificationMetadata.
- Parameters
prediction_column_name (string) – The name of the column that contains the models’ predictions. Optional, defaults to
None
.predicted_probabilities_column_names (Dict[str, str], default=None) – A dictionary mapping a model result class to the name of the column in the data that contains the predicted probabilities for that class.
- class_labels() List [source]
Returns a sorted list of class labels based on the class probability mapping.
- enrich(data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]
Creates copies of all metadata columns with fixed names.
- Parameters
data (DataFrame) – The data to enrich
- Returns
enriched_data – A DataFrame that has all metadata present in NannyML-specific columns.
- Return type
DataFrame
- extract(data: pandas.core.frame.DataFrame, model_name: Optional[str] = None, exclude_columns: Optional[List[str]] = None)[source]
Tries to extract model metadata from a given data set.
Manually constructing model metadata can be cumbersome, especially if you have hundreds of features. NannyML includes this helper function that tries to do the boring stuff for you using some simple rules.
By default, all columns in the given dataset are considered to be either model features or metadata. Use the
exclude_columns
parameter to prevent columns from being interpreted as metadata or features.- Parameters
data (DataFrame) – The dataset containing model inputs and outputs, enriched with the required metadata.
model_name (str) – A human-readable name for the model.
exclude_columns (List[str], default=None) – A list of column names that are to be skipped during metadata extraction, preventing them from being interpreted as either model metadata or model features.
- Returns
metadata – A fully initialized MultiClassClassificationMetadata instance.
- Return type
Notes
This method is most often not used directly, but by calling the
nannyml.metadata.extraction.extract_metadata()
function that will delegate to this method.
- is_complete() Tuple[bool, List[str]] [source]
Flags if the ModelMetadata is considered complete or still missing values.
- Returns
complete (bool) – True when all required fields are present, False otherwise
missing (List[str]) – A list of all missing properties. Empty when metadata is complete.
Examples
>>> from nannyml.metadata import MulticlassClassificationMetadata, Feature, FeatureType >>> metadata = MulticlassClassificationMetadata(target_column_name='work_home_actual') >>> metadata.features = [ >>> Feature('cat1', 'cat1', FeatureType.CATEGORICAL), Feature('cat2', 'cat2', FeatureType.CATEGORICAL), >>> Feature('cont1', 'cont1', FeatureType.CONTINUOUS), Feature('cont2', 'cont2', FeatureType.UNKNOWN)] >>> # missing either predicted labels or predicted probabilities, 'cont2' has an unknown feature type >>> metadata.is_complete() (False, ['predicted_probabilities_column_names', 'prediction_column_name']) >>> metadata.predicted_probabilities_column_names = {'A': 'y_pred_proba_A', 'B': 'y_pred_proba_B'} >>> metadata.feature(feature='cont2').feature_type = FeatureType.CONTINUOUS >>> metadata.is_complete() (True, [])
- property metadata_columns
Returns all metadata columns that are added to the data by the
enrich
method.
- predicted_class_probability_metadata_columns() Dict[Any, str] [source]
Returns the names of the class probability columns added to the data by the
enrich
method.
- property predicted_probabilities_column_names
- property prediction_column_name
- to_df() pandas.core.frame.DataFrame [source]
Represents a MulticlassClassificationMetadata instance as a DataFrame.
Examples
>>> from nannyml.metadata import ModelMetadata, Feature, FeatureType >>> metadata = ModelMetadata(model_type='classification_multiclass', target_column_name='work_home_actual') >>> metadata.features = [Feature(column_name='dist_from_office', label='office_distance', description='Distance from home to the office', feature_type=FeatureType.CONTINUOUS), >>> Feature(column_name='salary_range', label='salary_range', feature_type=FeatureType.CATEGORICAL)] >>> metadata.to_df()
- to_dict() Dict[str, Any] [source]
Represents a MulticlassClassificationMetadata instance as a dictionary.
- validate_predicted_class_labels_in_class_probability_mapping(data: pandas.core.frame.DataFrame) Tuple[bool, List] [source]
Checks if all predicted class labels have a corresponding predicted class probability column.
- Parameters
data (pd.DataFrame) – A pd.DataFrame that contains both the prediction column and the predicted class probability columns.
- Returns
ok (bool) – Boolean indicating validity.
True
when no class probability columns are missing,False
otherwise.missing (List) – A list of predicted classes for which a corresponding probability column is missing.