nannyml.preprocessing module

Preprocessing pipeline for incoming data.

nannyml.preprocessing.preprocess(data: DataFrame, metadata: ModelMetadata, reference: bool = False) DataFrame[source]

Analyse and prepare incoming data for further use downstream.

Parameters
  • data (pd.DataFrame) – A DataFrame containing model inputs, scores, targets and other metadata.

  • metadata (ModelMetadata) – Optional ModelMetadata instance that might have been manually constructed or contains non-default values

  • reference (bool) – Boolean indicating whether additional checks for reference data should be executed.

Returns

prepped_data – A copy of the uploaded data with added copies of metadata columns Will be None when the extracted/provided metadata was not complete.

Return type

Optional[DataFrame]