nannyml.datasets.datasets module
Utility module offering curated datasets for quick experimentation.
- nannyml.datasets.datasets.load_csv_file_to_df(local_file: str) DataFrame [source]
Loads a data file from within the NannyML package.
- Parameters
local_file (str, required) – string with the name of the data file to be loaded.
- Returns
df – A DataFrame containing the requested data
- Return type
pd.DataFrame
- nannyml.datasets.datasets.load_modified_california_housing_dataset()[source]
Loads the modified california housing dataset provided for testing the NannyML package.
This dataset has been altered to represent a binary classification problem over time. More information about the dataset can be found at: California Housing Dataset
- Returns
reference (pd.DataFrame) – A DataFrame containing reference period of modified california housing dataset
analysis (pd.DataFrame) – A DataFrame containing analysis period of modified california housing dataset
analysis_tgt (pd.DataFrame) – A DataFrame containing target values for the analysis period of modified california housing dataset
Examples
>>> from nannyml.datasets import load_modified_california_housing_dataset >>> reference_df, analysis_df, analysis_targets_df = load_modified_california_housing_dataset()
- nannyml.datasets.datasets.load_synthetic_binary_classification_dataset()[source]
Loads the synthetic binary classification dataset provided for testing the NannyML package.
- Returns
reference (pd.DataFrame) – A DataFrame containing reference period of synthetic binary classification dataset
analysis (pd.DataFrame) – A DataFrame containing analysis period of synthetic binary classification dataset
analysis_tgt (pd.DataFrame) – A DataFrame containing target values for the analysis period of synthetic binary classification dataset
Examples
>>> from nannyml.datasets import load_synthetic_binary_classification_dataset >>> reference_df, analysis_df, analysis_targets_df = load_synthetic_binary_classification_dataset()
- nannyml.datasets.datasets.load_synthetic_car_loan_dataset()[source]
Loads the synthetic car loan binary classification dataset provided for testing the NannyML package.
- Returns
reference (pd.DataFrame) – A DataFrame containing reference period of synthetic binary classification dataset
analysis (pd.DataFrame) – A DataFrame containing analysis period of synthetic binary classification dataset
analysis_tgt (pd.DataFrame) – A DataFrame containing target values for the analysis period of synthetic binary classification dataset
Examples
>>> from nannyml.datasets import load_synthetic_binary_classification_dataset >>> reference_df, analysis_df, analysis_targets_df = load_synthetic_binary_classification_dataset()
- nannyml.datasets.datasets.load_synthetic_multiclass_classification_dataset()[source]
Loads the synthetic multiclass classification dataset provided for testing the NannyML package.
- Returns
reference (pd.DataFrame) – A DataFrame containing reference period of synthetic multiclass classification dataset
analysis (pd.DataFrame) – A DataFrame containing analysis period of synthetic multiclass classification dataset
analysis_tgt (pd.DataFrame) – A DataFrame containing target values for the analysis period of synthetic multiclass classification dataset
Examples
>>> from nannyml.datasets import load_synthetic_multiclass_classification_dataset >>> reference_df, analysis_df, analysis_targets_df = load_synthetic_multiclass_classification_dataset()