Car Loan Synthetic Binary Classification Dataset
NannyML provides a synthetic dataset describing a binary classification problem, to make it easier to test and document its features.
To find out what requirements NannyML has for datasets, check out Data Requirements.
Problem Description
The dataset describes a machine learning model that predicts whether a customer will repay a loan to buy a car.
Dataset Description
A sample of the dataset can be seen below.
>>> import nannyml as nml
>>> reference, analysis, analysis_targets = nml.load_synthetic_car_loan_dataset()
>>> display(reference.head(3))
car_value |
salary_range |
debt_to_income_ratio |
loan_length |
repaid_loan_on_prev_car |
size_of_downpayment |
driver_tenure |
y_pred_proba |
y_pred |
repaid |
timestamp |
|
---|---|---|---|---|---|---|---|---|---|---|---|
0 |
39811 |
40K - 60K € |
0.63295 |
19 |
False |
40% |
0.212653 |
0.99 |
1 |
1 |
2018-01-01 00:00:00.000 |
1 |
12679 |
40K - 60K € |
0.718627 |
7 |
True |
10% |
4.92755 |
0.07 |
0 |
0 |
2018-01-01 00:08:43.152 |
2 |
19847 |
40K - 60K € |
0.721724 |
17 |
False |
0% |
0.520817 |
1 |
1 |
1 |
2018-01-01 00:17:26.304 |
The model uses 7 features:
car_value: A numerical feature representing the price of the car.
salary_range: A categorical feature with 4 categories that identify the range the employee’s yearly income falls within.
debt_to_income_ratio: A numerical feature representing the ratio of debt to income from the customer.
loan_length: A numerical feature representing in how many months the customer wants to repay the loan.
repaid_loan_on_prev_car: A categorical feature with 2 categories, stating whether the customer repaid or not a previous loan.
size_of_downpayment: A categorical feature with 10 categories, representing the percentage in increments of 10% of the size of the downpayment of the car value.
tenure: A numerical feature describing how many years the costumer has been driving.
There are 3 columns that reference the output of the model:
y_pred_proba: The model predicted probability of the customer repaying the loan.
y_pred: The model prediction in binary form.
repaid: The Target column describing if the customer actually repaid the loan.
There is also an auxiliary column that is helpful but not used by the monitored model:
timestamp: A date column informing us of the date the prediction was made.