Predict Classification Outputs


Once you've followed the steps in Predict Classification, you're ready to explore its outputs.

The dictionary output_dict will have 2 items in which the keys are the names of the outputs and the value are pandas data frames. Let's look at each one of them.

Predictions + Local Explainability

df_predicted.csv: Data frame containing predictions for the data used as input.

How to read the data above:

  • Customer UP24795 has 17.2% probability of churning (class True) and 82.8% probability of not churning (class False). Therefore, the prediction of churn is False.

  • Customer GT32586 has 85.4% probability of churning (class True) and 14.6% probability of not churning (class False). Therefore, the prediction of churn is True.

df_db.csv: Dataframe containing drivers and barriers per prediction.

This will help you understand the impact that each variable has on the predictions we just saw above. Let us take again customer UP24795 as an example. We'll look at class True to explain the probability of 17.2% of churning. This probability comes from (base value + drivers + barriers). That is, the base value is the probability that any customer in the dataset has to churn. But each customer is different, they have each a combination of Income, Monthly Premium Auto, Number of Policies, etc. so we have to sum their drivers and barriers to the base value to get the final probability.

For customer BU79786 we see that Income (15.4%) and Monthly Premium Auto (6.8%) are the top two drivers that lead this specific customer not to churn, while their Number of policies (3.4%) and Vehicle class (1%) are the top two barriers.

Both the drivers and barriers are ordered from most to least impact in columns list_driver_names and list_barrier_names. In list_driver_values and list_barrier_values you can see how much impact they have in terms of percentage.

You can use the Generate Insights AI function to create text insights for df_db.csv.

Last updated