Generate Insights Outputs

v.1.0.0

Depending on the requested task, you will have access to the corresponding output file.

1. generic_insight

insights: String that contains a series of bullet points indicating descriptive or statistical information about the introduced table.

- Customer Lifetime Value ranges considerably with a maximum of over \$83,000, pointing to high variability in customer value for the company.
- Most customers are located in California, as it is the top state with 3,150 references within the dataset.
- There's a prevalence of the basic coverage option chosen by customers, suggested by its highest frequency.
- Income's median is approximately \$34,000 which indicates that the average customer earns at this level; however, incomes range up to nearly \$100,000.
- Monthly Premium Auto has a median value of \$83, signifying that the typical customer's monthly insurance premium is at this level.
- The dataset shows that the majority of customers are employed which could imply stability in the customer base.
- Data about marital status reveal that the counts are not uniform, suggesting varying insurance needs or preferences by marital status.
- A significant portion of customers have not filed any complaints (the most common number of complaints is zero), hinting at customer satisfaction or non-use of complaint services.
- Only a few customers have more than one policy, as indicated by a median of two policies per customer.
- Total Claim Amount distribution is skewed with a mean higher than the median value, reflecting the presence of much larger claims skewing the average.
- Churn column exists indicating whether a customer has left or not with a binary value but a detailed proportion analysis wasn't provided.

2. partial_dependence

df_pdp_insights: Dataframe with a format analogous to df_pdp.csv in Train Classification Outputs, where an insights column has been added explaining how the average probability of conversion to the given class behaves as each input feature varies.

Note that the first line in df_pdp_insights above, explains the data below from df_pdp:

The second row in df_pdp_insights, explains the following block of data from df_pdp:

And so on. Each row from df_pdp_insights corresponds to a block of data from df_pdp, for a specific column_target, class and name_feature.

3. drivers_barriers

df_db_insights: Dataframe with a format similar to df_db.csv, where the level of contribution of the input features to the target column taking a certain value is described in text format.

Each row from df_db_inisghts above corresponds to a row in df_db.csv, and explains the top drivers and barriers in text format.

Last updated