# Generate Insights

## Overview

The generate\_insights tool allows the user to add explanatory insights, generated with OpenAI API, to a dataset of various natures. This could be generic data provided by the user, such as a table, a bar chart, etc., or it could be an output file generated by one of our tools. Currently, the implemented tools are as follows:

* **generic\_insights:** For any table introduced by the user, a series of bullet points with insights about the data is returned.
* **partial\_dependence:** Given the data frame containing the partial dependence evaluations, `df_pdp.csv`, generated in the [Train Classification](https://docs.shimoku.com/dev/artificial-intelligence/classification/train-classification) function, this tool provides a textual explanation of each potential one-dimensional partial dependence graph available.
* **drivers\_barriers:** This tool starts from the table of drivers and barriers, **df\_db.csv**, generated in the [Train Classification](https://docs.shimoku.com/dev/artificial-intelligence/classification/train-classification) or [Predict Classification](https://docs.shimoku.com/dev/artificial-intelligence/classification/predict-classification) functions. To every row, it adds a textual description explaining which inputs contribute the most, both positively and negatively, to the target taking a specific value. Executions are currently limited to 15 rows at a time.

Version 1.0.0 of the tool requires user access to the OpenAI model `gpt-4-1106-preview`, to ensure proper functionality.

## Step 0: Get ready

Make sure you have followed these steps first: [Setup and Requirements](https://docs.shimoku.com/dev/artificial-intelligence/broken-reference)

## Step 1: Initialize the Client and set up your workspace

Import necessary libraries and initialize the Shimoku client with your credentials. Define a workspace and a menu path for organizing your AI models.

```python
import os
import time
from io import StringIO
import pandas as pd
from shimoku import Client

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_ORG_ID = os.getenv("OPENAI_ORG_ID")

s = Client(
    access_token=os.getenv("SHIMOKU_TOKEN"),
    universe_id=os.getenv("UNIVERSE_ID"),
)

s.set_workspace(uuid=os.getenv("WORKSPACE_ID"))

menu_path_name = "insights"
s.set_menu_path(name=menu_path_name)
s.disable_caching()
```

Note: you must have your `SHIMOKU_TOKEN,` `UNIVERSE_ID`, `WORKSPACE_ID,` `OPENAI_API_KEY` and `OPENAI_ORG_ID` saved as environment variables.

#### For steps 2 to 5, choose the tab below according to the task you want to perform.

{% tabs %}
{% tab title="generic\_insights" %}

## Step 2: Prepare and Upload Your Data

Upload any type of table on which you wish to request relevant information. No additional format is imposed.

{% file src="<https://3782181538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUlHTfmIZY46Z1EDfyGMz%2Fuploads%2FmVGXwjvziRRS15lBZEgs%2Finput_data.csv?alt=media&token=67ec6c4b-2124-4edc-a790-1a01b70f9853>" %}

```python
input_file = pd.read_csv('./input_data.csv')

s.ai.create_input_files(
        input_files={'input_data': input_file.to_csv(index=False).encode()},
        force_overwrite=True
)
```

## Step 3: Execute the Generate Insight Function

Call the insight generator function and adjust the arguments for the generic\_insights task.

```
run_id = s.ai.generic_execute(
    ai_function='generate_insights',
    task='generic_insights',
    data='input_data',
    openai_api_key=OPENAI_API_KEY,
    openai_org_id=OPENAI_ORG_ID,
)
```

**ai\_function:&#x20;**<mark style="color:red;">**str**</mark> Label for this functionality, which will have the value 'generate\_insights'.

**openai\_api\_key:&#x20;**<mark style="color:red;">**str**</mark> Your OpenAI unique API key.

**openai\_org\_id:&#x20;**<mark style="color:red;">**str**</mark> Your OpenAI organization id.

**task:&#x20;**<mark style="color:red;">**str**</mark> 'generic\_insight' requests to generate insights about a table in any type of format.

**data:** <mark style="color:red;">**str**</mark> Name chosen in create\_input\_files to refer to the table.

## Step 4: Monitor the Process

Wait for the insights to be generated and the outputs to be uploaded.

```python
attempts = 20
wait = 60

for _ in range(attempts):
    try:
        results = s.ai.get_output_file_objects(run_id=run_id)
        if results:
            print("Successfully obtained the output.")
            break  # Exit the loop if results are obtained
    except Exception:
        pass  # Ignore errors and continue
    time.sleep(wait)  # Wait before retrying
else:
    print("Failed to obtain the output after the maximum number of attempts.")
```

## Step 5: Accessing the GPT insights&#x20;

Once execution is complete, insights are available.

```
insights = results['insights.txt'][0].decode()
```

{% endtab %}

{% tab title="partial\_dependence" %}

## Step 2: Prepare and Upload Your Data

Upload the resulting partial dependence file, **df\_pdp.csv**, as it was returned by our [Train Classification](https://docs.shimoku.com/dev/artificial-intelligence/classification/train-classification) function.

{% file src="<https://3782181538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUlHTfmIZY46Z1EDfyGMz%2Fuploads%2FIz9LH0jp34DUW061mUZJ%2Fdf_pdp.csv?alt=media&token=16c556a3-b99a-44ad-920f-899c4e8dca9b>" %}

{% code fullWidth="false" %}

```python
df_pdp = pd.read_csv('./df_pdp.csv')

# The number of pd plots with insights is currently limited to 10 per execution
cols_to_groupby = ['column_target', 'class', 'name_feature']
first_10_pdp = (df_pdp[cols_to_groupby].drop_duplicates().head(10))
df_pdp_10 = pd.merge(df_pdp, first_10_pdp, on=cols_to_groupby)

s.ai.create_input_files(
    input_files={'pd_data': df_pdp_10.to_csv(index=False).encode()},
    force_overwrite=True
)
```

{% endcode %}

## Step 3: Execute the Generate Insight Function

Call the insight generator function and adjust the arguments for the partial dependence task.

<pre class="language-python"><code class="lang-python"><strong>run_id = s.ai.generic_execute(
</strong>    ai_function='generate_insights',
    task='partial_dependence',
    data='pd_data',
    openai_api_key=OPENAI_API_KEY,
    openai_org_id=OPENAI_ORG_ID,
)
</code></pre>

**ai\_function:&#x20;**<mark style="color:red;">**str**</mark> Label for this functionality, which will have the value 'generate\_insights'.

**openai\_api\_key:&#x20;**<mark style="color:red;">**str**</mark> Your OpenAI unique API key.

**openai\_org\_id:&#x20;**<mark style="color:red;">**str**</mark> Your OpenAI organization id.

**task:&#x20;**<mark style="color:red;">**str**</mark> 'partial\_dependence' task.

**data:** <mark style="color:red;">**str**</mark> Name chosen in create\_input\_files referring to **df\_pdp.csv**.

## Step 4: Monitor the Process

Wait for the insights to be generated and the outputs to be uploaded.

```python
attempts = 20
wait = 60

for _ in range(attempts):
    try:
        results = s.ai.get_output_file_objects(run_id=run_id)
        if results:
            print("Successfully obtained the output.")
            break  # Exit the loop if results are obtained
    except Exception:
        pass  # Ignore errors and continue
    time.sleep(wait)  # Wait before retrying
else:
    print("Failed to obtain the output after the maximum number of attempts.")
```

## Step 5: Accessing the GPT insights&#x20;

Once execution is complete, insights are available.

```python
df_pdp_insights = pd.read_csv(StringIO(results['df_insights.csv'][0].decode('utf-8')))
```

{% endtab %}

{% tab title="drivers\_barriers" %}

## Step 2: Prepare and Upload Your Data

Upload the drivers and barriers file, **df\_db.csv**, as it was generated by one of our functions, the [Train Classification](https://docs.shimoku.com/dev/artificial-intelligence/classification/train-classification) or [Predict Classification](https://docs.shimoku.com/dev/artificial-intelligence/classification/predict-classification). Also, you will need to upload the dataset you used to train your model. Here are the files we used in the example of [Train Classification](https://docs.shimoku.com/dev/artificial-intelligence/classification/train-classification).

{% file src="<https://3782181538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUlHTfmIZY46Z1EDfyGMz%2Fuploads%2FWFHL93L5hJrMjxfZQ2Zx%2Fdf_db.csv?alt=media&token=cc996545-cc8b-4d58-9792-e89b23225b0c>" %}

{% file src="<https://3782181538-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUlHTfmIZY46Z1EDfyGMz%2Fuploads%2FueAU7EaubUWybLy6qWqQ%2Fsample_training_dataset.csv?alt=media&token=e4c66c01-b802-48b9-86c7-ebbe6eb9fd30>" %}

The **df\_db.csv** file must maintain its original format, except for the fact that you will need to break it in chucks of up to 15 rows, as it's the current limitation per execution.

<pre class="language-python"><code class="lang-python"><strong>input_file = pd.read_csv('./sample_training_dataset.csv')
</strong>df_db = pd.read_csv('./df_db.csv')

df_db_sample = df_db.head(15)

s.ai.create_input_files(
    input_files={'training_insurance': input_file.to_csv(index=False).encode(), 
                 'db_data': df_db_sample.to_csv(index=False).encode()},
    force_overwrite=True
)
</code></pre>

## Step 3: Execute the Generate Insight Function

Call the insight generator function and adjust the arguments for the drivers and barriers task.

<pre class="language-python"><code class="lang-python"><strong>run_id = s.ai.generic_execute(
</strong>    ai_function='generate_insights',
    task='drivers_barriers',
    data='db_data',
    context_data='training_insurance',
    <a data-footnote-ref href="#user-content-fn-1">openai_api_key=</a>OPENAI_API_KEY<a data-footnote-ref href="#user-content-fn-1">,</a>
    openai_org_id=OPENAI_ORG_ID,
)
</code></pre>

**ai\_function:&#x20;**<mark style="color:red;">**str**</mark> Label for this functionality, which will have the value 'generate\_insights'.

**openai\_api\_key:&#x20;**<mark style="color:red;">**str**</mark> Your OpenAI unique API key.

**openai\_org\_id:&#x20;**<mark style="color:red;">**str**</mark> Your OpenAI organization id.

**task:&#x20;**<mark style="color:red;">**str**</mark> 'drivers\_barriers' task.

**data:** <mark style="color:red;">**str**</mark> Name chosen in create\_input\_files referring to **db\_data.csv**.

**context\_data:** <mark style="color:red;">**str**</mark> Name chosen in create\_input\_files referring to the data used to train the classification model. Required to provide insights.&#x20;

## Step 4: Monitor the Process

Wait for the insights to be generated and the outputs to be uploaded.

```python
attempts = 20
wait = 60

for _ in range(attempts):
    try:
        results = s.ai.get_output_file_objects(run_id=run_id)
        if results:
            print("Successfully obtained the output.")
            break  # Exit the loop if results are obtained
    except Exception:
        pass  # Ignore errors and continue
    time.sleep(wait)  # Wait before retrying
else:
    print("Failed to obtain the output after the maximum number of attempts.")
```

## Step 5: Accessing the GPT insights&#x20;

Once execution is complete, insights are available by the user.

```python
df_db_insights = pd.read_csv(StringIO(results['df_insights.csv'][0].decode('utf-8')))
```

{% endtab %}
{% endtabs %}

[^1]:
