Generate Insights

v.1.0.0

Overview

The generate_insights tool allows the user to add explanatory insights, generated with OpenAI API, to a dataset of various natures. This could be generic data provided by the user, such as a table, a bar chart, etc., or it could be an output file generated by one of our tools. Currently, the implemented tools are as follows:

  • generic_insights: For any table introduced by the user, a series of bullet points with insights about the data is returned.

  • partial_dependence: Given the data frame containing the partial dependence evaluations, df_pdp.csv, generated in the Train Classification function, this tool provides a textual explanation of each potential one-dimensional partial dependence graph available.

  • drivers_barriers: This tool starts from the table of drivers and barriers, df_db.csv, generated in the Train Classification or Predict Classification functions. To every row, it adds a textual description explaining which inputs contribute the most, both positively and negatively, to the target taking a specific value. Executions are currently limited to 15 rows at a time.

Version 1.0.0 of the tool requires user access to the OpenAI model gpt-4-1106-preview, to ensure proper functionality.

Step 0: Get ready

Make sure you have followed these steps first: Setup and Requirements

Step 1: Initialize the Client and set up your workspace

Import necessary libraries and initialize the Shimoku client with your credentials. Define a workspace and a menu path for organizing your AI models.

import os
import time
from io import StringIO
import pandas as pd
from shimoku import Client

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_ORG_ID = os.getenv("OPENAI_ORG_ID")

s = Client(
    access_token=os.getenv("SHIMOKU_TOKEN"),
    universe_id=os.getenv("UNIVERSE_ID"),
)

s.set_workspace(uuid=os.getenv("WORKSPACE_ID"))

menu_path_name = "insights"
s.set_menu_path(name=menu_path_name)
s.disable_caching()

Note: you must have your SHIMOKU_TOKEN, UNIVERSE_ID, WORKSPACE_ID, OPENAI_API_KEY and OPENAI_ORG_ID saved as environment variables.

For steps 2 to 5, choose the tab below according to the task you want to perform.

Step 2: Prepare and Upload Your Data

Upload any type of table on which you wish to request relevant information. No additional format is imposed.

input_file = pd.read_csv('./input_data.csv')

s.ai.create_input_files(
        input_files={'input_data': input_file.to_csv(index=False).encode()},
        force_overwrite=True
)

Step 3: Execute the Generate Insight Function

Call the insight generator function and adjust the arguments for the generic_insights task.

run_id = s.ai.generic_execute(
    ai_function='generate_insights',
    task='generic_insights',
    data='input_data',
    openai_api_key=OPENAI_API_KEY,
    openai_org_id=OPENAI_ORG_ID,
)

ai_function: str Label for this functionality, which will have the value 'generate_insights'.

openai_api_key: str Your OpenAI unique API key.

openai_org_id: str Your OpenAI organization id.

task: str 'generic_insight' requests to generate insights about a table in any type of format.

data: str Name chosen in create_input_files to refer to the table.

Step 4: Monitor the Process

Wait for the insights to be generated and the outputs to be uploaded.

attempts = 20
wait = 60

for _ in range(attempts):
    try:
        results = s.ai.get_output_file_objects(run_id=run_id)
        if results:
            print("Successfully obtained the output.")
            break  # Exit the loop if results are obtained
    except Exception:
        pass  # Ignore errors and continue
    time.sleep(wait)  # Wait before retrying
else:
    print("Failed to obtain the output after the maximum number of attempts.")

Step 5: Accessing the GPT insights

Once execution is complete, insights are available.

insights = results['insights.txt'][0].decode()

Last updated