Generate Insights
v.1.0.0
Overview
The generate_insights tool allows the user to add explanatory insights, generated with OpenAI API, to a dataset of various natures. This could be generic data provided by the user, such as a table, a bar chart, etc., or it could be an output file generated by one of our tools. Currently, the implemented tools are as follows:
generic_insights: For any table introduced by the user, a series of bullet points with insights about the data is returned.
partial_dependence: Given the data frame containing the partial dependence evaluations,
df_pdp.csv
, generated in the Train Classification function, this tool provides a textual explanation of each potential one-dimensional partial dependence graph available.drivers_barriers: This tool starts from the table of drivers and barriers, df_db.csv, generated in the Train Classification or Predict Classification functions. To every row, it adds a textual description explaining which inputs contribute the most, both positively and negatively, to the target taking a specific value. Executions are currently limited to 15 rows at a time.
Version 1.0.0 of the tool requires user access to the OpenAI model gpt-4-1106-preview
, to ensure proper functionality.
Step 0: Get ready
Make sure you have followed these steps first: Setup and Requirements
Step 1: Initialize the Client and set up your workspace
Import necessary libraries and initialize the Shimoku client with your credentials. Define a workspace and a menu path for organizing your AI models.
import os
import time
from io import StringIO
import pandas as pd
from shimoku import Client
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_ORG_ID = os.getenv("OPENAI_ORG_ID")
s = Client(
access_token=os.getenv("SHIMOKU_TOKEN"),
universe_id=os.getenv("UNIVERSE_ID"),
)
s.set_workspace(uuid=os.getenv("WORKSPACE_ID"))
menu_path_name = "insights"
s.set_menu_path(name=menu_path_name)
s.disable_caching()
Note: you must have your SHIMOKU_TOKEN,
UNIVERSE_ID
, WORKSPACE_ID,
OPENAI_API_KEY
and OPENAI_ORG_ID
saved as environment variables.
For steps 2 to 5, choose the tab below according to the task you want to perform.
Step 2: Prepare and Upload Your Data
Upload any type of table on which you wish to request relevant information. No additional format is imposed.
input_file = pd.read_csv('./input_data.csv')
s.ai.create_input_files(
input_files={'input_data': input_file.to_csv(index=False).encode()},
force_overwrite=True
)
Step 3: Execute the Generate Insight Function
Call the insight generator function and adjust the arguments for the generic_insights task.
run_id = s.ai.generic_execute(
ai_function='generate_insights',
task='generic_insights',
data='input_data',
openai_api_key=OPENAI_API_KEY,
openai_org_id=OPENAI_ORG_ID,
)
ai_function: str Label for this functionality, which will have the value 'generate_insights'.
openai_api_key: str Your OpenAI unique API key.
openai_org_id: str Your OpenAI organization id.
task: str 'generic_insight' requests to generate insights about a table in any type of format.
data: str Name chosen in create_input_files to refer to the table.
Step 4: Monitor the Process
Wait for the insights to be generated and the outputs to be uploaded.
attempts = 20
wait = 60
for _ in range(attempts):
try:
results = s.ai.get_output_file_objects(run_id=run_id)
if results:
print("Successfully obtained the output.")
break # Exit the loop if results are obtained
except Exception:
pass # Ignore errors and continue
time.sleep(wait) # Wait before retrying
else:
print("Failed to obtain the output after the maximum number of attempts.")
Step 5: Accessing the GPT insights
Once execution is complete, insights are available.
insights = results['insights.txt'][0].decode()
Last updated
Was this helpful?