Shimoku docs
  • 🚀QuickStart
    • Installation & Setup
    • Minimal APP example
    • Extended Example
    • Templates & tutorials
  • 🤖Artificial Intelligence
    • Classification
      • Train Classification
        • Train Classification Outputs
      • Predict Classification
        • Predict Classification Outputs
    • Generate Insights
      • Generate Insights Outputs
  • 🛠️Building Web App
    • Environment
      • Overview
      • Environment Variables
    • Management
      • Managing Universes
      • Managing Workspaces
      • Managing Boards
    • Menu
      • Changing the Menu Path
      • Updating the Menu Path
      • Deleting the Menu Path
    • Grid
      • Using the Grid
    • Theming
      • Colors Usage
    • Actions
    • Modals
    • IO
  • 💡Elements
    • Summary
    • Charts
      • Table
        • Buttons Column
        • Filters
        • Search bars
        • Colored labels
      • HTML
        • Raw HTML
        • Beautiful Titles
        • Background Indicators
        • Click to New Tab
        • Box With Button
        • Panel
      • Indicators
        • Indicator
        • Grouped Indicators
        • Vertical Indicators
        • Color by Value
        • Gauge Indicator
        • Indicators with Header
      • Scatter Charts
        • Scatter
        • Scatter with Effect
      • Line Charts
        • Line
        • Predictive Line
        • Segmented Line
        • Marked Line
        • Line With Confidence Area
        • Top Bottom Line Charts
        • Summary Line
      • Area Charts
        • Area
        • Stacked Area
        • Segmented Area
        • Top Bottom Area Charts
      • Bar Charts
        • Bar
        • Stacked Bar
        • Horizontal Bar
        • Stacked Horizontal Bar
        • Zero Centered Bar
      • Pie Charts
        • Pie
        • Doughnut
        • Rose
      • Gauge Charts
        • Shimoku Gauge
        • Speed Gauge
      • Input forms
        • Group chained Inputs
        • List input search
        • Conditional inputs
        • Audio input
        • Drag & Drop
      • Line and Bar
      • Waterfall
      • Annotated Chart
      • Heatmap
      • Radar
      • Sunburst
      • Tree
      • Treemap
      • Sankey Diagram
      • Funnel chart
      • iFrame
    • Composite Template Charts
      • Infographics text bubble
      • Chart and Modal Button
      • Chart and Indicators
    • Data Sets
      • Data Set Filters
    • Create your own charts
      • Free Echarts
      • Bento box
    • Features & Navigation
      • Tabs
      • History navigation & Breadcrumb
  • 🔍Advanced usage
    • CLI
    • Workflow Triggers
    • Code Generation
  • 🌍Cloud & Community
    • Shimoku Cloud
    • Shared links
    • Handling Workspaces & Users
      • User authentication
      • Inviting users
      • Creating users
      • Users with multi-workspace access
  • 🌐Releases
    • 2024
      • v.2.6
      • v.2.5
      • v.2.4
      • v.2.3
        • v.2.3.1
      • v.2.2
        • v.2.2.3
        • v.2.2.2
        • v.2.2.1
      • v.2.1
        • v.2.1.2
        • v.2.1.1
      • v.2.0
        • v.2.0.(1..4)
      • v.1.6
        • v.1.6.1
      • v.1.5
    • 2023
      • v.1.4
        • v.1.4.1
        • v.1.4.2
      • v.1.3
      • v.1.2
        • v.1.2.1
      • v.1.1
        • v.1.1.1
      • v.1.0
        • v.1.0.2
        • v.1.0.1
      • v.0.20
      • v.0.19
      • v.0.18
      • v.0.17
        • v.0.17.1
      • v.0.16
        • v.0.16.3
        • v.0.16.2
        • v.0.16.1
      • v.0.15
      • v.0.14
    • 2022
      • v.0.13
        • v.0.13.3
      • v.0.12
      • v.0.11
      • v.0.10
        • v.0.10.4
        • v.0.10.3
        • v.0.10.1
      • v.0.9
      • v.0.8
      • v.0.7
        • v.0.7.1
      • v.0.6
      • v.0.5
      • v.0.4
      • v.0.3
        • v0.3.2
        • v0.3.1
      • v.0.2
Powered by GitBook
On this page
  • Overview
  • Step 0: Get ready
  • Step 1: Initialize the Client and set up your workspace

Was this helpful?

  1. Artificial Intelligence

Generate Insights

v.1.0.0

PreviousPredict Classification OutputsNextGenerate Insights Outputs

Last updated 1 year ago

Was this helpful?

Overview

The generate_insights tool allows the user to add explanatory insights, generated with OpenAI API, to a dataset of various natures. This could be generic data provided by the user, such as a table, a bar chart, etc., or it could be an output file generated by one of our tools. Currently, the implemented tools are as follows:

  • generic_insights: For any table introduced by the user, a series of bullet points with insights about the data is returned.

  • partial_dependence: Given the data frame containing the partial dependence evaluations, df_pdp.csv, generated in the function, this tool provides a textual explanation of each potential one-dimensional partial dependence graph available.

  • drivers_barriers: This tool starts from the table of drivers and barriers, df_db.csv, generated in the or functions. To every row, it adds a textual description explaining which inputs contribute the most, both positively and negatively, to the target taking a specific value. Executions are currently limited to 15 rows at a time.

Version 1.0.0 of the tool requires user access to the OpenAI model gpt-4-1106-preview, to ensure proper functionality.

Step 0: Get ready

Make sure you have followed these steps first: Setup and Requirements

Step 1: Initialize the Client and set up your workspace

Import necessary libraries and initialize the Shimoku client with your credentials. Define a workspace and a menu path for organizing your AI models.

import os
import time
from io import StringIO
import pandas as pd
from shimoku import Client

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_ORG_ID = os.getenv("OPENAI_ORG_ID")

s = Client(
    access_token=os.getenv("SHIMOKU_TOKEN"),
    universe_id=os.getenv("UNIVERSE_ID"),
)

s.set_workspace(uuid=os.getenv("WORKSPACE_ID"))

menu_path_name = "insights"
s.set_menu_path(name=menu_path_name)
s.disable_caching()

Note: you must have your SHIMOKU_TOKEN, UNIVERSE_ID, WORKSPACE_ID, OPENAI_API_KEY and OPENAI_ORG_ID saved as environment variables.

For steps 2 to 5, choose the tab below according to the task you want to perform.

Step 2: Prepare and Upload Your Data

Upload any type of table on which you wish to request relevant information. No additional format is imposed.

input_file = pd.read_csv('./input_data.csv')

s.ai.create_input_files(
        input_files={'input_data': input_file.to_csv(index=False).encode()},
        force_overwrite=True
)

Step 3: Execute the Generate Insight Function

Call the insight generator function and adjust the arguments for the generic_insights task.

run_id = s.ai.generic_execute(
    ai_function='generate_insights',
    task='generic_insights',
    data='input_data',
    openai_api_key=OPENAI_API_KEY,
    openai_org_id=OPENAI_ORG_ID,
)

ai_function: str Label for this functionality, which will have the value 'generate_insights'.

openai_api_key: str Your OpenAI unique API key.

openai_org_id: str Your OpenAI organization id.

task: str 'generic_insight' requests to generate insights about a table in any type of format.

data: str Name chosen in create_input_files to refer to the table.

Step 4: Monitor the Process

Wait for the insights to be generated and the outputs to be uploaded.

attempts = 20
wait = 60

for _ in range(attempts):
    try:
        results = s.ai.get_output_file_objects(run_id=run_id)
        if results:
            print("Successfully obtained the output.")
            break  # Exit the loop if results are obtained
    except Exception:
        pass  # Ignore errors and continue
    time.sleep(wait)  # Wait before retrying
else:
    print("Failed to obtain the output after the maximum number of attempts.")

Step 5: Accessing the GPT insights

Once execution is complete, insights are available.

insights = results['insights.txt'][0].decode()

Step 2: Prepare and Upload Your Data

df_pdp = pd.read_csv('./df_pdp.csv')

# The number of pd plots with insights is currently limited to 10 per execution
cols_to_groupby = ['column_target', 'class', 'name_feature']
first_10_pdp = (df_pdp[cols_to_groupby].drop_duplicates().head(10))
df_pdp_10 = pd.merge(df_pdp, first_10_pdp, on=cols_to_groupby)

s.ai.create_input_files(
    input_files={'pd_data': df_pdp_10.to_csv(index=False).encode()},
    force_overwrite=True
)

Step 3: Execute the Generate Insight Function

Call the insight generator function and adjust the arguments for the partial dependence task.

run_id = s.ai.generic_execute(
    ai_function='generate_insights',
    task='partial_dependence',
    data='pd_data',
    openai_api_key=OPENAI_API_KEY,
    openai_org_id=OPENAI_ORG_ID,
)

ai_function: str Label for this functionality, which will have the value 'generate_insights'.

openai_api_key: str Your OpenAI unique API key.

openai_org_id: str Your OpenAI organization id.

task: str 'partial_dependence' task.

data: str Name chosen in create_input_files referring to df_pdp.csv.

Step 4: Monitor the Process

Wait for the insights to be generated and the outputs to be uploaded.

attempts = 20
wait = 60

for _ in range(attempts):
    try:
        results = s.ai.get_output_file_objects(run_id=run_id)
        if results:
            print("Successfully obtained the output.")
            break  # Exit the loop if results are obtained
    except Exception:
        pass  # Ignore errors and continue
    time.sleep(wait)  # Wait before retrying
else:
    print("Failed to obtain the output after the maximum number of attempts.")

Step 5: Accessing the GPT insights

Once execution is complete, insights are available.

df_pdp_insights = pd.read_csv(StringIO(results['df_insights.csv'][0].decode('utf-8')))

Step 2: Prepare and Upload Your Data

The df_db.csv file must maintain its original format, except for the fact that you will need to break it in chucks of up to 15 rows, as it's the current limitation per execution.

input_file = pd.read_csv('./sample_training_dataset.csv')
df_db = pd.read_csv('./df_db.csv')

df_db_sample = df_db.head(15)

s.ai.create_input_files(
    input_files={'training_insurance': input_file.to_csv(index=False).encode(), 
                 'db_data': df_db_sample.to_csv(index=False).encode()},
    force_overwrite=True
)

Step 3: Execute the Generate Insight Function

Call the insight generator function and adjust the arguments for the drivers and barriers task.

run_id = s.ai.generic_execute(
    ai_function='generate_insights',
    task='drivers_barriers',
    data='db_data',
    context_data='training_insurance',
    OPENAI_API_KEY
    openai_org_id=OPENAI_ORG_ID,
)

ai_function: str Label for this functionality, which will have the value 'generate_insights'.

openai_api_key: str Your OpenAI unique API key.

openai_org_id: str Your OpenAI organization id.

task: str 'drivers_barriers' task.

data: str Name chosen in create_input_files referring to db_data.csv.

context_data: str Name chosen in create_input_files referring to the data used to train the classification model. Required to provide insights.

Step 4: Monitor the Process

Wait for the insights to be generated and the outputs to be uploaded.

attempts = 20
wait = 60

for _ in range(attempts):
    try:
        results = s.ai.get_output_file_objects(run_id=run_id)
        if results:
            print("Successfully obtained the output.")
            break  # Exit the loop if results are obtained
    except Exception:
        pass  # Ignore errors and continue
    time.sleep(wait)  # Wait before retrying
else:
    print("Failed to obtain the output after the maximum number of attempts.")

Step 5: Accessing the GPT insights

Once execution is complete, insights are available by the user.

df_db_insights = pd.read_csv(StringIO(results['df_insights.csv'][0].decode('utf-8')))

Upload the resulting partial dependence file, df_pdp.csv, as it was returned by our function.

Upload the drivers and barriers file, df_db.csv, as it was generated by one of our functions, the or . Also, you will need to upload the dataset you used to train your model. Here are the files we used in the example of .

🤖
Train Classification
Train Classification
Predict Classification
Train Classification
Train Classification
Predict Classification
Train Classification
1MB
input_data.csv
40KB
df_pdp.csv
11MB
df_db.csv
2MB
sample_training_dataset.csv