Shimoku docs
  • 🚀QuickStart
    • Installation & Setup
    • Minimal APP example
    • Extended Example
    • Templates & tutorials
  • 🤖Artificial Intelligence
    • Classification
      • Train Classification
        • Train Classification Outputs
      • Predict Classification
        • Predict Classification Outputs
    • Generate Insights
      • Generate Insights Outputs
  • 🛠️Building Web App
    • Environment
      • Overview
      • Environment Variables
    • Management
      • Managing Universes
      • Managing Workspaces
      • Managing Boards
    • Menu
      • Changing the Menu Path
      • Updating the Menu Path
      • Deleting the Menu Path
    • Grid
      • Using the Grid
    • Theming
      • Colors Usage
    • Actions
    • Modals
    • IO
  • 💡Elements
    • Summary
    • Charts
      • Table
        • Buttons Column
        • Filters
        • Search bars
        • Colored labels
      • HTML
        • Raw HTML
        • Beautiful Titles
        • Background Indicators
        • Click to New Tab
        • Box With Button
        • Panel
      • Indicators
        • Indicator
        • Grouped Indicators
        • Vertical Indicators
        • Color by Value
        • Gauge Indicator
        • Indicators with Header
      • Scatter Charts
        • Scatter
        • Scatter with Effect
      • Line Charts
        • Line
        • Predictive Line
        • Segmented Line
        • Marked Line
        • Line With Confidence Area
        • Top Bottom Line Charts
        • Summary Line
      • Area Charts
        • Area
        • Stacked Area
        • Segmented Area
        • Top Bottom Area Charts
      • Bar Charts
        • Bar
        • Stacked Bar
        • Horizontal Bar
        • Stacked Horizontal Bar
        • Zero Centered Bar
      • Pie Charts
        • Pie
        • Doughnut
        • Rose
      • Gauge Charts
        • Shimoku Gauge
        • Speed Gauge
      • Input forms
        • Group chained Inputs
        • List input search
        • Conditional inputs
        • Audio input
        • Drag & Drop
      • Line and Bar
      • Waterfall
      • Annotated Chart
      • Heatmap
      • Radar
      • Sunburst
      • Tree
      • Treemap
      • Sankey Diagram
      • Funnel chart
      • iFrame
    • Composite Template Charts
      • Infographics text bubble
      • Chart and Modal Button
      • Chart and Indicators
    • Data Sets
      • Data Set Filters
    • Create your own charts
      • Free Echarts
      • Bento box
    • Features & Navigation
      • Tabs
      • History navigation & Breadcrumb
  • 🔍Advanced usage
    • CLI
    • Workflow Triggers
    • Code Generation
  • 🌍Cloud & Community
    • Shimoku Cloud
    • Shared links
    • Handling Workspaces & Users
      • User authentication
      • Inviting users
      • Creating users
      • Users with multi-workspace access
  • 🌐Releases
    • 2024
      • v.2.6
      • v.2.5
      • v.2.4
      • v.2.3
        • v.2.3.1
      • v.2.2
        • v.2.2.3
        • v.2.2.2
        • v.2.2.1
      • v.2.1
        • v.2.1.2
        • v.2.1.1
      • v.2.0
        • v.2.0.(1..4)
      • v.1.6
        • v.1.6.1
      • v.1.5
    • 2023
      • v.1.4
        • v.1.4.1
        • v.1.4.2
      • v.1.3
      • v.1.2
        • v.1.2.1
      • v.1.1
        • v.1.1.1
      • v.1.0
        • v.1.0.2
        • v.1.0.1
      • v.0.20
      • v.0.19
      • v.0.18
      • v.0.17
        • v.0.17.1
      • v.0.16
        • v.0.16.3
        • v.0.16.2
        • v.0.16.1
      • v.0.15
      • v.0.14
    • 2022
      • v.0.13
        • v.0.13.3
      • v.0.12
      • v.0.11
      • v.0.10
        • v.0.10.4
        • v.0.10.3
        • v.0.10.1
      • v.0.9
      • v.0.8
      • v.0.7
        • v.0.7.1
      • v.0.6
      • v.0.5
      • v.0.4
      • v.0.3
        • v0.3.2
        • v0.3.1
      • v.0.2
Powered by GitBook
On this page
  • Overview
  • Step 0: Get ready
  • Step 1: Initialize the Client and set up your workspace
  • Step 2: Prepare and Upload Your Data
  • Step 3: Execute the Training Function
  • Step 4: Monitor the Training Process
  • Step 5: Accessing the Model Outputs

Was this helpful?

  1. Artificial Intelligence
  2. Classification

Train Classification

v.1.0.0

PreviousClassificationNextTrain Classification Outputs

Last updated 1 year ago

Was this helpful?

Overview

The train_classification function in Shimoku's SDK enables users to train machine learning models for various classification tasks. This guide will walk you through setting up your environment, preparing your data, and using the train_classification function to train a model.

Step 0: Get ready

Make sure you have followed these steps first: Setup and Requirements

Step 1: Initialize the Client and set up your workspace

Import necessary libraries and initialize the Shimoku client with your credentials.

Define a menu path, any name you want, for organizing your AI models and disable caching for real-time data processing. Import necessary libraries and initialize the Shimoku client with your credentials. Define a menu path, any name you want, for organizing your AI models and disable caching for real-time data processing.

import os
import time
from io import StringIO
import pandas as pd
from shimoku import Client

s = Client(
    access_token=os.getenv("SHIMOKU_TOKEN"),
    universe_id=os.getenv("UNIVERSE_ID"),
)

s.set_workspace(uuid=os.getenv("WORKSPACE_ID"))

menu_path_name = "insurance_model"
s.set_menu_path(name=menu_path_name)
s.disable_caching()

Note: you must have your SHIMOKU_TOKEN, UNIVERSE_ID and WORKSPACE_ID saved as environment variables.

Step 2: Prepare and Upload Your Data

Load your training data and create an input file for the AI function.

Here's a sample dataset for you to train your first model:

input_file = pd.read_csv('./sample_training_dataset.csv').to_csv(index=False).encode()

s.ai.create_input_files(
    input_files={'training_insurance': input_file},
    force_overwrite=True
)

Note that the input file is passed as a dictionary in which the key is the name you are assigning to your file (in this example, 'training_insurance') and the value is a CSV bytes object.

Step 3: Execute the Training Function

Train your classification model by specifying model details and training parameters.

run_id = s.ai.generic_execute(
    ai_function="train_classification",
    model_name="churn_insurance",
    table="training_insurance",
    columns_target=["Churn"],
    strategy="predictor",
    id_columns=["Customer"]
)

Let's understand the parameters:

ai_function: str It must be 'train_classification' when you are performing this task.

model_name: str Whatever name you want to give it, you'll refer to it later if you use the Predict Classification function.

table: str It must be the name of an input file you previously created - see Step 2.

columns_target: List[str] A list containing the name of the target column(s), that is, the column(s) in your dataset that contain(s) the values that we are trying to predict with our model.

strategy: str 'predictor' or 'recommender'. Choose 'predictor' if you want to predict churn or lead scoring, for example. If you want to be able to recommend products, for example, choose 'recommender'.

id_columns: List[str] A list containing all the columns in your dataset which are ids.

Step 4: Monitor the Training Process

Wait for the model to be trained and the outputs to be uploaded.

attempts = 20
wait = 60

for _ in range(attempts):
    try:
        results = s.ai.get_output_file_objects(run_id=run_id)
        if results:
            print("Successfully obtained the output.")
            break  # Exit the loop if results are obtained
    except Exception:
        pass  # Ignore errors and continue
    time.sleep(wait)  # Wait before retrying
else:
    print("Failed to obtain the output after the maximum number of attempts.")

Step 5: Accessing the Model Outputs

Once training is complete you can access the output files, which include predictions for the training dataset, explainability files and model scoring.

output_dict = dict()
for file_name, bytes_obj in results.items():
    output_dict[file_name] = pd.read_csv(StringIO(bytes_obj[0].decode('utf-8')))

The dictionary output_dict will have 4 items in which the keys are the names of the outputs and the value are pandas data frames. The following outputs will be available:

  • df_predicted.csv: Data frame containing predictions for the data used to train the model.

  • df_importance.csv: Data frame containing the importance of each feature.

  • df_db.csv: Dataframe containing drivers and barriers per prediction.

  • df_pdp.csv: Dataframe containing partial dependence evaluations per feature.

  • scoring_naive.csv: Dataframe containing model performance metrics.

Finally, if you want to save these outputs in your local machine, you can execute the following:

for file_name, dataframe in output_dict.items(): 
    dataframe.to_csv(file_name, index=False)

.

Also, now your model is ready for predictions of new data through the .

You can use the AI function to create text insights for df_db.csv and df_pdp.csv.

🤖
Have a look here to better understand the outputs
Predict Classification
Generate Insights
2MB
sample_training_dataset.csv