Predict Classification

v.1.0.0

Overview

After training your classification models, the predict_classification function allows you to make predictions on new data. This guide will walk you through the necessary setup and steps to use your trained model for prediction.

Step 0: Get ready

To make predictions you must have trained a classification model first. If you haven't, follow the steps here first: Train Classification.

Step 1: Initialize the Client and set up your workspace

If you haven't, start by importing necessary modules and initializing your client with the appropriate credentials.

Set the same menu path you did when you trained your classification model and disable caching for real-time data processing.

import os
import time
from io import StringIO
import pandas as pd
from shimoku import Client

s = Client(
    access_token=os.getenv("SHIMOKU_TOKEN"),
    universe_id=os.getenv("UNIVERSE_ID"),
)

s.set_workspace(uuid=os.getenv("WORKSPACE_ID"))

menu_path_name = "insurance_model"
s.set_menu_path(name=menu_path_name)
s.disable_caching()

Note: you must have your SHIMOKU_TOKEN, UNIVERSE_ID and WORKSPACE_ID saved as environment variables.

Step 2: Prepare Your Prediction Data

Load the data you wish to predict on and create an input file for the prediction process.

Here's a sample you can use to predict based on the model you created in Train Classification:

input_file = pd.read_csv('./sample_predict_dataset.csv').to_csv(index=False).encode()

s.ai.create_input_files(
    input_files={'predict_insurance': input_file},
    force_overwrite=True
)

Step 3: Execute the Prediction Function

Use your trained model to make predictions on the new data.

run_id = s.ai.generic_execute(
    ai_function='predict_classification',
    model_name='churn_insurance',
    table="predict_insurance"
)

Step 4: Monitor and Retrieve Predictions

Wait for the prediction process to complete and the outputs to be available.

attempts = 20
wait = 60

for _ in range(attempts):
    try:
        results = s.ai.get_output_file_objects(run_id=run_id)
        if results:
            print("Successfully obtained the output.")
            break  # Exit the loop if results are obtained
    except Exception:
        pass  # Ignore errors and continue
    time.sleep(wait)  # Wait before retrying
else:
    print("Failed to obtain the output after the maximum number of attempts.")

Step 5: Access the Prediction Results

Once the execution is complete, retrieve the output files with your predictions.

output_dict = dict()
for file_name, bytes_obj in results.items():
    output_dict[file_name] = pd.read_csv(StringIO(bytes_obj[0].decode('utf-8')))

The dictionary output_dict will have 2 items in which the keys are the names of the outputs and the value are pandas data frames. The following outputs will be available:

  • df_predicted.csv: Data frame containing predictions for the data used as input.

  • df_db.csv: Dataframe containing drivers and barriers per prediction.

Have a look here to better understand the outputs.

Finally, if you want to save these outputs in your local machine, you can execute the following:

for file_name, dataframe in output_dict.items(): 
    dataframe.to_csv(file_name, index=False)

Last updated