IO

File IO

Shimoku users can store and retrieve files in Shimoku as part of our service. Every file is assigned to a workspace and a menu path.

The provided methods are (using s as the Shimoku client):

# To delete an existing file
s.io.delete_file(file_name: str)

# To post a file represented in bytes
s.io.post_object(file_name: str, obj: bytes, overwrite: bool = True)

# To get a file, it will be returned in bytes
s.io.get_object(file_name: str) -> bytes

# To post a dataframe as a file, it will convert it to csv encoding it in 'utf-8'
s.io.post_dataframe(file_name: str, df: pd.DataFrame, overwrite: bool = True)

# To get a file that will be interpreted as a dataframe decoding it using 'utf-8'
s.io.get_dataframe(file_name: str) -> pd.DataFrame

# To post a dataframe into multiple batched files
s.io.post_batched_dataframe(file_name: str, df: pd.DataFrame, batch_size: int = 10000, overwrite: bool = True)

# To get multiple files that joined they form a dataframe
s.io.get_batched_dataframe(file_name: str) -> pd.DataFrame

# To delete multiple files that joined they form a dataframe
s.io.delete_batched_dataframe(file_name: str)

# To post an ai model as a file, it will use the pickle serialization
s.io.post_ai_model(model_name: str, model: Callable)

# To get a file that will be deserialized by using pickle, normally representing an ai model
s.io.get_ai_model(model_name: str) -> Any

# To get all the files from a menu path
s.menu_paths.get_menu_path_files(
    self, uuid: Optional[str] = None, name: Optional[str] = None
) -> List[Dict]

# To delete all the files from a menu path
s.menu_paths.delete_all_menu_path_files(
    self, uuid: Optional[str] = None, name: Optional[str] = None
)

Example #1 You can store raw binary or string objects and retrieve them (they can be ML models or any other binary object)

file_name = 'helloworld'
s.set_menu_path('test')
object_data = b''

s.io.post_object(file_name, object_data)
object: binary = s.io.get_object(file_name=file_name)

Example #2 You can also store pandas dataframes (of any size) and retrieve them easily:

file_name: str = 'df-test'
d = {'a': [1, 2, 3], 'b': [1, 4, 9]}
s.io.post_dataframe(file_name, df=pd.DataFrame(d))
df: pd.DataFrame = s.io.get_dataframe(file_name=file_name)

Example #3 In case the dataframe is very big you can use the batched version that will create batches and store them in different files. It will append the string '_batch_{n}':

df = pd.read_csv('bigdata.csv').reset_index(inplace=True)
s.io.post_batched_dataframe(file_name='test-big-df', df=df)
df: pd.DataFrame = s.io.get_batched_dataframe(file_name='test-big-df')

IO methods for Machine Learning models:

from sklearn import svm
from sklearn import datasets

clf = svm.SVC()
X, y = datasets.load_iris(return_X_y=True)
clf.fit(X, y)

s.io.post_ai_model(model_name='model-test', model=clf)
model = s.io.get_ai_model(model_name='model-test')

Last updated