Create step with custom Docker config

Warning

This part is completely optional for using the platform. The use of custom dockerfiles is suggested to address very specific use cases that require advanced configuration of the environment (more than just installing a pip or system dependencies).

Docker is a platform that allows you to create, deploy, and run applications in containers. Containers are a lightweight alternative to virtual machines that provide a way to package and isolate your application's dependencies and configuration into a single unit. With Docker, you can easily move your application between environments, from development to production, without worrying about dependencies or compatibility issues.

In the context of MLOps, Docker is particularly useful because it allows you to create custom containers for running your for running your machine learning code. This is important because machine learning models often have complex dependencies that can be difficult to manage. By using Docker, you can create a container with all the necessary dependencies pre-installed, ensuring that your code runs consistently across different configurations.

In our MLOps platform, you can provide a custom dockerfile to configure your step's container (which will contain the step execution). This gives you complete control over the container, allowing you to install any necessary libraries or dependencies, and ensure that your code runs consistently and reproducibly. By using Docker in this way, you can you can create more complex steps that require specific configuration, and ensure that your machine learning models are always running in the correct container configuration for that step.

Function name	Method	Return type	Description
Input	Input(name, data_type="string", description="", is_required=False, default_value=None)	Input SDK Object	Create an Input object to give at create_steps() function for step a step input.
Output	Output(name, data_type="string", description="")	Output SDK Object	Create an Output object to give at create_steps() function for step a step output.
create_step	create_step(step_name, description=None, container_config=None, inputs=None, outputs=None)	list of dict[str, str]	Create pipeline steps from a source code located on a remote repository.

Create dockerfile

Explanation dockerfile in the platform

To configure the step environment, we use a dockerfile where we specify the steps that will lead to the execution of the code. The steps we will have in the dockerfile are :

Use the base image of Python
Define the working folder ('app' folder is mandatory here)
Install the system dependencies and pip if needed
Copy the repo files into the docker container
Run the script

Note that you can of course add steps to this minimal process to configure the container according to your needs.

Example minimal dockerfile

FROM python:3.8-slim

# Workdir always need to be config at /app
ENV ROOT_DIR /app
WORKDIR ${ROOT_DIR}

# copy context (repo root here) inside docker container
COPY . .

# Lanch your Python script
ENTRYPOINT [ "python", "path/pythonCode.py" ]

Example dockerfile with system dependencies and pip dependencies

FROM python:3.9-slim

# Workdir always need to be config at /app
ENV ROOT_DIR /app
WORKDIR ${ROOT_DIR}

# Install system dependencies
RUN apt-get install dependencies

# Install Python dependencies
RUN pip install dependencies

# copy context (repo root here) inside docker container
COPY . .

# Lanch your Python script
ENTRYPOINT [ "python", "src/otherCode.py" ]

You can find more information about dockerfile here: https://docs.docker.com/engine/reference/builder/

Adapt step code

Explanation of step code changes

In the code of the steps we start with dockerfiles, we need to call :

Call the function you want to start directly in the script (see examples below)
If you want to define input, you will get it in "sys.argv" object, you don't need to define anything in dockerfile (see the second example below).
If you want to define output, you need to write the output files in the "app" folder with the prefix "output-".

Example without inputs and outputs

import json
import sys
import os


def my_function () :

    print ("Hello world !")



if __name__ == "__main__":

    my_function()

Example with inputs and outputs

The script is called with two inputs arguments that are serialized JSON:

The first contains the input values for the non-file inputs (the name of the input as key and its value as value).
The second contains the path to the file inputs (the name of the input as key and the object containing the path as value).

import json
import sys
import os


def my_function (number, file) :

    with open(file["path"]) as f:
        contents = f.readlines()


    # Your code here


    return ["Return step array", "output elem", "other elem"]



if __name__ == "__main__":

    # Pre-processing: Inputs
    parameters_inputs = json.loads(sys.argv[1])
    files_inputs = json.loads(sys.argv[2])

    # Call function
    result = my_function(**parameters_inputs, **files_inputs)

    # Post-processing: Outputs
    output_dir = '/app'
    with open(os.path.join(output_dir, 'output-result_var'), 'w+') as f:
        json.dump(result, f)

    with open(os.path.join(output_dir, 'output-result_file'), 'w') as f:
        f.write('\n'.join(f'* {item}' for item in result))

Note

If you want to send or return a file by endpoint, you can only send or return 1 file in it. The mapping of other input and output must be done with something else.

Create a step

Before creating a step, you can add input and output as a classic step, more information here To create steps with your dockerfile, you need to call the create_step() function with just the name of the step and the path to the dockerfile inside your repo. You don't have to specify the exact path to the code or function name because it will be taken from the dockerfile.

Once the step is created, you can create a pipeline and deployment as usual.

sdk.create_step(
    name="*your-custom-step-name*",
    [description="*text with limit*"],
    [container_config = {
        [repository_url="*your-git-url*"],
        [repository_branch="*your-git-branch* or *your-git-tag*"],
        [repository_deploy_key="*your-private_key*"],
        [included_folders=["*your-folder-path*", ...]],
        dockerfile_path="*your-dockerfile-path*",
    inputs=[Input(...)],
    outputs=[Output(...)],
}],
)

Parameters

name (str) – Name of step. By default, it’s the function name. The name must be unique inside an environment and without special character ( - _ & / ? …)
description (str, optional) – Description of the step, it’s no use by the code, it’s only for user.
container_config (dict, optional) – Dict Python object where each key can override default parameter values for this step defined at project level.
repository_url (str, optional) – Remote repository URL.
repository_branch (str, optional) – Branch name for Git repository. Defaults to None.
repository_deploy_key (str, optional) – Private SSH key related to the repository.
included_folders (list[str]) - List of folders and files in the repository required for the step execution. Defaults to falling back on project information, can be set to null.
dockerfile_path (str, optional) – Path to a docker-file for having a custom config in step. (see the part after for more detail)
inputs (list<Input>) – List of step inputs.
outputs (list<Output>) – List of step outputs.

Returns

List of steps represented as dict (with either key “name” if the creation succeeded or keys “name” and “error” if the creation failed). The return type is a list of dict[str, str].