Create a step
A step is an atomic component defined by its input and output parameters and by the processing it applies. Steps are the building blocks of Pipelines. In practice, a step is a function with inputs and outputs coded in Python. They are assembled with each other to create a complete ML pipeline. The Python code (only available language for the moment) used by the step is stored on a Git repository.
An input of a step is an object you can use inside the code.
An output of a step is defined from the results of the step function.
You will be able to connect inputs & outputs of a step with another step to compose a complete ML pipeline by using a directed acyclic graph (DAG).
Each step is considered as a specific container that is executed on Kubernetes.
The steps are stored in a specific environment, and only people with access to this environment can read and write the steps.

Summary
Function name |
Method |
Return type |
Description |
---|---|---|---|
Input |
Input(name, data_type=”string”, description=””, is_required=False, default_value=None) |
Input SDK Object |
Create an Input object to give at create_steps() function for step a step input. |
Output |
Output(name, data_type=”string”, description=””) |
Output SDK Object |
Create an Output object to give at create_steps() function for step a step output. |
create_step |
create_step(step_name, function_path, function_name, repository_branch=None, description=None, container_config=None, inputs=None, outputs=None) |
list of dict[str, str] |
Create pipeline steps from a source code located on a remote repository. |
Prepare your code a Git repository
Prerequisites: Before the creation of your first step, make sure you have already done this :
Setup Project & Environment
Git repository link to the project
❓ Why do you need to put your code on a Git repository ? This simplifies the access to the source code by the Craft AI platform. Indeed, the platform will be able to directly fetch your code from the repository, without the need for you to send it directly each time you change it, you just have to push it to your GitHub / GitLab repository.
Currently, you can create a step via the Python SDK and not with graphical interface. But, after the creation, you will be able to see the step on the UI platform.
If it’s not already done, put the code of the step into a GitHub / GitLab repository linked to the platform. The file with the entry function of your step can be anywhere in your Git repository.
Example tree file in repo :
| requirements.txt
| src
| my_entry_function_step.py
...
Example my_entry_function_step.py :
import numpy as np
# and other import
def entryStep(dataX_input, dataY_input) :
# Some machine learning code
return result_output
Define step inputs and outputs
A step may need to receive some information or give some result (just
like a function). To do that, we use Input and Output object. These
objects allow defining the properties of the input or output that will
be expected in the step. The input and output objects thus created must
be given as a parameter of the step creation. Each input is defined as
an Input
object and, each Output is defined as an Output
object,
through a class available in the SDK.
Input object definition
from craft_ai_sdk.io import Input
Input(
name="*your_input_name*",
data_type="*your_io_data_type*",
description="",
is_required=True
default_value="*default_value*"
)
Parameters
name
just a name for identifying the input later.data_type
, one of the following possible types:file
: reference to binary data, equivalent to a file’s content. If the input/output is not available, an empty stream.json
: JSON-serializable Python object. The following sub-types are provided for more precise type checking, but they are all JSONstring
number
array
ofJSON
boolean
If the input/output is not available, None in Python
default_value
(optional) - If the parameter is empty, this value will be set by default. If a deployment receives an empty parameter and already put a default value in the input, the default value of deployment will be keep.is_required
(optional,True
by default) - Push an error is the input is empty.description
(optional) - This parameter precise what it’s expected in this input. It’s not read by the machine, it’s like a comment.
Return
No return
Output object definition
from craft_ai_sdk.io import Output
Output(
name="*your_input_name*",
data_type="*your_io_data_type*",
description="",
)
Parameters
name
just a name for identifying the input later.data_type
, one of the following possible types:file
: reference to binary data, equivalent to a file’s content. If the input/output is not available, an empty stream.json
: JSON-serializable Python object. The following sub-types are provided for more precise type checking, but they are all JSONstring
number
array
ofJSON
boolean
If the input/output is not available, None in Python
description
(optional) - This parameter precise what it’s expected in this input. It’s not read by the machine, it’s like a comment.
Return
No return
Note
You can use craft_ai_sdk.INPUT_OUTPUT_TYPES
to get all possible types in Input
and Output
objects.
List of all possible types :
ARRAY
= “array”BOOLEAN
= “boolean”FILE
= “file”JSON
= “json”NUMBER
= “number”STRING
= “string”
Example :
from craft_ai_sdk.io import Input, INPUT_OUTPUT_TYPES
Input(
name="inputName",
data_type=INPUT_OUTPUT_TYPES.JSON,
)
Example for input and output
Input(
name="inputName",
data_type="string",
description="A parameter for step input",
is_required=True,
default_value="default_content_here"
)
Output(
name="inputName",
data_type="string",
description="A parameter for step input",
)
Create a step
Function definition
Create pipeline steps from a source code located on a remote repository.
sdk.create_step(
function_path="src/my_reusable_funtion.py",
function_name="my_function",
inputs=[Input(...)],
outputs=[Output(...)],
[name="*your-custom-step-name*"], # by default its the function name
[description="*text with limit*"],
[repository_branch="*your-git-branch* or *your-git-tag*"],
[container_config = {
[language="python:3.8-slim"],
[repository_url="*your-git-url*"],
[repository_deploy_key="*your-private_key*"],
[requirements_path="*your-path-to-requirements*"],
[included_folders=["*your-list-of-path-to-sources*"]],
[system_dependencies=["package_1", "package_2"]],
[dockerfile_path="*your-dockerfile-path*"],
}],
)
Parameters
function_path
(str) – Path to access to the file who had the entry function of the step.function_name
(str) – Function name of entry function step.inputs
(list<Input>) – List of step inputs.outputs
(list<Output>) – List of step outputs.name
(str) – Step name. By default, it’s the function name. The name must be unique inside an environment and without special character ( - _ & / ? …)description
(str, optional) – Description of the step, it’s no use by the code, it’s only for user.repository_branch
(str, optional) – Branch name for Git repository. Defaults to None.container_config
(dict, optional) – Dict Python object where each key can override default parameter values for this step defined at project level.language
(str, optional) – Language of programming used inside the step. Defaults to “python:3.8-slim”. Versions “python:3.9-slim” and “python:3.10-slim” are also available.repository_url
(str, optional) – Remote repository URL.repository_deploy_key
(str, optional) – Private SSH key related to the repository.requirements_path
(str, optional) – Path to the file requirement for Python dependency.included_folders
(list, optional) – List of folders that need to be accessible from step code.system_dependencies
(list, optional) – List of APT Linux packages to install.dockerfile_path
(str, optional) – Path to a docker-file for having a custom config in step. (see the part after for more detail)
Note
The repository_branch
parameters as well as the container_config
elements (except dockerfile_path
) can take one of the STEP_PARAMETER
object’s values in addition to theirs.
In fact, STEP_PARAMETER
allows us to specify at the step level whether we want to take the project’s values (default behavior) or define a null value:
STEP_PARAMETER.FALLBACK_PROJECT
: Allows to take the value defined in the project parameters (default behavior if the field is not defined).STEP_PARAMETER.NULL
: Allows to set the field to null value and not to take the value defined in the project.
Example with a code step that does not need a requirement.txt and does not take the one defined in the project settings:
from craft_ai_sdk import STEP_PARAMETER
# Code for init SDK here ...
sdk.create_step(
function_path="src/helloWorld.py",
function_name="helloWorld",
step_name="stepName",
container_config = {
"requirements_path": STEP_PARAMETER.NULL,
}
)
Warning
The size of the embedded code from your repository must not exceed 1MB.
You can select the part of your repository to import using the included_folders
parameter.
If the data you want to import is larger than 1Mb, you can use the data store to store it and then import it into your step.
Returns
List of steps represented as dict (with either key “name” if the creation succeeded or keys “name” and “error” if the creation failed). The return type is a list of dict[str, str].
ℹ️ Once our step is created, we need to create the pipeline that wraps the step. It is mandatory to create a pipeline once the step is created to be able to use it later. This technical choice was made in anticipation of future multistep functionality. This forces the use of a pipeline to contain the steps.
Example: Create step from scratch
Function usage
from craft_ai_sdk import Input, Output
input1 = Input(
name="input1",
data_type="string",
description="A parameter named input1, its type is a string",
is_required=True,
)
input2 = Input(
name="input2",
data_type="file",
description="A parameter named input2, its type is a file"
)
input3 = Input(
name="input3",
data_type="number",
)
prediction_output = Output(
name="prediction",
data_type="file",
default_value="default,content,here",
)
step = sdk.create_step(
function_path="src/my_reusable_funtion.py",
function_name="my_function",
inputs_list=[input1, input2, input3],
outputs_list=[prediction_output],
description="Apply the model to the sea",
## ...
)
Returned object
{
"name": "apply_model",
"inputs_list": [
{
"name": "input1",
"data_type": "string",
"is_required": True,
"description": ...,
},
...
],
"ouputs_list": [
...
],
...
}
Note
If you need to create a step with more specific configuration, you can do this with a custom dockerfile, more detail about here.