3.3 Part 3: Deploy with input and output

3.3.1 Introduction

In Part 2, we have built and deployed our first ML pipeline to train a model and compute predictions on a test subset of the iris dataset.

What if we want to use our pipeline to perform predictions on new data? Currently, we can not pass new data to our model. ⇒ We need to add an Input to our pipeline.

Moreover, in Part 2, our deployment only prints predictions in the logs. What if we want to provide these predictions to a final user? ⇒ We need to add an Output to our pipeline.

This part will show you how to do this with the Craft AI platform:

  • We will first update the code of the TrainPredictIris() function so that it can receive data and return predictions.

  • Then, we will see how to create a step, a pipeline and an endpoint that can handle input data and return the corresponding predictions as an output.

By the end of this part, we will have built an application that allows any user to get the predictions of the iris species on new data with a simple endpoint call:


3.3.2 Updating the application


First we have to update our code to compute predictions on any (correctly prepared) data given as input instead of computing predictions on a test set (below, we highlight in blue and bold the code changes compared with Part 2). Hence, our file src/part-3-irisModelIO.py is as follows:

import numpy as np
import pandas as pd

from sklearn import datasets
from sklearn.neighbors import KNeighborsClassifier

def TrainPredictIris(**input_data: dict**):
    iris_X, iris_y = datasets.load_iris(return_X_y=True, as_frame=True)

    indices = np.random.permutation(len(iris_X))

    iris_X_train = iris_X.loc[indices[0:90], :]
    iris_y_train = iris_y.loc[indices[0:90]]
    ## iris_X_test = iris_X.loc[indices[90:], :]
    ## iris_y_test = iris_y.loc[indices[90:]]

    knn = KNeighborsClassifier()
    knn.fit(iris_X_train, iris_y_train)

    input_dataframe = pd.DataFrame.from_dict(input_data, orient="index")
    result = knn.predict(input_dataframe)


    final_result = result.tolist()
    return {"predictions": final_result}

Let’s explain changes compared to the code of Part 2:

  • We add the argument input_data. Here, we choose it to be a dictionary like the one below:

        1: {
            'sepal length (cm)': 6.7,
          'sepal width (cm)': 3.3,
          'petal length (cm)': 5.7,
          'petal width (cm)': 2.1
      2: {
          'sepal length (cm)': 4.5,
          'sepal width (cm)': 2.3,
          'petal length (cm)': 1.3,
          'petal width (cm)': 0.3

    It contains the data on which we want to compute predictions.

  • We remove test set from our code (we’ll be able to pass it through the input_data argument if we want to) and just keep the train set.

  • At the end, we convert our input_data dictionary into a Pandas dataframe, and we compute predictions with our trained model.

  • As you can see, the function now returns a Python dict with one field called “predictions” that contains the predictions value. The platform only accepts step function with one return value of type ``dict``. Each item of this dict will be an output of the step and the key associated with each item will be the name of this output on the platform.

  • Moreover, you can see that we converted our result from a numpy ndarray to a list. That is because the values of the inputs and outputs are restricted to native Python types such as int, float, bool, string, list and dict with elements of those types. More precisely anything that is json-serializable. Later, the platform might handle more complex input and output types such as numpy array or even pandas dataframe.


Since we updated our code, we must add and commit our changes with Git and push them to GitHub so that the platform can take them into account!

3.3.3 Step creation with Input and Output

Now, let’s create our step on the platform. Here, since we have an input and an output, our step is the combination of three elements: an input, an output and the Python function above. We will first declare the input and the output. Then, we will use the function sdk.create_step() as in Part 2 to create the whole step.

../_images/step3_1.png Declare Input and Output of our new step

To manage inputs and outputs of a step, the platform requires you to declare them using the ``Input`` and ``Output`` classes from the SDK.

For our Iris application, the inputs and outputs declaration would look like this:

from craft_ai_sdk.io import Input, Output

## Create input
prediction_input = Input(

prediction_output = Output(

Both objects have two main attributes:

  • The name of the Input or Output

    • For the input it corresponds to the name of an argument of your step’s function. In our case name="input_data" , as in the first line of function :

      def TrainPredictIris(**input_data:** dict):
    • For the output it must be a key in the dictionary returned by your step’s function. In our case, name="predictions"as in the last line of function :

      return {"**predictions**": final_result}
  • The data_type describing the type of data it can accept. It can be one of: string, number, boolean, json, array.

    • For the input we want a dictionary as we specified, which corresponds to data_type="json".

    • For the output, we return a list which corresponds to data_type="array".

Now, we have everything we need to create, as before, the step and the pipeline corresponding to our new TrainPredictIris() function. Create step

Now as in Part 2, it is time to create our step on the platform using the sdk.create_step() function, but this time we specify our input and output:

        description="This function creates a classifier model for iris and makes prediction on test data set",
        "included_folders": ["src"],
        "requirements_path": "requirements.txt",

This is exclatly like in part 2 except for two parameters :

  • inputs containing the list of Input objects we declared above (here, prediction_input).

  • outputs containing the list of Output objects we declared above (here, prediction_output).

When step creation is finished, you obtain an output describing your step (including its inputs and outputs) as below:

>> Step "irisclassifier-with-io" created
    - input_data (json)
    - predictions (json)
>> Steps creation succeeded
>> {'name': 'irisclassifier-with-io',
 'inputs': [{'name': 'input_data', 'data_type': 'json'}],
 'outputs': [{'name': 'predictions', 'data_type': 'json'}]}

Now that our step is created in the platform, we can embed it in a piepline and deploy it.

3.3.4 Create and deploy your pipeline Create pipeline

Let’s create our pipeline here with sdk.create_pipeline() as in Part 2:


You quickly obtain this output, which describes the pipeline, its step and its inputs and outputs:

>> Pipeline creation succeeded
>> {'pipeline_name': 'part-3-irisio-pipeline',
 'created_at': '2023-02-02T17:12:33.032Z',
 'steps': ['part-3-irisio-step'],
 'open_inputs': [{'input_name': 'input_data',
   'step_name': 'irisclassifier-with-io',
   'data_type': 'json'}],
 'open_outputs': [{'output_name': 'predictions',
   'step_name': 'irisclassifier-with-io',
   'data_type': 'json'}]}

🎉 You’ve created your first step & pipeline with an input and an output!

Let’s deploy this pipeline. Create endpoint

To do this, we need to create an endpoint, similarly to what we did in Part 2 with sdk.create_deployment().

The big difference here is that the pipeline, that will be triggered by the endpoint, expects to have data as input and will send data as output. Up until now, the endpoint was only a way for an external user of our app to trigger the execution of the associated pipeline, but now, the user will also use it to send input data to the pipeline and to retrieve the results.

By default, the endpoint will expect all the inputs of the pipeline to be transmitted via the endpoint (we will see in the next part some more advanced options). However, you have to specify explicitly the outputs you want to be returned by the endpoint to avoid data leakage to the end user.

You might also want to deliver the outputs with a different name that the one you specified in the output of your step. In our case, we want to return our only output predictions, the predictions of our iris model, to the user of the app, and serve it to the client with the name iris_species, which is more understandable for the end user.


In the craft platform, this is done by declaring some OutputDestination objects like so:

from craft_ai_sdk.io import OutputDestination

output_mapping = OutputDestination(

Now, we can create our endpoint as follows. Note that sdk.create_deployment() needs a new argument outputs_mapping, which is a list of the OutputDestination objects we need (there should be one mapping for each step output we want to expose):

endpoint = sdk.create_deployment(

🎉 Bingo! You have created an endpoint using one input and one output. Let’s check if it effectively accepts input data and returns predictions by calling it.

3.3.5 Call the endpoint with new input data Prepare input data

Now, our endpoint needs data as input (formatted as we said above ⬆️). Let’s prepare it, simply by choosing some of the rows of iris dataset we did not use when training our model:

## prepare input data for which we want predictions
import numpy as np
import pandas as pd
from sklearn import datasets

indices = np.random.permutation(150)
iris_X, iris_y = datasets.load_iris(return_X_y=True, as_frame=True)
iris_X_test = iris_X.loc[indices[90:120],:]

## convert our test dataframe into a dictionary as required
test_dict_data = iris_X_test.to_dict(orient="index")

Let’s check the data we created:


We get the following output:

>> 124: {'sepal length (cm)': 6.7,
'sepal width (cm)': 3.3,
'petal length (cm)': 5.7,
'petal width (cm)': 2.1
41: {'sepal length (cm)': 4.5

Finally, we need to encapsulate this dictionary in another dictionary whose key is "input_data" (the name of the input of our step, i.e. the name of the argument of our step’s function):

test_data = {
    "input_data": test_dict_data

In particular, when your step has several inputs, this dictionary should have as much keys as the number of inputs. the step function has arguments. Call our endpoint

Finally, we can test our endpoint with the data we’ve just prepared by calling it almost as in Part 2 and passing our dictionary test_data in the json argument of requests.post():

import requests

endpoint_URL = sdk.base_environment_url + "/endpoints/" + endpoint["name"]
headers = {"Authorization": "EndpointToken " + endpoint["endpoint_token"]}

request = requests.post(endpoint_URL, headers=headers, json=test_data)

Let’s check the HTTP status code of our request:

>> 200

Finally, our output can be obtained like this:


This gives the output we want (with the predictions!):

>> {'iris_species': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2]}

Moreover, you can check the logs of this execution as follows:

pipeline_executions = sdk.list_pipeline_executions(pipeline_name="pipe-irisclassifier-with-io")

logs = sdk.get_pipeline_execution_logs(pipeline_name="pipe-irisclassifier-with-io",

print('\n'.join(log["message"] for log in logs))

You can also find these logs on the UI, by clicking on the Executions tab of your environment, selecting your pipeline and choosing the last execution.

🎉 Congratulations! You have deployed an endpoint to which we can pass new data and get predictions.

Next step: Part 4: Deploy with the Data Store