Skip to content

CraftAI Deployment specification for Craft AI Advanced Assistants

This document defines the deployment requirements for creating advanced assistants in Craft AI chat interface. Deployments used by Assistants must adhere to this specification to ensure they can be properly integrated and function as expected within the Craft AI chat interface.

Overview – Integration Requirements

To connect a CraftAI deployment as an Advanced Assistant, it must:

  1. be exposed as an endpoint – Deployment must use execution_rule="endpoint"
  2. map inputs and outputs correctly – Use the exact endpoint names defined in this specification
  3. follow the input/output contract – Accept and return data in the formats specified below
  4. not have a name starting with craftgpt- – Pipelines and deployments with the prefix craftgpt- are reserved: do not create, update, delete, nor rely on such resources

The Craft AI chat interface will handle:

  • formatting and saving user messages and chat session history,
  • uploading file attachments to the object storage and providing their keys,
  • rendering responses with proper sources and citations,
  • triggering the record usage history internal deployment

TL;DR – Input and output format

To get a quick look at the required input and output configuration format, check the InferenceInputBody and InferenceOutputBody schemas in the Detailed Schema Reference section below, or by looking at shared/pydantic_classes/schemas.py in the codebase.

How the Craft AI chat interface interacts with your deployment

The chat application handles all the complexity of triggering your deployment. Here's the complete flow:

sequenceDiagram
    participant User
    participant CraftAI as Craft AI Chat Interface
    participant ObjectStorage as Object Storage
    participant Deployment as your deployment
    participant UsageHistory as craftgpt-record-usage-history<br/>deployment

    User->>CraftAI: Sends message + optional files

    alt User attached files
        CraftAI->>ObjectStorage: Upload files
        ObjectStorage-->>CraftAI: Return s3_keys
    end

    CraftAI->>CraftAI: Build input payload
    CraftAI->>Deployment: sdk.trigger_endpoint("your-deployment-name", input_body_file)

    Deployment-->>CraftAI: Execution complete with outputs

    alt Success (status=Succeeded AND error=null)
        CraftAI->>User: Display answer + sources
        CraftAI->>UsageHistory: sdk.trigger_endpoint("craftgpt-record-usage-history", metrics)
    else Error (status≠Succeeded OR error≠null)
        CraftAI->>User: Display error message
    end

Step-by-step breakdown:

  1. User sends a message and optionally attaches files.
  2. Craft AI chat interface reconstructs the chat session mainline history and optionally uploads any attached files to the object storage.
  3. Craft AI chat interface triggers your deployment using sdk.trigger_endpoint, passing the chat session data as a JSON file in the input_body_file input. It then polls for completion.
  4. Once execution is complete, Craft AI chat interface extracts the outputs.
  5. If error is null and status is "Succeeded", it displays and saved the answer and sources in the chat UI. It also triggers the craftgpt-record-usage-history internal deployment with the inference metrics for analytics.
  6. Otherwise, it displays a generic error message to the user.

What you need to implement

Your pipeline only needs to

  1. Read the input_body_file file in JSON format.

  2. Process the messages and generate a response.

  3. Return the outputs in the correct format.

The Craft AI chat interface will handle all the rest (file uploads, input formatting, output rendering, usage history recording, etc).

Required configuration for creating an assistant-compatible deployment

There is no constraint on the internal logic of your pipeline – you can use any LLM, tools, or custom code you want. However, the deployment configuration must follow the following rules to be compatible with the chat application.

Deployment Type

Your deployment must be configured as an endpoint to be accessible from the chat application.

sdk.create_deployment(
    ...
    execution_rule="endpoint",
)

Deployment Mode

There is no requirement on the mode of the deployment, but we recommend using low_latency to ensure the best responsiveness in the chat app.

Pipeline and deployment names

Pipelines and deployments with the prefix craftgpt- are reserved: do not create, update, delete, nor rely on such resources

Input provided by the Craft AI chat interface

When a user interacts with the chat application, it automatically constructs an input JSON file containing the chat session history and other relevant information. This file is passed to your deployment in a file input named input_body_file. Your pipeline must read and parse this file to access the input data.

Deployment inputs mapping requirements

The deployment must accept a file input named input_body_file. The chat application will provide the chat session data as a JSON file. The JSON format which will be provided in this file by the Craft AI chat interface, is described in the Input Format section below.

Endpoint input name Pipleine input data type Description
input_body_file FILE JSON file with chat session history and request parameters. The format of this JSON file is described in the Input Format section below. Your pipeline code must read and parse this file to access the input data.

Note

The endpoint may accept additional inputs, but the Craft AI chat interface will only provide this specific input. Your pipeline should be designed to work with this input file and ignore any other inputs.

Input mapping configuration example

sdk.create_deployment(
    ...
    inputs_mapping=[
        InputSource(
            "pipeline_input_name": "input_body_file", # name exposed by your pipeline code
            "endpoint_input_name": "input_body_file"  # name required
        ),
        InputSource(
            "pipeline_input_name": "temperature",
            "constant_value": 0.7,

    ],
)

Content and format of the input_body_file JSON file

The chat application automatically constructs the input JSON file with the following structure. You do not need to create this - it's provided to your deployment when triggered.

Field Type Required Description
messages array Yes chat session history including user messages and file attachments
Message Structure

Each message in the messages array contains:

  • role: One of "user" or "assistant"
  • content: Array of content blocks (text or file references)

Content blocks can be:

  • Text block: {"type": "text", "text": "message content"}
  • S3 file block: {"type": "s3", "s3_key": "path/to/file.pdf"} referencing a key stored in the instance object storage, when the user has uploaded a file in the chat.

For reference, see the InferenceInputBody schema in the Detailed Schema Reference section below.

User file handling

When creating an assistant in the Craft AI chat interface, you can disable or enable file uploads. If you enable file uploads, users will be able to attach files in the chat interface, and your deployment will receive the object storage file blocks in the input JSON. In this case, Your pipeline must be able to:

- Download files from the instance object storage, using the provided `s3_key`
- Process common file types (PDF, TXT, DOCX, images, etc.)
- Handle cases where files may be inaccessible or corrupted

Examples

Content of input_body_file examples

Chat session: Screenshot

Resulting content in input_body_file:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is the capital of France?"
        }
      ]
    }
  ]
}

Chat session: Screenshot

When users upload files in the chat (PDFs, images, documents), the Craft AI chat interface uploads them to the object storage and provides the key:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Analyze this document and summarize the key points"
        },
        {
          "type": "s3",
          "s3_key": "craftgpt-ui-files/public/21d97f90-792d-4f6e-ad92-3b2f928069b7"
        }
      ]
    }
  ]
}

Chat session: Screenshot The messages array includes the full conversation history, allowing your pipeline to generate context-aware responses:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is the capital of France?"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "The capital of France is Paris."
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What about its population?"
        }
      ]
    }
  ]
}

Output returned to the Craft AI chat interface

Deployment outputs mapping requirements

The deployment must provide these outputs for proper display in the chat UI:

Endpoint output name Pipeline output data type Required Purpose
answer STRING Yes Main response displayed to the user. Displayed as the assistant message in the chat session.
sources JSON No List of sources used to generate the answer, with their extracts and metadata. Moreover, sources marked as cited, are displayed as clickable tags in the chat UI.
consumed_tokens JSON No Consumed tokens by each model (llm, embeddings, ...) for debugging and analytics. Not displayed in the chat UI but used to store usage metrics when the record_usage_history internal deployment is triggered by the Craft AI chat interface after each response.
specific_info JSON No Specific information about the inference, such as duration and other metrics, for debugging and analytics. Not displayed in the chat UI.
error JSON No Error information (must be None or not provided on success). It is only used for internal logging and is not displayed in the chat UI. If the error field is not null, the chat app will display the error message instead of the answer.

Outputs mapping configuration example

sdk.create_deployment(
    ...
    outputs_mapping=[
        OutputDestination(
            "pipeline_output_name": "answer", # name exposed by your pipeline code
            "endpoint_output_name": "answer"  # name required
        ),
        OutputDestination(
            "pipeline_output_name": "sources",
            "endpoint_output_name": "sources"
        ),
        OutputDestination(
            "pipeline_output_name": "consumed_tokens",
            "endpoint_output_name": "consumed_tokens"
        ),
        OutputDestination(
            "pipeline_output_name": "specific_info",
            "endpoint_output_name": "specific_info"
        ),
    ]
)

Output Requirements

  • answer is mandatory - the chat app expects this field.
  • sources is optional but can be used to enhance the user experience with citations and source transparency.
  • consumed_tokens is optional but highly recommended for internal monitoring.
  • specific_info is optional.
  • If error is non-null, the chat app will display a generic error message to the user.
  • The endpoint may provide additional outputs, but the chat app will only use these specific outputs for rendering the response. This can be useful to provide more context for internal debugging.

Outputs format details

Answer

The main response text that will be displayed to the user in the chat interface.

🎨 Shown as the assistant's message in the chat session thread.

Sources

Along with the main answer, your deployment can return a list of sources that were used to generate the response. Each source can be a web page, a document, or any other reference material. The chat app will render these sources as clickable footnotes in the chat UI, allowing users to understand what information was used to generate the answer.

Each source must include a source_type field to indicate whether it's a "web_source" or a "rag_source" (document source). Depending on the type, the required metadata fields differ:

  • Web source: Includes link, title, extracts, source_type, citation_number and metadata
  • Document source: Includes key, filename, extracts, source_type, citation_number and metadata, where the key reference the location of the file in the environment Data Store.

Data Store folders naming

Data Store folders with the prefix craftgpt- are reserved especially craftgpt-config. Do not use it.

The extracts field is a list of relevant chunks from the source, each with its own content and relevancy_score. The citation_number is used to link sources to specific parts of the answer when citing them directly in the text (cf. cite sources in the answer section below).

Source uniqueness assumption

For proper display, the sources should be unique based on their link for web sources, and key for document sources. In other words, if multiple extracts are associated to a single source, you should group them under the same source entry, and provide multiple extracts in the extracts list of this source. This will ensure that sources are displayed correctly in the chat UI.

sources = [
    {
        "title": "France - Wikipedia",
        "link": "https://en.wikipedia.org/wiki/France",
        "source_type": "web_source",
        "citation_number": 1,
        "extracts": [
            {
                "content": "Paris is the capital and most populous city of France.",
                "relevancy_score": 0.56,
            },
            {
                "content": "The city has been a major center of finance, diplomacy, commerce, fashion, science, and the arts since the 17th century.",
                "relevancy_score": 0.85,
            },
        ],
        "metadata": {"type": "fiches", "statut": "publié", "tags": ["tag1", "tag2"]},
    },
]

sources = [
    {
        "title": "France - Wikipedia",
        "link": "https://en.wikipedia.org/wiki/France",
        "source_type": "web_source",
        "citation_number": 1,
        "extracts": [
            {
                "content": "Paris is the capital and most populous city of France.",
                "relevancy_score": 0.56,
            },
        ],
    },
    {
        "title": "France - Wikipedia",
        "link": "https://en.wikipedia.org/wiki/France",
        "source_type": "web_source",
        "citation_number": 1,
        "extracts": [
            {
                "content": "The city has been a major center of finance, diplomacy, commerce, fashion, science, and the arts since the 17th century.",
                "relevancy_score": 0.85,
            },
        ],
    },
]
In this counter-example, the same source (same link) is duplicated with different extracts. This may lead to incorrect rendering in the chat UI, as the app expects unique sources based on their link or key.

You may include additional metadata fields in the sources for internal debugging, but they will be ignored by the Craft AI chat interface.

🎨 Sources are rendered as clickable footnotes and a side panel in the chat UI. Web sources link to the original webpage, while document sources may allow users to view or download the referenced file. Note that the sources will be sorted according to the maximum relevancy_score of their extracts.

See the Source schema in the Detailed Schema Reference section below for more details on the expected format of sources.

Inference metrics

Your deployment should also return metrics about the inference process, such as latency and token counts. These metrics are not displayed to the user but are used internally by the Craft AI chat interface to record usage history. It is recommended to provide the following metrics:

consumed_tokens = [
    {
        "model":"GPT5-2",
        "type":"text-generation",
        "input-tokens":10000,
        "output-tokens":27000
    },{
        "model":"Voyage",
        "type":"embedding",
        "input-tokens":40000,
        "output-tokens":12000
    },
]
specific_info = {
    "inference_duration_in_s": 1.254
}

For reference, see the InferenceOutputBody schema in the Detailed Schema Reference section below.

Cite sources in the answer

Sources can be directly cited in the answer text, in order to link specific parts of the answer to the sources that support them. To cite a source

  • Label each source in the sources list with a unique citation_number (e.g., 1, 2, 3, etc.).
  • In the answer text, include citations by referencing the corresponding citation_number in square brackets (e.g., [1], [2], or [1, 3] etc.) where you want to indicate that a particular part of the answer is supported by a specific source.

The Craft AI chat interface will automatically renders these citations as clickable links, direclty inside the assistant response.

Examples

Example outputs for different response types

answer = "The capital of France is Paris."
sources = None
error = None
consumed_tokens = {
    "model":"GPT5-2",
    "type":"text-generation",
    "input-tokens":10000,
    "output-tokens":27000
}
specific_info = {
    "inference_duration_in_s": 1.254
}

Displayed as: Screenshot

answer = "The capital of France is Paris."
sources = [
    {
        "title": "France - Wikipedia",
        "link": "https://en.wikipedia.org/wiki/France",
        "source_type": "web_source",
        "extracts": [
            {
                "content": "Paris is the capital and most populous city of France.",
                "relevancy_score": 0.95,
            },
            {
                "content": "The city has been a major center of finance, diplomacy, commerce, fashion, science, and the arts since the 17th century.",
                "relevancy_score": 0.85,
            },
        ],
    },
    {
        "filename": "Histoire de Paris",
        "type": "pdf",
        "key": "documents/paris_au_xxe_siecle.pdf",
        "source_type": "rag_source",
        "extracts": [
            {
                "content": "Paris is the capital city of France, located in the north-central part of the country.",
                "relevancy_score": 0.9,
            }
        ],
        "metadata": {"type": "pdf", "statut": "published", "tags": ["tag1", "tag2"]},
    },
]

Displayed as: Screenshot When expanding the "steps": Screenshot When clicking on the sources tag, it opens a side panel with the sources details: Screenshot

Note that the display is sources-oriented and not extracts-oriented. Indeed sources are sorted by the maximum relevancy score of their extracts, and extracts are grouped under their respective sources, sorted by their relevancy score.

answer = "The capital of France is Paris [1]. It has been the capital since 987 AD [1, 3]."
sources = [
    {
        "title": "France - Wikipedia",
        "link": "https://en.wikipedia.org/wiki/France",
        "source_type": "web_source",
        "citation_number": 1,
        "extracts": [
            {
                "content": "Paris is the capital and most populous city of France.",
                "relevancy_score": 0.95,
            },
            {
                "content": "The city has been a major center of finance, diplomacy, commerce, fashion, science, and the arts since the 17th century.",
                "relevancy_score": 0.85,
            },
        ],
    },
    {
        "title": "Relationships between the United States and France - Wikipedia",
        "link": "https://en.wikipedia.org/wiki/France%E2%80%93United_States_relations",
        "source_type": "web_source",
        "citation_number": 2,
        "extracts": [
            {
                "content": "The United States and France have a long history of diplomatic, cultural, and economic relations.",
                "relevancy_score": 0.5,
            }
        ],
    },
    {
        "filename": "Histoire de Paris",
        "type": "pdf",
        "key": "documents/paris_au_xxe_siecle.pdf",
        "source_type": "rag_source",
        "citation_number": 3,
        "extracts": [
            {
                "content": "Paris is the capital city of France, located in the north-central part of the country.",
                "relevancy_score": 0.9,
            }
        ],
    },
]

Displayed as:

Screenshot

Source side panel:

Screenshot

Their is a clear separation between cited sources (1 and 3) which are directly referenced in the answer text, and non-cited sources (2) which are not directly referenced but still relevant to the answer and displayed in the sources panel.

Integration Checklist

Before associating your deployment to a Craft AI assistant, make sure you have completed the following checklist:

  • Deployment uses execution_rule="endpoint"
  • Input mapping includes a file input mapped to input_body_file
  • Output mapping includes a string output mapped to answer
  • Tested with simple text messages
  • Tested with file attachments (if applicable)
  • Sources include proper citation_number and source_type (if using sources)

Reserved prefixes:

  • Deployment does not have the prefix craftgpt-
  • No Data Store folder with the prefix craftgpt- have been added
  • No environment variable with the prefix CRAFTGPT_ have been added

Custom Healthcheck deployment

The public API route <instance_name>.craft.ai/api/health-craftgpt-custom can be used to check the health of your application. The route returns a 200 OK response if all deployments on the studio are healthy. It returns a 500 Internal Server Error otherwise.

In order to add specific health-checks, deploy a custom healthcheck deployment.

This deployment has to follow this schema:

  • Input: No mapping to the endpoint
  • Output: {"status": string, "details": json}, status being the status (i.e. OK, any other status will be considered a failing status) and details being a json containing additionnal information.

After deploying the custom healthcheck, contact the Craft AI so that the deployment is triggered by the public API route.

Detailed Schema Reference

Input format

InferenceInputBody pydantic-model

Bases: BaseModel

Input payload for CraftAI LLM inference endpoint.

This schema defines the structure of the JSON payload provided in the input_body_file when the deployment is triggered by the CraftGPT chat app.

Fields:

Source code in shared/pydantic_classes/schemas.py
class InferenceInputBody(BaseModel):
    """
    Input payload for CraftAI LLM inference endpoint.

    This schema defines the structure of the JSON payload
    provided in the `input_body_file` when the deployment is triggered by the CraftGPT chat app.
    """

    messages: list[CraftAIMessage] = Field(
        description=(
            "Conversation history as a list of messages, alternating between user and "
            "assistant roles. Cf. CraftAIMessage schema for message structure."
        )
    )
messages pydantic-field

Conversation history as a list of messages, alternating between user and assistant roles. Cf. CraftAIMessage schema for message structure.


CraftAIMessage pydantic-model

Bases: BaseModel

Each message in the messages array of the input payload contains a role and an array of content blocks.

Fields:

Source code in shared/pydantic_classes/schemas.py
class CraftAIMessage(BaseModel):
    """
    Each message in the `messages` array of the input payload contains a role and an array
    of content blocks.
    """

    role: Literal["user", "assistant", "system"] = Field(
        description="The role of the message sender"
    )
    content: list[TextBlock | S3Block] = Field(
        description="List of content blocks (text or file references)"
    )
content pydantic-field

List of content blocks (text or file references)

role pydantic-field

The role of the message sender


TextBlock pydantic-model

Bases: BaseModel

A text content block in a message.

Fields:

Source code in shared/pydantic_classes/schemas.py
class TextBlock(BaseModel):
    """A text content block in a message."""

    type: Literal["text"] = Field(description="The block type identifier")
    text: str = Field(description="The actual text content")
text pydantic-field

The actual text content

type pydantic-field

The block type identifier


S3Block pydantic-model

Bases: BaseModel

An S3 file reference block in a message.

Fields:

Source code in shared/pydantic_classes/schemas.py
class S3Block(BaseModel):
    """An S3 file reference block in a message."""

    type: Literal["s3"] = Field(description="The block type identifier")
    s3_key: str = Field(description="The S3 object key where the file is stored")
s3_key pydantic-field

The S3 object key where the file is stored

type pydantic-field

The block type identifier


Output Payload Schema

InferenceOutputBody pydantic-model

Bases: BaseModel

Output payload from CraftAI LLM inference endpoint.

This schema defines the structure of the response returned by the CraftAI deployment after processing an inference request. Each field is an output of the endpoint.

Fields:

Source code in shared/pydantic_classes/schemas.py
class InferenceOutputBody(BaseModel):
    """
    Output payload from CraftAI LLM inference endpoint.

    This schema defines the structure of the response returned by the CraftAI
    deployment after processing an inference request.
    Each field is an output of the endpoint.
    """

    answer: str = Field(description="The generated response text from the LLM. Ouput type: STRING")
    sources: list[WebSource | DocumentSource] = Field(
        default_factory=list,
        description=(
            "List of sources cited in the answer (web pages or documents). Output type: JSON"
        ),
    )
    error: dict[str, Any] | None = Field(
        default=None,
        description=("Error field - should be None for successful responses. Output type: JSON"),
    )
    consumed_tokens: list[ConsumedTokensModel] | None = Field(
        default=None, description="Optional token consumption details. Output type: JSON"
    )
    specific_info: dict[str, Any] | None = Field(
        default=None,
        description=(
            "Optional field for any additional information specific to the inference. "
            "Output type: JSON"
        ),
    )
answer pydantic-field

The generated response text from the LLM. Ouput type: STRING

consumed_tokens = None pydantic-field

Optional token consumption details. Output type: JSON

error = None pydantic-field

Error field - should be None for successful responses. Output type: JSON

sources pydantic-field

List of sources cited in the answer (web pages or documents). Output type: JSON

specific_info = None pydantic-field

Optional field for any additional information specific to the inference. Output type: JSON


WebSource pydantic-model

Bases: BaseModel

A web source citation with extracts.

Fields:

Source code in shared/pydantic_classes/schemas.py
class WebSource(BaseModel):
    """A web source citation with extracts."""

    title: str = Field(description="Title of the web page")
    link: str = Field(description="URL of the web page")
    source_type: Literal["web_source"] = Field(description="Type identifier for web sources")
    extracts: list[SourceExtract] = Field(description="Relevant extracts from the web page")
    citation_number: int | None = Field(
        default=None, description="Citation number for referencing in the answer"
    )
    metadata: dict[str, str | list[str]] = Field(
        default_factory=dict, description="Optional metadata about the web source"
    )
citation_number = None pydantic-field

Citation number for referencing in the answer

extracts pydantic-field

Relevant extracts from the web page

URL of the web page

metadata pydantic-field

Optional metadata about the web source

source_type pydantic-field

Type identifier for web sources

title pydantic-field

Title of the web page


DocumentSource pydantic-model

Bases: BaseModel

A document source citation with extracts.

Fields:

Source code in shared/pydantic_classes/schemas.py
class DocumentSource(BaseModel):
    """A document source citation with extracts."""

    filename: str = Field(description="Name of the document file")
    key: str = Field(description="S3 key or identifier for the document")
    source_type: Literal["rag_source"] = Field(
        description="Type identifier for RAG document sources"
    )
    extracts: list[SourceExtract] = Field(description="Relevant extracts from the document")
    citation_number: int | None = Field(
        default=None, description="Citation number for referencing in the answer"
    )
    metadata: dict[str, str | list[str]] = Field(
        default_factory=dict, description="Optional metadata about the document source"
    )
citation_number = None pydantic-field

Citation number for referencing in the answer

extracts pydantic-field

Relevant extracts from the document

filename pydantic-field

Name of the document file

key pydantic-field

S3 key or identifier for the document

metadata pydantic-field

Optional metadata about the document source

source_type pydantic-field

Type identifier for RAG document sources


SourceExtract pydantic-model

Bases: BaseModel

An extract from a source document with relevancy score.

Fields:

Source code in shared/pydantic_classes/schemas.py
class SourceExtract(BaseModel):
    """An extract from a source document with relevancy score."""

    content: str = Field(description="The extracted text content")
    relevancy_score: float = Field(
        description="Score indicating how relevant this extract is to the query"
    )
content pydantic-field

The extracted text content

relevancy_score pydantic-field

Score indicating how relevant this extract is to the query