Image to Text Node - Automate OCR and Interpretation¶

The Image to Text node in the tool builder processes an uploaded image and generates text responses based on the user’s prompt. It can provide descriptions, answer image-related questions, or extract text from the image. This node leverages external LLM models like OpenAI and Anthropic for image processing and text generation.

A sample use case involves an insurance company assessing vehicle damage to estimate compensation and verify customer claims. The Image to Text node processes the uploaded image of the damaged vehicle, analyzes the extent of the damage, and helps determine repair costs. The File Upload API generates the file source (URL) at the tool endpoint, which is required as input for the node. Any publicly accessible URLs (public repositories) can also be used for the File Source.

Important Considerations

The user can upload only one file at a time for processing.
Except for image input handling, the OCR node functions like the existing AI node.
Sending images and related settings are handled by the File Upload API.
Image input preprocessing is supported in the following formats:

Binary, base64-encoded for Anthropic models.
Both binary, base64-encoded, and image URLs for OpenAI models.

Steps to Add and Configure the Node¶

To add and configure the node, follow the steps below:

Note

Before proceeding, you must add an external LLM to your account using either Easy Integration or Custom API integration

Log in to your account and click Tools under Agent Platform Modules.
Click the Tools tab, and select the tool to which you want to add the node. The Tool flow page is displayed.
Click Go to flow to edit the in-development version of the flow.
In the flow builder, click the + icon for Image to Text under AI in the Assets panel. Alternatively, drag the node from the panel onto the canvas. You can also click AI in the pop-up menu and click Image to text.
Click the added node to open its properties dialog box. The General Settings for the node are displayed.
Enter or select the following General Settings:

Node Name: Enter an appropriate name for the node. For example, “InsuranceEvaluation.”
Select a model from the list of configured models.

Note

Only the OpenAI (gpt-4o and gpt-4o-mini) and Anthropic (Claude Sonnet Vision) models are currently supported.

Provide the File URL of the public repository where your image file exists or is returned by the Upload File API at the tool endpoint.

Note

Only PNG, JPEG, and JPG file formats are supported.
The file source url must be valid for the node to function properly.

System Prompt: System prompts guide the model’s behavior and response style. Enter a system prompt to define its role for your use case. For example: "You are a vehicle insurance assistant that analyzes uploaded vehicle images to assess damage and estimate repair costs in USD."
Prompt: User prompts define specific questions or requests for the model. Provide clear instructions for the model to follow, using context variables for dynamic inputs in the syntax: {{context.variable_name}}. Example: "Check the image provided for the damaged parts in the car and select what parts are affected from the list below - {{context.parts_list}}."

Response JSON schema: Define a JSON schema for structured responses. This step is optional and depends on the selected model.
You can define a JSON schema to structure the model's response if the chosen model supports the response format. By default, if no schema is provided, the model will respond with plain text. Supported JSON schema types include: String, Boolean, Number, Integer, Object, Array, Enum, and anyOf. Ensure the schema follows the standard outlined here: Defining JSON schema. If the schema is invalid or mismatched, errors will be logged, and you must resolve them before proceeding.
For more information about how the model parses the response and separates keys from the content body, see: Structured Response Parsing and Context Sharing in Workflows.

Click the Connections icon and select the Go to Node for success and failure conditions.

On Success > Go to Node: After the current node is successfully executed, go to a selected node in the flow to execute next, such as an AI node, Function node, Condition node, API node, or End node.
On Failure > Go to Node: If the execution of the current node fails, go to the End node to display any custom error message from the Image to Text node.

Finally, test the flow and fix any issues found. Click the Run Flow button at the top-right corner of the flow builder and follow the onscreen instructions.

Standard Error

When the Model is not selected, the prompt details are not provided, or both, the following error message is displayed: “Proper data needs to be provided in the LLM node”.

Send Feedback