Responses API Reference¶

The Responses API provides an OpenAI-compatible interface for agentic workflows with built-in support for multi-turn conversations, tool execution, and MCP (Model Context Protocol) integration.

Overview¶

Purpose vs Chat Completions API¶

The Responses API differs from the Chat Completions API in several key ways:

Feature	Chat Completions	Responses API
Conversation State	Stateless	Server-managed state
Tool Execution	Client-side	Server-side with MCP support
Multi-turn	Manual	Automatic with `previous_response_id`
Persistence	None	Built-in response/conversation storage
Agentic Workflows	Manual orchestration	Built-in tool loop execution

Agentic Workflow Concepts¶

The Responses API enables agentic workflows where the model can:

Reason about tasks using optional reasoning parameters
Plan tool usage with automatic tool selection
Execute tools via MCP servers or function calling
Iterate through multiple tool calls in a single request
Persist conversation history for multi-session workflows

Base URL¶

http://localhost:30000/v1

Create Response¶

Create a new response with optional tool execution and conversation management.

POST /v1/responses

Request Body¶

Field	Type	Required	Description
`model`	string	Yes	Model identifier
`input`	string or array	Yes	Input text or array of input items
`instructions`	string	No	System instructions for the model
`max_output_tokens`	integer	No	Maximum tokens to generate
`max_tool_calls`	integer	No	Maximum number of tool calls per request
`temperature`	number	No	Sampling temperature (0-2), default: 1.0
`top_p`	number	No	Nucleus sampling parameter (0-1)
`stream`	boolean	No	Enable streaming responses
`store`	boolean	No	Store response for later retrieval, default: true
`tools`	array	No	Available tools (function, mcp, web_search_preview, code_interpreter)
`tool_choice`	string/object	No	Tool selection behavior: `auto`, `none`, `required`, or specific tool
`parallel_tool_calls`	boolean	No	Allow parallel tool execution, default: true
`previous_response_id`	string	No	Continue from a previous response
`conversation`	string	No	Conversation ID (mutually exclusive with `previous_response_id`)
`reasoning`	object	No	Reasoning configuration
`text`	object	No	Text format for structured outputs
`metadata`	object	No	Custom metadata (max 16 properties)
`user`	string	No	End-user identifier
`background`	boolean	No	Run request in background (not with streaming)

Input Formats¶

Simple text input:

{
  "input": "What is the capital of France?"
}

Structured input items:

{
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "Hello!"}]
    }
  ]
}

Tool Configuration¶

Function tools:

{
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string"}
        },
        "required": ["location"]
      }
    }
  ]
}

MCP tools:

{
  "tools": [
    {
      "type": "mcp",
      "server_url": "http://localhost:8080/mcp",
      "server_label": "my-mcp-server",
      "server_description": "My MCP server for data access",
      "require_approval": "never",
      "allowed_tools": ["query_database", "search_files"]
    }
  ]
}

Reasoning Configuration¶

{
  "reasoning": {
    "effort": "medium",
    "summary": "auto"
  }
}

Effort levels: minimal, low, medium, high

Text Format (Structured Outputs)¶

{
  "text": {
    "format": {
      "type": "json_schema",
      "name": "user_info",
      "schema": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "age": {"type": "integer"}
        }
      },
      "strict": true
    }
  }
}

Example Request¶

curl http://localhost:30000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "input": "Search for the latest news about AI",
    "instructions": "Be concise and factual",
    "max_output_tokens": 500,
    "temperature": 0.7,
    "tools": [
      {
        "type": "mcp",
        "server_url": "http://localhost:8080/mcp",
        "server_label": "search"
      }
    ],
    "tool_choice": "auto"
  }'

Response¶

{
  "id": "resp_abc123def456",
  "object": "response",
  "created_at": 1705312345,
  "status": "completed",
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "output": [
    {
      "type": "mcp_list_tools",
      "id": "mcp_list_001",
      "server_label": "search",
      "tools": [
        {
          "name": "web_search",
          "description": "Search the web",
          "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}}
        }
      ]
    },
    {
      "type": "mcp_call",
      "id": "mcp_call_001",
      "status": "completed",
      "name": "web_search",
      "arguments": "{\"query\": \"latest AI news\"}",
      "output": "{\"results\": [...]}",
      "server_label": "search"
    },
    {
      "type": "message",
      "id": "msg_001",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Based on my search, here are the latest AI developments..."
        }
      ],
      "status": "completed"
    }
  ],
  "usage": {
    "input_tokens": 50,
    "output_tokens": 150,
    "total_tokens": 200
  },
  "tools": [
    {
      "type": "mcp",
      "server_label": "search",
      "server_url": "http://localhost:8080/mcp"
    }
  ],
  "tool_choice": "auto",
  "parallel_tool_calls": true,
  "store": true
}

Streaming Response¶

With "stream": true, responses are sent as Server-Sent Events:

curl http://localhost:30000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "input": "Hello!",
    "stream": true
  }'

Event sequence:

event: response.created
data: {"type": "response.created", "sequence_number": 0, "response": {...}}

event: response.in_progress
data: {"type": "response.in_progress", "sequence_number": 1, "response": {...}}

event: response.output_item.added
data: {"type": "response.output_item.added", "sequence_number": 2, "output_index": 0, "item": {...}}

event: response.content_part.added
data: {"type": "response.content_part.added", "sequence_number": 3, "output_index": 0, "content_index": 0, "part": {...}}

event: response.output_text.delta
data: {"type": "response.output_text.delta", "sequence_number": 4, "output_index": 0, "content_index": 0, "delta": "Hello"}

event: response.output_text.done
data: {"type": "response.output_text.done", "sequence_number": 5, "output_index": 0, "content_index": 0, "text": "Hello! How can I help you?"}

event: response.output_item.done
data: {"type": "response.output_item.done", "sequence_number": 6, "output_index": 0, "item": {...}}

event: response.completed
data: {"type": "response.completed", "sequence_number": 7, "response": {...}}

data: [DONE]

MCP-specific streaming events:

event: response.mcp_list_tools.in_progress
data: {"type": "response.mcp_list_tools.in_progress", "output_index": 0, "item_id": "mcp_list_001"}

event: response.mcp_list_tools.completed
data: {"type": "response.mcp_list_tools.completed", "output_index": 0, "item_id": "mcp_list_001"}

event: response.mcp_call.in_progress
data: {"type": "response.mcp_call.in_progress", "output_index": 1, "item_id": "mcp_call_001"}

event: response.mcp_call_arguments.delta
data: {"type": "response.mcp_call_arguments.delta", "output_index": 1, "item_id": "mcp_call_001", "delta": "{\"query\": \"..."}

event: response.mcp_call_arguments.done
data: {"type": "response.mcp_call_arguments.done", "output_index": 1, "item_id": "mcp_call_001", "arguments": "{\"query\": \"...\"}"}

event: response.output_item.done
data: {"type": "response.output_item.done", "output_index": 1, "item": {"type": "mcp_call", "output": "...", ...}}

Get Response¶

Retrieve a previously stored response by ID.

GET /v1/responses/{response_id}

Path Parameters¶

Parameter	Type	Description
`response_id`	string	The response ID (e.g., `resp_abc123`)

Query Parameters¶

Parameter	Type	Description
`include`	array	Additional fields to include

Example Request¶

curl http://localhost:30000/v1/responses/resp_abc123def456

Response¶

Returns the full response object as shown in the Create Response section.

Cancel Response¶

POST /v1/responses/{response_id}/cancel

Attempts to cancel an in-progress response. Behavior depends on the connection mode:

gRPC workers: Background mode is not supported. This endpoint always returns a 400 Bad Request error with code cancellation_not_supported.
HTTP workers: The request is proxied to the backend worker. Whether cancellation succeeds depends on backend support.

Path Parameters¶

Parameter	Type	Description
`response_id`	string	The response ID to cancel

Example Request¶

curl -X POST http://localhost:30000/v1/responses/resp_abc123def456/cancel

Response¶

HTTP workers: Returns the response object from the backend.

gRPC workers: Returns a 400 Bad Request error:

{
  "error": {
    "message": "Background mode is not supported. Synchronous and streaming responses cannot be cancelled.",
    "type": "Bad Request",
    "code": "cancellation_not_supported"
  }
}

Delete Response¶

Delete a stored response.

DELETE /v1/responses/{response_id}

Path Parameters¶

Parameter	Type	Description
`response_id`	string	The response ID to delete

Example Request¶

curl -X DELETE http://localhost:30000/v1/responses/resp_abc123def456

Response¶

{
  "id": "resp_abc123def456",
  "object": "response.deleted",
  "deleted": true
}

List Response Input Items¶

List the input items that were sent with a response.

GET /v1/responses/{response_id}/input_items

Path Parameters¶

Parameter	Type	Description
`response_id`	string	The response ID

Example Request¶

curl http://localhost:30000/v1/responses/resp_abc123def456/input_items

Response¶

{
  "object": "list",
  "data": [
    {
      "id": "msg_input_001",
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "Hello!"}]
    }
  ],
  "first_id": "msg_input_001",
  "last_id": "msg_input_001",
  "has_more": false
}

Conversation Management¶

Conversations provide persistent storage for multi-turn interactions, enabling chat history to be maintained across multiple requests.

Create Conversation¶

POST /v1/conversations

Request Body¶

Field	Type	Required	Description
`metadata`	object	No	Custom metadata (max 16 properties)

Example Request¶

curl http://localhost:30000/v1/conversations \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {
      "project": "customer-support",
      "user_id": "user_123"
    }
  }'

Response¶

{
  "id": "conv_abc123def456",
  "object": "conversation",
  "created_at": 1705312345,
  "metadata": {
    "project": "customer-support",
    "user_id": "user_123"
  }
}

Get Conversation¶

GET /v1/conversations/{conversation_id}

Example Request¶

curl http://localhost:30000/v1/conversations/conv_abc123def456

Response¶

{
  "id": "conv_abc123def456",
  "object": "conversation",
  "created_at": 1705312345,
  "metadata": {
    "project": "customer-support"
  }
}

Update Conversation¶

Update conversation metadata. Uses merge semantics - set a key to null to delete it.

POST /v1/conversations/{conversation_id}

Request Body¶

Field	Type	Description
`metadata`	object	Metadata to merge (null values delete keys)

Example Request¶

curl http://localhost:30000/v1/conversations/conv_abc123def456 \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {
      "status": "resolved",
      "project": null
    }
  }'

Response¶

Returns the updated conversation object.

Delete Conversation¶

DELETE /v1/conversations/{conversation_id}

Example Request¶

curl -X DELETE http://localhost:30000/v1/conversations/conv_abc123def456

Response¶

{
  "id": "conv_abc123def456",
  "object": "conversation.deleted",
  "deleted": true
}

List Conversation Items¶

GET /v1/conversations/{conversation_id}/items

Query Parameters¶

Parameter	Type	Default	Description
`limit`	integer	100	Maximum items to return
`order`	string	`desc`	Sort order: `asc` or `desc`
`after`	string	-	Cursor for pagination

Example Request¶

curl "http://localhost:30000/v1/conversations/conv_abc123/items?limit=20&order=asc"

Response¶

{
  "object": "list",
  "data": [
    {
      "id": "item_001",
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "Hello"}],
      "status": "completed",
      "created_at": 1705312345
    },
    {
      "id": "item_002",
      "type": "message",
      "role": "assistant",
      "content": [{"type": "output_text", "text": "Hi there!"}],
      "status": "completed",
      "created_at": 1705312346
    }
  ],
  "first_id": "item_001",
  "last_id": "item_002",
  "has_more": false
}

Create Conversation Items¶

Add items to a conversation. Maximum 20 items per request.

POST /v1/conversations/{conversation_id}/items

Request Body¶

Field	Type	Required	Description
`items`	array	Yes	Array of items to add (max 20)

Supported Item Types¶

message - User or assistant messages
reasoning - Model reasoning content
mcp_list_tools - MCP tool listing
mcp_call - MCP tool invocation
item_reference - Reference to an existing item
function_call - Function tool call
function_call_output - Function call result

Example Request¶

curl http://localhost:30000/v1/conversations/conv_abc123/items \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {
        "type": "message",
        "role": "user",
        "content": [{"type": "input_text", "text": "What is 2+2?"}]
      },
      {
        "type": "message",
        "role": "assistant",
        "content": [{"type": "output_text", "text": "2+2 equals 4."}]
      }
    ]
  }'

Response¶

{
  "object": "list",
  "data": [
    {
      "id": "item_003",
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "What is 2+2?"}],
      "status": "completed"
    },
    {
      "id": "item_004",
      "type": "message",
      "role": "assistant",
      "content": [{"type": "output_text", "text": "2+2 equals 4."}],
      "status": "completed"
    }
  ],
  "first_id": "item_003",
  "last_id": "item_004",
  "has_more": false
}

Get Conversation Item¶

GET /v1/conversations/{conversation_id}/items/{item_id}

Example Request¶

curl http://localhost:30000/v1/conversations/conv_abc123/items/item_001

Response¶

Returns the item object.

Delete Conversation Item¶

Remove an item from a conversation. This performs a soft delete - the item may still exist if referenced by other conversations.

DELETE /v1/conversations/{conversation_id}/items/{item_id}

Example Request¶

curl -X DELETE http://localhost:30000/v1/conversations/conv_abc123/items/item_001

Response¶

Returns the updated conversation object.

Examples¶

Simple Agentic Workflow¶

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:30000/v1",
    api_key="your-api-key"
)

# Create a response with MCP tools
response = client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="Search for the weather in San Francisco and summarize it",
    tools=[
        {
            "type": "mcp",
            "server_url": "http://localhost:8080/mcp",
            "server_label": "weather-service"
        }
    ],
    tool_choice="auto"
)

# The response includes tool calls and final answer
for output in response.output:
    if output.type == "mcp_call":
        print(f"Tool called: {output.name}")
        print(f"Result: {output.output}")
    elif output.type == "message":
        for content in output.content:
            if content.type == "output_text":
                print(f"Answer: {content.text}")

Multi-turn Conversation with Tools¶

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:30000/v1",
    api_key="your-api-key"
)

# Create a conversation
conversation = client.conversations.create(
    metadata={"session": "support-123"}
)

# First turn
response1 = client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="I need help with my order #12345",
    conversation=conversation.id,
    tools=[
        {
            "type": "mcp",
            "server_url": "http://localhost:8080/mcp",
            "server_label": "order-service"
        }
    ]
)
print(f"First response: {response1.id}")

# Second turn - continues the conversation
response2 = client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="Can you also check if there are any discounts available?",
    conversation=conversation.id,
    tools=[
        {
            "type": "mcp",
            "server_url": "http://localhost:8080/mcp",
            "server_label": "order-service"
        }
    ]
)
print(f"Second response: {response2.id}")

# List conversation history
items = client.conversations.items.list(conversation.id)
for item in items.data:
    print(f"{item.role}: {item.content}")

Streaming Response Handling¶

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:30000/v1",
    api_key="your-api-key"
)

# Stream a response
with client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="Explain quantum computing",
    stream=True
) as stream:
    for event in stream:
        if event.type == "response.output_text.delta":
            print(event.delta, end="", flush=True)
        elif event.type == "response.mcp_call.in_progress":
            print(f"\n[Calling tool: {event.item_id}]")
        elif event.type == "response.completed":
            print(f"\n\nTokens used: {event.response.usage.total_tokens}")

Using Previous Response ID¶

# Alternative to conversations - chain responses directly
response1 = client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="What are the main programming paradigms?",
    store=True
)

# Continue from previous response
response2 = client.responses.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    input="Can you elaborate on functional programming?",
    previous_response_id=response1.id,
    store=True
)

Error Responses¶

Error Format¶

{
  "error": {
    "message": "Error description",
    "type": "error_type",
    "param": "field_name",
    "code": "error_code"
  }
}

Common Errors¶

HTTP Status	Type	Description
400	`invalid_request_error`	Malformed request or validation failure
401	`authentication_error`	Invalid or missing API key
404	`not_found_error`	Response, conversation, or item not found
429	`rate_limit_error`	Rate limit exceeded
500	`internal_error`	Server error
503	`service_unavailable`	No healthy workers available

Validation Errors¶

{
  "error": {
    "message": "Invalid 'conversation': 'invalid-id'. Expected an ID that begins with 'conv_'.",
    "type": "invalid_request_error",
    "param": "conversation",
    "code": "invalid_conversation_id"
  }
}

{
  "error": {
    "message": "Mutually exclusive parameters. Ensure you are only providing one of: 'previous_response_id' or 'conversation'.",
    "type": "invalid_request_error",
    "code": "mutually_exclusive_parameters"
  }
}

SGLang Extensions¶

The Responses API includes additional sampling parameters specific to SGLang:

Field	Type	Default	Description
`top_k`	integer	-1	Top-k sampling (-1 = disabled)
`min_p`	number	0.0	Min-p sampling threshold
`repetition_penalty`	number	1.0	Repetition penalty (1.0 = disabled)
`frequency_penalty`	number	-	OpenAI-compatible frequency penalty
`presence_penalty`	number	-	OpenAI-compatible presence penalty
`stop`	string/array	-	Stop sequences

Example:

{
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "input": "Write a story",
  "top_k": 50,
  "min_p": 0.05,
  "repetition_penalty": 1.1
}