Skip to content

Introduce V2 Chat Agent#4732

Merged
ylwu-amzn merged 12 commits intoopensearch-project:mainfrom
pyek-bot:agent-v2-revamp
Mar 24, 2026
Merged

Introduce V2 Chat Agent#4732
ylwu-amzn merged 12 commits intoopensearch-project:mainfrom
pyek-bot:agent-v2-revamp

Conversation

@pyek-bot
Copy link
Collaborator

@pyek-bot pyek-bot commented Mar 23, 2026

Description

This PR introduces a V2 Chat Agent - a major enhancement to the OpenSearch ML agent system that provides a unified interface, multi-modal support, and simplified agent registration.

Overview

Agent V2 addresses key limitations in the current agent framework by introducing a clean, message-centric architecture that:

  • Simplifies registration - Single API call with auto-generated connectors and models
  • Enables multi-modal conversations - Supports images, videos, and documents alongside text
  • Standardizes input/output - Three input types (TEXT, CONTENT_BLOCKS, MESSAGES) with consistent Strands-style responses
  • Supports multiple providers - Works with Bedrock, OpenAI, Gemini, and future providers
  • Maintains full backward compatibility - Zero changes to existing V1 agents
  • Simple implementation - V2 agents only work with structured messages, agentic memory and function calling. Thereby, getting rid of any legacy code that was part of V1 agents.

Problem Statement

The current agent framework has accumulated technical debt through evolution:

  1. V1: Prompt engineering approach with hardcoded question parameter
  2. V1.1: Native function calling added alongside V1, creating mixed code paths
  3. Pain Points:
    • Multi-step registration (create connector → register model → deploy model → create agent)
    • conversation_index memory loses multi-modal content (images/videos discarded)
    • Complex conditional branching making maintenance difficult
    • Difficult model switching requiring connector recreation
    • Non-standardized input/output formats

Solution: Agent V2 Framework

Key Feature: Unified Agent Interface

Before (V1) - Multi-step manual setup:

// 1. Create connector POST /_plugins/_ml/connectors/_create {...} // 2. Register model POST /_plugins/_ml/models/_register {...} // 3. Deploy model POST /_plugins/_ml/models/{model_id}/_deploy // 4. Create agent POST /_plugins/_ml/agents/_register {...}

After (V2) - Single-step with auto-generation:

POST /_plugins/_ml/agents/_register { "name": "Claude Multi-modal Agent", "type": "conversational_v2", "model": { "model_id": "anthropic.claude-3-5-sonnet-20241022-v2:0", "model_provider": "bedrock/converse", "credential": {"access_key": "...", "secret_key": "..."} }, "memory": {"type": "agentic_memory"}, "tools": [...] }

Multi-Modal Support

V2 agents support three input types:

1. TEXT Input - Simple string

{ "parameters": {"session_id": "session_001"}, "input": "What is machine learning?" }

2. CONTENT_BLOCKS Input - Multi-modal array

{ "input": [ {"type": "text", "text": "What's in this image?"}, {"type": "image", "source": {"type": "base64", "format": "png", "data": "..."}} ] }

3. MESSAGES Input - Full conversation array

{ "input": [ {"role": "user", "content": [{"text": "Hi"}]}, {"role": "assistant", "content": [{"text": "Hello!"}]}, {"role": "user", "content": [{"text": "Tell me about AI"}]} ] }

Agentic Memory - Structured Storage

V2 uses agentic_memory which stores complete message objects including multi-modal content in structured_data_blob using Strands-compatible format:

{ "structured_data_blob": { "message": { "role": "user", "content": [ {"text": "What's in this image?"}, {"image": {"format": "png", "source": {...}}} ] } }, "metadata": {"role": "user", "format": "strands", "type": "message"} }

Key Benefits:

  • Multi-modal content preserved (images, videos, documents)
  • Session-based conversation tracking via session_id
  • Support for continued conversations with memory_id

Standardized Output Format

V2 returns responses consistent with modern LLM APIs:

{ "stop_reason": "end_turn", "message": { "role": "assistant", "content": [{"text": "The response..."}] }, "memory_id": "abc123", "metrics": { "total_usage": { "inputTokens": 1234, "outputTokens": 567, "totalTokens": 1801 } } }

Note

  1. V1 Agents when registered with unified interface format will only support text input.
  2. V2 Agents can only be registered with agentic_memory

Limitations

  1. No streaming support
  2. No support for hooks and context management

These limitations will be addressed in future PRs.

Testing

Created a comprehensive agent test suite to validate backward compatibility and functionality: https://github.com/pyek-bot/agent-v2-test-suite

Technical Architecture

Message-Centric Design

V2 eliminates the question parameter limitation and works exclusively with Message objects:

Input (TEXT/CONTENT_BLOCKS/MESSAGES) ↓ Convert to Message format ↓ Save user Message to agentic memory ↓ Get conversation history (List<Message>) ↓ Execute agent logic with Messages (ReAct loop) ↓ Save assistant Message to memory ↓ Return standardized response 

New Agent Types

public enum MLAgentType { // V1 Agents (unchanged) CONVERSATIONAL, FLOW, PLAN_EXECUTE_AND_REFLECT, // V2 Agents (new - opt-in) CONVERSATIONAL_V2, // This PR FLOW_V2, // Future PLAN_EXECUTE_AND_REFLECT_V2 // Future }

AbstractV2AgentRunner - Code Reuse Foundation

New abstract base class provides common V2 functionality:

  • Validation (V2 requirements: agentic memory, model field)
  • Memory operations (read/write messages with history limiting)
  • Tool execution (sequential with error handling)
  • Message handling (formatting for LLM, extracting responses)
  • Output building (standardized AgentV2Output)
  • Token tracking (usage accumulation)

Benefits:

  • Future V2 agents require minimal agent-specific code
  • Consistent behavior across all V2 agents
  • Single source of truth for V2 framework operations

MLChatAgentRunnerV2 - Clean Implementation

Extends AbstractV2AgentRunner with chat-specific logic:

  • ReAct loop execution
  • Tool call detection and handling
  • Chat-specific stop conditions

What Happens During Registration

V2 Registration Flow

  1. User submits registration with type: conversational_v2
  2. Validation checks model field exists and memory.type is agentic_memory
  3. Auto-create connector from model.model_provider with provider-specific format
  4. Auto-register model using connector and deploy automatically
  5. Create agent linked to deployed model with tools and memory
  6. Return agent_id - Ready to execute immediately!

Auto-Generated Resources

For model_provider: "bedrock/converse", system creates connector with:

  • AWS SigV4 authentication
  • Bedrock Converse API endpoint
  • Correct request_body format for messages and tools
  • User's credentials from model.credential

Clear Validation

V2 prevents invalid configurations with helpful error messages:

  • V2 + conversation_index → Fails at registration: "V2 agents require agentic_memory"
  • V1 + conversation_index + multi-modal → Fails at execution: "Use TEXT input or upgrade to V2 with agentic_memory"

Optional Migration Path

To use V2 features:

  1. Change agent type: conversationalconversational_v2
  2. Use simplified model registration with model block
  3. Change memory type: conversation_indexagentic_memory

No forced migration - V1 agents continue indefinitely.

Summary

This PR delivers a complete Agent V2 Framework that:

  • Simplifies registration - Single API call with auto-generated resources
  • Enables multi-modal - Images, videos, documents preserved in memory
  • Standardizes I/O - Three input types with consistent Strands-style output
  • Supports all providers - Bedrock, OpenAI, Gemini via abstraction
  • Clean architecture - AbstractV2AgentRunner for code reuse
  • Fully backward compatible - Zero changes to existing V1 agents

This establishes a solid, maintainable foundation for future V2 agent types while keeping the existing V1 system fully operational.

Related Issues

Resolves #4552 - RFC: Unified Agent Interface with Multi-Modal Support

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link

github-actions bot commented Mar 23, 2026

PR Reviewer Guide 🔍

(Review updated until commit 5307e5a)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Add AgentV2Output data model and MLOutputType enum entry

Relevant files:

  • common/src/main/java/org/opensearch/ml/common/output/execute/agent/AgentV2Output.java
  • common/src/main/java/org/opensearch/ml/common/output/MLOutputType.java
  • common/src/test/java/org/opensearch/ml/common/output/execute/agent/AgentV2OutputTest.java

Sub-PR theme: Introduce AbstractV2AgentRunner and MLChatAgentRunnerV2 runner implementations

Relevant files:

  • ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AbstractV2AgentRunner.java
  • ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLChatAgentRunnerV2.java
  • ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/agent/AbstractV2AgentRunnerTest.java
  • ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/agent/MLChatAgentRunnerV2Test.java

Sub-PR theme: Wire V2 agent execution path into MLAgentExecutor

Relevant files:

  • ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java
  • ml-algorithms/src/test/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutorTest.java

⚡ Recommended focus areas for review

Possible Issue

In performInitialMemoryOperations, the variable formattedHistory is used before it is assigned. The code calls modelProvider.mapMessages(history, agentType) and assigns the result to historyParams, then reads historyParams.get("body") into formattedHistory. However, the if (formattedHistory != null && !formattedHistory.isEmpty()) check references formattedHistory which is now a local variable assigned just above — but the old code had formattedHistory as a pre-existing variable. If historyParams.get("body") returns null (e.g., the provider uses a different key), formattedHistory will be null and the history won't be added to params, silently dropping conversation history.

Map<String, String> historyParams = modelProvider.mapMessages(history, agentType); String formattedHistory = historyParams.get("body"); if (formattedHistory != null && !formattedHistory.isEmpty()) { params.put(NEW_CHAT_HISTORY, formattedHistory + ", "); } // Copy NO_ESCAPE_PARAMS to ensure _chat_history is not double-escaped if (historyParams.containsKey(ToolUtils.NO_ESCAPE_PARAMS)) { String existingNoEscape = params.get(ToolUtils.NO_ESCAPE_PARAMS); String historyNoEscape = historyParams.get(ToolUtils.NO_ESCAPE_PARAMS); if (existingNoEscape != null && !existingNoEscape.isEmpty()) { // Merge with existing NO_ESCAPE_PARAMS (avoid duplicates) String merged = existingNoEscape + "," + historyNoEscape; params.put(ToolUtils.NO_ESCAPE_PARAMS, merged); } else { params.put(ToolUtils.NO_ESCAPE_PARAMS, historyNoEscape); } }
NO_ESCAPE_PARAMS Merge Bug

When merging NO_ESCAPE_PARAMS values, the code concatenates with a comma (existingNoEscape + "," + historyNoEscape) without deduplication logic that actually works — the comment says "avoid duplicates" but the string concatenation doesn't deduplicate. If _chat_history is already in existingNoEscape, it will appear twice in the merged string, potentially causing double-processing issues downstream.

// Copy NO_ESCAPE_PARAMS to ensure _chat_history is not double-escaped if (historyParams.containsKey(ToolUtils.NO_ESCAPE_PARAMS)) { String existingNoEscape = params.get(ToolUtils.NO_ESCAPE_PARAMS); String historyNoEscape = historyParams.get(ToolUtils.NO_ESCAPE_PARAMS); if (existingNoEscape != null && !existingNoEscape.isEmpty()) { // Merge with existing NO_ESCAPE_PARAMS (avoid duplicates) String merged = existingNoEscape + "," + historyNoEscape; params.put(ToolUtils.NO_ESCAPE_PARAMS, merged); } else { params.put(ToolUtils.NO_ESCAPE_PARAMS, historyNoEscape); } }
Misleading Test

In test_PerformInitialMemoryOperations_NullHistory, the assertion assertTrue(e.getMessage() != null || e.getMessage() == null) is a tautology that always passes regardless of the exception message. This test provides no real coverage of the null history edge case and may mask a real bug where null history causes an NPE that should be handled gracefully.

try { mlAgentExecutor .performInitialMemoryOperations(memory, inputMessages, params, agent, listener, () -> continuationCalled.set(true), null); } catch (NullPointerException e) { // Expected if not handled - this tests the edge case assertTrue(e.getMessage() != null || e.getMessage() == null); }
Input Saved Before Validation

In executeV2Agent, memory.saveStructuredMessages(inputMessages, ...) is called before the agent runner validates the agent (e.g., validateV2Agent). If agent validation fails inside mlAgentRunner.runV2(...), the input messages have already been persisted to memory, corrupting the conversation history with an unanswered user turn.

memory.saveStructuredMessages(inputMessages, ActionListener.wrap(v -> { log.debug("Saved {} input messages to memory. agentId={}", inputMessages.size(), agentId); // Build full conversation (history + new input) List<Message> fullConversation = new ArrayList<>(history); fullConversation.addAll(inputMessages); log.debug("Full conversation has {} messages. agentId={}", fullConversation.size(), agentId); if (isAsync) { // Async execution: index task, run in background Map<String, Object> agentResponse = new HashMap<>(); agentResponse.put(MEMORY_ID, memory.getId()); mlTask.setResponse(agentResponse); mlTask.setAsync(true); indexMLTask(mlTask, ActionListener.wrap(indexResponse -> { String taskId = indexResponse.getId(); mlTask.setTaskId(taskId); params.put(TASK_ID_FIELD, taskId); MLTaskOutput outputBuilder = MLTaskOutput .builder() .taskId(taskId) .status(MLTaskState.RUNNING.toString()) .response(agentResponse) .build(); listener.onResponse(outputBuilder); ActionListener<Object> taskUpdater = createV2AsyncTaskUpdater(mlTask, mlAgent.getType(), agentId, tenantId); try { mlAgentRunner.runV2(mlAgent, params, taskUpdater, channel, memory, fullConversation); } catch (Exception e) { log.error("Agent execution failed. agentId={}, tenantId={}", agentId, tenantId, e); taskUpdater.onFailure(e); } }, listener::onFailure)); } else { // Sync execution ActionListener<Object> agentActionListener = createV2AgentActionListener( listener, mlAgent.getType(), agentId, tenantId ); try { mlAgentRunner.runV2(mlAgent, params, agentActionListener, channel, memory, fullConversation); } catch (Exception e) { log.error("Agent execution failed. agentId={}, tenantId={}", agentId, tenantId, e); agentActionListener.onFailure(e); }
@github-actions
Copy link

github-actions bot commented Mar 23, 2026

PR Code Suggestions ✨

Latest suggestions up to 5307e5a

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null history from memory

If allHistory is null (e.g., memory returns null), this code will throw a
NullPointerException. The performInitialMemoryOperations method already handles null
with a cast, but executeV2Agent does not guard against a null historyObj. Add a null
check before using allHistory.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [1091-1096]

+List<Message> allHistory = historyObj != null ? (List<Message>) historyObj : new ArrayList<>(); List<Message> history = messageHistoryLimit > 0 && allHistory.size() > messageHistoryLimit ? allHistory.subList(allHistory.size() - messageHistoryLimit, allHistory.size()) : allHistory;
Suggestion importance[1-10]: 7

__

Why: The executeV2Agent method casts historyObj to List<Message> without a null check, which could cause a NullPointerException if memory returns null. The improved_code correctly adds a null guard before the cast and the subList operation.

Medium
Guard against null input dataset in V2 execution path

If inputDataSet is null when executeV2Agent is called, the method will throw a
NullPointerException at inputDataSet.getParameters().get(AGENT_ID_FIELD). Add a null
check for inputDataSet before routing to the V2 path, or handle it inside
executeV2Agent.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [920-923]

 if (agentType.isV2() && inputMessages != null && memory != null) { + if (inputDataSet == null) { + listener.onFailure(new IllegalArgumentException("V2 agent execution requires input dataset with parameters")); + return; + } executeV2Agent(inputDataSet, tenantId, mlTask, isAsync, mlAgent, listener, memory, channel, hookRegistry, inputMessages); return; }
Suggestion importance[1-10]: 6

__

Why: If inputDataSet is null when routing to executeV2Agent, the method will throw a NullPointerException at inputDataSet.getParameters(). Adding an explicit null check with a meaningful error message improves robustness and debuggability.

Low
General
Fix null metrics serialization inconsistency

The writeTo writes false for both null metrics and empty metrics, but the
constructor initializes metrics to new HashMap<>() when the boolean is false. This
means a null metrics map passed at construction time will be deserialized as an
empty map, which is acceptable, but an empty metrics map will also be deserialized
as an empty map — consistent. However, there is a subtle inconsistency: if metrics
is null at serialization time (possible via setMetrics(null)),
out.writeBoolean(false) is written, and deserialization produces an empty map
instead of null, silently changing the value. This could cause
assertEquals(original.getMetrics(), deserialized.getMetrics()) to fail if null was
set. Normalize null to empty map consistently.

common/src/main/java/org/opensearch/ml/common/output/execute/agent/AgentV2Output.java [73-97]

 public AgentV2Output(StreamInput in) throws IOException { super(OUTPUT_TYPE); this.stopReason = in.readOptionalString(); - // Deserialize Message from JSON string String messageJson = in.readOptionalString(); this.message = messageJson != null ? gson.fromJson(messageJson, Message.class) : null; this.memoryId = in.readOptionalString(); this.metrics = in.readBoolean() ? in.readMap() : new HashMap<>(); } @Override public void writeTo(StreamOutput out) throws IOException { super.writeTo(out); out.writeOptionalString(stopReason); - // Serialize Message as JSON string String messageJson = message != null ? gson.toJson(message) : null; out.writeOptionalString(messageJson); out.writeOptionalString(memoryId); - if (metrics != null && !metrics.isEmpty()) { + Map<String, Object> metricsToWrite = metrics != null ? metrics : new HashMap<>(); + if (!metricsToWrite.isEmpty()) { out.writeBoolean(true); - out.writeMap(metrics); + out.writeMap(metricsToWrite); } else { out.writeBoolean(false); } }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that null metrics set via setMetrics(null) would be serialized as false and deserialized as an empty map, causing inconsistency. The improved code normalizes null to empty map before serialization, which is a valid defensive fix. The impact is moderate since the @Builder.Default initializes metrics to empty map, but setMetrics(null) could still trigger this issue.

Low
Handle unknown memory types with descriptive error message

The conversation_index check above already handles that specific case, but
MLMemoryType.from(memoryType) may throw an IllegalArgumentException for unknown
memory types (e.g., a typo or unsupported type). This exception is caught by the
outer catch (Exception e) block and returns a generic "Failed to validate agent
memory configuration" message, which is less informative than a direct validation
error. The unknown memory type case should be handled explicitly before calling
MLMemoryType.from().

plugin/src/main/java/org/opensearch/ml/action/agent/MLAgentRegistrationValidator.java [110-118]

 // Validate memory type is agentic -MLMemoryType parsedMemoryType = MLMemoryType.from(memoryType); +MLMemoryType parsedMemoryType; +try { + parsedMemoryType = MLMemoryType.from(memoryType); +} catch (IllegalArgumentException ex) { + return String.format( + "V2 agents (CONVERSATIONAL_V2) only support agentic_memory or remote_agentic_memory. Found unknown memory type: %s", + memoryType + ); +} if (parsedMemoryType != MLMemoryType.AGENTIC_MEMORY && parsedMemoryType != MLMemoryType.REMOTE_AGENTIC_MEMORY) { - return String - .format( - "V2 agents (CONVERSATIONAL_V2) only support agentic_memory or remote_agentic_memory. " + "Found: %s", - memoryType - ); + return String.format( + "V2 agents (CONVERSATIONAL_V2) only support agentic_memory or remote_agentic_memory. Found: %s", + memoryType + ); }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that MLMemoryType.from(memoryType) can throw for unknown types, which would be caught by the outer catch block producing a generic error message. The improved code provides a more descriptive error for unknown memory types, improving user experience and debuggability.

Low
Prevent duplicate entries when merging parameters

When merging NO_ESCAPE_PARAMS, duplicate entries are not prevented despite the
comment. If existingNoEscape already contains values from historyNoEscape, they will
be duplicated, potentially causing incorrect behavior. Use a Set-based deduplication
approach.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [872]

-String merged = existingNoEscape + "," + historyNoEscape; +Set<String> mergedSet = new LinkedHashSet<>(Arrays.asList(existingNoEscape.split(","))); +mergedSet.addAll(Arrays.asList(historyNoEscape.split(","))); +String merged = String.join(",", mergedSet);
Suggestion importance[1-10]: 4

__

Why: The comment says "avoid duplicates" but the simple string concatenation doesn't actually deduplicate. The Set-based approach is more correct, though this is a minor edge case that may rarely occur in practice.

Low
Provide consistent error handling for invalid LLM interface

FunctionCallingFactory.create() may throw an IllegalArgumentException for invalid
interface names before returning null, but the null check after it suggests it can
also return null. The thrown exception message may not be user-friendly. Wrap the
factory call in a try-catch to provide a consistent error message.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AbstractV2AgentRunner.java [504-520]

-protected FunctionCalling getFunctionCalling(MLAgent mlAgent, Map<String, String> params) { - String llmInterface = params.get(MLChatAgentRunner.LLM_INTERFACE); - if (llmInterface == null && mlAgent.getParameters() != null) { - llmInterface = mlAgent.getParameters().get(MLChatAgentRunner.LLM_INTERFACE); - } +FunctionCalling functionCalling; +try { + functionCalling = FunctionCallingFactory.create(llmInterface); +} catch (IllegalArgumentException e) { + throw new IllegalStateException( + String.format("Invalid _llm_interface '%s' for V2 agent. %s", llmInterface, e.getMessage()), e); +} +if (functionCalling == null) { + throw new IllegalStateException(String.format("No function calling implementation found for interface: %s", llmInterface)); +} - if (llmInterface == null) { - throw new IllegalStateException( - "V2 agents require function calling. LLM interface not configured. " - + "Ensure _llm_interface parameter is set in agent configuration." - ); - } - - FunctionCalling functionCalling = FunctionCallingFactory.create(llmInterface); - if (functionCalling == null) { - throw new IllegalStateException(String.format("No function calling implementation found for interface: %s", llmInterface)); - } -
Suggestion importance[1-10]: 4

__

Why: The test testGetFunctionCalling_FromParams already expects IllegalArgumentException from FunctionCallingFactory.create(), and the suggestion wraps it in a consistent IllegalStateException. This improves error message consistency but is a minor improvement.

Low
Remove unreachable else branch masking validation errors

The validateAgentForRegistration method calls listener.onResponse(true) on success,
but the else branch here would silently swallow a false response with a generic
error. More critically, if the validator calls onResponse(null) or
onResponse(false), the agent would fail with a non-descriptive message. The original
code had specific error messages; the fallback should at minimum preserve that.
However, looking at the validator implementation, it only calls onResponse(true) or
onFailure(...), so the else branch is dead code that could mask future bugs —
consider removing it or adding an assertion.

plugin/src/main/java/org/opensearch/ml/action/agents/TransportRegisterAgentAction.java [152-158]

 agentRegistrationValidator.validateAgentForRegistration(agent, ActionListener.wrap(isValid -> { - if (Boolean.TRUE.equals(isValid)) { - listener.onResponse(agent); - } else { - listener.onFailure(new IllegalArgumentException("Agent validation failed")); - } + // validateAgentForRegistration only calls onResponse(true) on success, + // or onFailure with a descriptive exception on failure. + listener.onResponse(agent); }, listener::onFailure));
Suggestion importance[1-10]: 4

__

Why: The suggestion is valid - the else branch is dead code since validateAgentForRegistration only calls onResponse(true) on success. However, removing it is a minor cleanup with low risk impact, and the improved code is functionally equivalent for all real cases.

Low
Avoid duplicate LLM response extraction calls

This block extracts the assistant message JSON for persistence after tool calls are
detected, but the same extractMessageFromResponse call is already performed earlier
via extractAssistantMessage (which calls modelProvider.extractMessageFromResponse
internally). This results in a duplicate call to extractMessageFromResponse on the
same response. The result from the first call should be reused or the assistant
message JSON should be captured once and stored, to avoid redundant processing and
potential inconsistency.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLChatAgentRunnerV2.java [240-249]

+// Capture message JSON for persistence at the same time as extractAssistantMessage +// by storing the result of extractMessageFromResponse when it was first called. +// Reuse the already-extracted messageJson instead of re-extracting: try { - var tensor = output.getMlModelOutputs().getFirst().getMlModelTensors().getFirst(); - Map<String, ?> dataAsMap = tensor.getDataAsMap(); + Map<String, ?> dataAsMap = output.getMlModelOutputs().getFirst().getMlModelTensors().getFirst().getDataAsMap(); String messageJson = modelProvider.extractMessageFromResponse(dataAsMap); if (messageJson != null) { toolInteractionJsonList.add(messageJson); } } catch (Exception e) { log.warn("Failed to extract message JSON for persistence. agentId={}", agentId, e); }
Suggestion importance[1-10]: 2

__

Why: The suggestion claims there's a duplicate call to extractMessageFromResponse, but the improved_code is essentially identical to the existing_code - it doesn't actually refactor to reuse a previously captured result. The suggestion identifies a real concern but the proposed fix doesn't address it meaningfully.

Low

Previous suggestions

Suggestions up to commit 0953e13
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null history list

If allHistory is null (e.g., when memory returns null), calling allHistory.size()
will throw a NullPointerException. Add a null check before using allHistory.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [1094-1096]

-List<Message> history = messageHistoryLimit > 0 && allHistory.size() > messageHistoryLimit - ? allHistory.subList(allHistory.size() - messageHistoryLimit, allHistory.size()) - : allHistory; +List<Message> allHistoryNonNull = allHistory != null ? allHistory : new ArrayList<>(); +List<Message> history = messageHistoryLimit > 0 && allHistoryNonNull.size() > messageHistoryLimit + ? allHistoryNonNull.subList(allHistoryNonNull.size() - messageHistoryLimit, allHistoryNonNull.size()) + : allHistoryNonNull;
Suggestion importance[1-10]: 7

__

Why: The allHistory variable is cast from historyObj which could be null (as demonstrated in the test test_PerformInitialMemoryOperations_NullHistory). Calling .size() on a null list would throw a NullPointerException. This is a valid defensive fix.

Medium
Add null check for input dataset parameters

inputDataSet or inputDataSet.getParameters() could be null, which would cause a
NullPointerException. Add null checks before accessing these values in
executeV2Agent.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [1076-1077]

+if (inputDataSet == null || inputDataSet.getParameters() == null) { + listener.onFailure(new IllegalArgumentException("Input dataset or parameters cannot be null for V2 agent execution")); + return; +} String agentId = inputDataSet.getParameters().get(AGENT_ID_FIELD); Map<String, String> params = new HashMap<>(inputDataSet.getParameters());
Suggestion importance[1-10]: 6

__

Why: inputDataSet could potentially be null if called incorrectly, and accessing getParameters() without a null check is a potential NPE. This is a valid defensive check, though the calling code in executeAgent likely ensures inputDataSet is non-null before routing to executeV2Agent.

Low
Guard against empty list access causing silent data loss

The assistant message JSON for tool-call interactions is extracted here from the raw
tensor data, but the same extraction is already performed earlier via
extractAssistantMessage which calls modelProvider.extractMessageFromResponse. This
results in duplicate extraction logic. More critically, if getMlModelOutputs() or
getMlModelTensors() returns an empty list, getFirst() will throw a
NoSuchElementException which is silently swallowed — but the assistant message with
tool calls won't be persisted, causing incomplete conversation history.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLChatAgentRunnerV2.java [240-249]

 try { - var tensor = output.getMlModelOutputs().getFirst().getMlModelTensors().getFirst(); - Map<String, ?> dataAsMap = tensor.getDataAsMap(); - String messageJson = modelProvider.extractMessageFromResponse(dataAsMap); - if (messageJson != null) { - toolInteractionJsonList.add(messageJson); + if (output.getMlModelOutputs() != null && !output.getMlModelOutputs().isEmpty()) { + var tensors = output.getMlModelOutputs().get(0); + if (tensors.getMlModelTensors() != null && !tensors.getMlModelTensors().isEmpty()) { + Map<String, ?> dataAsMap = tensors.getMlModelTensors().get(0).getDataAsMap(); + String messageJson = modelProvider.extractMessageFromResponse(dataAsMap); + if (messageJson != null) { + toolInteractionJsonList.add(messageJson); + } + } } } catch (Exception e) { log.warn("Failed to extract message JSON for persistence. agentId={}", agentId, e); }
Suggestion importance[1-10]: 5

__

Why: Using getFirst() on potentially empty lists can throw NoSuchElementException which would be silently caught, causing incomplete tool interaction persistence. The improved code adds proper null/empty checks before accessing list elements, which is a valid defensive improvement.

Low
Fix stream deserialization misalignment for output type enum

The writeTo method writes the enum type via super.writeTo(out) (which writes the
MLOutputType enum), but the deserialization constructor AgentV2Output(StreamInput
in) calls super(OUTPUT_TYPE) without reading the enum from the stream. This means
the stream position is misaligned during deserialization — the enum written by
super.writeTo is not consumed, causing all subsequent reads to be off by one field.
The enum should be read from the stream in the constructor, or the parent class
should handle this symmetrically.

common/src/main/java/org/opensearch/ml/common/output/execute/agent/AgentV2Output.java [73-81]

 public AgentV2Output(StreamInput in) throws IOException { - super(OUTPUT_TYPE); + super(in.readEnum(MLOutputType.class)); this.stopReason = in.readOptionalString(); // Deserialize Message from JSON string String messageJson = in.readOptionalString(); this.message = messageJson != null ? gson.fromJson(messageJson, Message.class) : null; this.memoryId = in.readOptionalString(); this.metrics = in.readBoolean() ? in.readMap() : new HashMap<>(); }
Suggestion importance[1-10]: 3

__

Why: The test code in AgentV2OutputTest explicitly reads the enum with input.readEnum(MLOutputType.class) before constructing AgentV2Output, suggesting the parent class writeTo writes the enum and the test handles it externally. The suggestion to read the enum inside the constructor may conflict with how the parent class and test infrastructure handle this, making it potentially incorrect without more context on the MLOutput base class behavior.

Low
General
Handle factory exceptions for invalid LLM interface

FunctionCallingFactory.create() may throw an IllegalArgumentException for invalid
interface names rather than returning null, but the code only checks for null. The
null check after create() may be dead code, and the actual exception from the
factory would propagate uncaught with a less informative message. Wrap the factory
call in a try-catch to provide a consistent error message.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AbstractV2AgentRunner.java [504-530]

-protected FunctionCalling getFunctionCalling(MLAgent mlAgent, Map<String, String> params) { - String llmInterface = params.get(MLChatAgentRunner.LLM_INTERFACE); - if (llmInterface == null && mlAgent.getParameters() != null) { - llmInterface = mlAgent.getParameters().get(MLChatAgentRunner.LLM_INTERFACE); - } - - if (llmInterface == null) { - throw new IllegalStateException( - "V2 agents require function calling. LLM interface not configured. " - + "Ensure _llm_interface parameter is set in agent configuration." - ); - } - - FunctionCalling functionCalling = FunctionCallingFactory.create(llmInterface); - if (functionCalling == null) { - throw new IllegalStateException(String.format("No function calling implementation found for interface: %s", llmInterface)); - } - ... +FunctionCalling functionCalling; +try { + functionCalling = FunctionCallingFactory.create(llmInterface); +} catch (IllegalArgumentException e) { + throw new IllegalStateException( + String.format("Invalid LLM interface '%s' for V2 agent. %s", llmInterface, e.getMessage()), e); +} +if (functionCalling == null) { + throw new IllegalStateException(String.format("No function calling implementation found for interface: %s", llmInterface)); }
Suggestion importance[1-10]: 5

__

Why: The suggestion is valid - if FunctionCallingFactory.create() throws IllegalArgumentException for invalid interfaces (as evidenced by the test expecting this), wrapping it provides better error messages. However, the test code already shows this exception propagates acceptably, making this a moderate improvement.

Low
Deduplicate merged NO_ESCAPE_PARAMS values

Merging NO_ESCAPE_PARAMS by simple string concatenation with a comma can introduce
duplicate entries if historyNoEscape values already exist in existingNoEscape. This
could cause unexpected behavior in downstream processing. Consider deduplicating the
merged values.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [872]

-String merged = existingNoEscape + "," + historyNoEscape; +Set<String> mergedSet = new LinkedHashSet<>(Arrays.asList(existingNoEscape.split(","))); +mergedSet.addAll(Arrays.asList(historyNoEscape.split(","))); +String merged = String.join(",", mergedSet);
Suggestion importance[1-10]: 4

__

Why: The suggestion to deduplicate NO_ESCAPE_PARAMS values is a valid improvement to prevent duplicate entries, but it's a minor edge case that may not cause critical issues in practice. The improved code correctly uses a LinkedHashSet to deduplicate while preserving order.

Low
Handle unknown memory types with a clear validation error

The conversation_index check is redundant because the subsequent
MLMemoryType.from(memoryType) call followed by the agentic memory check already
covers it. More importantly, if MLMemoryType.from(memoryType) throws an exception
for an unknown memory type (e.g., a typo), it will be caught by the outer catch
block and return a generic error message instead of a clear validation error. The
unknown memory type case should be handled explicitly before calling
MLMemoryType.from.

plugin/src/main/java/org/opensearch/ml/action/agent/MLAgentRegistrationValidator.java [99-118]

 // V2 agents cannot use conversation_index String memoryType = agent.getMemory().getType(); -if (MLMemoryType.CONVERSATION_INDEX.name().equalsIgnoreCase(memoryType)) { - return String - .format(...); +MLMemoryType parsedMemoryType; +try { + parsedMemoryType = MLMemoryType.from(memoryType); +} catch (Exception e) { + return String.format( + "V2 agents (CONVERSATIONAL_V2) only support agentic_memory or remote_agentic_memory. Found unknown memory type: %s", + memoryType + ); +} +if (parsedMemoryType != MLMemoryType.AGENTIC_MEMORY && parsedMemoryType != MLMemoryType.REMOTE_AGENTIC_MEMORY) { + return String.format( + "V2 agents (CONVERSATIONAL_V2) only support agentic_memory or remote_agentic_memory. Found: %s", + memoryType + ); } -// Validate memory type is agentic -MLMemoryType parsedMemoryType = MLMemoryType.from(memoryType); -if (parsedMemoryType != MLMemoryType.AGENTIC_MEMORY && parsedMemoryType != MLMemoryType.REMOTE_AGENTIC_MEMORY) { - return String - .format( - "V2 agents (CONVERSATIONAL_V2) only support agentic_memory or remote_agentic_memory. " + "Found: %s", - memoryType - ); -} -
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that an unknown memory type passed to MLMemoryType.from() would be caught by the outer catch block and return a generic error. Handling it explicitly provides clearer error messages to users, though the existing code does still return an error (just less specific).

Low
Preserve specific validation error details on failure

The validateAgentForRegistration method calls listener.onResponse(true) on success,
but the new code silently swallows the case where isValid is false with a generic
error message, losing the specific validation error. Looking at
validateAgentForRegistration, it only calls onResponse(true) on success and
onFailure with a descriptive exception on failure — so the else branch with a
generic message can never be reached with a meaningful error, but if it were, the
original error detail would be lost. The else branch should propagate a more
informative error or be removed if onResponse(false) is never called.

plugin/src/main/java/org/opensearch/ml/action/agents/TransportRegisterAgentAction.java [152-158]

 agentRegistrationValidator.validateAgentForRegistration(agent, ActionListener.wrap(isValid -> { if (Boolean.TRUE.equals(isValid)) { listener.onResponse(agent); } else { - listener.onFailure(new IllegalArgumentException("Agent validation failed")); + listener.onFailure(new IllegalArgumentException("Agent validation failed: validator returned false without a specific error")); } }, listener::onFailure));
Suggestion importance[1-10]: 2

__

Why: The validateAgentForRegistration method only calls onResponse(true) on success and onFailure with descriptive exceptions on failure, so the else branch is unreachable in practice. The improved code only changes the error message text, making this a very low-impact suggestion.

Low
Suggestions up to commit c6211c0
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null history list

If allHistory is null (e.g., memory returns null), this code will throw a
NullPointerException. The allHistory variable should be null-checked before calling
.size() on it. Add a null guard to default to an empty list.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [1094-1096]

-List<Message> history = messageHistoryLimit > 0 && allHistory.size() > messageHistoryLimit - ? allHistory.subList(allHistory.size() - messageHistoryLimit, allHistory.size()) - : allHistory; +List<Message> safeHistory = allHistory != null ? allHistory : new ArrayList<>(); +List<Message> history = messageHistoryLimit > 0 && safeHistory.size() > messageHistoryLimit + ? safeHistory.subList(safeHistory.size() - messageHistoryLimit, safeHistory.size()) + : safeHistory;
Suggestion importance[1-10]: 7

__

Why: The allHistory variable could be null if memory.getStructuredMessages() returns null, causing a NullPointerException. The test test_PerformInitialMemoryOperations_NullHistory in the test file also highlights this edge case. Adding a null guard is a valid defensive improvement.

Medium
Guard against null inputDataSet for V2 agents

If inputDataSet is null when executeV2Agent is called, the method will throw a
NullPointerException at inputDataSet.getParameters().get(AGENT_ID_FIELD). A null
check for inputDataSet should be added to the V2 routing condition to fail fast with
a meaningful error.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [920-923]

 if (agentType.isV2() && inputMessages != null && memory != null) { + if (inputDataSet == null) { + listener.onFailure(new IllegalArgumentException("V2 agent execution requires a non-null input dataset")); + return; + } executeV2Agent(inputDataSet, tenantId, mlTask, isAsync, mlAgent, listener, memory, channel, hookRegistry, inputMessages); return; }
Suggestion importance[1-10]: 6

__

Why: If inputDataSet is null when routing to executeV2Agent, the method will throw a NullPointerException at inputDataSet.getParameters(). Adding a null check with a meaningful error message is a valid defensive improvement that prevents cryptic NPEs.

Low
Avoid swallowing validation errors with generic message

The validateAgentForRegistration method calls listener.onResponse(true) on success,
but the else branch here silently swallows the actual validation error with a
generic message. Since onFailure is already forwarded via listener::onFailure, the
else branch (where isValid is false or null) should never be reached in normal flow
— but if it is, the original error context is lost. Consider removing the else
branch or logging a warning, since all failure cases should go through onFailure.

plugin/src/main/java/org/opensearch/ml/action/agents/TransportRegisterAgentAction.java [152-158]

 agentRegistrationValidator.validateAgentForRegistration(agent, ActionListener.wrap(isValid -> { - if (Boolean.TRUE.equals(isValid)) { - listener.onResponse(agent); - } else { - listener.onFailure(new IllegalArgumentException("Agent validation failed")); - } + listener.onResponse(agent); }, listener::onFailure));
Suggestion importance[1-10]: 6

__

Why: The else branch replaces the actual error with a generic "Agent validation failed" message, losing context. Since validateAgentForRegistration uses onFailure for all error cases, the else branch is effectively dead code and the simplified version is cleaner and more correct.

Low
Ensure metrics map serialization handles non-primitive values

The deserialization reads in.readMap() only when the boolean is true, but
out.writeMap(metrics) writes a generic map. The StreamOutput.writeMap and
StreamInput.readMap methods require value types to be serializable via the stream
protocol. If metrics contains non-primitive values (e.g., nested maps or custom
objects), this will fail at runtime. Additionally, when metrics is null (not just
empty), the boolean is written as false, but on deserialization an empty HashMap is
returned — this is inconsistent with the @Builder.Default behavior and could mask
null metrics passed explicitly.

common/src/main/java/org/opensearch/ml/common/output/execute/agent/AgentV2Output.java [91-96]

 if (metrics != null && !metrics.isEmpty()) { out.writeBoolean(true); - out.writeMap(metrics); + out.writeMap(metrics, StreamOutput::writeString, (o, v) -> o.writeGenericValue(v)); } else { out.writeBoolean(false); }
Suggestion importance[1-10]: 5

__

Why: The writeMap call without explicit value serializers may fail for non-primitive metric values at runtime. However, the improved_code still uses StreamOutput::writeString for keys which may not match the actual deserialization, making the suggestion partially inaccurate.

Low
General
Deduplicate merged NO_ESCAPE_PARAMS values

When merging NO_ESCAPE_PARAMS, duplicate entries are not prevented. If the same
parameter name appears in both existingNoEscape and historyNoEscape, it will be
duplicated in the merged string, which could cause unexpected behavior downstream.
Consider using a Set to deduplicate before merging.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [872-873]

-String merged = existingNoEscape + "," + historyNoEscape; -params.put(ToolUtils.NO_ESCAPE_PARAMS, merged); +Set<String> mergedSet = new LinkedHashSet<>(Arrays.asList(existingNoEscape.split(","))); +mergedSet.addAll(Arrays.asList(historyNoEscape.split(","))); +params.put(ToolUtils.NO_ESCAPE_PARAMS, String.join(",", mergedSet));
Suggestion importance[1-10]: 4

__

Why: Deduplicating NO_ESCAPE_PARAMS is a reasonable improvement to avoid downstream issues with duplicate parameter names, but this is a minor edge case that may not cause critical failures in practice.

Low
Consistent error handling for invalid LLM interface

FunctionCallingFactory.create() is expected to return null for unknown interfaces,
but it may also throw an exception. The null check after create() is correct, but
the error message says "No function calling implementation found" while the test
expects "Invalid _llm_interface". Ensure the factory and this method are consistent
in their error handling contract to avoid confusing error messages.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AbstractV2AgentRunner.java [504-520]

-protected FunctionCalling getFunctionCalling(MLAgent mlAgent, Map<String, String> params) { - String llmInterface = params.get(MLChatAgentRunner.LLM_INTERFACE); - if (llmInterface == null && mlAgent.getParameters() != null) { - llmInterface = mlAgent.getParameters().get(MLChatAgentRunner.LLM_INTERFACE); - } +FunctionCalling functionCalling; +try { + functionCalling = FunctionCallingFactory.create(llmInterface); +} catch (IllegalArgumentException e) { + throw new IllegalArgumentException("Invalid _llm_interface: " + llmInterface + ". " + e.getMessage(), e); +} +if (functionCalling == null) { + throw new IllegalStateException(String.format("No function calling implementation found for interface: %s", llmInterface)); +} - if (llmInterface == null) { - throw new IllegalStateException( - "V2 agents require function calling. LLM interface not configured. " - + "Ensure _llm_interface parameter is set in agent configuration." - ); - } - - FunctionCalling functionCalling = FunctionCallingFactory.create(llmInterface); - if (functionCalling == null) { - throw new IllegalStateException(String.format("No function calling implementation found for interface: %s", llmInterface)); - } -
Suggestion importance[1-10]: 4

__

Why: The suggestion addresses a potential inconsistency between the factory's error handling and the runner's error messages. The test acknowledges this inconsistency with a comment about "Invalid _llm_interface", making this a valid but minor improvement to error message clarity.

Low
Consolidate duplicate memory type validation checks

The conversation_index check above already handles that case, but
MLMemoryType.from(memoryType) may throw an IllegalArgumentException for unknown
memory types (e.g., a typo or future type). This exception is caught by the outer
catch (Exception e) block and returns a generic "Failed to validate agent memory
configuration" error, losing the specific context. The redundant check also means
conversation_index is validated twice. Consider consolidating the logic to parse the
type once and check it, letting the parse exception propagate a meaningful error.

plugin/src/main/java/org/opensearch/ml/action/agent/MLAgentRegistrationValidator.java [110-118]

-// Validate memory type is agentic -MLMemoryType parsedMemoryType = MLMemoryType.from(memoryType); -if (parsedMemoryType != MLMemoryType.AGENTIC_MEMORY && parsedMemoryType != MLMemoryType.REMOTE_AGENTIC_MEMORY) { - return String - .format( - "V2 agents (CONVERSATIONAL_V2) only support agentic_memory or remote_agentic_memory. " + "Found: %s", +// Validate memory type is a known agentic type for V2 +try { + MLMemoryType parsedMemoryType = MLMemoryType.from(memoryType); + if (parsedMemoryType == MLMemoryType.CONVERSATION_INDEX) { + return String.format( + "V2 agents (CONVERSATIONAL_V2) are not compatible with conversation_index memory. " + + "Found memory type: %s. Please use 'agentic_memory' or 'remote_agentic_memory' instead.", memoryType ); + } + if (parsedMemoryType != MLMemoryType.AGENTIC_MEMORY && parsedMemoryType != MLMemoryType.REMOTE_AGENTIC_MEMORY) { + return String.format( + "V2 agents (CONVERSATIONAL_V2) only support agentic_memory or remote_agentic_memory. Found: %s", + memoryType + ); + } +} catch (IllegalArgumentException e) { + return String.format("V2 agents (CONVERSATIONAL_V2) have an unknown memory type: %s", memoryType); }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that conversation_index is checked twice and that unknown types produce a generic error. Consolidating the logic improves clarity and provides better error messages for unknown memory types.

Low
Avoid redundant provider calls for message extraction

This block extracts the message JSON for persistence after tool calls are detected,
but the same extraction is done again later via extractAssistantMessage which also
calls modelProvider.extractMessageFromResponse. The assistant message with tool
calls is added to toolInteractionJsonList here, but assistantMessage was already
added to messages above. This means the assistant message with tool calls is tracked
for persistence but the duplicate extraction could lead to inconsistency if the two
calls return different results due to mutable state in mocks or provider
implementations.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLChatAgentRunnerV2.java [240-249]

+// Track assistant message JSON with tool calls for persistence (like V1) +// Use the already-extracted message from extractAssistantMessage to avoid duplicate provider calls try { - var tensor = output.getMlModelOutputs().getFirst().getMlModelTensors().getFirst(); - Map<String, ?> dataAsMap = tensor.getDataAsMap(); - String messageJson = modelProvider.extractMessageFromResponse(dataAsMap); + String messageJson = modelProvider.extractMessageFromResponse( + output.getMlModelOutputs().getFirst().getMlModelTensors().getFirst().getDataAsMap() + ); if (messageJson != null) { toolInteractionJsonList.add(messageJson); } } catch (Exception e) { log.warn("Failed to extract message JSON for persistence. agentId={}", agentId, e); }
Suggestion importance[1-10]: 2

__

Why: The improved_code is essentially identical to the existing_code — it still calls modelProvider.extractMessageFromResponse with the same arguments. The suggestion doesn't actually address the stated concern about avoiding duplicate calls.

Low
Suggestions up to commit 17cb259
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix method signature mismatch between tests and implementation

The public formatToolResults method in the test file takes (List<Map<String,
Object>> toolResults, FunctionCalling functionCalling, ModelProvider modelProvider)
as parameters and calls functionCalling.supply(toolResults) internally, but the
implementation in AbstractV2AgentRunner only accepts (List llmMessages,
ModelProvider modelProvider). This signature mismatch means the test is testing a
different method signature than what is implemented, suggesting the actual
production method may be missing the toolResults/functionCalling overload, or the
test is calling a non-existent method.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/AbstractV2AgentRunner.java [423-426]

 protected List<Message> formatToolResults( - List<org.opensearch.ml.engine.function_calling.LLMMessage> llmMessages, + List<Map<String, Object>> toolResults, + FunctionCalling functionCalling, ModelProvider modelProvider ) { + List<org.opensearch.ml.engine.function_calling.LLMMessage> llmMessages = functionCalling.supply(toolResults); + List<Message> messages = new ArrayList<>(); + for (var llmMsg : llmMessages) { + String messageJson = llmMsg.getResponse(); + if (messageJson != null && !messageJson.isEmpty()) { + try { + Message message = modelProvider.parseToUnifiedMessage(messageJson); + if (message != null) { + messages.add(message); + } else { + log.warn("ModelProvider returned null when parsing tool result JSON: {}", messageJson); + } + } catch (Exception e) { + log.error("Failed to parse tool result JSON: {}", messageJson, e); + } + } else { + log.warn("Tool result message has empty response"); + } + } + return messages; +}
Suggestion importance[1-10]: 8

__

Why: The test file calls runner.formatToolResults(toolResults, functionCalling, modelProvider) with three parameters including FunctionCalling, but the implementation only accepts (List<LLMMessage>, ModelProvider). This is a real signature mismatch that would cause compilation failures in the test, indicating the production method is missing the overload that accepts raw tool results.

Medium
Guard against null history list

If allHistory is null (e.g., when memory returns null), this code will throw a
NullPointerException. The existing performInitialMemoryOperations method casts the
result and handles null, but executeV2Agent does not guard against a null
allHistory. Add a null check before using allHistory.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [1094-1096]

-List<Message> history = messageHistoryLimit > 0 && allHistory.size() > messageHistoryLimit - ? allHistory.subList(allHistory.size() - messageHistoryLimit, allHistory.size()) - : allHistory; +List<Message> allHistoryNonNull = allHistory != null ? allHistory : new ArrayList<>(); +List<Message> history = messageHistoryLimit > 0 && allHistoryNonNull.size() > messageHistoryLimit + ? allHistoryNonNull.subList(allHistoryNonNull.size() - messageHistoryLimit, allHistoryNonNull.size()) + : allHistoryNonNull;
Suggestion importance[1-10]: 7

__

Why: The allHistory variable could be null if memory returns null, which would cause a NullPointerException when calling allHistory.size(). The test test_PerformInitialMemoryOperations_NullHistory explicitly tests this edge case, confirming it's a real concern.

Medium
Remove dead code hiding real validation errors

The validateAgentForRegistration method calls listener.onResponse(true) on success,
but the new code silently swallows the case where isValid is false with a generic
error message. Looking at the validator implementation, it never calls
onResponse(false) — it either calls onResponse(true) or onFailure(exception). The
else branch with a generic "Agent validation failed" message is therefore dead code
that hides the real error. Remove the else branch and let onFailure propagate the
actual exception.

plugin/src/main/java/org/opensearch/ml/action/agents/TransportRegisterAgentAction.java [152-158]

 agentRegistrationValidator.validateAgentForRegistration(agent, ActionListener.wrap(isValid -> { - if (Boolean.TRUE.equals(isValid)) { - listener.onResponse(agent); - } else { - listener.onFailure(new IllegalArgumentException("Agent validation failed")); - } + listener.onResponse(agent); }, listener::onFailure));
Suggestion importance[1-10]: 7

__

Why: The validateAgentForRegistration implementation only calls onResponse(true) or onFailure(exception), making the else branch dead code that would hide the actual error with a generic message. Simplifying to just listener.onResponse(agent) in the success path is a valid improvement.

Medium
General
Normalize numeric metric types after stream deserialization

The writeTo method writes false and skips the map when metrics is empty, but the
deserialization constructor calls in.readMap() only when the boolean is true. This
means an empty metrics map is correctly handled. However, in.readMap() returns a
Map<String, Object> where values are deserialized generics — numeric values stored
as Long may be deserialized as Integer depending on the stream implementation,
causing type mismatches in tests like assertEquals(500L,
deserialized.getMetrics().get("total_tokens")). Consider using a typed read or
document this limitation explicitly.

common/src/main/java/org/opensearch/ml/common/output/execute/agent/AgentV2Output.java [73-81]

 public AgentV2Output(StreamInput in) throws IOException { super(OUTPUT_TYPE); this.stopReason = in.readOptionalString(); - // Deserialize Message from JSON string String messageJson = in.readOptionalString(); this.message = messageJson != null ? gson.fromJson(messageJson, Message.class) : null; this.memoryId = in.readOptionalString(); - this.metrics = in.readBoolean() ? in.readMap() : new HashMap<>(); + if (in.readBoolean()) { + Map<String, Object> rawMetrics = in.readMap(); + // Normalize numeric values to Long for consistent type handling + this.metrics = new HashMap<>(); + rawMetrics.forEach((k, v) -> this.metrics.put(k, v instanceof Number ? ((Number) v).longValue() : v)); + } else { + this.metrics = new HashMap<>(); + } }
Suggestion importance[1-10]: 6

__

Why: This is a real potential issue where Long values written to a stream may be deserialized as Integer, causing type mismatch failures in tests like assertEquals(500L, deserialized.getMetrics().get("total_tokens")). The normalization approach in improved_code is a reasonable fix, though it may be overly broad by converting all numbers to Long.

Low
Avoid shared mutable state in async listener closures

The agentResponse and updatedTask maps are shared mutable state captured in the
lambda closures for both onResponse and onFailure. If the listener is called from
multiple threads or called more than once, this could lead to race conditions or
corrupted state. These maps should be created inside each callback branch instead.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [1373-1378]

 private ActionListener<Object> createV2AsyncTaskUpdater(MLTask mlTask, String agentType, String agentId, String tenantId) { long startTime = System.currentTimeMillis(); String taskId = mlTask.getTaskId(); - Map<String, Object> agentResponse = new HashMap<>(); - Map<String, Object> updatedTask = new HashMap<>(); + return ActionListener.wrap(output -> { + Map<String, Object> agentResponse = new HashMap<>(); + Map<String, Object> updatedTask = new HashMap<>(); + // ... rest of onResponse logic + }, ex -> { + Map<String, Object> agentResponse = new HashMap<>(); + Map<String, Object> updatedTask = new HashMap<>(); + // ... rest of onFailure logic + }); +} +
Suggestion importance[1-10]: 5

__

Why: Sharing agentResponse and updatedTask maps across both onResponse and onFailure closures is a potential thread-safety issue. However, in practice ActionListener callbacks are typically called exactly once, so the risk is moderate rather than critical.

Low
Remove redundant check and fix silent exception swallowing

The conversation_index check is redundant because the subsequent
MLMemoryType.from(memoryType) call followed by the allowlist check already covers
it. More importantly, if MLMemoryType.from(memoryType) throws an exception for an
unknown type (e.g., a typo), the outer catch block silently swallows it and returns
a generic error, masking the real problem. Remove the redundant first check and let
the allowlist check handle all invalid types, or ensure the exception from
MLMemoryType.from is propagated with a clear message.

plugin/src/main/java/org/opensearch/ml/action/agent/MLAgentRegistrationValidator.java [99-118]

-// V2 agents cannot use conversation_index +// Validate memory type is agentic String memoryType = agent.getMemory().getType(); -if (MLMemoryType.CONVERSATION_INDEX.name().equalsIgnoreCase(memoryType)) { - return String - .format(...); +MLMemoryType parsedMemoryType; +try { + parsedMemoryType = MLMemoryType.from(memoryType); +} catch (Exception e) { + return String.format( + "V2 agents (CONVERSATIONAL_V2) only support agentic_memory or remote_agentic_memory. Found unknown type: %s", + memoryType + ); +} +if (parsedMemoryType != MLMemoryType.AGENTIC_MEMORY && parsedMemoryType != MLMemoryType.REMOTE_AGENTIC_MEMORY) { + return String.format( + "V2 agents (CONVERSATIONAL_V2) only support agentic_memory or remote_agentic_memory. Found: %s", + memoryType + ); } -// Validate memory type is agentic -MLMemoryType parsedMemoryType = MLMemoryType.from(memoryType); -if (parsedMemoryType != MLMemoryType.AGENTIC_MEMORY && parsedMemoryType != MLMemoryType.REMOTE_AGENTIC_MEMORY) { - return String - .format(...); -} -
Suggestion importance[1-10]: 5

__

Why: The redundant conversation_index check before the allowlist check is a valid concern, and the suggestion to handle MLMemoryType.from exceptions explicitly rather than relying on the outer catch is a reasonable improvement for clarity and correctness. However, the existing_code snippet doesn't match the actual file shown in the diff (it references MLAgentRegistrationValidator.java), reducing confidence slightly.

Low
Deduplicate merged NO_ESCAPE_PARAMS entries

The comment says "avoid duplicates" but the implementation simply concatenates the
strings with a comma, which does not actually deduplicate entries. If
historyNoEscape is already present in existingNoEscape, it will be added again,
potentially causing double-processing of parameters.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java [870-877]

 if (existingNoEscape != null && !existingNoEscape.isEmpty()) { // Merge with existing NO_ESCAPE_PARAMS (avoid duplicates) - String merged = existingNoEscape + "," + historyNoEscape; - params.put(ToolUtils.NO_ESCAPE_PARAMS, merged); + Set<String> existingSet = new LinkedHashSet<>(Arrays.asList(existingNoEscape.split(","))); + existingSet.addAll(Arrays.asList(historyNoEscape.split(","))); + params.put(ToolUtils.NO_ESCAPE_PARAMS, String.join(",", existingSet)); } else { params.put(ToolUtils.NO_ESCAPE_PARAMS, historyNoEscape); }
Suggestion importance[1-10]: 4

__

Why: The comment says "avoid duplicates" but the code just concatenates strings, which doesn't actually deduplicate. While this is a real inconsistency, the practical impact depends on how NO_ESCAPE_PARAMS is consumed downstream, making it a moderate concern.

Low
Avoid duplicate message extraction for persistence

**The assistant message JSON fo...

@github-actions
Copy link

Persistent review updated to latest commit c44a58e

@pyek-bot pyek-bot changed the title Introduce V2 Agent Introduce V2 Chat Agent Mar 23, 2026
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env March 23, 2026 08:42 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env March 23, 2026 08:42 — with GitHub Actions Failure
@github-actions
Copy link

Persistent review updated to latest commit fefa92a

@github-actions
Copy link

Persistent review updated to latest commit 990f73d

@github-actions
Copy link

Persistent review updated to latest commit 11334dd

@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env March 23, 2026 09:16 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env March 23, 2026 09:16 — with GitHub Actions Failure
@pyek-bot pyek-bot marked this pull request as ready for review March 23, 2026 09:22
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env March 24, 2026 05:32 — with GitHub Actions Failure
@github-actions
Copy link

Persistent review updated to latest commit 897a4ba

@pyek-bot pyek-bot temporarily deployed to ml-commons-cicd-env March 24, 2026 06:44 — with GitHub Actions Inactive
@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 47.57930% with 314 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.97%. Comparing base (6ba0e6d) to head (897a4ba).

Files with missing lines Patch % Lines
...ch/ml/engine/algorithms/agent/MLAgentExecutor.java 14.12% 147 Missing and 5 partials ⚠️
...engine/algorithms/agent/AbstractV2AgentRunner.java 65.62% 51 Missing and 15 partials ⚠️
.../ml/common/output/execute/agent/AgentV2Output.java 0.00% 49 Missing ⚠️
...l/engine/algorithms/agent/MLChatAgentRunnerV2.java 65.42% 31 Missing and 6 partials ⚠️
.../ml/action/agent/MLAgentRegistrationValidator.java 81.48% 4 Missing and 1 partial ⚠️
...ml/action/agents/TransportRegisterAgentAction.java 60.00% 1 Missing and 1 partial ⚠️
...ain/java/org/opensearch/ml/common/MLAgentType.java 66.66% 1 Missing ⚠️
...n/java/org/opensearch/ml/common/agent/MLAgent.java 0.00% 1 Missing ⚠️
...arch/ml/engine/algorithms/agent/MLAgentRunner.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@ Coverage Diff @@ ## main #4732 +/- ## ============================================ - Coverage 77.27% 76.97% -0.31%  - Complexity 11621 11704 +83  ============================================ Files 948 951 +3 Lines 52168 52725 +557 Branches 6329 6422 +93 ============================================ + Hits 40315 40586 +271  - Misses 9175 9432 +257  - Partials 2678 2707 +29 
Flag Coverage Δ
ml-commons 76.97% <47.57%> (-0.31%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@pyek-bot pyek-bot temporarily deployed to ml-commons-cicd-env March 24, 2026 07:07 — with GitHub Actions Inactive
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env March 24, 2026 08:14 — with GitHub Actions Failure
@pyek-bot pyek-bot had a problem deploying to ml-commons-cicd-env March 24, 2026 08:14 — with GitHub Actions Failure
@github-actions
Copy link

Persistent review updated to latest commit 8cb1586

@github-actions
Copy link

Persistent review updated to latest commit 17cb259

1 similar comment
@github-actions
Copy link

Persistent review updated to latest commit 17cb259

@github-actions
Copy link

Persistent review updated to latest commit c6211c0

@pyek-bot
Copy link
Collaborator Author

pyek-bot commented Mar 24, 2026

Failing due to unrelated tests:

Tests with failures: - org.opensearch.ml.rest.RestConnectorToolIT.testConnectorToolInFlowAgent {"status":503,"error":{"type":"OpenSearchStatusException","reason":"System Error","details":"Error from remote service: {\"message\":\"Too many connections, please wait before trying again.\"}"}} 

Built locally with success!

Screenshot 2026-03-24 at 8 53 02 AM
Copy link
Collaborator

@rithin-pullela-aws rithin-pullela-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR Pavan, this is a huge improvement to Agent framework.
Added small nits, since they are not huge concerns, approving the PR LGTM

Map<String, ?> candidateMap = (Map<String, ?>) candidatesList.get(0);
Object contentObj = candidateMap.get("content");
if (contentObj != null) {
return org.opensearch.ml.common.utils.StringUtils.toJson(contentObj);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead import org.opensearch.ml.common.utils.

// Create mutable list for ReAct iterations (will append tool results)
List<Message> messages = new ArrayList<>(conversationHistory);
AtomicInteger iteration = new AtomicInteger(0);
TokenUsage[] accumulatedTokenUsage = new TokenUsage[1];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, can be enhanced later: Looks like we are using this to accumulate tokens. Can we use AgentTokenTracker which abstracts away the token tracking and also provides granular data about token consumption?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will follow up

Comment on lines +920 to +923
if (agentType.isV2() && inputMessages != null && memory != null) {
executeV2Agent(inputDataSet, tenantId, mlTask, isAsync, mlAgent, listener, memory, channel, hookRegistry, inputMessages);
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skips the MCP flag check, can we move this part below like 930?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will submit a PR most this

@github-actions
Copy link

Persistent review updated to latest commit 0953e13

pyek-bot added 12 commits March 24, 2026 15:25
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
…ng & context management, remove dead code Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
Signed-off-by: Pavan Yekbote <pybot@amazon.com>
@github-actions
Copy link

Persistent review updated to latest commit 5307e5a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

6 participants