Skip to content

fix: correct array_append return type and mark as Compatible#3795

Merged
comphead merged 4 commits intoapache:mainfrom
andygrove:fix-array-append-compat
Mar 27, 2026
Merged

fix: correct array_append return type and mark as Compatible#3795
comphead merged 4 commits intoapache:mainfrom
andygrove:fix-array-append-compat

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Mar 25, 2026

Which issue does this PR close?

Closes #.

Rationale for this change

CometArrayAppend had two bugs:

  1. Runtime assertion failure for literal arrays. DataFusion's array_append always returns a list with nullable: true on the element field. But when the input array has non-null elements (e.g. array(1, 2, 3)), Spark's ArrayAppend.dataType returns ArrayType(IntegerType, containsNull = false). The serde code was passing expr.dataType as the "promised" return type to DataFusion, which caused a runtime assertion failure:

    Assertion failed: result_data_type == *expected_type: Function 'array_append' returned value of type 'List(Field { data_type: Int32, nullable: true })' while the following type was promised at planning time and expected: 'List(Field { data_type: Int32 })' 
  2. Incorrect Incompatible classification. CometArrayAppend was marked Incompatible(None) with no explanation, which disabled it by default. The CaseWhen(IsNotNull(arr), array_append(arr, elem), null) wrapper already handles the only genuine incompatibility (DataFusion's array_append does not preserve null top-level array rows on its own), so the expression matches Spark's behavior fully.

What changes are included in this PR?

  • arrays.scala: Use ArrayType(elementType, containsNull = true) as the promised return type for array_append, matching what DataFusion actually returns. Change getSupportLevel from Incompatible to Compatible.
  • array_append.sql: Remove spark.comet.expression.ArrayAppend.allowIncompatible=true (no longer needed now that it's Compatible). Add comment explaining why ArrayInsert.allowIncompatible=true is still needed on Spark 4.0 (where array_append is a RuntimeReplaceable that rewrites to array_insert(-1)).
  • expressions.md: Mark ArrayAppend as Spark-compatible.

How are these changes tested?

Existing SQL file test expressions/array/array_append.sql covers column inputs, literal inputs, NULL arrays, and NULL elements across both dictionary and non-dictionary Parquet. The literal-only query that previously triggered the assertion failure now passes.

@andygrove andygrove marked this pull request as draft March 25, 2026 21:01
…usion DataFusion's array_append always returns a list with nullable elements (nullable: true on the inner field), but Spark's ArrayAppend.dataType can have containsNull = false when the input array has non-null elements (e.g. array(1, 2, 3)). This caused a runtime assertion failure when the promised type did not match the actual DataFusion output type. Fixes the literal-only query: SELECT array_append(array(1, 2, 3), 4), ...
@andygrove andygrove marked this pull request as ready for review March 25, 2026 23:56
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

@comphead comphead merged commit 77bd8e0 into apache:main Mar 27, 2026
150 of 151 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants