Skip to content

feat: Support Spark expression hours#3804

Open
0lai0 wants to merge 2 commits intoapache:mainfrom
0lai0:support_spark_hours
Open

feat: Support Spark expression hours#3804
0lai0 wants to merge 2 commits intoapache:mainfrom
0lai0:support_spark_hours

Conversation

@0lai0
Copy link
Copy Markdown
Contributor

@0lai0 0lai0 commented Mar 27, 2026

Which issue does this PR close?

Closes #3125

Rationale for this change

Comet previously did not support the Spark hours expression (a V2 partition transform).
Queries using the hours function for partitioning would fall back to Spark's JVM execution instead of running natively on DataFusion. By adding native support for this expression, we allow more Spark workloads (especially those partitioned by hourly intervals) to benefit from Comet's native acceleration.

What changes are included in this PR?

This change adds end-to-end native support for the hours partition transform. Since Hours is a PartitionTransformExpression (and not a TimeZoneAwareExpression), the timezone is injected from the session configuration during the planning phase.
The native implementation uses Arrow's unary and try_unary kernels for efficient vectorized computation, and correctly handles pre-epoch (negative) timestamps using Euclidean floor division (div_euclid). It distinctly handles both TimestampType (applies timezone offsets) and TimestampNTZType (direct wall-clock computation).

  • expr.proto: Added HoursTransform message definition to pass the child expression and session timezone.
  • datetime.scala: Added CometHours serde handler to intercept the Spark Hours expression and read the timezone from SQLConf.
  • QueryPlanSerde.scala: Registered the CometHours handler in the temporal expressions map.
  • hours.rs: Added SparkHoursTransform UDF using efficient Arrow kernels.
  • temporal.rs & expression_registry.rs: Registered the native Builder for the new expression.

How are these changes tested?

Added comprehensive evaluation in both Rust and Scala:

  1. Rust Unit Tests : Added unit tests in hours.rs covering:
    • Positive/negative (pre-epoch) epoch handling
    • Epoch boundary (zero)
    • Timezone offset handling
    • Null propagation
    • Proper isolation of TimestampNTZType (ensuring it ignores timezone offsets)
    cargo test -p datafusion-comet-spark-expr -- datetime_funcs::hours
  2. Scala Integration Tests: Evaluated end-to-end execution in CometTemporalExpressionSuite.
    ./mvnw test -pl spark -Dsuites='org.apache.comet.CometTemporalExpressionSuite'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant