Rolling Time Window Average in Polars with Left-Edge Expansion

Question

The goal is to compute a rolling average over 5-minute windows using Polars, where:

the window ends at each timestamp t
the left edge of the window is not strict — if there are no values exactly at t - 5min, it should include an earlier point
input data contains columns: timestamp and value

In Pandas, I used to handle this kind of logic easily by overriding BaseIndexer along with some custom Numba-accelerated logic to compute rolling window boundaries manually. I'd love to have similar functionality in Polars, but I don't know how to achieve it yet.

Here is an example:

import polars as pl df = pl.DataFrame({ "timestamp": [ "2024-04-10 10:00:01", "2024-04-10 10:01:30", "2024-04-10 10:03:10", "2024-04-10 10:05:00", "2024-04-10 10:06:00", "2024-04-10 10:08:10", ], "value": [1, 2, 3, 4, 5, 6] }).with_columns( pl.col("timestamp").str.to_datetime() )

┌─────────────────────┬───────┐ │ timestamp ┆ value │ │ --- ┆ --- │ │ datetime[μs] ┆ i64 │ ╞═════════════════════╪═══════╡ │ 2024-04-10 10:00:01 ┆ 1 │ │ 2024-04-10 10:01:30 ┆ 2 │ │ 2024-04-10 10:03:10 ┆ 3 │ │ 2024-04-10 10:05:00 ┆ 4 │ │ 2024-04-10 10:06:00 ┆ 5 │ │ 2024-04-10 10:08:10 ┆ 6 │ └─────────────────────┴───────┘

After applying a rolling window, I want to get the following:

(df.rolling("timestamp", period="5m+????", closed="both") #?????? .agg( pl.col("value"), pl.mean("value").alias("rolling_value")) ) )

┌─────────────────────┬────────────────┬───────────────┐ │ timestamp ┆ value ┆ rolling_value │ │ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ list[i64] ┆ f64 │ ╞═════════════════════╪════════════════╪═══════════════╡ │ 2024-04-10 10:00:01 ┆ [1] ┆ 1.0 │ │ 2024-04-10 10:01:30 ┆ [1, 2] ┆ 1.5 │ │ 2024-04-10 10:03:10 ┆ [1, 2, 3] ┆ 2.0 │ │ 2024-04-10 10:05:00 ┆ [1, 2, 3, 4] ┆ 2.5 │ │ 2024-04-10 10:06:00 ┆ [1, 2, 3, 4, 5]┆ 3.0 │. <- include the first value │ 2024-04-10 10:08:10 ┆ [3, 4, 5, 6] ┆ 4.5 │ └─────────────────────┴────────────────┴───────────────┘

orlp · Accepted Answer · 2025-04-17 07:22:58Z

Do a rolling with a left-exclusive window and then a backwards join_asof to add the extra value you want outside the window back in:

(df.rolling("timestamp", period="5m", closed="right") .agg("value") .join_asof(df, left_on="timestamp", right_on=pl.col.timestamp.dt.offset_by("5m")) .select( pl.col.timestamp, pl.when(pl.col.value_right.is_null()) .then(pl.col.value) .otherwise(pl.concat_list(pl.col.value_right, pl.col.value)) ) .with_columns(rolling_value=pl.col.value.list.mean()) )

Thank you! But I have 20 million rows of data, and it crashes. A regular .rolling(...).agg(pl.mean(...)) works in less than a second. I understand that solution has too much overhead because of the lists and the mean is not calculated incrementally. Anyway, it seems the only real solution is to have a way in Polars to define custom rolling windows maybe like this: .rolling("timestamp", period="5m+1e") or .rolling(window_size=pl.col("backward_offset"), ...)
@MikeChurch You can adopt the solution to calculate a sum + length and use that to compute the mean rather than use .list.mean().

poisoned_monkey · Accepted Answer · 2025-04-16 20:54:11Z

You can use cross join

import polars as pl from datetime import timedelta df = pl.DataFrame({ "timestamp": [ "2024-04-10 10:00:01", "2024-04-10 10:01:30", "2024-04-10 10:03:10", "2024-04-10 10:05:00", "2024-04-10 10:06:00", "2024-04-10 10:08:10", ], "value": [1, 2, 3, 4, 5, 6] }).with_columns( pl.col("timestamp").str.strptime(pl.Datetime, "%Y-%m-%d %H:%M:%S") ).sort("timestamp") df_cross = df.join(df, how="cross") windowed = df_cross.filter( (pl.col("timestamp_right") <= pl.col("timestamp")) & (pl.col("timestamp_right") >= pl.col("timestamp") - timedelta(minutes=5)) ) result = windowed.group_by("timestamp").agg([ pl.col("value_right").alias("value_window"), pl.col("value_right").mean().alias("rolling_value") ]) print(result)

Output:

shape: (6, 3) ┌─────────────────────┬──────────────┬───────────────┐ │ timestamp ┆ value_window ┆ rolling_value │ │ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ list[i64] ┆ f64 │ ╞═════════════════════╪══════════════╪═══════════════╡ │ 2024-04-10 10:01:30 ┆ [1, 2] ┆ 1.5 │ │ 2024-04-10 10:03:10 ┆ [1, 2, 3] ┆ 2.0 │ │ 2024-04-10 10:08:10 ┆ [3, 4, … 6] ┆ 4.5 │ │ 2024-04-10 10:00:01 ┆ [1] ┆ 1.0 │ │ 2024-04-10 10:06:00 ┆ [2, 3, … 5] ┆ 3.5 │ │ 2024-04-10 10:05:00 ┆ [1, 2, … 4] ┆ 2.5 │ └─────────────────────┴──────────────┴───────────────┘

Unfortunately, such a solution has disastrous performance for large amounts of data.

Collectives™ on Stack Overflow

Rolling Time Window Average in Polars with Left-Edge Expansion

2 Answers 2

2 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Related