Skip to main content
edited tags
Source Link
jqurious
  • 24.2k
  • 6
  • 24
  • 43
df = pl.DataFrame({ "timestamp": [ "2024-04-10 10:00:01", "2024-04-10 10:01:30", "2024-04-10 10:03:10", "2024-04-10 10:05:00", "2024-04-10 10:06:00", "2024-04-10 10:08:10", ], "value": [1, 2, 3, 4, 5, 6] }).with_columns( pl.col("timestamp").str.strptime(pl.Datetime) ) 
import polars as pl df = pl.DataFrame({ "timestamp": [ "2024-04-10 10:00:01", "2024-04-10 10:01:30", "2024-04-10 10:03:10", "2024-04-10 10:05:00", "2024-04-10 10:06:00", "2024-04-10 10:08:10", ], "value": [1, 2, 3, 4, 5, 6] }).with_columns( pl.col("timestamp").str.to_datetime() ) 
df = pl.DataFrame({ "timestamp": [ "2024-04-10 10:00:01", "2024-04-10 10:01:30", "2024-04-10 10:03:10", "2024-04-10 10:05:00", "2024-04-10 10:06:00", "2024-04-10 10:08:10", ], "value": [1, 2, 3, 4, 5, 6] }).with_columns( pl.col("timestamp").str.strptime(pl.Datetime) ) 
import polars as pl df = pl.DataFrame({ "timestamp": [ "2024-04-10 10:00:01", "2024-04-10 10:01:30", "2024-04-10 10:03:10", "2024-04-10 10:05:00", "2024-04-10 10:06:00", "2024-04-10 10:08:10", ], "value": [1, 2, 3, 4, 5, 6] }).with_columns( pl.col("timestamp").str.to_datetime() ) 
Source Link

Rolling Time Window Average in Polars with Left-Edge Expansion

The goal is to compute a rolling average over 5-minute windows using Polars, where:

  • the window ends at each timestamp t
  • the left edge of the window is not strict — if there are no values exactly at t - 5min, it should include an earlier point
  • input data contains columns: timestamp and value

In Pandas, I used to handle this kind of logic easily by overriding BaseIndexer along with some custom Numba-accelerated logic to compute rolling window boundaries manually. I'd love to have similar functionality in Polars, but I don't know how to achieve it yet.

Here is an example:

df = pl.DataFrame({ "timestamp": [ "2024-04-10 10:00:01", "2024-04-10 10:01:30", "2024-04-10 10:03:10", "2024-04-10 10:05:00", "2024-04-10 10:06:00", "2024-04-10 10:08:10", ], "value": [1, 2, 3, 4, 5, 6] }).with_columns( pl.col("timestamp").str.strptime(pl.Datetime) ) 
┌─────────────────────┬───────┐ │ timestamp ┆ value │ │ --- ┆ --- │ │ datetime[μs] ┆ i64 │ ╞═════════════════════╪═══════╡ │ 2024-04-10 10:00:01 ┆ 1 │ │ 2024-04-10 10:01:30 ┆ 2 │ │ 2024-04-10 10:03:10 ┆ 3 │ │ 2024-04-10 10:05:00 ┆ 4 │ │ 2024-04-10 10:06:00 ┆ 5 │ │ 2024-04-10 10:08:10 ┆ 6 │ └─────────────────────┴───────┘ 

After applying a rolling window, I want to get the following:

(df.rolling("timestamp", period="5m+????", closed="both") #?????? .agg( pl.col("value"), pl.mean("value").alias("rolling_value")) ) ) 
┌─────────────────────┬────────────────┬───────────────┐ │ timestamp ┆ value ┆ rolling_value │ │ --- ┆ --- ┆ --- │ │ datetime[μs] ┆ list[i64] ┆ f64 │ ╞═════════════════════╪════════════════╪═══════════════╡ │ 2024-04-10 10:00:01 ┆ [1] ┆ 1.0 │ │ 2024-04-10 10:01:30 ┆ [1, 2] ┆ 1.5 │ │ 2024-04-10 10:03:10 ┆ [1, 2, 3] ┆ 2.0 │ │ 2024-04-10 10:05:00 ┆ [1, 2, 3, 4] ┆ 2.5 │ │ 2024-04-10 10:06:00 ┆ [1, 2, 3, 4, 5]┆ 3.0 │. <- include the first value │ 2024-04-10 10:08:10 ┆ [3, 4, 5, 6] ┆ 4.5 │ └─────────────────────┴────────────────┴───────────────┘ 
created from staging ground