Skip to content

Fix correctness issue in predict_linear with step invariant#527

Open
harry671003 wants to merge 3 commits intothanos-io:mainfrom
harry671003:predict_linear_failure
Open

Fix correctness issue in predict_linear with step invariant#527
harry671003 wants to merge 3 commits intothanos-io:mainfrom
harry671003:predict_linear_failure

Conversation

@harry671003
Copy link
Copy Markdown
Contributor

@harry671003 harry671003 commented Mar 17, 2025

Issue

Our continuous correctness tests found an issue with predict_linear with step invariant matrix selector.

Eg: predict_linear({__name__="http_requests_total", pod!~"nginx-1"}[5m] @ start(), -0.37690610678629094)

This PR addresses the problem by allowing the matrixScanner to act in an invariant way similar to Prometheus engine.
See: https://github.com/prometheus/prometheus/blob/2a5ed8b8a55fecaa79236ef4adb9f0b82b34587c/promql/engine.go#L1788

Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
@harry671003 harry671003 changed the title Fix failure in predict_linear Fix correctness issue in predict_linear with step invariant Mar 18, 2025
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
@harry671003 harry671003 force-pushed the predict_linear_failure branch from 2ec4376 to a43f047 Compare March 19, 2025 00:15
@harry671003 harry671003 marked this pull request as ready for review March 19, 2025 00:34
query: `predict_linear({__name__="http_requests_total",pod!~"nginx-1"}[5m] @ start(), -0.37690610678629094)`,
end: time.Unix(600, 0),
start: time.Unix(300, 0),
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an issue for predict linear only or it can impact any function that takes matrix selector with step invariant?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will only impact functions that are at modifier unsafe that takes matrix selector as arg.

https://github.com/prometheus/prometheus/blob/308c8c48c15c74a929c430447df3d3c3a3d4001f/promql/functions.go#L1914

So far only applicable to predict_linear.

@MichaHoffmann
Copy link
Copy Markdown
Contributor

Can we just wrap the matrix function into step invariant operator? https://github.com/thanos-io/promql-engine/blob/main/execution/step_invariant/step_invariant.go#L47 ?

@MichaHoffmann
Copy link
Copy Markdown
Contributor

We could just push step inviarnace up in a preprocessor like here: https://github.com/thanos-io/promql-engine/blob/main/logicalplan/plan.go#L335

@harry671003
Copy link
Copy Markdown
Contributor Author

harry671003 commented Mar 19, 2025

We could just push step inviarnace up in a preprocessor like here

Can we just wrap the matrix function into step invariant operator?

The PromQL parser parses the query into:
Screenshot 2025-03-19 at 10 51 22 AM

After calling promql.PreprocessExpr() in plan.go this becomes:
Screenshot 2025-03-19 at 10 52 11 AM

In Prometheus, predict_linear is marked as at modifier unsafe. So it cannot be wrapped with StepInvariance:
https://github.com/prometheus/prometheus/blob/308c8c48c15c74a929c430447df3d3c3a3d4001f/promql/functions.go#L1914

@MichaHoffmann
Copy link
Copy Markdown
Contributor

I see, I wonder if we maybe should jsut fall back for now for correctness sake - this seems to be fairly niche usecase that we must weigh against added complexity

@harry671003
Copy link
Copy Markdown
Contributor Author

I see, I wonder if we maybe should jsut fall back for now for correctness sake - this seems to be fairly niche usecase that we must weigh against added complexity

I'm okay with doing the fallback. There is one concern, we'll have to exclude the functions.test file from acceptance tests.
If that is okay, I can create a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants