- Notifications
You must be signed in to change notification settings - Fork 15.3k
[LV] Count cost of middle block if TC <= VF. #168949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
fhahn merged 4 commits into llvm:main from fhahn:lv-consider-middle-block-cost-for-low-tripcount-loops Nov 24, 2025
Merged
Changes from all commits
Commits
Show all changes
4 commits Select commit Hold shift + click to select a range
243c8d7 [LV] Count cost of middle block if TC <= VF.
fhahn d5aa2ae Merge remote-tracking branch 'origin/main' into lv-consider-middle-bl…
fhahn 5a9560d !fixup update comments, thanks
fhahn 423b872 Merge branch 'main' into lv-consider-middle-block-cost-for-low-tripco…
fhahn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -9257,6 +9257,7 @@ static InstructionCost calculateEarlyExitCost(VPCostContext &CostCtx, | |
| /// 2. In the case of loops with uncountable early exits, we may have to do | ||
| /// extra work when exiting the loop early, such as calculating the final | ||
| /// exit values of variables used outside the loop. | ||
| /// 3. The middle block, if expected TC <= VF.Width. | ||
| static bool isOutsideLoopWorkProfitable(GeneratedRTChecks &Checks, | ||
| VectorizationFactor &VF, Loop *L, | ||
| PredicatedScalarEvolution &PSE, | ||
| | @@ -9271,6 +9272,14 @@ static bool isOutsideLoopWorkProfitable(GeneratedRTChecks &Checks, | |
| // one exists. | ||
| TotalCost += calculateEarlyExitCost(CostCtx, Plan, VF.Width); | ||
| | ||
| // If the expected trip count is less than the VF, the vector loop will only | ||
| // execute a single iteration. Then the middle block is executed the same | ||
| // number of times as the vector region. | ||
| // TODO: Extend logic to always account for the cost of the middle block. | ||
| auto ExpectedTC = getSmallBestKnownTC(PSE, L); | ||
| if (ExpectedTC && ElementCount::isKnownLE(*ExpectedTC, VF.Width)) | ||
| TotalCost += Plan.getMiddleBlock()->cost(VF.Width, CostCtx); | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you update the comment above the function Also the comment further down: needs updating too I think. Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both should be updated, thanks! | ||
| | ||
| // When interleaving only scalar and vector cost will be equal, which in turn | ||
| // would lead to a divide by 0. Fall back to hard threshold. | ||
| if (VF.Width.isScalar()) { | ||
| | @@ -9301,9 +9310,11 @@ static bool isOutsideLoopWorkProfitable(GeneratedRTChecks &Checks, | |
| // The total cost of the vector loop is | ||
| // RtC + VecC * (TC / VF) + EpiC | ||
| // where | ||
| // * RtC is the cost of the generated runtime checks plus the cost of | ||
| // performing any additional work in the vector.early.exit block for loops | ||
| // with uncountable early exits. | ||
| // * RtC is the sum of the costs cost of | ||
| // - the generated runtime checks | ||
| // - performing any additional work in the vector.early.exit block for | ||
| // loops with uncountable early exits. | ||
| // - the middle block, if ExpectedTC <= VF.Width. | ||
| // * VecC is the cost of a single vector iteration. | ||
| // * TC is the actual trip count of the loop | ||
| // * VF is the vectorization factor | ||
| | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code now seems inconsistent with the sum of
calculateEarlyExitCostandChecks.getCost(), which is always adding on the cost of work in the early exits regardless of trip count. Also, the code below already seems to have proper logic to handle this more complelely, i.e.I don't know why we shouldn't just always add on the cost to be consistent with the existing code that expects it? It feels really odd to treat the middle block differently to the runtime check block and early exit block code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is intentional in the initial patch (from the description:
Note that isOutsideLoopWorkProfitable already scales the cost of blocks outside the vector region, but the patch restricts accounting for the middle block to cases where VF <= ExpectedTC, to initially catch some worst cases and avoid regressions.)The TODO is to enable it unconditionally, but that potentially has much higher impact, which is why I'd prefer to do this as a separate commit, to make it slightly easier to narrow down regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK fair enough. If there is going to be follow-on work to do this unconditionally I'm happy with that!