Skip to content

Conversation

@usx95
Copy link
Contributor

@usx95 usx95 commented Nov 1, 2025

Optimize the FactManager and DataflowAnalysis classes by using vector-based storage with ID-based lookups instead of maps.

  • Added a FactID type using the utils::ID template to uniquely identify facts
  • Modified Fact class to store and manage IDs
  • Changed FactManager to use vector-based storage indexed by block ID instead of a map
  • Updated DataflowAnalysis to use vector-based storage for states instead of maps
  • Modified lookups to use ID-based indexing for better performance

Improves compile time hit on long-tail targets like tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/TargetBuiltins/RISCV.cpp.o​ from 21% to 3.2%

Copy link
Contributor Author

usx95 commented Nov 1, 2025

@usx95 usx95 changed the title Avoid using DenseMap for CFGBlock and program points [LifetimeSafety] Optimize fact storage with IDs and vector-based lookup Nov 1, 2025
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch 4 times, most recently from f48ebcf to 7e6051b Compare November 1, 2025 04:21
@usx95 usx95 marked this pull request as ready for review November 1, 2025 04:24
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:analysis clang:temporal-safety Issue/FR relating to the lifetime analysis in Clang (-Wdangling, -Wreturn-local-addr) labels Nov 1, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 1, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang-temporal-safety

Author: Utkarsh Saxena (usx95)

Changes

Optimize the FactManager and DataflowAnalysis classes by using vector-based storage with ID-based lookups instead of maps.

  • Added a FactID type using the utils::ID template to uniquely identify facts
  • Modified Fact class to store and manage IDs
  • Changed FactManager to use vector-based storage indexed by block ID instead of a map
  • Updated DataflowAnalysis to use vector-based storage for states instead of maps
  • Modified lookups to use ID-based indexing for better performance

Improves compile time hit on long-tail targets like tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/TargetBuiltins/RISCV.cpp.o​ from 21% to 3.2%


Full diff: https://github.com/llvm/llvm-project/pull/165963.diff

4 Files Affected:

  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h (+23-8)
  • (modified) clang/lib/Analysis/LifetimeSafety/Dataflow.h (+9-5)
  • (modified) clang/lib/Analysis/LifetimeSafety/Facts.cpp (+5-8)
  • (modified) clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp (+1)
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h index 063cb5c2d42ab..b9cad5340c940 100644 --- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h +++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h @@ -16,6 +16,7 @@ #include "clang/Analysis/Analyses/LifetimeSafety/Loans.h" #include "clang/Analysis/Analyses/LifetimeSafety/Origins.h" +#include "clang/Analysis/Analyses/LifetimeSafety/Utils.h" #include "clang/Analysis/AnalysisDeclContext.h" #include "clang/Analysis/CFG.h" #include "llvm/ADT/SmallVector.h" @@ -23,6 +24,9 @@ #include <cstdint> namespace clang::lifetimes::internal { + +using FactID = utils::ID<struct FactTag>; + /// An abstract base class for a single, atomic lifetime-relevant event. class Fact { @@ -48,6 +52,7 @@ class Fact { private: Kind K; + FactID ID; protected: Fact(Kind K) : K(K) {} @@ -56,6 +61,9 @@ class Fact { virtual ~Fact() = default; Kind getKind() const { return K; } + void setID(FactID ID) { this->ID = ID; } + FactID getID() const { return ID; } + template <typename T> const T *getAs() const { if (T::classof(this)) return static_cast<const T *>(this); @@ -183,22 +191,26 @@ class TestPointFact : public Fact { class FactManager { public: + void init(const CFG &Cfg) { + assert(BlockToFacts.empty() && "FactManager already initialized"); + BlockToFacts.resize(Cfg.getNumBlockIDs()); + } + llvm::ArrayRef<const Fact *> getFacts(const CFGBlock *B) const { - auto It = BlockToFactsMap.find(B); - if (It != BlockToFactsMap.end()) - return It->second; - return {}; + return BlockToFacts[B->getBlockID()]; } void addBlockFacts(const CFGBlock *B, llvm::ArrayRef<Fact *> NewFacts) { if (!NewFacts.empty()) - BlockToFactsMap[B].assign(NewFacts.begin(), NewFacts.end()); + BlockToFacts[B->getBlockID()].assign(NewFacts.begin(), NewFacts.end()); } template <typename FactType, typename... Args> FactType *createFact(Args &&...args) { void *Mem = FactAllocator.Allocate<FactType>(); - return new (Mem) FactType(std::forward<Args>(args)...); + FactType *Res = new (Mem) FactType(std::forward<Args>(args)...); + Res->setID(NextFactID++); + return Res; } void dump(const CFG &Cfg, AnalysisDeclContext &AC) const; @@ -214,16 +226,19 @@ class FactManager { /// \note This is intended for testing only. llvm::StringMap<ProgramPoint> getTestPoints() const; + unsigned getNumFacts() const { return NextFactID.Value; } + LoanManager &getLoanMgr() { return LoanMgr; } const LoanManager &getLoanMgr() const { return LoanMgr; } OriginManager &getOriginMgr() { return OriginMgr; } const OriginManager &getOriginMgr() const { return OriginMgr; } private: + FactID NextFactID{0}; LoanManager LoanMgr; OriginManager OriginMgr; - llvm::DenseMap<const clang::CFGBlock *, llvm::SmallVector<const Fact *>> - BlockToFactsMap; + /// Facts for each CFG block, indexed by block ID. + llvm::SmallVector<llvm::SmallVector<const Fact *>> BlockToFacts; llvm::BumpPtrAllocator FactAllocator; }; } // namespace clang::lifetimes::internal diff --git a/clang/lib/Analysis/LifetimeSafety/Dataflow.h b/clang/lib/Analysis/LifetimeSafety/Dataflow.h index 2f7bcb6e5dc81..de821bb17eb6b 100644 --- a/clang/lib/Analysis/LifetimeSafety/Dataflow.h +++ b/clang/lib/Analysis/LifetimeSafety/Dataflow.h @@ -67,10 +67,10 @@ class DataflowAnalysis { llvm::DenseMap<const CFGBlock *, Lattice> InStates; /// The dataflow state after a basic block is processed. llvm::DenseMap<const CFGBlock *, Lattice> OutStates; - /// The dataflow state at a Program Point. + /// Dataflow state at each program point, indexed by Fact ID. /// In a forward analysis, this is the state after the Fact at that point has /// been applied, while in a backward analysis, it is the state before. - llvm::DenseMap<ProgramPoint, Lattice> PerPointStates; + llvm::SmallVector<Lattice> PointToState; static constexpr bool isForward() { return Dir == Direction::Forward; } @@ -86,6 +86,8 @@ class DataflowAnalysis { Derived &D = static_cast<Derived &>(*this); llvm::TimeTraceScope Time(D.getAnalysisName()); + PointToState.resize(FactMgr.getNumFacts()); + using Worklist = std::conditional_t<Dir == Direction::Forward, ForwardDataflowWorklist, BackwardDataflowWorklist>; @@ -116,7 +118,9 @@ class DataflowAnalysis { } protected: - Lattice getState(ProgramPoint P) const { return PerPointStates.lookup(P); } + Lattice getState(ProgramPoint P) const { + return PointToState[P->getID().Value]; + } std::optional<Lattice> getInState(const CFGBlock *B) const { auto It = InStates.find(B); @@ -144,12 +148,12 @@ class DataflowAnalysis { if constexpr (isForward()) { for (const Fact *F : Facts) { State = transferFact(State, F); - PerPointStates[F] = State; + PointToState[F->getID().Value] = State; } } else { for (const Fact *F : llvm::reverse(Facts)) { // In backward analysis, capture the state before applying the fact. - PerPointStates[F] = State; + PointToState[F->getID().Value] = State; State = transferFact(State, F); } } diff --git a/clang/lib/Analysis/LifetimeSafety/Facts.cpp b/clang/lib/Analysis/LifetimeSafety/Facts.cpp index ecde390cd6ca3..4a4172fe55bf3 100644 --- a/clang/lib/Analysis/LifetimeSafety/Facts.cpp +++ b/clang/lib/Analysis/LifetimeSafety/Facts.cpp @@ -64,8 +64,8 @@ void TestPointFact::dump(llvm::raw_ostream &OS, const LoanManager &, llvm::StringMap<ProgramPoint> FactManager::getTestPoints() const { llvm::StringMap<ProgramPoint> AnnotationToPointMap; - for (const CFGBlock *Block : BlockToFactsMap.keys()) { - for (const Fact *F : getFacts(Block)) { + for (const auto &BlockFacts : BlockToFacts) { + for (const Fact *F : BlockFacts) { if (const auto *TPF = F->getAs<TestPointFact>()) { StringRef PointName = TPF->getAnnotation(); assert(AnnotationToPointMap.find(PointName) == @@ -88,12 +88,9 @@ void FactManager::dump(const CFG &Cfg, AnalysisDeclContext &AC) const { // Print blocks in the order as they appear in code for a stable ordering. for (const CFGBlock *B : *AC.getAnalysis<PostOrderCFGView>()) { llvm::dbgs() << " Block B" << B->getBlockID() << ":\n"; - auto It = BlockToFactsMap.find(B); - if (It != BlockToFactsMap.end()) { - for (const Fact *F : It->second) { - llvm::dbgs() << " "; - F->dump(llvm::dbgs(), LoanMgr, OriginMgr); - } + for (const Fact *F : getFacts(B)) { + llvm::dbgs() << " "; + F->dump(llvm::dbgs(), LoanMgr, OriginMgr); } llvm::dbgs() << " End of Block\n"; } diff --git a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp index 00c7ed90503e7..a51ba4280f284 100644 --- a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp +++ b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp @@ -41,6 +41,7 @@ void LifetimeSafetyAnalysis::run() { const CFG &Cfg = *AC.getCFG(); DEBUG_WITH_TYPE("PrintCFG", Cfg.dump(AC.getASTContext().getLangOpts(), /*ShowColors=*/true)); + FactMgr.init(Cfg); FactsGenerator FactGen(FactMgr, AC); FactGen.run(); 
@usx95 usx95 requested review from Xazax-hun and ymand November 1, 2025 04:26
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch from 7e6051b to 66119a7 Compare November 4, 2025 15:53
@usx95 usx95 force-pushed the users/usx95/10-30-persistent-origin-optimisation branch from 33c0c70 to 0bbc422 Compare November 4, 2025 15:53
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch from 66119a7 to e177f8d Compare November 7, 2025 03:02
@usx95 usx95 force-pushed the users/usx95/10-30-persistent-origin-optimisation branch from 0bbc422 to 667242a Compare November 7, 2025 03:02
Base automatically changed from users/usx95/10-30-persistent-origin-optimisation to main November 7, 2025 03:02
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch 3 times, most recently from 9c32808 to 5029ce8 Compare November 7, 2025 03:41
Copy link
Contributor Author

usx95 commented Nov 7, 2025

Merge activity

  • Nov 7, 3:41 AM UTC: Graphite rebased this pull request as part of a merge.
  • Nov 7, 3:57 AM UTC: Graphite rebased this pull request as part of a merge.
  • Nov 7, 4:00 AM UTC: Graphite rebased this pull request as part of a merge.
  • Nov 7, 4:05 AM UTC: Graphite rebased this pull request as part of a merge.
  • Nov 7, 4:27 AM UTC: @usx95 merged this pull request with Graphite.
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch 2 times, most recently from 2dd60af to 2f4af6f Compare November 7, 2025 03:59
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch from 2f4af6f to 0dafeb9 Compare November 7, 2025 04:04
@usx95 usx95 merged commit c9b4169 into main Nov 7, 2025
7 of 9 checks passed
@usx95 usx95 deleted the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch November 7, 2025 04:27
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 7, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-darwin running on doug-worker-3 while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/23/builds/15335

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure) ******************** TEST 'Clang :: Headers/stdckdint.c' FAILED ******************** Exit Code: 127 Command Output (stdout): -- # RUN: at line 2 /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/clang -cc1 -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/lib/clang/22/include -nostdsysteminc -triple=x86_64 -emit-llvm -verify -std=c99 /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/stdckdint.c -o - | /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/FileCheck /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/stdckdint.c # executed command: /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/clang -cc1 -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/lib/clang/22/include -nostdsysteminc -triple=x86_64 -emit-llvm -verify -std=c99 /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/stdckdint.c -o - # .---command stderr------------ # | Could not create process (/Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/clang) due to [Errno 24] Too many open files # `----------------------------- # error: command failed with exit status: 127 -- ******************** 
vinay-deshmukh pushed a commit to vinay-deshmukh/llvm-project that referenced this pull request Nov 8, 2025
…up (llvm#165963) Optimize the FactManager and DataflowAnalysis classes by using vector-based storage with ID-based lookups instead of maps. - Added a `FactID` type using the `utils::ID` template to uniquely identify facts - Modified `Fact` class to store and manage IDs - Changed `FactManager` to use vector-based storage indexed by block ID instead of a map - Updated `DataflowAnalysis` to use vector-based storage for states instead of maps - Modified lookups to use ID-based indexing for better performance Improves compile time hit on long-tail targets like `tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/TargetBuiltins/RISCV.cpp.o`​ from [21%](http://llvm-compile-time-tracker.com/compare_clang.php?from=6e25a04027ca786b7919657c7df330a33985ceea&to=20b42efa277c8b1915db757863e1fc26531cfd53&stat=instructions%3Au&sortBy=absolute-difference) to [3.2%](http://llvm-compile-time-tracker.com/compare_clang.php?from=6e25a04027ca786b7919657c7df330a33985ceea&to=d2d1cd1109c3a85344457bfff6f092ae7b96b211&stat=instructions%3Au&sortBy=absolute-difference)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:analysis clang:temporal-safety Issue/FR relating to the lifetime analysis in Clang (-Wdangling, -Wreturn-local-addr) clang Clang issues not falling into any other category

6 participants