Skip to content

Conversation

@hahnjo
Copy link
Member

@hahnjo hahnjo commented Mar 26, 2025

  • Hash inner template arguments: The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.
  • Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.
  • Load only needed partial specializations: Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.
  • Remove bail-out logic in TemplateArgumentHasher: While it is correct to assign a single fixed hash to all template
    arguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Mar 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 26, 2025

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Jonas Hahnfeld (hahnjo)

Changes
  • Hash inner template arguments: The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.
  • Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.
  • Load only needed partial specializations: Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.
  • Remove bail-out logic in TemplateArgumentHasher: While it is correct to assign a single fixed hash to all template
    arguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.

Full diff: https://github.com/llvm/llvm-project/pull/133057.diff

3 Files Affected:

  • (modified) clang/lib/AST/DeclTemplate.cpp (-6)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+2-8)
  • (modified) clang/lib/Serialization/TemplateArgumentHasher.cpp (+18-33)
diff --git a/clang/lib/AST/DeclTemplate.cpp b/clang/lib/AST/DeclTemplate.cpp index c0f5be51db5f3..8560c3928aa84 100644 --- a/clang/lib/AST/DeclTemplate.cpp +++ b/clang/lib/AST/DeclTemplate.cpp @@ -367,12 +367,6 @@ bool RedeclarableTemplateDecl::loadLazySpecializationsImpl( if (!ExternalSource) return false; - // If TPL is not null, it implies that we're loading specializations for - // partial templates. We need to load all specializations in such cases. - if (TPL) - return ExternalSource->LoadExternalSpecializations(this->getCanonicalDecl(), - /*OnlyPartial=*/false); - return ExternalSource->LoadExternalSpecializations(this->getCanonicalDecl(), Args); } diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp index 0cd2cedb48dd9..eb0496c97eb3b 100644 --- a/clang/lib/Serialization/ASTReader.cpp +++ b/clang/lib/Serialization/ASTReader.cpp @@ -7891,14 +7891,8 @@ void ASTReader::CompleteRedeclChain(const Decl *D) { } } - if (Template) { - // For partitial specialization, load all the specializations for safety. - if (isa<ClassTemplatePartialSpecializationDecl, - VarTemplatePartialSpecializationDecl>(D)) - Template->loadLazySpecializationsImpl(); - else - Template->loadLazySpecializationsImpl(Args); - } + if (Template) + Template->loadLazySpecializationsImpl(Args); } CXXCtorInitializer ** diff --git a/clang/lib/Serialization/TemplateArgumentHasher.cpp b/clang/lib/Serialization/TemplateArgumentHasher.cpp index 3c7177b83ba52..5fb363c4ab148 100644 --- a/clang/lib/Serialization/TemplateArgumentHasher.cpp +++ b/clang/lib/Serialization/TemplateArgumentHasher.cpp @@ -21,17 +21,6 @@ using namespace clang; namespace { class TemplateArgumentHasher { - // If we bail out during the process of calculating hash values for - // template arguments for any reason. We're allowed to do it since - // TemplateArgumentHasher are only required to give the same hash value - // for the same template arguments, but not required to give different - // hash value for different template arguments. - // - // So in the worst case, it is still a valid implementation to give all - // inputs the same BailedOutValue as output. - bool BailedOut = false; - static constexpr unsigned BailedOutValue = 0x12345678; - llvm::FoldingSetNodeID ID; public: @@ -41,14 +30,7 @@ class TemplateArgumentHasher { void AddInteger(unsigned V) { ID.AddInteger(V); } - unsigned getValue() { - if (BailedOut) - return BailedOutValue; - - return ID.computeStableHash(); - } - - void setBailedOut() { BailedOut = true; } + unsigned getValue() { return ID.computeStableHash(); } void AddType(const Type *T); void AddQualType(QualType T); @@ -92,8 +74,7 @@ void TemplateArgumentHasher::AddTemplateArgument(TemplateArgument TA) { case TemplateArgument::Expression: // If we meet expression in template argument, it implies // that the template is still dependent. It is meaningless - // to get a stable hash for the template. Bail out simply. - BailedOut = true; + // to get a stable hash for the template. break; case TemplateArgument::Pack: AddInteger(TA.pack_size()); @@ -110,10 +91,9 @@ void TemplateArgumentHasher::AddStructuralValue(const APValue &Value) { // 'APValue::Profile' uses pointer values to make hash for LValue and // MemberPointer, but they differ from one compiler invocation to another. - // It may be difficult to handle such cases. Bail out simply. + // It may be difficult to handle such cases. if (Kind == APValue::LValue || Kind == APValue::MemberPointer) { - BailedOut = true; return; } @@ -135,14 +115,11 @@ void TemplateArgumentHasher::AddTemplateName(TemplateName Name) { case TemplateName::DependentTemplate: case TemplateName::SubstTemplateTemplateParm: case TemplateName::SubstTemplateTemplateParmPack: - BailedOut = true; break; case TemplateName::UsingTemplate: { UsingShadowDecl *USD = Name.getAsUsingShadowDecl(); if (USD) AddDecl(USD->getTargetDecl()); - else - BailedOut = true; break; } case TemplateName::DeducedTemplate: @@ -167,7 +144,6 @@ void TemplateArgumentHasher::AddDeclarationName(DeclarationName Name) { case DeclarationName::ObjCZeroArgSelector: case DeclarationName::ObjCOneArgSelector: case DeclarationName::ObjCMultiArgSelector: - BailedOut = true; break; case DeclarationName::CXXConstructorName: case DeclarationName::CXXDestructorName: @@ -194,16 +170,29 @@ void TemplateArgumentHasher::AddDeclarationName(DeclarationName Name) { void TemplateArgumentHasher::AddDecl(const Decl *D) { const NamedDecl *ND = dyn_cast<NamedDecl>(D); if (!ND) { - BailedOut = true; return; } AddDeclarationName(ND->getDeclName()); + + // If this was a specialization we should take into account its template + // arguments. This helps to reduce collisions coming when visiting template + // specialization types (eg. when processing type template arguments). + ArrayRef<TemplateArgument> Args; + if (auto *CTSD = dyn_cast<ClassTemplateSpecializationDecl>(D)) + Args = CTSD->getTemplateArgs().asArray(); + else if (auto *VTSD = dyn_cast<VarTemplateSpecializationDecl>(D)) + Args = VTSD->getTemplateArgs().asArray(); + else if (auto *FD = dyn_cast<FunctionDecl>(D)) + if (FD->getTemplateSpecializationArgs()) + Args = FD->getTemplateSpecializationArgs()->asArray(); + + for (auto &TA : Args) + AddTemplateArgument(TA); } void TemplateArgumentHasher::AddQualType(QualType T) { if (T.isNull()) { - BailedOut = true; return; } SplitQualType split = T.split(); @@ -213,7 +202,6 @@ void TemplateArgumentHasher::AddQualType(QualType T) { // Process a Type pointer. Add* methods call back into TemplateArgumentHasher // while Visit* methods process the relevant parts of the Type. -// Any unhandled type will make the hash computation bail out. class TypeVisitorHelper : public TypeVisitor<TypeVisitorHelper> { typedef TypeVisitor<TypeVisitorHelper> Inherited; llvm::FoldingSetNodeID &ID; @@ -245,9 +233,6 @@ class TypeVisitorHelper : public TypeVisitor<TypeVisitorHelper> { void Visit(const Type *T) { Inherited::Visit(T); } - // Unhandled types. Bail out simply. - void VisitType(const Type *T) { Hash.setBailedOut(); } - void VisitAdjustedType(const AdjustedType *T) { AddQualType(T->getOriginalType()); } 
@ChuanqiXu9
Copy link
Member

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

@hahnjo
Copy link
Member Author

hahnjo commented Mar 26, 2025

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

@vgvassilev
Copy link
Contributor

@ilya-biryukov, would you mind giving this PR a test on your infrastructure and if it works maybe share some performance results?

@hahnjo
Copy link
Member Author

hahnjo commented Mar 26, 2025

Performance measurements with LLVM

I tested these patches for building LLVM itself with modules (LLVM_ENABLE_MODULES=ON). To work around #130795, I apply #131354 before building Clang. In terms of overall performance for the entire build, I'm not able to measure a difference in memory consumption because that is dominated by the linker. The run time performance is very noisy, so it's hard to make accurate statements but it looks unaffected as well.

I did some measurements for individual files, chosen by searching for large object files and excluding generated files. For each version, I first build LLVM completely to populate the module.cache and then delete and rebuild only one object file. Run time performance is not hugely affected, it seems to get slightly faster with this PR.

Maximum resident set size (kbytes) from /usr/bin/time -v:

object file before* main this PR
lib/Analysis/CMakeFiles/LLVMAnalysis.dir/ScalarEvolution.cpp.o 543100 515184 445784
lib/Passes/CMakeFiles/LLVMPasses.dir/PassBuilder.cpp.o 923036 884160 805960
lib/Transforms/IPO/CMakeFiles/LLVMipo.dir/AttributorAttributes.cpp.o 639184 600076 522512
lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/SLPVectorizer.cpp.o 876580 857404 776572

before*: reverting fb2c9d9, c5e4afe, 30ea0f0, 20e9049 on current main

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 26, 2025
hahnjo added a commit to devajithvs/root that referenced this pull request Mar 26, 2025
@ilya-biryukov
Copy link
Contributor

@ilya-biryukov, would you mind giving this PR a test on your infrastructure and if it works maybe share some performance results?

Sure, let me try kicking it off. Note that our infrastructure is much better at detecting the compilations timing out than providing proper benchmarking at scale (there are a few targeted benchmarks too, though).
That means we're good and detecting big regressions, but won't be able to provide very reliable performance measurements.

I'll try to give you what we have, though.

@ChuanqiXu9
Copy link
Member

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

@ChuanqiXu9
Copy link
Member

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 28, 2025
@hahnjo
Copy link
Member Author

hahnjo commented Mar 28, 2025

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

I'm ok with pushing the commits one-by-one after the PR is reviewed, just let me know.

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

Sure, but in my understanding, that's not needed on the ASTReader side but is taken care of by Sema (?). For the following example:

//--- partial.cppm export module partial; export template <typename S, typename T, typename U> struct Partial { static constexpr int Value() { return 0; } }; export template <typename T, typename U> struct Partial<int, T, U> { static constexpr int Value() { return 1; } }; //--- partial.cpp import partial; static_assert(Partial<int, double, double>::Value() == 1);

(I assume that's what you have in mind?) I see two calls to ASTReader::CompleteRedeclChain (with this PR applied): The first asks for the full instantiation Partial<int, double, double> and regardless of what we load, the answer to the query is that it's not defined yet. The second asks for the partial specialization Partial<int, T, U> and then instantiation proceeds to do the right thing.

@vgvassilev
Copy link
Contributor

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

@ChuanqiXu9
Copy link
Member

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

I'm ok with pushing the commits one-by-one after the PR is reviewed, just let me know.

Complete only needed partial specializations: It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.

This comes from the logic: if we have a partial template specialization A<int, T, U> and we need a full specialization for A<int, double, double>, we hope the partial specialization to be loaded

Sure, but in my understanding, that's not needed on the ASTReader side but is taken care of by Sema (?). For the following example:

//--- partial.cppm export module partial; export template <typename S, typename T, typename U> struct Partial { static constexpr int Value() { return 0; } }; export template <typename T, typename U> struct Partial<int, T, U> { static constexpr int Value() { return 1; } }; //--- partial.cpp import partial; static_assert(Partial<int, double, double>::Value() == 1);

(I assume that's what you have in mind?) I see two calls to ASTReader::CompleteRedeclChain (with this PR applied): The first asks for the full instantiation Partial<int, double, double> and regardless of what we load, the answer to the query is that it's not defined yet. The second asks for the partial specialization Partial<int, T, U> and then instantiation proceeds to do the right thing.

If it works, I feel good with it.

@ChuanqiXu9
Copy link
Member

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.

Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

@vgvassilev
Copy link
Contributor

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.

Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

Honestly I am more concerned about the tests that @ilya-biryukov is running. As long as they are happy I do not particularly care about commit style. Although it'd be weird to land 40 line patch in many commits :)

@ChuanqiXu9
Copy link
Member

While I may not able to look into them in detail recently, it may be helpful to split this into seperate patches to review and to land.

I initially considered this, but @vgvassilev said in root-project/root#17722 (comment) he prefers a single PR, also for external testing.

Maybe you can test it with this and land it with different patches. So that we can revert one of them if either of them are problematic but other parts are fine.

This is a relatively small patch focused on reducing the round trips to modules deserialization. I see this as an atomic change that if it goes in partially would defeat its purpose. What's the goal of a partial optimization?

I think partial optimizations are optimization too. If these codes are not dependent on each other, it should be better to split them.
Given the scale of the patch, it may not be serious problem actually. I still think it is better to land them separately, but if you want to save some typings. I don't feel too bad.

Honestly I am more concerned about the tests that @ilya-biryukov is running. As long as they are happy I do not particularly care about commit style. Although it'd be weird to land 40 line patch in many commits :)

I don't feel odd. I remember it is (or was) LLVM's policy that smaller patches are preferred : )

hahnjo added a commit to devajithvs/root that referenced this pull request Mar 28, 2025
@ilya-biryukov
Copy link
Contributor

The small-scale benchmarks we had show 10% improvement in CPU and 23% improvement in memory usage for some compilations!

We did hit one compiler error that does not reproduce without modules, however:
error: use of overloaded operator '=' is ambiguous

We're in the process of getting a small reproducer (please bear with us, it takes some time) that we can share. @emaxx-google is working on it.

@vgvassilev
Copy link
Contributor

The small-scale benchmarks we had show 10% improvement in CPU and 23% improvement in memory usage for some compilations!

That's very good news. I think we can further reduce these times. IIRC, we still deserialize declarations that we do not need. One of the places to look is the logic that kicks in when at module loading time:

llvm::Error ASTReader::ReadASTBlock(ModuleFile &F,

We did hit one compiler error that does not reproduce without modules, however: error: use of overloaded operator '=' is ambiguous

Ouch. If that's the only issue on your infrastructure that's probably not so bad.

We're in the process of getting a small reproducer (please bear with us, it takes some time) that we can share. @emaxx-google is working on it.

@emaxx-google
Copy link
Contributor

emaxx-google commented Apr 1, 2025

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)

UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

@hahnjo
Copy link
Member Author

hahnjo commented Apr 1, 2025

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)

UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

Thanks for the efforts! I only had a very quick look and it seems the paste is not complete. For example, head1.h has

class Class1 { public:

and many other definitions look incomplete as well. Can you check if there was maybe a mistake?

@emaxx-google
Copy link
Contributor

emaxx-google commented Apr 1, 2025

Here's the (almost) minimized reproducer for this error: use of overloaded operator '=' is ambiguous error: https://pastebin.com/Ux7TiQhw . (The minimization tool isn't perfect, we know, but we opted to share this result sooner rather than later.)
UPD: To run the reproducer, first "unpack" the archive into separate files using LLVM's split-file (e.g., split-file repro.txt repro/), then run the makefile: CLANG=path/to/clang make -k -C repro.

Thanks for the efforts! I only had a very quick look and it seems the paste is not complete. For example, head1.h has

class Class1 { public:

and many other definitions look incomplete as well. Can you check if there was maybe a mistake?

That's how it looks like - the minimizer tool (based on C-Reduce/C-Vise) basically works by randomly removing chunks of code, which does often end up with code that looks corrupted. The tool could at least do a better job by merging/inlining unnecessary headers, macros, etc., but at least the output, as shared, should be sufficient to trigger the error in question (error: use of overloaded operator '=' is ambiguous). Let me know whether this works for you.

@hahnjo
Copy link
Member Author

hahnjo commented Apr 1, 2025

I had a closer look, but I get plenty of compile errors already on main - including

./head15.h:20:7: error: use of overloaded operator '=' is ambiguous (with operand types 'std::vector<absl::string_view>' and 'strings_internal::Splitter<typename strings_internal::SelectDelimiter<char>::type, AllowEmpty, std::string>' (aka 'Splitter<char, strings_internal::AllowEmpty, basic_string>')) 

I haven't even applied the change in this PR - what am I missing?

@hahnjo
Copy link
Member Author

hahnjo commented Sep 4, 2025

Hm, then I need more information how to reproduce before I can meaningfully look into it...

@hahnjo
Copy link
Member Author

hahnjo commented Sep 15, 2025

@ilya-biryukov @emaxx-google ping, any updates on this? I would be curious to learn more details about the "rather special build mode" and, if possible, if #154158 on its own is good to go

@emaxx-google
Copy link
Contributor

@ilya-biryukov @emaxx-google ping, any updates on this? I would be curious to learn more details about the "rather special build mode" and, if possible, if #154158 on its own is good to go

I asked for help from the corresponding team a couple of weeks ago, but we haven't got anything that we can share yet, sorry.

hahnjo added a commit that referenced this pull request Oct 6, 2025
With lazy template loading, it is possible to find non-canonical FunctionDecls, depending on when redecl chains are completed. This is a problem for templated conversion operators that would allow to call either the copy assignment or the move assignment operator. This ambiguity is resolved by isBetterReferenceBindingKind (called from CompareStandardConversionSequences) ranking rvalue refs over lvalue refs. Unfortunately, this fix is hard to test in isolation without the changes in #133057 that make lazy template loading more likely to complete redecl chains at "inconvenient" times. The added reproducer passes before and after this commit, but would have failed with the proposed changes of the linked PR. Kudos to Maksim Ivanov for providing an initial version of the reproducer that I further simplified.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 6, 2025
With lazy template loading, it is possible to find non-canonical FunctionDecls, depending on when redecl chains are completed. This is a problem for templated conversion operators that would allow to call either the copy assignment or the move assignment operator. This ambiguity is resolved by isBetterReferenceBindingKind (called from CompareStandardConversionSequences) ranking rvalue refs over lvalue refs. Unfortunately, this fix is hard to test in isolation without the changes in llvm/llvm-project#133057 that make lazy template loading more likely to complete redecl chains at "inconvenient" times. The added reproducer passes before and after this commit, but would have failed with the proposed changes of the linked PR. Kudos to Maksim Ivanov for providing an initial version of the reproducer that I further simplified.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 6, 2025
With lazy template loading, it is possible to find non-canonical FunctionDecls, depending on when redecl chains are completed. This is a problem for templated conversion operators that would allow to call either the copy assignment or the move assignment operator. This ambiguity is resolved by isBetterReferenceBindingKind (called from CompareStandardConversionSequences) ranking rvalue refs over lvalue refs. Unfortunately, this fix is hard to test in isolation without the changes in llvm#133057 that make lazy template loading more likely to complete redecl chains at "inconvenient" times. The added reproducer passes before and after this commit, but would have failed with the proposed changes of the linked PR. Kudos to Maksim Ivanov for providing an initial version of the reproducer that I further simplified.
The code is applied from ODRHash::AddDecl with the reasoning given in the comment, to reduce collisions. This was particularly visible with STL types templated on std::pair where its template arguments were not taken into account.
It is unclear (to me) why this needs to be done "for safety", but this change significantly improves the effectiveness of lazy loading.
Similar as the last commit, it is unclear why we need to load all specializations, including non-partial ones, when we have a TPL.
While it is correct to assign a single fixed hash to all template arguments, it can reduce the effectiveness of lazy loading and is not actually needed: we are allowed to ignore parts that cannot be handled because they will be analogously ignored by all hashings.
@hahnjo hahnjo force-pushed the clang-modules-lazy branch from 5b5e242 to c44e2b2 Compare October 18, 2025 11:48
@hahnjo
Copy link
Member Author

hahnjo commented Oct 18, 2025

I rebased the changes after #154158. Would it be possible to re-test / give information about the observed failures? I understand (public) reproducers take time, but it would be great to get at least something to understand how to trigger...

@alexfh
Copy link
Contributor

alexfh commented Oct 27, 2025

@hahnjo I'll take care of testing this internally at Google.

@alexfh
Copy link
Contributor

alexfh commented Oct 29, 2025

So far I found one assertion failure:

assert.h assertion failed at llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:3377 in const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: D && "Cannot get layout of forward declarations!" *** Check failure stack trace: *** @ 0x55e7b65cf564 absl::log_internal::LogMessage::SendToLog() @ 0x55e7b65cf525 absl::log_internal::LogMessage::Flush() @ 0x55e7b67aed24 __assert_fail @ 0x55e7b28c7646 clang::ASTContext::getASTRecordLayout() @ 0x55e7b2295e48 clang::ASTContext::getTypeInfoImpl() @ 0x55e7b2297297 clang::ASTContext::getTypeInfo() @ 0x55e7b22979bc clang::ASTContext::getTypeAlignInChars() @ 0x55e7b07188d3 clang::CodeGen::CodeGenFunction::CreateIRTemp() @ 0x55e7b09843bc clang::CodeGen::CodeGenFunction::StartFunction() @ 0x55e7b0986716 clang::CodeGen::CodeGenFunction::GenerateCode() @ 0x55e7b09b20ec clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition() @ 0x55e7b09a9662 clang::CodeGen::CodeGenModule::EmitGlobalDefinition() @ 0x55e7b099a226 clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x55e7b099a242 clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x55e7b099a242 clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x55e7b099a242 clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x55e7b099a242 clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x55e7b099a242 clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x55e7b0996c6e clang::CodeGen::CodeGenModule::Release() @ 0x55e7b0adc8ae (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit() @ 0x55e7b063685a clang::BackendConsumer::HandleTranslationUnit() @ 0x55e7b14daca8 clang::ParseAST() @ 0x55e7b120f68a clang::FrontendAction::Execute() @ 0x55e7b118333d clang::CompilerInstance::ExecuteAction() @ 0x55e7b0635e2e clang::ExecuteCompilerInvocation() @ 0x55e7b0628f5e cc1_main() @ 0x55e7b0625f99 ExecuteCC1Tool() @ 0x55e7b062874c llvm::function_ref<>::callback_fn<>() @ 0x55e7b1344afe llvm::function_ref<>::callback_fn<>() @ 0x55e7b64224bc llvm::CrashRecoveryContext::RunSafely() @ 0x55e7b1343fe4 clang::driver::CC1Command::Execute() @ 0x55e7b13017b3 clang::driver::Compilation::ExecuteCommand() @ 0x55e7b1301a4f clang::driver::Compilation::ExecuteJobs() @ 0x55e7b131bd20 clang::driver::Driver::ExecuteCompilation() @ 0x55e7b06255a7 clang_main() @ 0x55e7b06238f4 main @ 0x7f012f05a352 __libc_start_main @ 0x55e7b062382a _start 

Trying to produce a test case.

@alexfh
Copy link
Contributor

alexfh commented Oct 29, 2025

There's also a number of compilations that became significantly slower (or hit an infinite loop in Clang).

@alexfh
Copy link
Contributor

alexfh commented Oct 30, 2025

So far I found one assertion failure:

assert.h assertion failed at llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:3377 in const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: D && "Cannot get layout of forward declarations!" ... Trying to produce a test case. 

The reduction is running, but it's going to take a lot of time, I'm afraid (as it usually is with module-related bugs). The initial size of the inputs is around 450MB.

@alexfh
Copy link
Contributor

alexfh commented Nov 5, 2025

So far I found one assertion failure:

assert.h assertion failed at llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:3377 in const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: D && "Cannot get layout of forward declarations!" ... Trying to produce a test case. 

The reduction is running, but it's going to take a lot of time, I'm afraid (as it usually is with module-related bugs). The initial size of the inputs is around 450MB.

The reduction has been crawling for 5 days now. The input is down to 275MB, but the progress has slowed down a lot.

@hahnjo
Copy link
Member Author

hahnjo commented Nov 5, 2025

The reduction has been crawling for 5 days now. The input is down to 275MB, but the progress has slowed down a lot.

Ouch... I don't know if it would already be possible to share it (offline with me only if needed) or does it still contain Google-internal code? Then I could at least have a look with a debugger and see if I can "guess" what is happening

@alexfh
Copy link
Contributor

alexfh commented Nov 10, 2025

The reduction has been crawling for 5 days now. The input is down to 275MB, but the progress has slowed down a lot.

Ouch... I don't know if it would already be possible to share it (offline with me only if needed) or does it still contain Google-internal code? Then I could at least have a look with a debugger and see if I can "guess" what is happening

It's all internal code, so sharing it in any way wouldn't be possible. But if you have any ideas of how to stuff Clang with debug logs and/or assertions to make the issue easier to diagnose, I could run Clang and send you logs.

Meanwhile, CVise made some progress (but not much). The input is down to 250MB and ~12.5K files.

@hahnjo
Copy link
Member Author

hahnjo commented Nov 10, 2025

The reduction has been crawling for 5 days now. The input is down to 275MB, but the progress has slowed down a lot.

Ouch... I don't know if it would already be possible to share it (offline with me only if needed) or does it still contain Google-internal code? Then I could at least have a look with a debugger and see if I can "guess" what is happening

It's all internal code, so sharing it in any way wouldn't be possible. But if you have any ideas of how to stuff Clang with debug logs and/or assertions to make the issue easier to diagnose, I could run Clang and send you logs.

Not really without further understanding what is going wrong under which circumstances... Some pointers (not sure how much time you would be able to spend on your side debugging the problem):

  1. As a first matter of action, ASTContext::getASTRecordLayout completes the type. This should trigger template loading which is more "lazy" after these fixes.
    if (D->hasExternalLexicalStorage() && !D->getDefinition())
    getExternalSource()->CompleteType(const_cast<RecordDecl*>(D));
    // Complete the redecl chain (if necessary).
    (void)D->getMostRecentDecl();
    D = D->getDefinition();
    assert(D && "Cannot get layout of forward declarations!");
    assert(!D->isInvalidDecl() && "Cannot get layout of invalid decl!");
    assert(D->isCompleteDefinition() && "Cannot layout type before complete!");
  2. One thing that could happen is that deserialization triggers merging of declarations and another one has the actual definition attached (redecl chains are a bit black magic for me).
  3. Another possibility is that the forward declaration is local and not external, thus D->hasExternalLexicalStorage() returns false and we skip the call to CompleteType. We might try changing that condition to getExternalSource() && !D->getDefinition()...
  4. Finally it would be interesting to know if what type of template class (?) we are dealing with: implicit specialization, explicit, involving partial specializations? In the past, the "triggering" commit was 3a272ba if you can maybe see if reverting that one avoids the problems (or in general which of the four changes does)?

Meanwhile, CVise made some progress (but not much). The input is down to 250MB and ~12.5K files.

Ok, if possible maybe you can keep it running in parallel, we will still need a reproducer as a regression test later on.

@alexfh
Copy link
Contributor

alexfh commented Nov 19, 2025

It's all internal code, so sharing it in any way wouldn't be possible. But if you have any ideas of how to stuff Clang with debug logs and/or assertions to make the issue easier to diagnose, I could run Clang and send you logs.

Not really without further understanding what is going wrong under which circumstances... Some pointers (not sure how much time you would be able to spend on your side debugging the problem):

  1. As a first matter of action, ASTContext::getASTRecordLayout completes the type. This should trigger template loading which is more "lazy" after these fixes.
    if (D->hasExternalLexicalStorage() && !D->getDefinition())
    getExternalSource()->CompleteType(const_cast<RecordDecl*>(D));
    // Complete the redecl chain (if necessary).
    (void)D->getMostRecentDecl();
    D = D->getDefinition();
    assert(D && "Cannot get layout of forward declarations!");
    assert(!D->isInvalidDecl() && "Cannot get layout of invalid decl!");
    assert(D->isCompleteDefinition() && "Cannot layout type before complete!");
  2. One thing that could happen is that deserialization triggers merging of declarations and another one has the actual definition attached (redecl chains are a bit black magic for me).
  3. Another possibility is that the forward declaration is local and not external, thus D->hasExternalLexicalStorage() returns false and we skip the call to CompleteType. We might try changing that condition to getExternalSource() && !D->getDefinition()...
  4. Finally it would be interesting to know if what type of template class (?) we are dealing with: implicit specialization, explicit, involving partial specializations? In the past, the "triggering" commit was 3a272ba if you can maybe see if reverting that one avoids the problems (or in general which of the four changes does)?

I could allocate some time to debugging this, and here's what I got.

The crash stack is:

... frame #7: 0x00005555748dba8a clang-dbg`__assert_fail(assertion="D && \"Cannot get layout of forward declarations!\"", file="llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp", line=3377, function="const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const") at logging.cc:61:3 frame #8: 0x000055556a8dd209 clang-dbg`clang::ASTContext::getASTRecordLayout(this=0x000030f1bfc20000, D=0x0000000000000000) const at RecordLayoutBuilder.cpp:3377:3 * frame #9: 0x000055556976f180 clang-dbg`clang::ASTContext::getTypeInfoImpl(this=0x000030f1bfc20000, T=0x000030f1ba458cb0) const at ASTContext.cpp:2409:37 frame #10: 0x0000555569770a6c clang-dbg`clang::ASTContext::getTypeInfo(this=0x000030f1bfc20000, T=0x000030f1ba458cb0) const at ASTContext.cpp:1969:17 frame #11: 0x00005555644077f8 clang-dbg`clang::ASTContext::getTypeInfo(this=0x000030f1bfc20000, T=QualType @ 0x00007ffffffe6dd8) const at ASTContext.h:2645:51 frame #12: 0x0000555564b475c8 clang-dbg`clang::ASTContext::getTypeAlign(this=0x000030f1bfc20000, T=QualType @ 0x00007ffffffe6e38) const at ASTContext.h:2682:52 frame #13: 0x00005555697717f8 clang-dbg`clang::ASTContext::getTypeAlignInChars(this=0x000030f1bfc20000, T=QualType @ 0x00007ffffffe6e70) const at ASTContext.cpp:2574:30 frame #14: 0x0000555564483e7c clang-dbg`clang::CodeGen::CodeGenFunction::CreateIRTemp(this=0x00007ffffffeb608, Ty=QualType @ 0x00007ffffffe6ef8, Name=0x00007ffffffe8740) at CGExpr.cpp:183:34 frame #15: 0x0000555564e95a9e clang-dbg`clang::CodeGen::CodeGenFunction::StartFunction(this=0x00007ffffffeb608, GD=GlobalDecl @ 0x00007ffffffea560, RetTy=QualType @ 0x00007ffffffea558, Fn=0x000030f1b47866c8, FnInfo=0x000030f1b477d980, Args=0x00007ffffffeb3c0, Loc=(ID = 2115544462), StartLoc=(ID = 2115544504)) at CodeGenFunction.cpp:1248:19 frame #16: 0x0000555564e98bf0 clang-dbg`clang::CodeGen::CodeGenFunction::GenerateCode(this=0x00007ffffffeb608, GD=GlobalDecl @ 0x00007ffffffeb470, Fn=0x000030f1b47866c8, FnInfo=0x000030f1b477d980) at CodeGenFunction.cpp:1558:3 frame #17: 0x0000555564eee07c clang-dbg`clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(this=0x000030f1bfd04000, GD=GlobalDecl @ 0x00007ffffffecfa0, GV=0x000030f1b47866c8) at CodeGenModule.cpp:6406:26 frame #18: 0x0000555564eddd64 clang-dbg`clang::CodeGen::CodeGenModule::EmitGlobalDefinition(this=0x000030f1bfd04000, GD=GlobalDecl @ 0x00007ffffffed150, GV=0x000030f1b47866c8) at CodeGenModule.cpp:4479:9 frame #19: 0x0000555564ecb704 clang-dbg`clang::CodeGen::CodeGenModule::EmitDeferred(this=0x000030f1bfd04000) at CodeGenModule.cpp:3562:5 frame #20: 0x0000555564ecb774 clang-dbg`clang::CodeGen::CodeGenModule::EmitDeferred(this=0x000030f1bfd04000) at CodeGenModule.cpp:3568:7 frame #21: 0x0000555564ecb774 clang-dbg`clang::CodeGen::CodeGenModule::EmitDeferred(this=0x000030f1bfd04000) at CodeGenModule.cpp:3568:7 frame #22: 0x0000555564ecb774 clang-dbg`clang::CodeGen::CodeGenModule::EmitDeferred(this=0x000030f1bfd04000) at CodeGenModule.cpp:3568:7 frame #23: 0x0000555564ecb774 clang-dbg`clang::CodeGen::CodeGenModule::EmitDeferred(this=0x000030f1bfd04000) at CodeGenModule.cpp:3568:7 frame #24: 0x0000555564ecb774 clang-dbg`clang::CodeGen::CodeGenModule::EmitDeferred(this=0x000030f1bfd04000) at CodeGenModule.cpp:3568:7 frame #25: 0x0000555564ec46ed clang-dbg`clang::CodeGen::CodeGenModule::Release(this=0x000030f1bfd04000) at CodeGenModule.cpp:965:3 frame #26: 0x00005555652f3165 clang-dbg`(anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit(this=0x000030f1bfe07110, Ctx=0x000030f1bfc20000) at ModuleBuilder.cpp:293:18 frame #27: 0x0000555564350b65 clang-dbg`clang::BackendConsumer::HandleTranslationUnit(this=0x000030f1bfe08540, C=0x000030f1bfc20000) at CodeGenAction.cpp:240:10 frame #28: 0x0000555566fd16e1 clang-dbg`clang::ParseAST(S=0x000030f1bb597000, PrintStats=false, SkipFunctionBodies=false) at ParseAST.cpp:183:13 frame #29: 0x00005555669479c5 clang-dbg`clang::ASTFrontendAction::ExecuteAction(this=0x000030f1bfe2a600) at FrontendAction.cpp:1431:3 frame #30: 0x0000555564356f5d clang-dbg`clang::CodeGenAction::ExecuteAction(this=0x000030f1bfe2a600) at CodeGenAction.cpp:1109:30 frame #31: 0x0000555566946f0b clang-dbg`clang::FrontendAction::Execute(this=0x000030f1bfe2a600) at FrontendAction.cpp:1311:3 frame #32: 0x00005555667ef49d clang-dbg`clang::CompilerInstance::ExecuteAction(this=0x000030f1bfe08380, Act=0x000030f1bfe2a600) at CompilerInstance.cpp:1008:33 frame #33: 0x00005555643456ab clang-dbg`clang::ExecuteCompilerInvocation(Clang=0x000030f1bfe08380) at ExecuteCompilerInvocation.cpp:310:25 ... 

Looking closer at frame #9: 0x000055556976f180 clang-dbgclang::ASTContext::getTypeInfoImpl(this=0x000030f1bfc20000, T=0x000030f1ba458cb0) const at ASTContext.cpp:2409:37`:

 2386 case Type::Record: 2387 case Type::Enum: { 2388 const auto *TT = cast<TagType>(T); 2389 const TagDecl *TD = TT->getDecl()->getDefinitionOrSelf(); 2390 2391 if (TD->isInvalidDecl()) { 2392 Width = 8; 2393 Align = 8; (lldb) 2394 break; 2395 } 2396 2397 if (isa<EnumType>(TT)) { 2398 const EnumDecl *ED = cast<EnumDecl>(TD); 2399 TypeInfo Info = 2400 getTypeInfo(ED->getIntegerType()->getUnqualifiedDesugaredType()); 2401 if (unsigned AttrAlign = ED->getMaxAlignment()) { 2402 Info.Align = AttrAlign; 2403 Info.AlignRequirement = AlignRequirementKind::RequiredByEnum; 2404 } (lldb) 2405 return Info; 2406 } 2407 2408 const auto *RD = cast<RecordDecl>(TD); 2409 const ASTRecordLayout &Layout = getASTRecordLayout(RD); 

The crash is in the function call above ^^

 2410 Width = toBits(Layout.getSize()); 2411 Align = toBits(Layout.getAlignment()); 2412 AlignRequirement = RD->hasAttr<AlignedAttr>() 2413 ? AlignRequirementKind::RequiredByRecord 2414 : AlignRequirementKind::None; 2415 break; 

The CXXRecordDecl that triggers the crash is:

(lldb) p RD->dump() CXXRecordDecl 0x30f1ba455568 parent 0x30f1babf29d8 prev 0x30f1ba449eb0 <third_party/absl/container/internal/raw_hash_set.h:890:1, col:7> col:7 imported in //third_party/protobuf/util:differencer hidden class const_iterator instantiated_from 0x30f1ba97cea0 (lldb) p RD->hasExternalLexicalStorage() (bool) false (lldb) p RD->getDefinition() (clang::RecordDecl *) nullptr 

Just in case a full dump of *RD:

(lldb) settings set target.max-children-depth 100 (lldb) p *RD (const clang::RecordDecl) { clang::TagDecl = { clang::TypeDecl = { clang::NamedDecl = { clang::Decl = { NextInContextAndBits = { Value = (Data = "\U00000002") } DeclCtx = { llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<clang::DeclContext *, clang::Decl::MultipleDC *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::DeclContext *, clang: :Decl::MultipleDC *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::DeclContext *, clang::Decl::MultipleDC *> > >, 0, clang::DeclContext *, clang::Decl::MultipleDC *> = { llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<clang::DeclContext *, clang::Decl::MultipleDC *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::DeclContext *, clan g::Decl::MultipleDC *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::DeclContext *, clang::Decl::MultipleDC *> > >, 1, clang::Decl::MultipleDC *> = { llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<clang::DeclContext *, clang::Decl::MultipleDC *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::DeclContext *, cl ang::Decl::MultipleDC *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::DeclContext *, clang::Decl::MultipleDC *> > >, 2> = { Val = { Value = (Data = "\U00000014\xb6I\xba\xf10") } } } } } Loc = (ID = 2102307394) DeclKind = CXXRecord InvalidDecl = false HasAttrs = false Implicit = false Used = false Referenced = false TopLevelDeclInObjCContainer = false Access = AS_public FromASTFile = true IdentifierNamespace = IDNS_Tag | IDNS_Type = 0 CacheValidAndLinkage = Invalid } Name = (Ptr = 53814798516824) } TypeForDecl = 0x000030f1ba49ecb0 LocStart = (ID = 2102307388) } clang::DeclContext = { LookupPtr = nullptr = { DeclContextBits = { DeclKind = CXXRecord ExternalLexicalStorage = false ExternalVisibleStorage = false NeedToReconcileExternalVisibleStorage = false HasLazyLocalLexicalLookups = false HasLazyExternalLexicalLookups = false UseQualifiedLookup = false } NamespaceDeclBits = (59, IsInline = true, IsNested = true) TagDeclBits = { = 59 TagDeclKind = Class IsCompleteDefinition = false IsBeingDefined = false IsEmbeddedInDeclarator = false IsFreeStanding = false IsCompleteDefinitionRequired = false IsThisDeclarationADemotedDefinition = false } EnumDeclBits = { = 24635 NumPositiveBits = 0 NumNegativeBits = 0 IsScoped = false IsScopedUsingClassTag = false IsFixed = false HasODRHash = false } RecordDeclBits = { = 24635 HasFlexibleArrayMember = false AnonymousStructOrUnion = false HasObjectMember = false HasVolatileMember = false LoadedFieldsFromExternalStorage = false NonTrivialToPrimitiveDefaultInitialize = false NonTrivialToPrimitiveCopy = false NonTrivialToPrimitiveDestroy = false HasNonTrivialToPrimitiveDefaultInitializeCUnion = false HasNonTrivialToPrimitiveDestructCUnion = false HasNonTrivialToPrimitiveCopyCUnion = false HasUninitializedExplicitInitFields = false ParamDestroyedInCallee = false ArgPassingRestrictions = CanPassInRegs IsRandomized = false ODRHash = 0 } OMPDeclareReductionDeclBits = (59, InitializerKind = Direct | Copy) FunctionDeclBits = { = 59 SClass = SC_PrivateExtern IsInline = false IsInlineSpecified = false IsVirtualAsWritten = false IsPureVirtual = false HasInheritedPrototype = false HasWrittenPrototype = false IsDeleted = false IsTrivial = false IsTrivialForCall = false IsDefaulted = false IsExplicitlyDefaulted = false HasDefaultedOrDeletedInfo = false IsIneligibleOrNotSelected = false HasImplicitReturnZero = false IsLateTemplateParsed = false IsInstantiatedFromMemberTemplate = false ConstexprKind = Unspecified BodyContainsImmediateEscalatingExpression = false InstantiationIsPending = false UsesSEHTry = false HasSkippedBody = false WillHaveBody = false ... } CXXConstructorDeclBits = { = 24635 NumCtorInitializers = 0 IsInheritingConstructor = false HasTrailingExplicitSpecifier = false IsSimpleExplicit = false } ObjCMethodDeclBits = { = 59 Family = OMF_init IsInstance = false IsVariadic = false IsPropertyAccessor = false IsSynthesizedAccessorStub = false IsDefined = false IsRedeclaration = false HasRedeclaration = false DeclImplementation = None objcDeclQualifier = OBJC_TQ_None RelatedResultType = false SelLocsKind = SelLoc_NonStandard IsOverriding = false HasSkippedBody = false } ObjCContainerDeclBits = { AtStart = (ID = 0) } LinkageSpecDeclBits = (59, Language = C | CXX, HasBraces = false) BlockDeclBits = { = 59 IsVariadic = true CapturesCXXThis = true BlockMissingReturnType = false IsConversionFromLambda = false DoesNotEscape = false CanAvoidCopyToHeap = false } } FirstDecl = nullptr LastDecl = nullptr } clang::Redeclarable<clang::TagDecl> = { RedeclLink = { Link = { llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<llvm::PointerUnion<clang::Decl *, const void *>, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang::Decl *)> >, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<llvm::PointerUnion<clang::Decl *, const void *>, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang::Decl *)> >, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<llvm::PointerUnion<clang::Decl *, const void *>, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang::Decl *)> > > >, 0, llvm::PointerUnion<clang::Decl *, const void *>, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang::Decl *)> > = { llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<llvm::PointerUnion<clang::Decl *, const void *>, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang:: Decl *)> >, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<llvm::PointerUnion<clang::Decl *, const void *>, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang::Decl *)> >, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<llvm::PointerUnion<clang::Decl *, const void *>, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang::Decl *)> > > >, 1, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang::Decl *)> > = { llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<llvm::PointerUnion<clang::Decl *, const void *>, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang ::Decl *)> >, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<llvm::PointerUnion<clang::Decl *, const void *>, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang::Decl *)> >, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<llvm::PointerUnion<clang::Decl *, const void *>, clang::LazyGenerationalUpdatePtr<const clang::Decl *, clang::Decl *, void (clang::ExternalASTSource::*)(const clang::Decl *)> > > >, 2> = { Val = { Value = (Data = "\xb0\xfeH\xba\xf10") } } } } } } First = 0x000030f1ba48feb0 } BraceRange = { B = (ID = 0) E = (ID = 0) } TypedefNameDeclOrQualifier = { llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<clang::TypedefNameDecl *, clang::QualifierInfo *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::TypedefNameDecl *, clang:: QualifierInfo *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::TypedefNameDecl *, clang::QualifierInfo *> > >, 0, clang::TypedefNameDecl *, clang::QualifierInfo *> = { llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<clang::TypedefNameDecl *, clang::QualifierInfo *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::TypedefNameDecl *, clang ::QualifierInfo *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::TypedefNameDecl *, clang::QualifierInfo *> > >, 1, clang::QualifierInfo *> = { llvm::pointer_union_detail::PointerUnionMembers<llvm::PointerUnion<clang::TypedefNameDecl *, clang::QualifierInfo *>, llvm::PointerIntPair<void *, 1, int, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::TypedefNameDecl *, clang::QualifierInfo *>, llvm::PointerIntPairInfo<void *, 1, llvm::pointer_union_detail::PointerUnionUIntTraits<clang::TypedefNameDecl *, clang::QualifierInfo *> > >, 2> = { Val = { Value = (Data = "") } } } } } } } 

It looks like it happens as you suspected ("D->hasExternalLexicalStorage() returns false and we skip the call to CompleteType"). I tried changing that condition to getExternalSource() && !D->getDefinition(), but it doesn't help - I'm still getting the same assertion:

assert.h assertion failed at third_party/llvm/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:3377 in const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: D && "Cannot get layout of forward declarations!" 
(lldb) 3360 3361 /// getASTRecordLayout - Get or compute information about the layout of the 3362 /// specified record (struct/union/class), which indicates its size and field 3363 /// position information. 3364 const ASTRecordLayout & 3365 ASTContext::getASTRecordLayout(const RecordDecl *D) const { 3366 // These asserts test different things. A record has a definition 3367 // as soon as we begin to parse the definition. That definition is 3368 // not a complete definition (which is what isDefinition() tests) 3369 // until we *finish* parsing the definition. (lldb) 3370 3371 if (getExternalSource() && !D->getDefinition()) 3372 getExternalSource()->CompleteType(const_cast<RecordDecl*>(D)); 3373 // Complete the redecl chain (if necessary). 3374 (void)D->getMostRecentDecl(); 3375 3376 D = D->getDefinition(); 3377 assert(D && "Cannot get layout of forward declarations!"); 

One more thing I noticed while trying to debug what's happening in the getExternalSource()->CompleteType(const_cast<RecordDecl*>(D)); is that the crash happens after a different number of calls to getASTRecordLayout(). It varies in a wide range (I saw numbers from ~800 to ~15000 hits of a breakpoint on line 3371 when the crash occurred). This makes me think about a non-determinism here.

Please let me know, if you have any more ideas I could try.

Meanwhile, CVise made some progress (but not much). The input is down to 250MB and ~12.5K files.

Ok, if possible maybe you can keep it running in parallel, we will still need a reproducer as a regression test later on.

The test case is down to ~130MB and 9K files, but still nothing I can share directly.

@alexfh
Copy link
Contributor

alexfh commented Nov 24, 2025

Ok, if possible maybe you can keep it running in parallel, we will still need a reproducer as a regression test later on.

The test case is down to ~130MB and 9K files, but still nothing I can share directly.

I've done some manual build graph cleanup and got the makefile in the test case from a few thousand down to a handful of rules. The reduction started running a bit faster since then: the size of inputs is now ~65MB spread across ~4.5K files. Looking at how it goes, I may have a shareable test case by the end of the week.

@hahnjo
Copy link
Member Author

hahnjo commented Nov 27, 2025

I could allocate some time to debugging this, and here's what I got.

Thanks for trying all that!

It looks like it happens as you suspected ("D->hasExternalLexicalStorage() returns false and we skip the call to CompleteType").

Ok, at least the understanding of the problem is progressing.

I tried changing that condition to getExternalSource() && !D->getDefinition(), but it doesn't help - I'm still getting the same assertion:

Too bad, that would have been too easy I guess... Now I wonder if we maybe need to change the order of operations, ie first complete the redecl chain, then notice that we can get the definition externally, and then actually go and do that. Conceptually something like

 // Complete the redecl chain (if necessary). D = D->getMostRecentDecl(); if (D->hasExternalLexicalStorage() && !D->getDefinition()) getExternalSource()->CompleteType(const_cast<RecordDecl*>(D));

Not sure if that changes anything (at least it doesn't fail Clang tests when applied to current master on its own), but as I said earlier "redecl chains are a bit black magic for me". But if we are getting close to getting a sharable reproducer, I will be able to have a look myself.

Please let me know, if you have any more ideas I could try.

Only if you have more time to spare, the second part of point 4. above, ie which of the commits is the culprit: Is it a single one and everything is fine if you just exclude / revert that one?

@alexfh
Copy link
Contributor

alexfh commented Nov 28, 2025

I've done some manual build graph cleanup and got the makefile in the test case from a few thousand down to a handful of rules. The reduction started running a bit faster since then: the size of inputs is now ~65MB spread across ~4.5K files. Looking at how it goes, I may have a shareable test case by the end of the week.

10MB and 2.7K files

@alexfh
Copy link
Contributor

alexfh commented Nov 29, 2025

After manually cleaning up the ~10KB that CVise could squeeze out of the initial 400+MB, I got this:

//--- a.cppmap module "a" { header "a.h" } //--- b.cppmap module "b" { header "b.h" } //--- d.cppmap module "d" { header "d.h" } //--- stl.cppmap module "stl" { header "stl.h" } //--- a.h #include "b.h" namespace { void a(absl::set<char> c) { absl::map<int> a; absl::set<int> b; c.end(); c.contains(); } } // namespace //--- b.h #include "c.h" void b() { absl::set<char> x; } //--- c.h #include "stl.h" inline namespace internal { template <typename> class set_base { public: struct iterator { void u() const; }; iterator end() const { return {}; } void contains() const { end().u(); } pair<iterator> e(); template <typename H> friend H h(); }; template <typename> class map_base { template <typename H> friend H h(); }; } // namespace internal namespace absl { template <typename T> class set : public set_base<T> {}; template <typename T> class map : public map_base<T> {}; } // namespace absl //--- d.h #include "c.h" void d() { absl::set<char> x; } //--- stl.h #ifndef _STL_H_ #define _STL_H_ template <class> struct pair; #endif //--- main.cc #include "c.h" void f(absl::set<char> o) { o.contains(); } 
$ $CLANG -fmodule-name=stl -fmodule-map-file=stl.cppmap -Xclang=-fno-cxx-modules -xc++ -Xclang=-emit-module -fmodules -std=c++20 -c stl.cppmap -o stl.pcm $ $CLANG -fmodule-name=d -fmodule-map-file=d.cppmap -Xclang=-fno-cxx-modules -xc++ -Xclang=-emit-module -fmodules -fmodule-file=stl.pcm -std=c++20 -c d.cppmap -o d.pcm $ $CLANG -fmodule-name=b -fmodule-map-file=b.cppmap -Xclang=-fno-cxx-modules -xc++ -Xclang=-emit-module -fmodules -fmodule-file=stl.pcm -std=c++20 -c b.cppmap -o b.pcm $ $CLANG -fmodule-name=a -fmodule-map-file=a.cppmap -Xclang=-fno-cxx-modules -xc++ -Xclang=-emit-module -fmodules -fmodule-file=stl.pcm -fmodule-file=b.pcm -fmodule-file=d.pcm -std=c++20 -c a.cppmap -o a.pcm $ $CLANG -Xclang=-fno-cxx-modules -fmodules -fmodule-file=a.pcm -std=c++20 -c main.cc -o main.o assert.h assertion failed at clang/lib/AST/RecordLayoutBuilder.cpp:3377 in const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: D && "Cannot get layout of forward declarations!" *** Check failure stack trace: *** @ 0x55e64e1aed24 __assert_fail @ 0x55e64a2c7646 clang::ASTContext::getASTRecordLayout() @ 0x55e649c95e48 clang::ASTContext::getTypeInfoImpl() @ 0x55e649c97297 clang::ASTContext::getTypeInfo() @ 0x55e649c979bc clang::ASTContext::getTypeAlignInChars() @ 0x55e6481188d3 clang::CodeGen::CodeGenFunction::CreateIRTemp() @ 0x55e6483843bc clang::CodeGen::CodeGenFunction::StartFunction() @ 0x55e648386716 clang::CodeGen::CodeGenFunction::GenerateCode() @ 0x55e6483b20ec clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition() @ 0x55e6483a9662 clang::CodeGen::CodeGenModule::EmitGlobalDefinition() @ 0x55e64839a226 clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x55e64839a242 clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x55e648396c6e clang::CodeGen::CodeGenModule::Release() @ 0x55e6484dc8ae (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit() ... 
@alexfh
Copy link
Contributor

alexfh commented Nov 29, 2025

There's also a number of compilations that became significantly slower (or hit an infinite loop in Clang).

I will try to find time to get to this part too.

@hahnjo
Copy link
Member Author

hahnjo commented Dec 1, 2025

After manually cleaning up the ~10KB that CVise could squeeze out of the initial 400+MB, I got this:

Thanks for your efforts, very useful! That reduced example actually crashes on main already, I opened #170084 (after some more simplifications). If your full internal code is currently compiling without problems, it is likely that this PR "just" exposes the issue which git-bisect says was introduced by 91cdd35 on August 9.

@hahnjo
Copy link
Member Author

hahnjo commented Dec 1, 2025

Ok, this wasn't too bad actually: #170090 fixes the reproducer, so fingers crossed it also addresses the issue in the full-blown internal code 🤞 @alexfh if you have more cycles available, could you test the two PRs together?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category

8 participants