- Notifications
You must be signed in to change notification settings - Fork 15.3k
[clang] Allow trivial pp-directives before C++ module directive #153641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: yronglin <yronglin777@gmail.com>
| @llvm/pr-subscribers-clang Author: None (yronglin) ChangesConsider the following code: # 1 __FILE__ 1 3 export module a;According to the wording in P1857R3: and the wording in [cpp.pre]
State change would mean most semantically observable preprocessor state, particularly anything that is order dependent. Global flags like being a system header/module shouldn't matter. We should exempt a brunch of directives, even though it violates the current standard wording. In this patch, we introduce a This list of preprossing directives just a draft, feel free to comments. Fixes #145274 Patch is 36.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153641.diff 11 Files Affected:
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h index 06971ff87ab96..423f2ffe2f852 100644 --- a/clang/include/clang/Lex/Lexer.h +++ b/clang/include/clang/Lex/Lexer.h @@ -143,9 +143,6 @@ class Lexer : public PreprocessorLexer { /// True if this is the first time we're lexing the input file. bool IsFirstTimeLexingFile; - /// True if current lexing token is the first pp-token. - bool IsFirstPPToken; - // NewLinePtr - A pointer to new line character '\n' being lexed. For '\r\n', // it also points to '\n.' const char *NewLinePtr; diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h index 71b0f8eab3bfa..d51faad255224 100644 --- a/clang/include/clang/Lex/Preprocessor.h +++ b/clang/include/clang/Lex/Preprocessor.h @@ -82,6 +82,7 @@ class PreprocessorLexer; class PreprocessorOptions; class ScratchBuffer; class TargetInfo; +class TrivialDirectiveTracer; namespace Builtin { class Context; @@ -353,6 +354,11 @@ class Preprocessor { /// First pp-token source location in current translation unit. SourceLocation FirstPPTokenLoc; + /// A preprocessor directive tracer to trace whether the preprocessing + /// state changed. These changes would mean most semantically observable + /// preprocessor state, particularly anything that is order dependent. + TrivialDirectiveTracer *DirTracer = nullptr; + /// A position within a C++20 import-seq. class StdCXXImportSeq { public: @@ -609,6 +615,8 @@ class Preprocessor { return State == NamedModuleImplementation && !getName().contains(':'); } + bool isNotAModuleDecl() const { return State == NotAModuleDecl; } + StringRef getName() const { assert(isNamedModule() && "Can't get name from a non named module"); return Name; @@ -3091,6 +3099,9 @@ class Preprocessor { bool setDeserializedSafeBufferOptOutMap( const SmallVectorImpl<SourceLocation> &SrcLocSeqs); + /// Whether allow C++ module directive. + bool hasSeenNoTrivialPPDirective() const; + private: /// Helper functions to forward lexing to the actual lexer. They all share the /// same signature. diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h index fc43e72593b94..c493571e00038 100644 --- a/clang/include/clang/Lex/Token.h +++ b/clang/include/clang/Lex/Token.h @@ -86,12 +86,10 @@ class Token { // macro stringizing or charizing operator. CommaAfterElided = 0x200, // The comma following this token was elided (MS). IsEditorPlaceholder = 0x400, // This identifier is a placeholder. - - IsReinjected = 0x800, // A phase 4 token that was produced before and - // re-added, e.g. via EnterTokenStream. Annotation - // tokens are *not* reinjected. - FirstPPToken = 0x1000, // This token is the first pp token in the - // translation unit. + IsReinjected = 0x800, // A phase 4 token that was produced before and + // re-added, e.g. via EnterTokenStream. Annotation + // tokens are *not* reinjected. + SeenNoTrivialPPDirective = 0x1000, }; tok::TokenKind getKind() const { return Kind; } @@ -321,8 +319,9 @@ class Token { /// lexer uses identifier tokens to represent placeholders. bool isEditorPlaceholder() const { return getFlag(IsEditorPlaceholder); } - /// Returns true if this token is the first pp-token. - bool isFirstPPToken() const { return getFlag(FirstPPToken); } + bool hasSeenNoTrivialPPDirective() const { + return getFlag(SeenNoTrivialPPDirective); + } }; /// Information about the conditional stack (\#if directives) diff --git a/clang/include/clang/Lex/TrivialDirectiveTracer.h b/clang/include/clang/Lex/TrivialDirectiveTracer.h new file mode 100644 index 0000000000000..9d4e0fdc96daf --- /dev/null +++ b/clang/include/clang/Lex/TrivialDirectiveTracer.h @@ -0,0 +1,388 @@ +//===--- TrivialDirectiveTracer.h -------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file defines the TrivialDirectiveTracer interface. +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_CLANG_LEX_TRIVIAL_DIRECTIVE_TRACER_H +#define LLVM_CLANG_LEX_TRIVIAL_DIRECTIVE_TRACER_H + +#include "clang/Lex/PPCallbacks.h" + +namespace clang { +class Preprocessor; + +class TrivialDirectiveTracer : public PPCallbacks { + Preprocessor &PP; + bool InMainFile = true; + bool SeenNoTrivialPPDirective = false; + + void setSeenNoTrivialPPDirective(bool Val); + +public: + TrivialDirectiveTracer(Preprocessor &P) : PP(P) {} + + bool hasSeenNoTrivialPPDirective() const; + + /// Callback invoked whenever a source file is entered or exited. + /// + /// \param Loc Indicates the new location. + /// \param PrevFID the file that was exited if \p Reason is ExitFile or the + /// the file before the new one entered for \p Reason EnterFile. + void FileChanged(SourceLocation Loc, FileChangeReason Reason, + SrcMgr::CharacteristicKind FileType, + FileID PrevFID = FileID()) override; + + /// Callback invoked whenever the \p Lexer moves to a different file for + /// lexing. Unlike \p FileChanged line number directives and other related + /// pragmas do not trigger callbacks to \p LexedFileChanged. + /// + /// \param FID The \p FileID that the \p Lexer moved to. + /// + /// \param Reason Whether the \p Lexer entered a new file or exited one. + /// + /// \param FileType The \p CharacteristicKind of the file the \p Lexer moved + /// to. + /// + /// \param PrevFID The \p FileID the \p Lexer was using before the change. + /// + /// \param Loc The location where the \p Lexer entered a new file from or the + /// location that the \p Lexer moved into after exiting a file. + void LexedFileChanged(FileID FID, LexedFileChangeReason Reason, + SrcMgr::CharacteristicKind FileType, FileID PrevFID, + SourceLocation Loc) override; + + /// Callback invoked whenever an embed directive has been processed, + /// regardless of whether the embed will actually find a file. + /// + /// \param HashLoc The location of the '#' that starts the embed directive. + /// + /// \param FileName The name of the file being included, as written in the + /// source code. + /// + /// \param IsAngled Whether the file name was enclosed in angle brackets; + /// otherwise, it was enclosed in quotes. + /// + /// \param File The actual file that may be included by this embed directive. + /// + /// \param Params The parameters used by the directive. + void EmbedDirective(SourceLocation HashLoc, StringRef FileName, bool IsAngled, + OptionalFileEntryRef File, + const LexEmbedParametersResult &Params) override { + setSeenNoTrivialPPDirective(true); + } + + /// Callback invoked whenever an inclusion directive of + /// any kind (\c \#include, \c \#import, etc.) has been processed, regardless + /// of whether the inclusion will actually result in an inclusion. + /// + /// \param HashLoc The location of the '#' that starts the inclusion + /// directive. + /// + /// \param IncludeTok The token that indicates the kind of inclusion + /// directive, e.g., 'include' or 'import'. + /// + /// \param FileName The name of the file being included, as written in the + /// source code. + /// + /// \param IsAngled Whether the file name was enclosed in angle brackets; + /// otherwise, it was enclosed in quotes. + /// + /// \param FilenameRange The character range of the quotes or angle brackets + /// for the written file name. + /// + /// \param File The actual file that may be included by this inclusion + /// directive. + /// + /// \param SearchPath Contains the search path which was used to find the file + /// in the file system. If the file was found via an absolute include path, + /// SearchPath will be empty. For framework includes, the SearchPath and + /// RelativePath will be split up. For example, if an include of "Some/Some.h" + /// is found via the framework path + /// "path/to/Frameworks/Some.framework/Headers/Some.h", SearchPath will be + /// "path/to/Frameworks/Some.framework/Headers" and RelativePath will be + /// "Some.h". + /// + /// \param RelativePath The path relative to SearchPath, at which the include + /// file was found. This is equal to FileName except for framework includes. + /// + /// \param SuggestedModule The module suggested for this header, if any. + /// + /// \param ModuleImported Whether this include was translated into import of + /// \p SuggestedModule. + /// + /// \param FileType The characteristic kind, indicates whether a file or + /// directory holds normal user code, system code, or system code which is + /// implicitly 'extern "C"' in C++ mode. + /// + void InclusionDirective(SourceLocation HashLoc, const Token &IncludeTok, + StringRef FileName, bool IsAngled, + CharSourceRange FilenameRange, + OptionalFileEntryRef File, StringRef SearchPath, + StringRef RelativePath, const Module *SuggestedModule, + bool ModuleImported, + SrcMgr::CharacteristicKind FileType) override { + setSeenNoTrivialPPDirective(true); + } + + /// Callback invoked whenever there was an explicit module-import + /// syntax. + /// + /// \param ImportLoc The location of import directive token. + /// + /// \param Path The identifiers (and their locations) of the module + /// "path", e.g., "std.vector" would be split into "std" and "vector". + /// + /// \param Imported The imported module; can be null if importing failed. + /// + void moduleImport(SourceLocation ImportLoc, ModuleIdPath Path, + const Module *Imported) override { + setSeenNoTrivialPPDirective(true); + } + + /// Callback invoked when the end of the main file is reached. + /// + /// No subsequent callbacks will be made. + void EndOfMainFile() override { setSeenNoTrivialPPDirective(true); } + + /// Callback invoked when a \#ident or \#sccs directive is read. + /// \param Loc The location of the directive. + /// \param str The text of the directive. + /// + void Ident(SourceLocation Loc, StringRef str) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when start reading any pragma directive. + void PragmaDirective(SourceLocation Loc, + PragmaIntroducerKind Introducer) override {} + + /// Callback invoked when a \#pragma comment directive is read. + void PragmaComment(SourceLocation Loc, const IdentifierInfo *Kind, + StringRef Str) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma mark comment is read. + void PragmaMark(SourceLocation Loc, StringRef Trivia) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma detect_mismatch directive is + /// read. + void PragmaDetectMismatch(SourceLocation Loc, StringRef Name, + StringRef Value) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma clang __debug directive is read. + /// \param Loc The location of the debug directive. + /// \param DebugType The identifier following __debug. + void PragmaDebug(SourceLocation Loc, StringRef DebugType) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma message directive is read. + /// \param Loc The location of the message directive. + /// \param Namespace The namespace of the message directive. + /// \param Kind The type of the message directive. + /// \param Str The text of the message directive. + void PragmaMessage(SourceLocation Loc, StringRef Namespace, + PragmaMessageKind Kind, StringRef Str) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma gcc diagnostic push directive + /// is read. + void PragmaDiagnosticPush(SourceLocation Loc, StringRef Namespace) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma gcc diagnostic pop directive + /// is read. + void PragmaDiagnosticPop(SourceLocation Loc, StringRef Namespace) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma gcc diagnostic directive is read. + void PragmaDiagnostic(SourceLocation Loc, StringRef Namespace, + diag::Severity mapping, StringRef Str) override { + setSeenNoTrivialPPDirective(false); + } + + /// Called when an OpenCL extension is either disabled or + /// enabled with a pragma. + void PragmaOpenCLExtension(SourceLocation NameLoc, const IdentifierInfo *Name, + SourceLocation StateLoc, unsigned State) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma warning directive is read. + void PragmaWarning(SourceLocation Loc, PragmaWarningSpecifier WarningSpec, + ArrayRef<int> Ids) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma warning(push) directive is read. + void PragmaWarningPush(SourceLocation Loc, int Level) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma warning(pop) directive is read. + void PragmaWarningPop(SourceLocation Loc) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma execution_character_set(push) directive + /// is read. + void PragmaExecCharsetPush(SourceLocation Loc, StringRef Str) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma execution_character_set(pop) directive + /// is read. + void PragmaExecCharsetPop(SourceLocation Loc) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma clang assume_nonnull begin directive + /// is read. + void PragmaAssumeNonNullBegin(SourceLocation Loc) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma clang assume_nonnull end directive + /// is read. + void PragmaAssumeNonNullEnd(SourceLocation Loc) override { + setSeenNoTrivialPPDirective(false); + } + + /// Called by Preprocessor::HandleMacroExpandedIdentifier when a + /// macro invocation is found. + void MacroExpands(const Token &MacroNameTok, const MacroDefinition &MD, + SourceRange Range, const MacroArgs *Args) override; + + /// Hook called whenever a macro definition is seen. + void MacroDefined(const Token &MacroNameTok, + const MacroDirective *MD) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever a macro \#undef is seen. + /// \param MacroNameTok The active Token + /// \param MD A MacroDefinition for the named macro. + /// \param Undef New MacroDirective if the macro was defined, null otherwise. + /// + /// MD is released immediately following this callback. + void MacroUndefined(const Token &MacroNameTok, const MacroDefinition &MD, + const MacroDirective *Undef) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever the 'defined' operator is seen. + /// \param MD The MacroDirective if the name was a macro, null otherwise. + void Defined(const Token &MacroNameTok, const MacroDefinition &MD, + SourceRange Range) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#if is seen. + /// \param Loc the source location of the directive. + /// \param ConditionRange The SourceRange of the expression being tested. + /// \param ConditionValue The evaluated value of the condition. + /// + // FIXME: better to pass in a list (or tree!) of Tokens. + void If(SourceLocation Loc, SourceRange ConditionRange, + ConditionValueKind ConditionValue) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#elif is seen. + /// \param Loc the source location of the directive. + /// \param ConditionRange The SourceRange of the expression being tested. + /// \param ConditionValue The evaluated value of the condition. + /// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive. + // FIXME: better to pass in a list (or tree!) of Tokens. + void Elif(SourceLocation Loc, SourceRange ConditionRange, + ConditionValueKind ConditionValue, SourceLocation IfLoc) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#ifdef is seen. + /// \param Loc the source location of the directive. + /// \param MacroNameTok Information on the token being tested. + /// \param MD The MacroDefinition if the name was a macro, null otherwise. + void Ifdef(SourceLocation Loc, const Token &MacroNameTok, + const MacroDefinition &MD) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#elifdef branch is taken. + /// \param Loc the source location of the directive. + /// \param MacroNameTok Information on the token being tested. + /// \param MD The MacroDefinition if the name was a macro, null otherwise. + void Elifdef(SourceLocation Loc, const Token &MacroNameTok, + const MacroDefinition &MD) override { + setSeenNoTrivialPPDirective(true); + } + /// Hook called whenever an \#elifdef is skipped. + /// \param Loc the source location of the directive. + /// \param ConditionRange The SourceRange of the expression being tested. + /// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive. + // FIXME: better to pass in a list (or tree!) of Tokens. + void Elifdef(SourceLocation Loc, SourceRange ConditionRange, + SourceLocation IfLoc) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#ifndef is seen. + /// \param Loc the source location of the directive. + /// \param MacroNameTok Information on the token being tested. + /// \param MD The MacroDefiniton if the name was a macro, null otherwise. + void Ifndef(SourceLocation Loc, const Token &MacroNameTok, + const MacroDefinition &MD) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#elifndef branch is taken. + /// \param Loc the source location of the directive. + /// \param MacroNameTok Information on the token being tested. + /// \param MD The MacroDefinition if the name was a macro, null otherwise. + void Elifndef(SourceLocation Loc, const Token &MacroNameTok, + const MacroDefinition &MD) override { + setSeenNoTrivialPPDirective(true); + } + /// Hook called whenever an \#elifndef is skipped. + /// \param Loc the source location of the directive. + /// \param ConditionRange The SourceRange of the expression being tested. + /// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive. + // FIXME: better to pass in a list (or tree!) of Tokens. + void Elifndef(SourceLocation Loc, SourceRange ConditionRange, + SourceLocation IfLoc) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#else is seen. + /// \param Loc the source location of the directive. + /// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive. + void Else(SourceLocation Loc, SourceLocation IfLoc) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#endif is seen. + /// \param Loc the source location of the directive. + /// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive. + void Endif(SourceLocation Loc, SourceLocation IfLoc) override { + setSeenNoTrivialPPDirective(true); + } +}; + +} // namespace clang + +#endif // LLVM_CLANG_LEX_TRIVIAL_DIRECTIVE_TRACER... [truncated] |
| @llvm/pr-subscribers-clang-modules Author: None (yronglin) ChangesConsider the following code: # 1 __FILE__ 1 3 export module a;According to the wording in P1857R3: and the wording in [cpp.pre]
State change would mean most semantically observable preprocessor state, particularly anything that is order dependent. Global flags like being a system header/module shouldn't matter. We should exempt a brunch of directives, even though it violates the current standard wording. In this patch, we introduce a This list of preprossing directives just a draft, feel free to comments. Fixes #145274 Patch is 36.26 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/153641.diff 11 Files Affected:
diff --git a/clang/include/clang/Lex/Lexer.h b/clang/include/clang/Lex/Lexer.h index 06971ff87ab96..423f2ffe2f852 100644 --- a/clang/include/clang/Lex/Lexer.h +++ b/clang/include/clang/Lex/Lexer.h @@ -143,9 +143,6 @@ class Lexer : public PreprocessorLexer { /// True if this is the first time we're lexing the input file. bool IsFirstTimeLexingFile; - /// True if current lexing token is the first pp-token. - bool IsFirstPPToken; - // NewLinePtr - A pointer to new line character '\n' being lexed. For '\r\n', // it also points to '\n.' const char *NewLinePtr; diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h index 71b0f8eab3bfa..d51faad255224 100644 --- a/clang/include/clang/Lex/Preprocessor.h +++ b/clang/include/clang/Lex/Preprocessor.h @@ -82,6 +82,7 @@ class PreprocessorLexer; class PreprocessorOptions; class ScratchBuffer; class TargetInfo; +class TrivialDirectiveTracer; namespace Builtin { class Context; @@ -353,6 +354,11 @@ class Preprocessor { /// First pp-token source location in current translation unit. SourceLocation FirstPPTokenLoc; + /// A preprocessor directive tracer to trace whether the preprocessing + /// state changed. These changes would mean most semantically observable + /// preprocessor state, particularly anything that is order dependent. + TrivialDirectiveTracer *DirTracer = nullptr; + /// A position within a C++20 import-seq. class StdCXXImportSeq { public: @@ -609,6 +615,8 @@ class Preprocessor { return State == NamedModuleImplementation && !getName().contains(':'); } + bool isNotAModuleDecl() const { return State == NotAModuleDecl; } + StringRef getName() const { assert(isNamedModule() && "Can't get name from a non named module"); return Name; @@ -3091,6 +3099,9 @@ class Preprocessor { bool setDeserializedSafeBufferOptOutMap( const SmallVectorImpl<SourceLocation> &SrcLocSeqs); + /// Whether allow C++ module directive. + bool hasSeenNoTrivialPPDirective() const; + private: /// Helper functions to forward lexing to the actual lexer. They all share the /// same signature. diff --git a/clang/include/clang/Lex/Token.h b/clang/include/clang/Lex/Token.h index fc43e72593b94..c493571e00038 100644 --- a/clang/include/clang/Lex/Token.h +++ b/clang/include/clang/Lex/Token.h @@ -86,12 +86,10 @@ class Token { // macro stringizing or charizing operator. CommaAfterElided = 0x200, // The comma following this token was elided (MS). IsEditorPlaceholder = 0x400, // This identifier is a placeholder. - - IsReinjected = 0x800, // A phase 4 token that was produced before and - // re-added, e.g. via EnterTokenStream. Annotation - // tokens are *not* reinjected. - FirstPPToken = 0x1000, // This token is the first pp token in the - // translation unit. + IsReinjected = 0x800, // A phase 4 token that was produced before and + // re-added, e.g. via EnterTokenStream. Annotation + // tokens are *not* reinjected. + SeenNoTrivialPPDirective = 0x1000, }; tok::TokenKind getKind() const { return Kind; } @@ -321,8 +319,9 @@ class Token { /// lexer uses identifier tokens to represent placeholders. bool isEditorPlaceholder() const { return getFlag(IsEditorPlaceholder); } - /// Returns true if this token is the first pp-token. - bool isFirstPPToken() const { return getFlag(FirstPPToken); } + bool hasSeenNoTrivialPPDirective() const { + return getFlag(SeenNoTrivialPPDirective); + } }; /// Information about the conditional stack (\#if directives) diff --git a/clang/include/clang/Lex/TrivialDirectiveTracer.h b/clang/include/clang/Lex/TrivialDirectiveTracer.h new file mode 100644 index 0000000000000..9d4e0fdc96daf --- /dev/null +++ b/clang/include/clang/Lex/TrivialDirectiveTracer.h @@ -0,0 +1,388 @@ +//===--- TrivialDirectiveTracer.h -------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This file defines the TrivialDirectiveTracer interface. +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_CLANG_LEX_TRIVIAL_DIRECTIVE_TRACER_H +#define LLVM_CLANG_LEX_TRIVIAL_DIRECTIVE_TRACER_H + +#include "clang/Lex/PPCallbacks.h" + +namespace clang { +class Preprocessor; + +class TrivialDirectiveTracer : public PPCallbacks { + Preprocessor &PP; + bool InMainFile = true; + bool SeenNoTrivialPPDirective = false; + + void setSeenNoTrivialPPDirective(bool Val); + +public: + TrivialDirectiveTracer(Preprocessor &P) : PP(P) {} + + bool hasSeenNoTrivialPPDirective() const; + + /// Callback invoked whenever a source file is entered or exited. + /// + /// \param Loc Indicates the new location. + /// \param PrevFID the file that was exited if \p Reason is ExitFile or the + /// the file before the new one entered for \p Reason EnterFile. + void FileChanged(SourceLocation Loc, FileChangeReason Reason, + SrcMgr::CharacteristicKind FileType, + FileID PrevFID = FileID()) override; + + /// Callback invoked whenever the \p Lexer moves to a different file for + /// lexing. Unlike \p FileChanged line number directives and other related + /// pragmas do not trigger callbacks to \p LexedFileChanged. + /// + /// \param FID The \p FileID that the \p Lexer moved to. + /// + /// \param Reason Whether the \p Lexer entered a new file or exited one. + /// + /// \param FileType The \p CharacteristicKind of the file the \p Lexer moved + /// to. + /// + /// \param PrevFID The \p FileID the \p Lexer was using before the change. + /// + /// \param Loc The location where the \p Lexer entered a new file from or the + /// location that the \p Lexer moved into after exiting a file. + void LexedFileChanged(FileID FID, LexedFileChangeReason Reason, + SrcMgr::CharacteristicKind FileType, FileID PrevFID, + SourceLocation Loc) override; + + /// Callback invoked whenever an embed directive has been processed, + /// regardless of whether the embed will actually find a file. + /// + /// \param HashLoc The location of the '#' that starts the embed directive. + /// + /// \param FileName The name of the file being included, as written in the + /// source code. + /// + /// \param IsAngled Whether the file name was enclosed in angle brackets; + /// otherwise, it was enclosed in quotes. + /// + /// \param File The actual file that may be included by this embed directive. + /// + /// \param Params The parameters used by the directive. + void EmbedDirective(SourceLocation HashLoc, StringRef FileName, bool IsAngled, + OptionalFileEntryRef File, + const LexEmbedParametersResult &Params) override { + setSeenNoTrivialPPDirective(true); + } + + /// Callback invoked whenever an inclusion directive of + /// any kind (\c \#include, \c \#import, etc.) has been processed, regardless + /// of whether the inclusion will actually result in an inclusion. + /// + /// \param HashLoc The location of the '#' that starts the inclusion + /// directive. + /// + /// \param IncludeTok The token that indicates the kind of inclusion + /// directive, e.g., 'include' or 'import'. + /// + /// \param FileName The name of the file being included, as written in the + /// source code. + /// + /// \param IsAngled Whether the file name was enclosed in angle brackets; + /// otherwise, it was enclosed in quotes. + /// + /// \param FilenameRange The character range of the quotes or angle brackets + /// for the written file name. + /// + /// \param File The actual file that may be included by this inclusion + /// directive. + /// + /// \param SearchPath Contains the search path which was used to find the file + /// in the file system. If the file was found via an absolute include path, + /// SearchPath will be empty. For framework includes, the SearchPath and + /// RelativePath will be split up. For example, if an include of "Some/Some.h" + /// is found via the framework path + /// "path/to/Frameworks/Some.framework/Headers/Some.h", SearchPath will be + /// "path/to/Frameworks/Some.framework/Headers" and RelativePath will be + /// "Some.h". + /// + /// \param RelativePath The path relative to SearchPath, at which the include + /// file was found. This is equal to FileName except for framework includes. + /// + /// \param SuggestedModule The module suggested for this header, if any. + /// + /// \param ModuleImported Whether this include was translated into import of + /// \p SuggestedModule. + /// + /// \param FileType The characteristic kind, indicates whether a file or + /// directory holds normal user code, system code, or system code which is + /// implicitly 'extern "C"' in C++ mode. + /// + void InclusionDirective(SourceLocation HashLoc, const Token &IncludeTok, + StringRef FileName, bool IsAngled, + CharSourceRange FilenameRange, + OptionalFileEntryRef File, StringRef SearchPath, + StringRef RelativePath, const Module *SuggestedModule, + bool ModuleImported, + SrcMgr::CharacteristicKind FileType) override { + setSeenNoTrivialPPDirective(true); + } + + /// Callback invoked whenever there was an explicit module-import + /// syntax. + /// + /// \param ImportLoc The location of import directive token. + /// + /// \param Path The identifiers (and their locations) of the module + /// "path", e.g., "std.vector" would be split into "std" and "vector". + /// + /// \param Imported The imported module; can be null if importing failed. + /// + void moduleImport(SourceLocation ImportLoc, ModuleIdPath Path, + const Module *Imported) override { + setSeenNoTrivialPPDirective(true); + } + + /// Callback invoked when the end of the main file is reached. + /// + /// No subsequent callbacks will be made. + void EndOfMainFile() override { setSeenNoTrivialPPDirective(true); } + + /// Callback invoked when a \#ident or \#sccs directive is read. + /// \param Loc The location of the directive. + /// \param str The text of the directive. + /// + void Ident(SourceLocation Loc, StringRef str) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when start reading any pragma directive. + void PragmaDirective(SourceLocation Loc, + PragmaIntroducerKind Introducer) override {} + + /// Callback invoked when a \#pragma comment directive is read. + void PragmaComment(SourceLocation Loc, const IdentifierInfo *Kind, + StringRef Str) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma mark comment is read. + void PragmaMark(SourceLocation Loc, StringRef Trivia) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma detect_mismatch directive is + /// read. + void PragmaDetectMismatch(SourceLocation Loc, StringRef Name, + StringRef Value) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma clang __debug directive is read. + /// \param Loc The location of the debug directive. + /// \param DebugType The identifier following __debug. + void PragmaDebug(SourceLocation Loc, StringRef DebugType) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma message directive is read. + /// \param Loc The location of the message directive. + /// \param Namespace The namespace of the message directive. + /// \param Kind The type of the message directive. + /// \param Str The text of the message directive. + void PragmaMessage(SourceLocation Loc, StringRef Namespace, + PragmaMessageKind Kind, StringRef Str) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma gcc diagnostic push directive + /// is read. + void PragmaDiagnosticPush(SourceLocation Loc, StringRef Namespace) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma gcc diagnostic pop directive + /// is read. + void PragmaDiagnosticPop(SourceLocation Loc, StringRef Namespace) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma gcc diagnostic directive is read. + void PragmaDiagnostic(SourceLocation Loc, StringRef Namespace, + diag::Severity mapping, StringRef Str) override { + setSeenNoTrivialPPDirective(false); + } + + /// Called when an OpenCL extension is either disabled or + /// enabled with a pragma. + void PragmaOpenCLExtension(SourceLocation NameLoc, const IdentifierInfo *Name, + SourceLocation StateLoc, unsigned State) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma warning directive is read. + void PragmaWarning(SourceLocation Loc, PragmaWarningSpecifier WarningSpec, + ArrayRef<int> Ids) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma warning(push) directive is read. + void PragmaWarningPush(SourceLocation Loc, int Level) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma warning(pop) directive is read. + void PragmaWarningPop(SourceLocation Loc) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma execution_character_set(push) directive + /// is read. + void PragmaExecCharsetPush(SourceLocation Loc, StringRef Str) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma execution_character_set(pop) directive + /// is read. + void PragmaExecCharsetPop(SourceLocation Loc) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma clang assume_nonnull begin directive + /// is read. + void PragmaAssumeNonNullBegin(SourceLocation Loc) override { + setSeenNoTrivialPPDirective(false); + } + + /// Callback invoked when a \#pragma clang assume_nonnull end directive + /// is read. + void PragmaAssumeNonNullEnd(SourceLocation Loc) override { + setSeenNoTrivialPPDirective(false); + } + + /// Called by Preprocessor::HandleMacroExpandedIdentifier when a + /// macro invocation is found. + void MacroExpands(const Token &MacroNameTok, const MacroDefinition &MD, + SourceRange Range, const MacroArgs *Args) override; + + /// Hook called whenever a macro definition is seen. + void MacroDefined(const Token &MacroNameTok, + const MacroDirective *MD) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever a macro \#undef is seen. + /// \param MacroNameTok The active Token + /// \param MD A MacroDefinition for the named macro. + /// \param Undef New MacroDirective if the macro was defined, null otherwise. + /// + /// MD is released immediately following this callback. + void MacroUndefined(const Token &MacroNameTok, const MacroDefinition &MD, + const MacroDirective *Undef) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever the 'defined' operator is seen. + /// \param MD The MacroDirective if the name was a macro, null otherwise. + void Defined(const Token &MacroNameTok, const MacroDefinition &MD, + SourceRange Range) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#if is seen. + /// \param Loc the source location of the directive. + /// \param ConditionRange The SourceRange of the expression being tested. + /// \param ConditionValue The evaluated value of the condition. + /// + // FIXME: better to pass in a list (or tree!) of Tokens. + void If(SourceLocation Loc, SourceRange ConditionRange, + ConditionValueKind ConditionValue) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#elif is seen. + /// \param Loc the source location of the directive. + /// \param ConditionRange The SourceRange of the expression being tested. + /// \param ConditionValue The evaluated value of the condition. + /// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive. + // FIXME: better to pass in a list (or tree!) of Tokens. + void Elif(SourceLocation Loc, SourceRange ConditionRange, + ConditionValueKind ConditionValue, SourceLocation IfLoc) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#ifdef is seen. + /// \param Loc the source location of the directive. + /// \param MacroNameTok Information on the token being tested. + /// \param MD The MacroDefinition if the name was a macro, null otherwise. + void Ifdef(SourceLocation Loc, const Token &MacroNameTok, + const MacroDefinition &MD) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#elifdef branch is taken. + /// \param Loc the source location of the directive. + /// \param MacroNameTok Information on the token being tested. + /// \param MD The MacroDefinition if the name was a macro, null otherwise. + void Elifdef(SourceLocation Loc, const Token &MacroNameTok, + const MacroDefinition &MD) override { + setSeenNoTrivialPPDirective(true); + } + /// Hook called whenever an \#elifdef is skipped. + /// \param Loc the source location of the directive. + /// \param ConditionRange The SourceRange of the expression being tested. + /// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive. + // FIXME: better to pass in a list (or tree!) of Tokens. + void Elifdef(SourceLocation Loc, SourceRange ConditionRange, + SourceLocation IfLoc) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#ifndef is seen. + /// \param Loc the source location of the directive. + /// \param MacroNameTok Information on the token being tested. + /// \param MD The MacroDefiniton if the name was a macro, null otherwise. + void Ifndef(SourceLocation Loc, const Token &MacroNameTok, + const MacroDefinition &MD) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#elifndef branch is taken. + /// \param Loc the source location of the directive. + /// \param MacroNameTok Information on the token being tested. + /// \param MD The MacroDefinition if the name was a macro, null otherwise. + void Elifndef(SourceLocation Loc, const Token &MacroNameTok, + const MacroDefinition &MD) override { + setSeenNoTrivialPPDirective(true); + } + /// Hook called whenever an \#elifndef is skipped. + /// \param Loc the source location of the directive. + /// \param ConditionRange The SourceRange of the expression being tested. + /// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive. + // FIXME: better to pass in a list (or tree!) of Tokens. + void Elifndef(SourceLocation Loc, SourceRange ConditionRange, + SourceLocation IfLoc) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#else is seen. + /// \param Loc the source location of the directive. + /// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive. + void Else(SourceLocation Loc, SourceLocation IfLoc) override { + setSeenNoTrivialPPDirective(true); + } + + /// Hook called whenever an \#endif is seen. + /// \param Loc the source location of the directive. + /// \param IfLoc the source location of the \#if/\#ifdef/\#ifndef directive. + void Endif(SourceLocation Loc, SourceLocation IfLoc) override { + setSeenNoTrivialPPDirective(true); + } +}; + +} // namespace clang + +#endif // LLVM_CLANG_LEX_TRIVIAL_DIRECTIVE_TRACER... [truncated] |
| CC @boris-kolpackov I can't find you in the github pull request reviewer list, feel free to comments! |
| I'm not sure "no trivial directive" is a good name, but I'm terrible at naming things, so I'd be happy to change it if you have any suggestions. |
…ndle no-trivial pp-directives Signed-off-by: yronglin <yronglin777@gmail.com>
erichkeane left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this, but modules stuff should be reviewed by @Bigcheese
Signed-off-by: yronglin <yronglin777@gmail.com>
Signed-off-by: yronglin <yronglin777@gmail.com>
| @erichkeane Thanks for your review! |
| ✅ With the latest revision this PR passed the C/C++ code formatter. |
… unit tests Signed-off-by: yronglin <yronglin777@gmail.com>
| Thanks for the review! |
| Thanks for fixing this. Will the fix end up in version 21? |
I think we need to backport this. |
Yes, please. Without this fix modules will be unusable in |
…#153641) Consider the following code: ```cpp # 1 __FILE__ 1 3 export module a; ``` According to the wording in [P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html): ``` A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.) ``` and the wording in [[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file) ``` module-file: pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt] ``` `#` is the first pp-token in the translation unit, and it was rejected by clang, but they really should be exempted from this rule. The goal is to not allow any preprocessor conditionals or most state changes, but these don't fit that. State change would mean most semantically observable preprocessor state, particularly anything that is order dependent. Global flags like being a system header/module shouldn't matter. We should exempt a brunch of directives, even though it violates the current standard wording. In this patch, we introduce a `TrivialDirectiveTracer` to trace the **State change** that described above and propose to exempt the following kind of directive: `#line`, GNU line marker, `#ident`, `#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma warning`, `#pragma execution_character_set`, `#pragma clang assume_nonnull` and builtin macro expansion. Fixes llvm#145274 --------- Signed-off-by: yronglin <yronglin777@gmail.com>
| /cherry-pick e6e874c |
Error: Command failed due to missing milestone. |
| /cherry-pick e6e874c |
| /pull-request #154077 |
…#153641) Consider the following code: ```cpp # 1 __FILE__ 1 3 export module a; ``` According to the wording in [P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html): ``` A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.) ``` and the wording in [[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file) ``` module-file: pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt] ``` `#` is the first pp-token in the translation unit, and it was rejected by clang, but they really should be exempted from this rule. The goal is to not allow any preprocessor conditionals or most state changes, but these don't fit that. State change would mean most semantically observable preprocessor state, particularly anything that is order dependent. Global flags like being a system header/module shouldn't matter. We should exempt a brunch of directives, even though it violates the current standard wording. In this patch, we introduce a `TrivialDirectiveTracer` to trace the **State change** that described above and propose to exempt the following kind of directive: `#line`, GNU line marker, `#ident`, `#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma warning`, `#pragma execution_character_set`, `#pragma clang assume_nonnull` and builtin macro expansion. Fixes llvm#145274 --------- Signed-off-by: yronglin <yronglin777@gmail.com> (cherry picked from commit e6e874c)
Since the following 2 patches was landed, mark P1857R3 as partial implemented. - [[C++][Modules] A module directive may only appear as the first preprocessing tokens in a file](#144233). - [[clang] Allow trivial pp-directives before C++ module directive](#153641). --------- Signed-off-by: yronglin <yronglin777@gmail.com> Co-authored-by: h-vetinari <h.vetinari@gmx.com> Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
Since the following 2 patches was landed, mark P1857R3 as partial implemented. - [[C++][Modules] A module directive may only appear as the first preprocessing tokens in a file](llvm/llvm-project#144233). - [[clang] Allow trivial pp-directives before C++ module directive](llvm/llvm-project#153641). --------- Signed-off-by: yronglin <yronglin777@gmail.com> Co-authored-by: h-vetinari <h.vetinari@gmx.com> Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
Since the following 2 patches was landed, mark P1857R3 as partial implemented. - [[C++][Modules] A module directive may only appear as the first preprocessing tokens in a file](llvm#144233). - [[clang] Allow trivial pp-directives before C++ module directive](llvm#153641). --------- Signed-off-by: yronglin <yronglin777@gmail.com> Co-authored-by: h-vetinari <h.vetinari@gmx.com> Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
Since the following 2 patches was landed, mark P1857R3 as partial implemented. - [[C++][Modules] A module directive may only appear as the first preprocessing tokens in a file](llvm#144233). - [[clang] Allow trivial pp-directives before C++ module directive](llvm#153641). --------- Signed-off-by: yronglin <yronglin777@gmail.com> Co-authored-by: h-vetinari <h.vetinari@gmx.com> Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
Consider the following code:
According to the wording in P1857R3:
and the wording in [cpp.pre]
#is the first pp-token in the translation unit, and it was rejected by clang, but they really should be exempted from this rule. The goal is to not allow any preprocessor conditionals or most state changes, but these don't fit that.State change would mean most semantically observable preprocessor state, particularly anything that is order dependent. Global flags like being a system header/module shouldn't matter.
We should exempt a brunch of directives, even though it violates the current standard wording.
In this patch, we introduce a
TrivialDirectiveTracerto trace the State change that described above and propose to exempt the following kind of directive:#line, GNU line marker,#ident,#pragma comment,#pragma mark,#pragma detect_mismatch,#pragma clang __debug,#pragma message,#pragma GCC warning,#pragma GCC error,#pragma gcc diagnostic,#pragma OPENCL EXTENSION,#pragma warning,#pragma execution_character_set,#pragma clang assume_nonnulland builtin macro expansion.Fixes #145274