- Notifications
You must be signed in to change notification settings - Fork 15.3k
Thread Safety Analysis: Basic capability alias-analysis #142955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @aaronpuchert - RFC regarding basic capability alias analysis. For the bare minimum this would work, and likely covers 90% of the cases I worry about. I believe later enhancements could be built on top. There might be something I'm missing though. Kindly take a look. |
c2dcde6 to 608c4f6 Compare | Cleaned it up some more. |
| @llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-analysis Author: Marco Elver (melver) ChangesAdd a simple form of alias analysis for capabilities by substituting local pointer variables with their initializers if they are For example, the analysis will no longer generate false positives for cases such as: This implementation would satisfy the basic needs of addressing the concerns for Linux kernel application [1]. Current limitations:
Link: https://lore.kernel.org/all/CANpmjNPquO=W1JAh1FNQb8pMQjgeZAKCPQUAd7qUg=5pjJ6x=Q@mail.gmail.com/ [1] Full diff: https://github.com/llvm/llvm-project/pull/142955.diff 5 Files Affected:
diff --git a/clang/include/clang/Analysis/Analyses/ThreadSafetyCommon.h b/clang/include/clang/Analysis/Analyses/ThreadSafetyCommon.h index 6c97905a2d7f9..cd2c278c2278f 100644 --- a/clang/include/clang/Analysis/Analyses/ThreadSafetyCommon.h +++ b/clang/include/clang/Analysis/Analyses/ThreadSafetyCommon.h @@ -395,7 +395,7 @@ class SExprBuilder { CapabilityExpr translateAttrExpr(const Expr *AttrExp, CallingContext *Ctx); // Translate a variable reference. - til::LiteralPtr *createVariable(const VarDecl *VD); + til::SExpr *createVariable(const VarDecl *VD, CallingContext *Ctx = nullptr); // Translate a clang statement or expression to a TIL expression. // Also performs substitution of variables; Ctx provides the context. @@ -501,6 +501,9 @@ class SExprBuilder { void mergeEntryMapBackEdge(); void mergePhiNodesBackEdge(const CFGBlock *Blk); + // Returns true if a variable is assumed to be reassigned. + bool isVariableReassigned(const VarDecl *VD); + private: // Set to true when parsing capability expressions, which get translated // inaccurately in order to hack around smart pointers etc. @@ -531,6 +534,9 @@ class SExprBuilder { std::vector<til::Phi *> IncompleteArgs; til::BasicBlock *CurrentBB = nullptr; BlockInfo *CurrentBlockInfo = nullptr; + + // Map caching if a local variable is reassigned. + llvm::DenseMap<const VarDecl *, bool> LocalVariableReassigned; }; #ifndef NDEBUG diff --git a/clang/lib/Analysis/ThreadSafety.cpp b/clang/lib/Analysis/ThreadSafety.cpp index 80e7c8eff671a..a0ab241125d1c 100644 --- a/clang/lib/Analysis/ThreadSafety.cpp +++ b/clang/lib/Analysis/ThreadSafety.cpp @@ -1141,11 +1141,10 @@ class ThreadSafetyAnalyzer { void warnIfMutexNotHeld(const FactSet &FSet, const NamedDecl *D, const Expr *Exp, AccessKind AK, Expr *MutexExp, - ProtectedOperationKind POK, til::LiteralPtr *Self, + ProtectedOperationKind POK, til::SExpr *Self, SourceLocation Loc); void warnIfMutexHeld(const FactSet &FSet, const NamedDecl *D, const Expr *Exp, - Expr *MutexExp, til::LiteralPtr *Self, - SourceLocation Loc); + Expr *MutexExp, til::SExpr *Self, SourceLocation Loc); void checkAccess(const FactSet &FSet, const Expr *Exp, AccessKind AK, ProtectedOperationKind POK); @@ -1599,7 +1598,7 @@ class BuildLockset : public ConstStmtVisitor<BuildLockset> { } void handleCall(const Expr *Exp, const NamedDecl *D, - til::LiteralPtr *Self = nullptr, + til::SExpr *Self = nullptr, SourceLocation Loc = SourceLocation()); void examineArguments(const FunctionDecl *FD, CallExpr::const_arg_iterator ArgBegin, @@ -1629,7 +1628,7 @@ class BuildLockset : public ConstStmtVisitor<BuildLockset> { /// of at least the passed in AccessKind. void ThreadSafetyAnalyzer::warnIfMutexNotHeld( const FactSet &FSet, const NamedDecl *D, const Expr *Exp, AccessKind AK, - Expr *MutexExp, ProtectedOperationKind POK, til::LiteralPtr *Self, + Expr *MutexExp, ProtectedOperationKind POK, til::SExpr *Self, SourceLocation Loc) { LockKind LK = getLockKindFromAccessKind(AK); CapabilityExpr Cp = SxBuilder.translateAttrExpr(MutexExp, D, Exp, Self); @@ -1688,8 +1687,7 @@ void ThreadSafetyAnalyzer::warnIfMutexNotHeld( /// Warn if the LSet contains the given lock. void ThreadSafetyAnalyzer::warnIfMutexHeld(const FactSet &FSet, const NamedDecl *D, const Expr *Exp, - Expr *MutexExp, - til::LiteralPtr *Self, + Expr *MutexExp, til::SExpr *Self, SourceLocation Loc) { CapabilityExpr Cp = SxBuilder.translateAttrExpr(MutexExp, D, Exp, Self); if (Cp.isInvalid()) { @@ -1857,7 +1855,7 @@ void ThreadSafetyAnalyzer::checkPtAccess(const FactSet &FSet, const Expr *Exp, /// of an implicitly called cleanup function. /// \param Loc If \p Exp = nullptr, the location. void BuildLockset::handleCall(const Expr *Exp, const NamedDecl *D, - til::LiteralPtr *Self, SourceLocation Loc) { + til::SExpr *Self, SourceLocation Loc) { CapExprSet ExclusiveLocksToAdd, SharedLocksToAdd; CapExprSet ExclusiveLocksToRemove, SharedLocksToRemove, GenericLocksToRemove; CapExprSet ScopedReqsAndExcludes; @@ -1869,7 +1867,7 @@ void BuildLockset::handleCall(const Expr *Exp, const NamedDecl *D, const auto *TagT = Exp->getType()->getAs<TagType>(); if (D->hasAttrs() && TagT && Exp->isPRValue()) { til::LiteralPtr *Placeholder = - Analyzer->SxBuilder.createVariable(nullptr); + cast<til::LiteralPtr>(Analyzer->SxBuilder.createVariable(nullptr)); [[maybe_unused]] auto inserted = Analyzer->ConstructedObjects.insert({Exp, Placeholder}); assert(inserted.second && "Are we visiting the same expression again?"); diff --git a/clang/lib/Analysis/ThreadSafetyCommon.cpp b/clang/lib/Analysis/ThreadSafetyCommon.cpp index ddbd0a9ca904b..d95dfa087fdf0 100644 --- a/clang/lib/Analysis/ThreadSafetyCommon.cpp +++ b/clang/lib/Analysis/ThreadSafetyCommon.cpp @@ -19,6 +19,7 @@ #include "clang/AST/Expr.h" #include "clang/AST/ExprCXX.h" #include "clang/AST/OperationKinds.h" +#include "clang/AST/RecursiveASTVisitor.h" #include "clang/AST/Stmt.h" #include "clang/AST/Type.h" #include "clang/Analysis/Analyses/ThreadSafetyTIL.h" @@ -241,7 +242,22 @@ CapabilityExpr SExprBuilder::translateAttrExpr(const Expr *AttrExp, return CapabilityExpr(E, AttrExp->getType(), Neg); } -til::LiteralPtr *SExprBuilder::createVariable(const VarDecl *VD) { +til::SExpr *SExprBuilder::createVariable(const VarDecl *VD, + CallingContext *Ctx) { + if (VD) { + // Substitute local pointer variables with their initializers if they are + // explicitly const or never reassigned. + QualType Ty = VD->getType(); + if (Ty->isPointerType() && VD->hasInit() && !isVariableReassigned(VD)) { + const Expr *Init = VD->getInit()->IgnoreParenImpCasts(); + // Check for self-initialization to prevent infinite recursion. + if (const auto *InitDRE = dyn_cast<DeclRefExpr>(Init)) { + if (InitDRE->getDecl()->getCanonicalDecl() == VD->getCanonicalDecl()) + return new (Arena) til::LiteralPtr(VD); + } + return translate(Init, Ctx); + } + } return new (Arena) til::LiteralPtr(VD); } @@ -353,6 +369,9 @@ til::SExpr *SExprBuilder::translateDeclRefExpr(const DeclRefExpr *DRE, : cast<ObjCMethodDecl>(D)->getCanonicalDecl()->getParamDecl(I); } + if (const auto *VarD = dyn_cast<VarDecl>(VD)) + return createVariable(VarD, Ctx); + // For non-local variables, treat it as a reference to a named object. return new (Arena) til::LiteralPtr(VD); } @@ -1012,6 +1031,63 @@ void SExprBuilder::exitCFG(const CFGBlock *Last) { IncompleteArgs.clear(); } +bool SExprBuilder::isVariableReassigned(const VarDecl *VD) { + // Note: The search is performed lazily per-variable and result is cached. An + // alternative would have been to eagerly create a set of all reassigned + // variables, but that would consume significantly more memory. The number of + // variables needing the reassignment check in a single function is expected + // to be small. Also see createVariable(). + struct ReassignmentFinder : RecursiveASTVisitor<ReassignmentFinder> { + explicit ReassignmentFinder(const VarDecl *VD) : QueryVD(VD) { + assert(QueryVD->getCanonicalDecl() == QueryVD); + } + + bool VisitBinaryOperator(BinaryOperator *BO) { + if (!BO->isAssignmentOp()) + return true; + auto *DRE = dyn_cast<DeclRefExpr>(BO->getLHS()->IgnoreParenImpCasts()); + if (!DRE) + return true; + auto *AssignedVD = dyn_cast<VarDecl>(DRE->getDecl()); + if (!AssignedVD) + return true; + // Don't count the initialization itself as a reassignment. + if (AssignedVD->hasInit() && + AssignedVD->getInit()->getBeginLoc() == BO->getBeginLoc()) + return true; + // If query variable appears as LHS of assignment. + if (QueryVD == AssignedVD->getCanonicalDecl()) { + FoundReassignment = true; + return false; // stop + } + return true; + } + + const VarDecl *QueryVD; + bool FoundReassignment = false; + }; + + if (VD->getType().isConstQualified()) + return false; // Assume UB-freedom. + if (!VD->isLocalVarDecl()) + return true; // Not a local variable (assume reassigned). + auto *FD = dyn_cast<FunctionDecl>(VD->getDeclContext()); + if (!FD) + return true; // Assume reassigned. + + // Try to look up in cache; use the canonical declaration to ensure consistent + // lookup in the cache. + VD = VD->getCanonicalDecl(); + auto It = LocalVariableReassigned.find(VD); + if (It != LocalVariableReassigned.end()) + return It->second; + + ReassignmentFinder Visitor(VD); + // const_cast ok: FunctionDecl not modified. + Visitor.TraverseDecl(const_cast<FunctionDecl *>(FD)); + return LocalVariableReassigned[VD] = Visitor.FoundReassignment; +} + #ifndef NDEBUG namespace { diff --git a/clang/test/Sema/warn-thread-safety-analysis.c b/clang/test/Sema/warn-thread-safety-analysis.c index b43f97af9162e..549cb1231baa6 100644 --- a/clang/test/Sema/warn-thread-safety-analysis.c +++ b/clang/test/Sema/warn-thread-safety-analysis.c @@ -184,9 +184,13 @@ int main(void) { /// Cleanup functions { struct Mutex* const __attribute__((cleanup(unlock_scope))) scope = &mu1; - mutex_exclusive_lock(scope); // Note that we have to lock through scope, because no alias analysis! + mutex_exclusive_lock(scope); // Lock through scope works. // Cleanup happens automatically -> no warning. } + { + struct Mutex* const __attribute__((unused, cleanup(unlock_scope))) scope = &mu1; + mutex_exclusive_lock(&mu1); // With basic alias analysis lock through mu1 also works. + } foo_.a_value = 0; // expected-warning {{writing variable 'a_value' requires holding mutex 'mu_' exclusively}} *foo_.a_ptr = 1; // expected-warning {{writing the value pointed to by 'a_ptr' requires holding mutex 'bar.other_mu' exclusively}} diff --git a/clang/test/SemaCXX/warn-thread-safety-analysis.cpp b/clang/test/SemaCXX/warn-thread-safety-analysis.cpp index d64ed1e5f260a..da5fd350daf74 100644 --- a/clang/test/SemaCXX/warn-thread-safety-analysis.cpp +++ b/clang/test/SemaCXX/warn-thread-safety-analysis.cpp @@ -1556,9 +1556,9 @@ void main() { Child *c; Base *b = c; - b->func1(); // expected-warning {{calling function 'func1' requires holding mutex 'b->mu_' exclusively}} + b->func1(); // expected-warning {{calling function 'func1' requires holding mutex 'c->mu_' exclusively}} b->mu_.Lock(); - b->func2(); // expected-warning {{cannot call function 'func2' while mutex 'b->mu_' is held}} + b->func2(); // expected-warning {{cannot call function 'func2' while mutex 'c->mu_' is held}} b->mu_.Unlock(); c->func1(); // expected-warning {{calling function 'func1' requires holding mutex 'c->mu_' exclusively}} @@ -7224,3 +7224,101 @@ class TestNegativeWithReentrantMutex { }; } // namespace Reentrancy + +// Tests for tracking aliases of capabilities. +namespace CapabilityAliases { +struct Foo { + Mutex mu; + int data GUARDED_BY(mu); +}; + +void testBasicPointerAlias(Foo *f) { + Foo *ptr = f; + ptr->mu.Lock(); // lock through alias + f->data = 42; // access through original + ptr->mu.Unlock(); // unlock through alias +} + +// FIXME: No alias tracking for non-const reassigned pointers. +void testReassignment() { + Foo f1, f2; + Foo *ptr = &f1; + ptr->mu.Lock(); + f1.data = 42; // expected-warning{{writing variable 'data' requires holding mutex 'f1.mu' exclusively}} \ + // expected-note{{found near match 'ptr->mu'}} + ptr->mu.Unlock(); + + ptr = &f2; // reassign + ptr->mu.Lock(); + f2.data = 42; // expected-warning{{writing variable 'data' requires holding mutex 'f2.mu' exclusively}} \ + // expected-note{{found near match 'ptr->mu'}} + f1.data = 42; // expected-warning{{writing variable 'data' requires holding mutex 'f1.mu'}} \ + // expected-note{{found near match 'ptr->mu'}} + ptr->mu.Unlock(); +} + +// Nested field access through pointer +struct Container { + Foo foo; +}; + +void testNestedAccess(Container *c) { + Foo *ptr = &c->foo; // pointer to nested field + ptr->mu.Lock(); + c->foo.data = 42; + ptr->mu.Unlock(); +} + +void testNestedAcquire(Container *c) EXCLUSIVE_LOCK_FUNCTION(&c->foo.mu) { + Foo *buf = &c->foo; + buf->mu.Lock(); +} + +struct ContainerOfPtr { + Foo *foo_ptr; +}; + +void testIndirectAccess(ContainerOfPtr *fc) { + Foo *ptr = fc->foo_ptr; // get pointer + ptr->mu.Lock(); + fc->foo_ptr->data = 42; // access via original + ptr->mu.Unlock(); +} + +// FIXME: No alias tracking through complex control flow. +void testSimpleControlFlow(Foo *f1, Foo *f2, bool cond) { + Foo *ptr; + if (cond) { + ptr = f1; + } else { + ptr = f2; + } + ptr->mu.Lock(); + if (cond) { + f1->data = 42; // expected-warning{{writing variable 'data' requires holding mutex 'f1->mu' exclusively}} \ + // expected-note{{found near match 'ptr->mu'}} + } else { + f2->data = 42; // expected-warning{{writing variable 'data' requires holding mutex 'f2->mu' exclusively}} \ + // expected-note{{found near match 'ptr->mu'}} + } + ptr->mu.Unlock(); +} + +void testLockFunction(Foo *f) EXCLUSIVE_LOCK_FUNCTION(&f->mu) { + Mutex *mu = &f->mu; + mu->Lock(); +} + +void testUnlockFunction(Foo *f) UNLOCK_FUNCTION(&f->mu) { + Mutex *mu = &f->mu; + mu->Unlock(); +} + +// Semantically UB, but let's not crash the compiler with this (should be +// handled by -Wuninitialized). +void testSelfInit() { + Mutex *mu = mu; // don't do this at home + mu->Lock(); + mu->Unlock(); +} +} // namespace CapabilityAliases |
| Gentle ping. |
0964bd4 to ff8f6e2 Compare | Some more self-review, and fixes:
|
aaronpuchert left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, but I still need to wrap my head around this. For now just some very high-level comments.
277dc49 to 8185269 Compare | Thanks for the comments - tried to address most of them. PTAL. |
b6fd3cf to b4b995b Compare e5e2f35 to 07bc50c Compare | Just FYI - I rebased the kernel patches, and attempted to apply -Wthread-safety to kernel/sched/, which previously was impossible. With this PR, it does work with modest changes (most are annotations, only few some small code changes): https://git.kernel.org/pub/scm/linux/kernel/git/melver/linux.git/log/?h=cap-analysis/dev I also discovered that with this form of alias analysis, we can take care of a function acquiring a capability inside a returned object with a hack like this: ... which was necessary for taking care of kernel/sched. |
8a99864 to 471d932 Compare | Alright, I think this is as far as I can push it. This PR now contains 3 commits just to keep the story clear (happy to squash them if that's better). Besides the Linux kernel prototype integration, I took this version and tested it on our internal codebase that has used But because there's a real chance this alias analysis might confuse existing users with new warnings, we should hide it behind PTAL. |
860025e to c801def Compare | Squashed the loop-invariant alias fix, so this should address most comments - initially thought I'd preserve the history of how we got here, but this seems cleaner. Also moved the isStaticLocal() fix to the base commit, and the 2nd commit is now only about translateCXXNewExpr(). PTAL. |
87ff340 to 0bc6707 Compare
delesley left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now, but you'll need the current maintainers to sign off on it. :-)
0bc6707 to dc3e720 Compare | Dropped the translateCXXNewExpr() patch for now.
Thank you for the review! |
dc3e720 to 309bc52 Compare | Gentle ping - I want to submit this by end of next week. |
7bcd142 to 11f1c5b Compare Add basic alias analysis for capabilities by reusing LocalVariableMap, which tracks currently valid definitions of variables. Aliases created through complex control flow are not tracked. This implementation would satisfy the basic needs of addressing the concerns for Linux kernel application [1]. For example, the analysis will no longer generate false positives for cases such as (and many others): void testNestedAccess(Container *c) { Foo *ptr = &c->foo; ptr->mu.Lock(); c->foo.data = 42; // OK - no false positive ptr->mu.Unlock(); } void testNestedAcquire(Container *c) EXCLUSIVE_LOCK_FUNCTION(&c->foo.mu) { Foo *buf = &c->foo; buf->mu.Lock(); // OK - no false positive } Given the analysis is now able to identify potentially unsafe patterns it was not able to identify previously (see added FIXME test case for an example), mark alias resolution as a "beta" feature behind the flag `-Wthread-safety-beta`. Fixing LocalVariableMap: It was found that LocalVariableMap was not properly tracking loop-invariant aliases: the old implementation failed because the merge logic compared raw VarDefinition IDs. The algorithm for handling back-edges (in createReferenceContext()) generates new 'reference' definitions for loop-scoped variables. Later ID comparison caused alias invalidation at back-edge merges (in intersectBackEdge()) and at subsequent forward-merges with non-loop paths (in intersectContexts()). Fix LocalVariableMap by adding the getCanonicalDefinitionID() helper that resolves any definition ID down to its non-reference base. As a result, a variable's definition is preserved across control-flow merges as long as its underlying canonical definition remains the same. Link: https://lore.kernel.org/all/CANpmjNPquO=W1JAh1FNQb8pMQjgeZAKCPQUAd7qUg=5pjJ6x=Q@mail.gmail.com/ [1] 11f1c5b to 241dc5b Compare | I will merge this later - if this causes problems, please revert, but since it's hidden behind |
Add basic alias analysis for capabilities by reusing LocalVariableMap,
which tracks currently valid definitions of variables. Aliases created
through complex control flow are not tracked. This implementation would
satisfy the basic needs of addressing the concerns for Linux kernel
application [1].
For example, the analysis will no longer generate false positives for
cases such as (and many others):
Given the analysis is now able to identify potentially unsafe patterns
it was not able to identify previously (see added FIXME test case for an
example), mark alias resolution as a "beta" feature behind the flag
-Wthread-safety-beta.Fixing LocalVariableMap: It was found that LocalVariableMap was not
properly tracking loop-invariant aliases: the old implementation failed
because the merge logic compared raw VarDefinition IDs. The algorithm
for handling back-edges (in createReferenceContext()) generates new
'reference' definitions for loop-scoped variables. Later ID comparison
caused alias invalidation at back-edge merges (in intersectBackEdge())
and at subsequent forward-merges with non-loop paths (in intersectContexts()).
Fix LocalVariableMap by adding the getCanonicalDefinitionID() helper
that resolves any definition ID down to its non-reference base. As a
result, a variable's definition is preserved across control-flow merges
as long as its underlying canonical definition remains the same.
Link: https://lore.kernel.org/all/CANpmjNPquO=W1JAh1FNQb8pMQjgeZAKCPQUAd7qUg=5pjJ6x=Q@mail.gmail.com/ [1]