38

The problem

On our ASP .net website I keep getting wrong line numbers in stack traces of exceptions. I am talking about our live environment. There seems to be a pattern: The stack trace will always point to the line that contains the method's closing curly brackets.

For Example:

public class Foo { public void Bar() { object someObject = null; someObject.ToString(); /* arbitrarily more lines of code */ } // this line will be the one that the stack trace points to } 

More details

  • To be clear: This does not only happen for some method(s), it happens for every exception that we log. So I would rule out (JIT) optimizations here, that might lead to line numbers being seemingly randomly off. What bothers me is, that the wrong line numbers seem to consistently point to the closing curly brackets of the containing method.

  • Note that before .net 4.6 there actually was a bug in the framework, such that if you compiled targeting x64, exactly this would happen. However, Microsoft confirmed that this has been fixed. Also running a minimal example app for this, reaffirms their claim. But for some reason it still happens on the web servers.

  • It puzzles me even more that it does not happen in our test and development environments. The test and live systems are pretty similarly set up. The only real difference is that live we are running Windows Server 2012, while on the test system we are still using Windows Server 2008.

What I have checked

  • pdb files are valid (tested with chkmatch)
  • Compile time code optimizations are not the issue
  • The aforementioned bug in .net before 4.6 cannot be reproduced with a minimal example
  • .NET versions are exactly the same
  • Build comes from the same deploy script

Any clues towards solving this problem are highly appreciated. If you know anything that we could check, please let me know.

14
  • 8
    Is there the possibility that the live build is different than your development environments so that when comparing code you are getting incorrect line numbers. Commented May 6, 2016 at 12:17
  • 4
    Going the other way around, can you clone your live environment into a dev environment for further testing? If you're not using a virtual server of some sort, there is a Sysinternals tool called Disk2Vhd (but I've not tried it myself) technet.microsoft.com/en-us/sysinternals/ee656415.aspx Commented Jul 13, 2016 at 13:12
  • 1
    Do your methods contain multiple throw methods? This describes a gotcha with the .net framework using structured exception handling in windows. stackoverflow.com/a/2494150/141022 Commented Jul 13, 2016 at 13:17
  • 2
    I think @AlexKeySmith's idea is a smart one. Copy the deployed application from your live environment to your test and see if it behaves the same. If the behavior is still the same (wrong line number) then you know it has something to do with your project / build config because the compiled code differs. If the behavior is now the same (line numbers are correct) then it is environment specific. This approach will narrow down where you have to look. Commented Jul 13, 2016 at 13:30
  • 2
    Side note - this is how our environments behave as well but it is expected because we compile with configuration release to our live servers and our build script forces flag 'Optimize code` so line numbers will never match. On our test servers we compile with configuration debug and the script disabled/removes flag Optimize code so line numbers are always matched in any stack trace. See Common MSBuild Project Properties flag Optimize. Commented Jul 13, 2016 at 13:36

3 Answers 3

29
+25

Compilation optimization as well as runtime optimization can change the actual execution, as well as the call stack information constructed. So you cannot treat the numbers that seriously.

More information can be found in posts such as Scott Hanselman's: Release IS NOT Debug: 64bit Optimizations and C# Method Inlining in Release Build Call Stacks.

It would be too difficult to troubleshoot a specific case without touching your bits. But if you know of WinDbg you might dive deeper, by live debugging the application when the exception occurs. Then you can dump the actual jitted machine code as well as other runtime information, so as to determine how the line number is constructed.

Sign up to request clarification or add additional context in comments.

1 Comment

> So you cannot treat the numbers that seriously. But this is basic debugability. We need the exact line numbers. Tools that do optimizaiton must emit some mapping so debugging and monitoring tools can find the right line number given a crash dump of sorts
16

Thanks everyone for your help! We found out that the SCOM APM agent running on the machines in production caused our application to log wrong line numbers in the stack traces. As soon as we deactivated the agent, the line numbers were correct.

2 Comments

It is a very interesting fact, but I guess this agent injects certain dlls into your process, which polluted the call stack and led to the offset. You might do some live debugging to confirm if you are familiar with tools such as WinDbg.
I would guess that the problem is stack unwinding. Inside SCOM you would be able to see any exception with correct(!) line numbers. So probably some process was injected that unwound the stack, leaving your application with an incomplete stacktrace.
3

In latest .NET Core versions (aka .NET 6.0) if you have "Ready to run" JIT optimization enabled (PublishReadyToRun in publishing settings), this essentially renders line numbers inaccurate too.

Disabling it might help. But this comes as a performance trade obviously.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.