2

Recently, I've been seeing a wide variety of memory-related crashes at work (C++ applications on Ubuntu 14.04). Our applications use a properietary third party library to interface with real time data and are heavily multi-threaded. The errors spat out by glibc have been "double free or corruption" and "detected corrupted double-linked list" under various circumstances. The cores recovered from these show crashes occurring both in the third party code (more frequently) and in our own code (less frequently). The unifying characteristic is that every crash corresponds with a new/malloc or a delete/free.

Unfortunately, there is very little room for a debugger in this environment. The applications are highly speed sensitive and the input data and operating circumstances can only be approximated when seeking to reproduce a crash because everything in the environment is real time.

That being said, my question is this: what utilities exist that might illuminate exactly which parts of what process are touching memory they shouldn't be? Since crashes can't be reproduced reliably, my only option procedurally is to capture data and then examine it after a crash has occurred. To that end, I found OProfile and Sysprof, but I'm not sure if either are appropriate tools for the task.

I apologize for the vagueness of the question and am grateful in advance for any guidance on this issue.

10
  • "Since crashes can't be reproduced reliably" -> Why not? See here and here. Of course, if you don't have the source for the proprietary parts, having a core dump might not be very helpful, but it at least would be for your own code. It may also be possible to locate the error by doing test runs inside valgrind for a while. Commented Jul 8, 2015 at 15:23
  • Yeah, there are fairly reliable cores and we do have the source for the third party library. So, that works out. The reason the crashes can't be reproduced is because the input data for our applications is generated in real time and entirely externally. We don't store it, so we can't reproduce any given execution exactly. Commented Jul 8, 2015 at 15:37
  • 2
    Also: "what process are touching memory they shouldn't be?" -> It's whichever process crashed. It is not some other process stepping on them. The OS will not allow that, which is exactly why you get slapped for things like double free or corruption. The error is in the crashed process, not anywhere else. Commented Jul 8, 2015 at 15:37
  • Indeed. I suppose it's more of a question of whether it's third party code or our code. But since very similarly presenting errors occur in both, I've gone looking for tools to shed light. You mentioned valgrind for this; I'll do my due Googling, but do you have a favored reference for this type of query? Commented Jul 8, 2015 at 15:39
  • Valgrind has a home page and copious documentation. It will catch memory access violations and provide a backtrace. You can also get a back trace from a core dump, although again this is most useful if you have the source and can provide a binary with debugging symbols in. Without that it is much sketchier to pin anything down -- but I think that is a logical restriction which will apply no matter what; the information you need simply doesn't exist in a useful form. Commented Jul 8, 2015 at 15:46

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.