2

I've run into this problem in my current project, which requires reasoning about code at the binary level.

I think we can determine the starting location of all functions in a program by looking at the operand to CALL instructions. After we have this list, can we determine which function encloses an address by simply searching backward until we find a start address? IE is the start address of the function enclosing an instruction the greatest function address that is less than the instruction address?

If the above method is not correct, is there another way to find the starting address of the function enclosing an instruction?

edit: Added clarification of the question.

edit2: My method is probably wrong. Compilers are not guaranteed to place function bodies in contiguous regions of machine code.

2
  • 1
    Assembly language is not even required to use functions. It could just be a big spaghetti mess of gotos. Commented May 29, 2012 at 3:08
  • You're right in the context of assembly language. The context of this is the output of a compiled language. Commented May 29, 2012 at 3:55

2 Answers 2

3

You need to constrain your problem space more. Even when constrained just to "the output of a compiled language", compilers nowadays are good at blurring the boundaries between functions. Inlining means one function can be enclosed within another. Tail-call optimization transfers control between two functions without a CALL instruction. Profile-guided optimization can create discontiguous functions. Code flow analysis and noreturn hints can result in code falling through to data. Jump tables mean that data can fall through to code without a CALL target. The only reliable way is to have the compiler explicitly tell you the instruction-to-function mapping, say via debug information. You didn't say what platform you're using, so it's hard to give more specific information.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. Originally, I was hoping to be able to do this for any compiled binary, but it seems like I will have to constrain to binaries compilied from C and use debug information.
0

No, assembly code can do all sorts of funky things. One call might jump completely over another function entirely, jump backwards, or into another module.

3 Comments

In general you can't determine it. Your actual situation will vary with how the function binary code is constructed; for some compilers, rscheme might work. But in general, you can't trust what appears to be instructions, to in fact be instructions. If you can't count on that, you can't possibly trace your way "backwards" to the function start.
Debuggers are able to do this. Should it be possible with debugging information?
Debugging information ought to include each function's starting address, in order to be useful. So you can start at each function's starting address and go forwards in order to construct lists of instructions that might be exectued by each function. Then, given an address of a particular instruction, you can figure out which function(s) might execute that instruction.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.