2

My question is if i have some function

void func1(){ char * s = "hello"; char * c; int b; c = (char *) malloc(15); strcpy(c,s); } 

I think the s pointer is allocated on the stack but where is the data "hello" stored does that go in the data segment of the program? As for c and b they are unitialized and since 'c = some memory address' and it doesnt have one yet how does that work? and b also has no contents so it cant stored on the stack? Then when we allocate memory for c on the heap with malloc c now has some memory address, how is this unitialized c variable given the address of the first byte for that string on the heap?

4
  • 5
    The C standard does not specify how/where objects are stored, but only storage duration. Commented Oct 4, 2015 at 17:38
  • "hello" will be stored in DS . Pointers can be on stack or data depending on context . But b will be on stack . Commented Oct 4, 2015 at 17:41
  • Most any C compiler will store it in the text segment. No much point in allowing a program to change a literal. That it isn't const char* is an ancient C bug that's impossible to fix because it will break most any existing program :) Just try changing it, dollar to donuts it will say bang! Commented Oct 4, 2015 at 17:43
  • @HansPassant: Actually, it's usually the read-only DS where string literals are put on. Apart from that, all that you said still applies. Commented Oct 4, 2015 at 17:59

2 Answers 2

1

We need to consider what memory location a variable has and what its contents are. Keep this in mind.

For an int, the variable has a memory address and has a number as its contents.

For a char pointer, the variable has a memory address and its contents is a pointer to a string--the actual string data is at another memory location.

To understand this, we need to consider two things:

(1) the memory layout of a program (2) the memory layout of a function when it's been called

Program layout [typical]. Lower memory address to higher memory address:

code segment -- where instructions go: ... machine instructions for func1 ... data segment -- where initialized global variables and constants go: ... int myglobal_inited = 23; ... "hello" ... bss segment -- for unitialized globals: ... int myglobal_tbd; ... heap segment -- where malloc data is stored (grows upward towards higher memory addresses): ... stack segment -- starts at top memory address and grows downward toward end of heap

Now here's a stack frame for a function. It will be within the stack segment somewhere. Note, this is higher memory address to lower:

function arguments [if any]: arg2 arg1 arg0 function's return address [where it will go when it returns] function's stack/local variables: char *s char *c int b char buf[20] 

Note that I've added a "buf". If we changed func1 to return a string pointer (e.g. "char *func1(arg0,arg1,arg2)" and we added "strcpy(buf,c)" or "strcpy(buf,c)" buf would be usable by func1. func1 could return either c or s, but not buf.

That's because with "c" the data is stored in the data segment and persists after func1 returns. Likewise, s can be returned because the data is in the heap segment.

But, buf would not work (e.g. return buf) because the data is stored in func1's stack frame and that is popped off the stack when func1 returns [meaning it would appear as garbage to caller]. In other words, data in the stack frame of a given function is available to it and any function that it may call [and so on ...]. But, this stack frame is not available to a caller of that function. That is, the stack frame data only "persists" for the lifetime of the called function.

Here's the fully adjusted sample program:

int myglobal_initialized = 23; int myglobal_tbd; char * func1(int arg0,int arg1,int arg2) { char *s = "hello"; char *c; int b; char buf[20]; char *ret; c = malloc(15); strcpy(c,s); strcpy(buf,s); // ret can be c, s, but _not_ buf ret = ...; return ret; } 
Sign up to request clarification or add additional context in comments.

Comments

0

Let's divide this answer in two points of view of the same stuff, because the standards only complicate understanding of this topic, but they're standards anyway :).

Subject common to both parts

void func1() { char *s = "hello"; char *c; int b; c = (char*)malloc(15); strcpy(c, s); } 

Part I: From a standardese point of view

According to the standards, there's this useful concept known as automatic variable duration, in which a variable's space is reserved automatically upon entering a given scope (with unitialized values, a.k.a: garbage!), it may be set/accessed or not during such a scope, and such a space is freed for future use. Note: In C++, this also involves construction and destruction of objects.

So, in your example, you have three automatic variables:

  • char *s, which gets initialized to whatever the address of "hello" happens to be.
  • char *c, which holds garbage until it's initialized by a later assignment.
  • int b, which holds garbage all of its lifetime.

BTW, how storage works with functions is unspecified by the standards.

Part II: From a real-world point of view

On any decent computer architecture you will find a data structure known as the stack. The stack's purpose is to hold space that can be used and recycled by automatic variables, as well as some space for some stuff needed for recursion/function calling, and can serve as a place to hold temporary values (for optimization purposes) if the compiler decides to.

The stack works in a PUSH/POP fashion, that is, the stack grows downwards. Let my explain it a little better. Imagine an empty stack like this:

[Top of the Stack] [Bottom of the Stack] 

If you, for example, PUSH an int of value 5, you get:

[Top of the Stack] 5 [Bottom of the Stack] 

Then, if you PUSH -2:

[Top of the Stack] 5 -2 [Bottom of the Stack] 

And, if you POP, you retrieve -2, and the stack looks as before -2 was PUSHed.

The bottom of the stack is a barrier that can be moved uppon PUSHing and POPing. On most architectures, the bottom of the stack is recorded by a processor register known as the stack pointer. Think of it as a unsigned char*. You can decrease it, increase it, do pointer arithmetic on it, etcetera. Everything with the sole purpose to do black magic on the stack's contents.

Reserving (space for) automatic variables in the stack is done by decreasing it (remember, it grows downwards), and releasing them is done by increasing it. Basing us on this, the previous theoretical PUSH -2 is shorthand to something like this in pseudo-assembly:

SUB %SP, $4 # Subtract sizeof(int) from the stack pointer MOV $-2, (%SP) # Copy the value `-2` to the address pointed by the stack pointer 

POP whereToPop is merely the inverse

MOV (%SP), whereToPop # Get the value ADD %SP, $4 # Free the space 

Now, compiling func1() may yield the following pseudo-assembly (Note: you are not expected to understand this at its fullest):

.rodata # Read-only data goes here! .STR0 = "hello" # The string literal goes here .text # Code goes here! func1: SUB %SP, $12 # sizeof(char*) + sizeof(char*) + sizeof(int) LEA .STR0, (%SP) # Copy the address (LEA, load effective address) of `.STR0` (the string literal) into the first 4-byte space in the stack (a.k.a `char *s`) PUSH $15 # Pass argument to `malloc()` (note: arguments are pushed last to first) CALL malloc ADD %SP, 4 # The caller cleans up the stack/pops arguments MOV %RV, 4(%SP) # Move the return value of `malloc()` (%RV) to the second 4-byte variable allocated (`4(%SP)`, a.k.a `char *c`) PUSH (%SP) # Second argument to `strcpy()` PUSH 4(%SP) # First argument to `strcpy()` CALL strcpy RET # Return with no value 

I hope this has led some light on you!

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.