0

I am new to socket programming & I have realized that most posix functions from libc require that you pass a buffer & a length. While I can understand this helps the functions to understand the type such as casting a *sockaddr_in to a *sockaddr I cannot, looking at the type definition from the header conclude if the buffer I pass in completely gets overwritten.

In my program the behaviour I observe seems to suggest that it does however I am not sure. I also assume that I won't be able to just look their source code up in my system as I just have .so files in my /usr/lib folder.

Here's a short code example where I am reusing the buffer across multiple socket read calls under the assumption that the implementations fully zeroize the buffer before writing to it so that I am not reading corrupt data:

int main(int argc, char **argv) { int fd = socket(AF_INET, SOCK_STREAM, 0); if (fd < 0) { die("socket()"); } // Create the first instance of a sockaddr buffer & assign to it struct sockaddr_in addr = {}; addr.sin_family = AF_INET; addr.sin_port = htons(1234); addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); int rv = connect(fd, (const struct sockaddr *)&addr, sizeof(addr)); socklen_t len = sizeof(addr); unsigned short client_port = 0; // First reuse here using getsockname if (!getsockname(fd, (struct sockaddr *)&addr, &len)) { client_port = ntohs(addr.sin_port); } if (rv) { die("connect"); } else { socklen_t len = sizeof(addr); // Second reuse here using getpeername int remote = getpeername(fd, (struct sockaddr *)&addr, &len); if (remote) { printf("error getting remote address\n"); } else { char ip[INET_ADDRSTRLEN]; inet_ntop(AF_INET, &addr.sin_addr, ip, sizeof(ip)); printf("connected to server @ <%s:%d>", ip, ntohs(addr.sin_port)); } if (client_port != 0 && remote == 0) printf(" | client port <%d>\n", client_port); } close(fd); return 0; } 

The reason this question came up is because I could've easily used 3 separate variables for the code snippet above, but if I am guaranteed that the functions don't rely on getting a zeroized buffer, I can just reuse it across these functions.

Bonus question: Do libc/posix functions in generally have the behaviour of overwriting the buffer completely if they accept a buffer param & length as in the above case?

3
  • 1
    "so that I am not reading corrupt data" - Check the return code of the call. If it signals success then the result is in the buffer, otherwise not. It does not matter for this if the buffer gets set to zero inside the call or not and you should not rely on it. Commented 19 hours ago
  • @SteffenUllrich that's what I am currently doing (checking for 0 return code) and relying on the fact that the existing contents of the buffer have no effect on the written result. Commented 18 hours ago
  • I would certainly hope that these system calls don't waste time zeroing out a memory range that they're going to overwrite anyway. And if they did, that seems like an easy target for optimization, low hanging fruit basically. Commented 16 hours ago

1 Answer 1

1

Do posix functions like getpeername or getsockname completely rewrite their buffers?

No. The standard says:

The getpeername() function shall retrieve the peer address of the specified socket, store this address in the sockaddr structure pointed to by the address argument, and store the length of this address in the object pointed to by the address_len argument.

What it does hence is store something at address address. Not more. (doing more would not only be wasted time, imho, but also a breach of contract. "I store N bytes at A" means that the byte at A+N+1 is untouched, to me.)

While I can understand this helps the functions to understand the type such as casting …

struct sockaddr contains (at least) the the two fields sa_family_t sa_family and char sa_data[] Socket address (variable-length data). In other words, the size of the C struct is impossible to know intrinsically. You need to know the length of of either the variable-length sa_data array or the overall size (including that).

So, I'm technically of the opinion that getpeername and co should never change more than the updated address_len bytes (anything else would be in conflict with "don't have side effects you don't declare").

Practically, you pass a buffer and its size to a function that modifies it. Don't rely on anything not happening within that length in that buffer.

Here's a short code example where I am reusing the buffer across multiple socket read calls under the assumption that the implementations fully zeroize the buffer before writing to it so that I am not reading corrupt data

You never read corrupt data, because you only read min(address_len, sizeof(addr)) bytes, and address_len gets explicitly filled in by getpeername. This problem hence doesn't really occur, unless you make massive mistakes when handling the "output" of that function by ignoring the length it declared it worked on!

Checking the return code to be 0 is not sufficient: you also need to check that the filled in (sockaddr)addr.sa_family is as you expect it to be, and that the value of len == sizeof(sockaddr_in).

Generally, this isn't 1975. Don't reuse objects unless it really makes sense to. Your compiler knows how long an object is used – it will not even use extra stack space to just use a new object of the desired type when you need it, if the previous one isn't used anymore. Furthermore, we're talking of objects that are 32 bytes in length or shorter. I can't think of any case where the additional memory usage would even be close to mattering, considering the code you use that memory with does multiple syscalls, which does much (as in: 3 orders of magnitude) worse things to your caches.


Rant: The sockaddr.* type family is a nightmare, and so is the getsockname and friends API; you can't write provably correct C code that uses these without doing extra copies. Why? Because this assumes that pointers to structs of different types but starting with the same elements can be cast around arbitrarily, which is wrong. C, since 1989 (that's 36 years!) says (in C89, §3.3 EXPRESSIONS) that you cannot just access fields of a sockaddr_in that is just a pointercast of sockaddr. It's simply undefined behaviour to do so (and there's good reasons for that; by ensuring that, the compiler can assume that objects of different types don't live in the same memory, which makes a a whole lot of errors impossible and a whole lot of optimizations possible (with strict aliasing enabled: 30% faster on arm, 70% faster on x86_64!)).

The only way around here would be to memcpy the sockaddr filled in by getsockname to a sockaddr_in, and then access the fields of that. In fact, that's what you should do here. And it's stupid that it's necessary; a better-designed getsockaddr would take the desired address family as separate argument, which, well, would even be more efficient to check against, and would take a void* argument that it guarantees to memcpy (or equivalent) construct the reply to request for information at. Or, and quite frankly, POSIX would acknowledge that this is an address family-specific function and having the same function for UNIX domain, IP, IPv6, X.25 links, IPX, AS numbers (whyever these are an address family??) and things like BGP transports.

The good news is that this is coding-wise an extra copy only. Functionally, your compiler sees what you're doing here – copying from one struct to a different one. And if, by chance, the structs are "compatible" in all the fields you might access afterwards, it elides that copy. (Which, by the way, is an awesome ability only made possible by the strict aliasing rules mentioned above.) In the getsockname and sockaddr.* case, that avoidance of an actual copy will happen, because that "compatible by chance" has been forced by developers of operating systems and standards libraries making sure manually that they are compatible (by padding fields, and restricting themselves to "strange" types).

2
  • re. "only made possible by the strict aliasing rules mentioned above", well, it seems you didn't actually mention aliasing rules above there :) I would have guessed that was what you meant with the parenthetical good reasons, but it would be nice to know if you had something else in mind, too. Commented 11 hours ago
  • @ilkkachu ah yes, that fell prey to a "read again before posting" revision Commented 8 hours ago

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.