9

I have a problem with a server socket under Linux. For some reason unknown to me the server socket vanishes and I get a Bad file descriptor error in the select call that waits for an incomming connection. This problem always occurs when I close an unrelated socket connection in a different thread. This happens on an embedded Linux with 2.6.36 Kernel.

Does anyone know why this would happen? Is it normal that a server socket can simply vanish resulting in Bad file descriptor?

edit: The other socket code implements a VNC Server and runs in a completely different thread. The only thing special in that other code is the use of setjmp/longjmp but that should not be a problem.

The code that create the server socket is the following:

int server_socket = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP); struct sockaddr_in saddr; memset(&saddr, 0, sizeof(saddr)); saddr.sin_family = AF_INET; saddr.sin_addr.s_addr = htonl(INADDR_ANY); saddr.sin_port = htons(1234); const int optionval = 1; setsockopt(server_socket, SOL_SOCKET, SO_REUSEADDR, &optionval, sizeof(optionval)); if (bind(server_socket, (struct sockaddr *) &saddr, sizeof(saddr)) < 0) { perror("bind"); return 0; } if (listen(server_socket, 1) < 0) { perror("listen"); return 0; } 

I wait for an incomming connection using the code below:

static int WaitForConnection(int server_socket, struct timeval *timeout) { fd_set read_fds; FD_ZERO(&read_fds); int max_sd = server_socket; FD_SET(server_socket, &read_fds); // This select will result in 'EBADFD' in the error case. // Even though the server socket was not closed with 'close'. int res = select(max_sd + 1, &read_fds, NULL, NULL, timeout); if (res > 0) { struct sockaddr_in caddr; socklen_t clen = sizeof(caddr); return accept(server_socket, (struct sockaddr *) &caddr, &clen); } return -1; } 

edit: When the problem case happens i currently simply restart the server but I don't understand why the server socket id should suddenly become an invalid file descriptor:

int error = 0; socklen_t len = sizeof (error); int retval = getsockopt (server_socket, SOL_SOCKET, SO_ERROR, &error, &len ); if (retval < 0) { close(server_socket); goto server_start; } 
5
  • 2
    There is nothing wrong with the code you posted, the error must be elsewhere. Do you use the socket after closing it for example? Commented Jul 31, 2012 at 7:22
  • Where are the threads exactly used? Commented Jul 31, 2012 at 7:32
  • The above code runs in one thread. The other code is in another module which also runs a thread. Closing the connection there kills the server here. I hadn't thought that a server socket could become invalid without me closing it. Commented Jul 31, 2012 at 7:42
  • 3
    My bet is that some bug in your code is causing you to close the very same socket you later select on. Commented Jul 31, 2012 at 10:26
  • @trenki it can't. There's a bug somewhere that causes you to close the same file descriptor value that the listening socket have, or there's a bug that overwrites the variable holding the listening socket descriptor. You could run your program under strace, e.g. strace -f -e accept,socket,close,shutdown ./yourserver and see if you ever call close() with the same file descriptor value as the listening socket, or if you suddenly start passing a different file descriptor to accept(). Commented Nov 28, 2014 at 19:24

4 Answers 4

7

Sockets (file descriptors) usually suffer from the same management issues as raw pointers in C. Whenever you close a socket, do not forget to assign -1 to the variable that keeps the descriptor value:

close(socket); socket = -1; 

As you would do to C pointer

free(buffer); buffer = NULL; 

If you forget to do this yo can later close socket twice, as you would free() memory twice if it was a pointer.

The other issue might be related to the fact that people usually forget: file descriptors in UNIX environment start from 0. If somewhere in the code you have

struct FooData { int foo; int socket; ... } // Either FooData my_data_1 = {0}; // Or FooData my_data_2; memset(&my_data_2, 0, sizeof(my_data_2)); 

In both cases my_data_1 and my_data_2 have a valid descriptor (socket) value. And later, some piece of code, responsible for freeing FooData structure may blindly close() this descriptor, that happens to be you server's listening socket (0).

Sign up to request clarification or add additional context in comments.

Comments

4

1- close your socket:

close(sockfd); 

2- clear your socket file descriptor from select set:

FD_CLR(sockfd,&master); //opposite of FD_SET 

Comments

1

You don't distinguish the two error cases in your code, both can fail select or accept. My guess is that you just have a time out and that select returns 0.

  • print retval and errno in an else branch
  • investigate the return value of accept seperately
  • ensure that errno is reset to 0 before each of the system calls

Comments

-3

In Linux once you create a connection and it get closed then you have to wait for some time before making new connection. As in Linux, socket doesn't release the port no. as soon as you close the socket.

OR

You reuse the socket, then bad file descriptor want come.

3 Comments

it is not the IP address that is reserved (that would be severe) but the port number
@JensGustedt ya its the port number.
This is only true if (a) you are creating a client connection, which isn't happening here, (b) you bind to a specific outbound port number, which there is no need to do, and which also isn't happening here, and (c) you are the end that initiates the close. -1 for complete and utter irrelevance.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.