0

I am trying to write a little program, which is resistant against buffer overflow and similar vulnerabilities.

Since we cannot trust the user input, I thought it would be a good idea to concatenate all the strings that are inputted into one string and then pass it to a static path, which is a bash script in the same folder and then add all the parameters/flags/arguments (e.g. ./script.sh test1 test2 test3 test4).

My logic on paper is the following:

  1. Check whether the number of argc is 5 exactly (so program name + the 4 arguments), if not - exit immediately
  2. The specific char array for 5 strings (so array elements) with a max length of 4096 bytes length is initialized.
  3. Since argv[0] is equal to the program name, we need to skip it. Hence the loop starts at 1 (first argument) and ends at argc-1. So good so far
  4. We append the argv[1] string the ending .key in-memory.
  5. We memcpy all the strings to make it safe to use
  6. For each iteration we don't forget to add the null terminator at the end of the string.
  7. After we have all the arguments safe to use we concat it in the function parse_output and call the bash script called script.sh and add all the required arguments
  8. We return the output from the script.sh 4argshere to the user

I tried freeing the memory like in the comments but it seems to not work or I have errors somewhere else.

My segfaulting Proof of Concept:

#include <stdlib.h> #include <string.h> #include <stdio.h> #define BUFSIZE 1000 char *concatenate(size_t size, char *array[size], const char *joint); char *concatenate(size_t size, char *array[size], const char *joint){ size_t jlen, lens[size]; size_t i, total_size = (size-1) * (jlen=strlen(joint)) + 1; char *result, *p; for(i=0;i<size;++i){ total_size += (lens[i]=strlen(array[i])); } p = result = malloc(total_size); for(i=0;i<size;++i){ memcpy(p, array[i], lens[i]); p += lens[i]; if(i<size-1){ memcpy(p, joint, jlen); p += jlen; } } *p = '\0'; return result; } int parse_output(char *safeargv[]) { char safeargs = *concatenate(5, safeargv, " "); char cmd[BUFSIZE]; snprintf(cmd, BUFSIZE, "./script.sh %s", safeargs); char buf[BUFSIZE] = {0}; FILE *fp; if ((fp = popen(cmd, "r")) == NULL) { printf("Error opening pipe!\n"); //free(safeargs); return -1; } while (fgets(buf, BUFSIZE, fp) != NULL) { printf("OUTPUT: %s", buf); } if (pclose(fp)) { printf("Command not found or exited with error status\n"); //free(safeargs); return -1; } //free(safeargs); return 0; } int main(int argc, char *argv[]) { if(argc != 5) { exit(1); } char *safeargv[5][4096]; for (int i = 1; i < argc - 1; i++) { if (i == 1) strcat(argv[1], ".key"); for (int x = 0; x < strlen(argv[i]); x++) { char *unsafe_string = argv[i]; size_t max_len = 4096; size_t len = strnlen(unsafe_string, max_len) + 1; char *x = malloc(len * sizeof(char)); if (x != NULL) { strncpy(x, unsafe_string, len); x[len-1] = '\0'; // Ensure null-termination strcpy(safeargv[i], x); free(x); } } } parse_output(safeargv); return 0; } 

The warnings I get while compiling:

chal.c: In function 'main': chal.c:78:32: warning: passing argument 1 of 'strcpy' from incompatible pointer type [-Wincompatible-pointer-types] 78 | strcpy(safeargv[i], x); | ~~~~~~~~^~~ | | | char ** In file included from chal.c:2: /usr/include/string.h:141:39: note: expected 'char * restrict' but argument is of type 'char **' 141 | extern char *strcpy (char *__restrict __dest, const char *__restrict __src) | ~~~~~~~~~~~~~~~~~^~~~~~ chal.c:83:18: warning: passing argument 1 of 'parse_output' from incompatible pointer type [-Wincompatible-pointer-types] 83 | parse_output(safeargv); | ^~~~~~~~ | | | char * (*)[4096] chal.c:32:24: note: expected 'char **' but argument is of type 'char * (*)[4096]' 32 | int parse_output(char *safeargv[]) { | ~~~~~~^~~~~~~~~~ 

It seems only my argc check works, because if I call ./programname abc abc abc abc it segfaults. Also what's the proper way to detect an error, if for case there's a typo in the argument for the script?

What mistake(s) did I make?

My last ltrace lines:

strlen(" ") = 1 strlen(nil <no return ...> 

It looks as if it would try to detect the length of a null character or similar.

6
  • I'm a bit confused. I can't speak for windows, but for linux (i.e. posix), the argv from an exec* call is passed into the kernel. The kernel checks these when creating argv (via copy_from_user for parent and copy_to_user for child) in the target program address space. So, no need to copy out argv from a program main as it is already "safe". As long as you stay within the bounds of the argv array, you're fine. In other words, the argv sent via exec* is not blindly used as is. The kernel probes each element and copies it to a new area (checking along the way). Commented Mar 31, 2024 at 21:00
  • @CraigEstey I am using Linux. So you're saying all the memcpy and etc. is only security theater in this case? A simple while(*argv != NULL) is enough then? Commented Mar 31, 2024 at 21:02
  • Yes, the kernel "sanitizes" all elements of the argv that is passed to main (and better as the strings can be > 4096 (IIRC)?). There is a similar char* array for the environment variables. When doing exec*, the parent can put the argv anywhere in its memory. The kernel when copying into the target, starts at the highest [virtual] address, growing the area downward. Ditto for environ. The initial stack pointer is set just below those. Try printing (via %p) the addresses of argv in both caller/callee (as well as stack pointer). sp < argv < environ for callee (e.g.) Commented Mar 31, 2024 at 21:11
  • For a deep dive, get the linux kernel source. Many ways to get the kernel source, but Linus' git repo can be gotten via: git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Look in the file fs/exec.c. Look at the copy_strings function. For the limits, look in include/uapi/linux/binfmts.h (for MAX_ARG_STRLEN). It is 32 pages (not 1). And, up to 0xFFFFFFFF separate argv entries. The max number of pages devoted to args is 32 Commented Mar 31, 2024 at 21:30
  • 1
    @SirMuffington, certainly char safeargs = *concatenate(5, safeargv, " "); should be char *safeargs = concatenate(5, safeargv, " ");. Perhaps other issues too. Commented Apr 1, 2024 at 0:27

1 Answer 1

2
  1. main(): strcat(argv[1], ".key") is buffer overflow. You write 4 bytes to a string that has no spare space.

  2. (performance) main(): for (int x = 0; x < strlen(argv[i]); x++) move the strlen() before the loop so you don't have to evaluate it per iteration.

  3. main(): char *x = malloc(len * sizeof(char)); is confusing when the enclosing loop uses the variable int x = 0.

  4. (performance) main(): for (int x = 0; x < strlen(argv[i]); x++) does the same thing strlen(argv[i]) times. Eliminate the loop and just do it once.

  5. main(): strcpy(safeargv[i], x) is incorrect as strcpy() expects a char * but safeargv[i] is of type char *[4096]. I suspect you want to fix the type to be for example char *safeargv[4].

  6. main(): With the type fixed strcpy(safeargv[i], x) is still problematic as you start i=1 so safeargv[0] is not initialized.

  7. main(): parse_output(safeargv) is an incompatible type.

  8. parse_output(): char safeargs = *concatenate(5, safeargv, " "); means safeargs hold the 1st character of whatever concatenate() returns. You want a char *. Subsequently passing a char when a char * is expected in the following snprintf(cmd, BUFSIZE, "./script.sh %s", safeargs) will probably trigger a segfault.

  9. parse_output(): snprintf(cmd, BUFSIZE, "./script.sh %s", safeargs); each argument presumably could be up to 4k but BUFSIZE is 1k. You might as well limit each argument to 1k.

  10. parse_output(): Eliminate concatenate() and just use snprintf(). In the program below I check if the command is being truncated. You may or may not care about this.

  11. concatenate() is called with array[5] = { NULL, NULL, NULL, NULL, NULL} and strlen(NULL) cause your original segfault. With the other issues resolved this becomes a non-issue.

  12. (Not fixed) popen() passes the command to shell so if your input is untrusted then you already lost. If you want to use popen() pass the input via a file (for example a temporary file and give the filename as the argument). Or use fork() + execl() instead of popen().

#define _POSIX_C_SOURCE 2 #include <stdio.h> #define BUFSIZE 1000 int main(int argc, char *argv[]) { if(argc != 5) { printf("usage: programname STR STR STR STR\n"); return 1; } char cmd[BUFSIZE]; int n = snprintf(cmd, 0, "./script.sh %s.key %s %s %s", argv[1], argv[1], argv[2], argv[3]); if(n + 1 > BUFSIZE) { printf("cmd was truncated\n"); return 1; } snprintf(cmd, BUFSIZE, "./script.sh %s.key %s %s %s", argv[1], argv[1], argv[2], argv[3]); FILE *fp = popen(cmd, "r"); if (!fp) { printf("Error opening pipe!\n"); return 1; } char buf[BUFSIZE]; while (fgets(buf, BUFSIZE, fp)) printf("OUTPUT: %s", buf); if (pclose(fp)) { printf("Command not found or exited with error status\n"); return 1; } } 

With script.sh:

#!/bin/bash echo "$*" 

here is an example run:

$ ./a.out abc abc abc abc OUTPUT: abc.key abc abc abc 
Sign up to request clarification or add additional context in comments.

8 Comments

Great answer, it was way easier using the correct functions. To 6: but argv[0] is the program name, hence I skip it. to 11: why were the values all NULL, I don't get it. Where does the NULL come from?
How do you sanitize in your tip 12, by doing that? And also don't you have to return 0 when the execution is successful? (it seems to work without). And no need for any malloc or free?
What do you mean with sanitize? What I would call sanitize is escaping all special shell characters and you weren't doing that in your program. You are way better off using execl() (or friends) to by-pass the shell. Just to be clear there are two shells involved here the /bin/sh that popen starts on your behalf and the one the shell script runs /bin/bash in my case. I am talking about the former /bin/sh. No, there is no need to malloc/free as you limit your input to 1000 bytes. On linux you have 8mb of stack so I wouldn't use the heap.
The NULL values are undefined behavior and come about from the incorrect type of safeargv and how it's used. char *safeargv[5][4096] means a 2d matrix of 5 * 4096 char pointers.
It's very relevant and it's complicated subject. See for example baeldung.com/linux/bash-escape-characters
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.