0

I am looking for a program or (probably C)-code which I will refer to as route which takes a bytestring/stream on stdin and routes it to one of n programs, based on the intial few bytes (the prefix). The prefix will not be forwarded/piped to the destination program. I would imagine it would have a commandline interface like:

echo -n 'for-p1.perform-operation-A' | route 'for-p1.' >(program1) 'for-p2.' >(program2) echo -n 'for-p2.perform-operation-B' | route 'for-p1.' >(program1) 'for-p2.' >(program2) 

in the example given above, program1 would receive 'perform-operation-A' (as if I had executed `echo 'perform-operation-A' | program1), program2 would receive 'perform-operation-B'. The prefix always precedes the (virtual) filename of the destination.

Is there any existing solution for doing this or do I have to roll my own?

EDIT: By popular request, here is my 30min attempt at a solution, but I would vastly prefer existing solutions, or at least recommendations for libraries for the steps:

/* build: sudo apt install -y build-essential && g++ main.cpp usage example: ( rm output* || true echo -n "foo-hello" | ./a.out 2>/dev/null 'foo-' >(cat | tee -a output-foo) 'bar-' >(cat | tee -a output-bar) 1>/dev/null echo -n "bar-world" | ./a.out 2>/dev/null 'foo-' >(cat | tee -a output-foo) 'bar-' >(cat | tee -a output-bar) 1>/dev/null echo echo -n "output-foo: " ; cat output-foo ; echo echo -n "output-bar: " ; cat output-bar ; echo ) */ #include <fcntl.h> #include <stdio.h> #include <map> #include <string> using namespace std; typedef FILE* File; #define hasKey(map, key) (map.find((key)) != map.end()) // I assume this is pretty inefficient, looking for a good solution void pipeRest(File fromFile, File toFile) { size_t bytesRead = 0; do { char data; bytesRead = fread(&data, 1, 1, fromFile); if (bytesRead) fwrite(&data, 1, 1, toFile); } while (bytesRead > 0); } int main( int const argc, char const * const * const argv ) { if (argc <= 1) { fprintf(stderr, "usage:\n\n\troute prefix1 >(ouput-program-1) prefix2 >(output-program-2) ...\n\nreads from stdin, recognizes any of n prefixes and pipes the rest to the filename following the prefix\n"); exit(1); } // Parse [prefix outputFile] pairs from commandline args map<string, string> prefixesToOutputFileNames; map<string, File> prefixesToOutputFiles; for (int i = 0; i < argc; i++) { fprintf(stderr, "argv[%d] = '%s'\n", i, argv[i]); if (i > 0 && i % 2 == 0) { auto const prefix = argv[i - 1]; fprintf(stderr, "prefix = '%s'\n", prefix); auto const outputFileName = argv[i]; fprintf(stderr, "outputFileName = '%s'\n", outputFileName); prefixesToOutputFileNames[prefix] = outputFileName; auto const outputFile = fopen(outputFileName, "wb"); prefixesToOutputFiles[prefix] = outputFile; } } // Start reading bytes from stdin, collect the prefix freopen(0, "rb", stdin); string prefix = ""; char nextPrefixChar[2] = { 0 }; size_t bytesRead = 0; do { bytesRead = fread(nextPrefixChar, 1, 1, stdin); prefix += nextPrefixChar; fprintf(stderr, "read %zd bytes = '%s', prefix = '%s'\n", bytesRead, bytesRead ? nextPrefixChar : 0, prefix.c_str()); if (hasKey(prefixesToOutputFiles, prefix)) { // Prefix found -> pipe to corresponding output file auto const outputFileName = prefixesToOutputFileNames[prefix]; fprintf(stderr, "prefix '%s' was recognized, will pipe rest to '%s'\n", prefix.c_str(), outputFileName.c_str()); auto const outputFile = prefixesToOutputFiles[prefix]; pipeRest(stdin, outputFile); exit(0); } } while (bytesRead > 0); fprintf(stderr, "input ends and did not recognize any prefix, rest will be piped to stdout\n"); pipeRest(stdin, stdout); return 0; } 
10
  • 1
    routes it to one of n programs - The prefix always precedes the (virtual) filename - so route to filenames or to programs? Routing to files/pipes is simpler. But why not just write it in shell? How do you copy stdin most efficiently to an output file? Please one question per post. Have you started writing such C program? What research did you do? What part of that C program are you having problem with? Commented Mar 11, 2021 at 9:59
  • Does this answer your question? Read a file by bytes in BASH Commented Mar 11, 2021 at 10:05
  • @KamilCuk thanks for your feedback. In bash, using `>(program)' turns a program into a file that can be written to. That will be my primary usecase, but 'route prefix my-output-file.txt' would also be admissible. I have written such a program, but I am unsure about the efficiency of it (to redirect the output after detecting the prefix, I read and write byte by byte).... Commented Mar 11, 2021 at 10:06
  • Why do you care about efficiency? Have you followed the rules of optimization? Have you profiled the code? Are you sure that I/O operations from stream is the bootleneck? Why not just tee >(sed -n 's/^for.p1.//p' | program1) | sed -n 's/^for.p2.//p' | program2 and forget about it? I have written such a program Then please post it. Commented Mar 11, 2021 at 10:07
  • @DavidCullen no, the question does not address "routing" the rest of the file based on the initial content. Commented Mar 11, 2021 at 10:08

1 Answer 1

1

In bash shell, you would write just:

tee >(sed -n 's/^for\.p1\.//p' | program1) | sed -n 's/^for\.p2\.//p' | program2 

The following program:

filter() { keyword=$1 shift # https://stackoverflow.com/questions/407523/escape-a-string-for-a-sed-replace-pattern keyword=$(printf '%s\n' "$keyword" | sed -e 's/[]\/$*.^[]/\\&/g'); LC_ALL=C sed -n "s/^$keyword//p" | "$@" } hexfilter() { keyword=$1 shift keyword=$(printf '%s\n' "$keyword" | sed -e 's/../[\\x&]/g'); LC_ALL=C sed -n "s/^$keyword//p" | "$@" } program1() { :; } program2() { :; } echo -e '\x00\xca\xfe\x2a world!' | { tee >(filter 'for.p1.' program1 >&3) >(hexfilter '00cafe2a' sed 's/^/Hello/' >&3) | filter 'for.p2.' program2 >&3; } 3>&1 | cat 

filters prefix 0x00 0xca 0xfe 0x2a and outputs Hello world!.

Sign up to request clarification or add additional context in comments.

6 Comments

Will sed interpret the bytes as characters? I would like to be able to use any bytestring as a prefix, if necessary I would need an automatic solution for escaping the bytes accordingly for this sed specification. In particular, a 0-byte should be possible in the prefix, and I know that bash/linux in general does not support 0-bytes in cli-arguments, so maybe a file would have to be used to specify the prefix...
How would this generalize to 3 or more prefixes & programs?
Will sed interpret the bytes as characters? Unspecified in POSIX, GNU sed has some unicode support. With LC_ALL=C sed then bytes are bytes - it only cares about newlines, as they end lines. In particular, a 0-byte should be possible in the prefix Don't trust me - test it. echo abcd00ef0a | xxd -r -p | LC_ALL=C sed 's/^.*\x00//' | xxd -p works fine here. How would this generalize to 3 or more prefixes & programs? tee >(prog1) >(prog2) >(prog3) | prog4 etc. With bash eval you can generate it - remember to use printf "%q" to properly escape
But indeed - handling of zero bytes is hard, because arguments themselves end with zero bytes (no matter the language). Store strings in it's hex form and convert to/from with xxd.
Newlines end the line. sed works with lines. There will be a "problem" with newlines.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.