3

Because the input to join must be sorted, often the command is called similarly to:

join <(sort file1) <(sort file2) 

This is not portable as it uses process substitution, which is not specified by POSIX.

join can also use the standard input by specifying - as one of the file arguments. However, this only allows for sorting one of the files through a pipeline:

sort file1 | join - <(sort file2) 

It seems there should be a simple way to accomplish sorting of both files and then joining the results using POSIX-specified features only. Perhaps something using redirection to a third file descriptor, or perhaps it will require created a FIFO. However, I'm having trouble visualizing it.

How can join be used POSIXly on unsorted files?

0

1 Answer 1

5

You can do it with two named pipes (or of course you could use one named pipe and stdin):

mkfifo a b sort file1 > a & sort file2 > b & join a b 

Process substitution works essentially by setting up those fifos (using /dev/fd/ instead of named pipes where available). For example, in bash:

$ echo join <(sort file1) <(sort file2) join /dev/fd/63 /dev/fd/62 

Note how bash has substituted the process with a file name, in /dev/fd. (Witout /dev/fd/, new enough versions of zsh, bash, and ksh93 will used named pipes.) It's left those open when invoking join, so when join opens those, it'll read from the two sorts. You can see them passed with some lsof-trickery:

$ sh -c 'lsof -a -d 0-999 -p $$; exit' <(sort file1) <(sort file2) COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME sh 1894 anthony 0u CHR 136,5 0t0 8 /dev/pts/5 sh 1894 anthony 1u CHR 136,5 0t0 8 /dev/pts/5 sh 1894 anthony 2u CHR 136,5 0t0 8 /dev/pts/5 sh 1894 anthony 62r FIFO 0,10 0t0 237085 pipe sh 1894 anthony 63r FIFO 0,10 0t0 237083 pipe 

(The exit is to prevent a common optimization where the shell doesn't fork when there is only one command to run).

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.