2

From coreutils' manual about join

-e string 

Replace those output fields that are missing in the input with string. I.e., missing fields specified with the -12jo options.

I don't understand the option at all. What do the following mean

  • "those output fields that are missing in the input"

  • "missing fields specified with the -12jo options"?

Thanks.

1 Answer 1

2

The slightly cryptic string -12jo refers to the four separate options -1, -2, -j and -o, of which the first three has to do with selecting what field in each file to join on and the last has to do with what fields from each file should be outputted. The -j option is an extension in GNU join and and -j n is the same as -1 n -2 n (where n is some integer).

The -e option comes into effect when you, with -a, request to get unpaired lines from one or both of the files that you join. An unpaired line will have missing data, as the line from one file did not correspond to a line in the other file. The -e option replaces those fields with the given string. Likewise, if you request, with -o, a field that does not exist on a particular line in a file, you would use -e to replace the empty values with a string.

Example: Two files that contain manufacturing costs and sales income for a number of products. Each file has the fields

  1. Product ID
  2. Product name
  3. Some number
$ cat expenses.txt 1 teacup 5 2 spoon 7 3 bowl 10 $ cat sales.txt 1 teacup 30 2 spoon 24 

To get the expenses and sales for all products, while replacing the number (from either the first or second file) that may be missing with the string NONE, I would do

$ join -a1 -a2 -o0,1.2,1.3,2.3 -e NONE expenses.txt sales.txt 1 teacup 5 30 2 spoon 7 24 3 bowl 10 NONE 

Here, I use the -a option twice to request all lines from both files (a "full outer join" in SQL speak). The -o option is used to get specific fields from each file (field 0 is the join field, which is the first field in each file by default), and -e to specify the string NONE to replace missing value with.

As you can see, we get NONE as the "sales value" since the product with ID 3 was not mentioned in that second file.

9
  • 1
    Note that it's not only about -a. join -o 1.2 -e NONE <(echo a) <(echo a) would also output NONE Commented Jul 24, 2018 at 6:59
  • Thanks. Why does join -a 1 -a 2 -t " " -e "NULL" -1 1 -2 1 <(sort file1) <(sort file2) not output NULL, suppose file2 has unpairable line? Commented Jul 24, 2018 at 13:21
  • file1 and file2 are here unix.stackexchange.com/q/458068/674 Commented Jul 24, 2018 at 13:27
  • @Tim join -a 2 -o0,1.2,2.2 -t " " -e "NULL" <(sort file1) <(sort file2) Commented Jul 24, 2018 at 13:31
  • Thanks. Wondering why the command doesn't output NULL? When does -e work? Does -e only work with -o? Commented Jul 24, 2018 at 13:33

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.