What does `join -e` mean?

Question

From coreutils' manual about join

-e string 
Replace those output fields that are missing in the input with string. I.e., missing fields specified with the -12jo options.

I don't understand the option at all. What do the following mean

"those output fields that are missing in the input"
"missing fields specified with the -12jo options"?

Thanks.

Kusalananda · Accepted Answer · 2018-07-24 12:55:00Z

The slightly cryptic string -12jo refers to the four separate options -1, -2, -j and -o, of which the first three has to do with selecting what field in each file to join on and the last has to do with what fields from each file should be outputted. The -j option is an extension in GNU join and and -j n is the same as -1 n -2 n (where n is some integer).

The -e option comes into effect when you, with -a, request to get unpaired lines from one or both of the files that you join. An unpaired line will have missing data, as the line from one file did not correspond to a line in the other file. The -e option replaces those fields with the given string. Likewise, if you request, with -o, a field that does not exist on a particular line in a file, you would use -e to replace the empty values with a string.

Example: Two files that contain manufacturing costs and sales income for a number of products. Each file has the fields

Product ID
Product name
Some number

$ cat expenses.txt 1 teacup 5 2 spoon 7 3 bowl 10 $ cat sales.txt 1 teacup 30 2 spoon 24

To get the expenses and sales for all products, while replacing the number (from either the first or second file) that may be missing with the string NONE, I would do

$ join -a1 -a2 -o0,1.2,1.3,2.3 -e NONE expenses.txt sales.txt 1 teacup 5 30 2 spoon 7 24 3 bowl 10 NONE

Here, I use the -a option twice to request all lines from both files (a "full outer join" in SQL speak). The -o option is used to get specific fields from each file (field 0 is the join field, which is the first field in each file by default), and -e to specify the string NONE to replace missing value with.

As you can see, we get NONE as the "sales value" since the product with ID 3 was not mentioned in that second file.

Note that it's not only about -a. join -o 1.2 -e NONE <(echo a) <(echo a) would also output NONE — Stéphane Chazelas
– Stéphane Chazelas, Commented Jul 24, 2018 at 6:59
Thanks. Why does join -a 1 -a 2 -t " " -e "NULL" -1 1 -2 1 <(sort file1) <(sort file2) not output NULL, suppose file2 has unpairable line? — Tim
– Tim, Commented Jul 24, 2018 at 13:21
file1 and file2 are here unix.stackexchange.com/q/458068/674 — Tim
– Tim, Commented Jul 24, 2018 at 13:27
@Tim join -a 2 -o0,1.2,2.2 -t " " -e "NULL" <(sort file1) <(sort file2) — Kusalananda
– Kusalananda ♦, Commented Jul 24, 2018 at 13:31
Thanks. Wondering why the command doesn't output NULL? When does -e work? Does -e only work with -o? — Tim
– Tim, Commented Jul 24, 2018 at 13:33

Stack Exchange Network

What does `join -e` mean?

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

What does `join -e` mean?

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions