I have two input.txt files on a Linux machine containing different columns (, as separator). I have a script to join these files using the ID reported in each's first column. This script preserves files all IDs of the first file in the output and only the matched ID of the second one. I need to implement this script adding an option to also preserve the ID of the second files not matched with the ID of the first one.
Example:
2931,C,-9.750,-2.550,57.910,-0.3,C 2932,C,-5.470,-0.200,51.550,0.9,C 2940,C,-10.860,-3.400,54.000,0.7,C 2941,S,-11.820,-13.550,55.070,2.1,S 2944,H,-3.770,-4.180,60.300,0.7,H input2.txt
4304,N,-9.700,-7.680,58.330,-2.3,N 2940,S,-10.440,-3.450,54.270,2.2,S 2900,C,-13.655,-13.730,59.405,-1.5,C 2931,C,-9.910,-2.420,57.610,0.2,C cmd:
join -t, -a1 -o auto <(sort input1.txt) <(sort input2.txt) > output.txt.txt output.txt
2931,C,-9.750,-2.550,57.910,-0.3,C,2931,C,-9.910,-2.420,57.610,0.2,C 2932,C,-5.470,-0.200,51.550,0.9,C,,,,,,, 2940,C,-10.860,-3.400,54.000,0.7,C,2940,S,-10.440,-3.450,54.270,2.2,S 2941,S,-11.820,-13.550,55.070,2.1,S,,,,,,, 2944,H,-3.770,-4.180,60.300,0.7,H,,,,,,, I'd like to modify the command to obtain two output files. The first should be similar to what I get now, but it should also have the IDs that were not matched:
output_final.txt
2931,C,-9.750,-2.550,57.910,-0.3,C,2931,C,-9.910,-2.420,57.610,0.2,C 2932,C,-5.470,-0.200,51.550,0.9,C,,,,,,, 2940,C,-10.860,-3.400,54.000,0.7,C,2940,S,-10.440,-3.450,54.270,2.2,S 2941,S,-11.820,-13.550,55.070,2.1,S,,,,,,, 2944,H,-3.770,-4.180,60.300,0.7,H,,,,,,, ,,,,,,,2900,C,-13.655,-13.730,59.405,-1.5,C ,,,,,,,4304,N,-9.700,-7.680,58.330,-2.3,N The other output file should contain only the non-matching rows of input2.txt:
output2.txt
2900,C,-13.655,-13.730,59.405,-1.5,C 4304,N,-9.700,-7.680,58.330,-2.3,N Moreover, if in input2.txt I would like to replace the element of the last column only with the string "P" for rows that have an ID equal or grated than 4000, how can I do?
i.e I would like to replace for only the first row (ID = 4304) the last "C" with "P"
output.txt
4304,N,-9.700,-7.680,58.330,-2.3,P 2940,S,-10.440,-3.450,54.270,2.2,S 2900,C,-13.655,-13.730,59.405,-1.5,C 2931,C,-9.910,-2.420,57.610,0.2,C