BASH: How to read TAB-separated line with empty colums from file into array

Question

A txt-File shall be analyzed: It has many lines each having 20 fields separated by TABs and each field can contain any type of data (Integer, FloatingPoint, DateTime, text also conatining BLANKS and "" etc.) and in addition to that fields can be empty, e.g. such a line would start like

111TABTABWalterTAB11.1234TABThis is a sample TextTAB"Another sample"TABTABTAB555...

How can I read each line of the file into an array arrLine having 20 columns, e.g.

arrLine(0)=111
arrLine(1)=empty
arrLine(2)=Walter
...

I tried this proposal like

while IFS=$'\t' read -r -a arrLine; do echo "${#arrLine[@]} items: ${arrLine[@]}" echo "${arrLine[3]} || ${arrLine[4]} || ${arrLine[5]}" done < "test.txt"

but empty colums are not filled into the array.

Thank you!

Added my actual code (from StackOverflow) but it doesn't consider empty cells — StOMicha
– StOMicha, Commented Mar 4, 2021 at 18:18

Socowi · Accepted Answer · 2021-03-04 21:26:30Z

With IFS="anyWithespaceCharacter" read there is no way to read an empty field between two other fields; with non-whitespace characters this would work. This inconsistent behavior is dictated by posix:

The term " IFS white space" is used to mean any sequence (zero or more instances) of white-space characters that are in the IFS value (for example, if IFS contains <space>/<comma>/<tab>, any sequence of <space> and <tab> characters is considered IFS white space).

IFS white space shall be ignored at the beginning and end of the input.

Each occurrence in the input of an IFS character that is not IFS white space, along with any adjacent IFS white space, shall delimit a field, as described previously.

Non-zero-length IFS white space shall delimit a field.

But you still can use mapfile to split each line into fields:

while IFS= read -r line; do mapfile -td $'\t' arrLine < <(printf %s "$line") declare -p arrLine # prints the array for debugging done < test.txt

Alternatively, swap the whitespace delimiter (in this case tab) for any other non-whitespace character that does not appear in the input. In this case we use the ascii symbol "unit separator" \037.

while IFS=$'\037' read -ra arrLine; do declare -p arrLine # prints the array for debugging done < <(tr \\t \\037 < test.txt)

To visualize the content of array arrLine I recommend: declare -p arrLine
Thank you Socowi, looks perfect. @Cyrus: I don't understand your comment, where to add that code?
@StOMicha: To quickly get an overview of what is in array arrLine, you can remove both lines with echo and put there one declare -p arrLine.
@Socowi The alternative solution may not work if the line already contains a unit separator (\037) character (unusual, but not impossible). But there is a remedy for that: Replace the tr \\t \\037 with tr '\t\037' '\037\t' and add this as a first line within the while loop: arrLine=("${arrLine[@]//$'\t'/$'\037'}")

dpippen · Accepted Answer · 2021-03-04 18:58:38Z

You can split an input line into an array using IFS but in this case bash wants to glob the tab so you lose columns if there are consecutive tabs. You can sidestep that by translating the tabs to a different delimiter.

#IFS=$'\t' IFS='|' while read -a arrLine; do for i in {0..19}; do echo "arrLine [$i]: ${arrLine[$i]}" done done < <(cat input.txt | tr '\t' '|') arrLine [0]: 111 arrLine [1]: arrLine [2]: Walter arrLine [3]: 11.1234 arrLine [4]: This is a sample Text arrLine [5]: "Another sample" arrLine [6]: arrLine [7]: arrLine [8]: 555... arrLine [9]: arrLine [10]: etc.....

Collectives™ on Stack Overflow

BASH: How to read TAB-separated line with empty colums from file into array

2 Answers 2

7 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Linked

Related