8

A txt-File shall be analyzed: It has many lines each having 20 fields separated by TABs and each field can contain any type of data (Integer, FloatingPoint, DateTime, text also conatining BLANKS and "" etc.) and in addition to that fields can be empty, e.g. such a line would start like

111TABTABWalterTAB11.1234TABThis is a sample TextTAB"Another sample"TABTABTAB555... 

How can I read each line of the file into an array arrLine having 20 columns, e.g.

  • arrLine(0)=111
  • arrLine(1)=empty
  • arrLine(2)=Walter
  • ...

I tried this proposal like

while IFS=$'\t' read -r -a arrLine; do echo "${#arrLine[@]} items: ${arrLine[@]}" echo "${arrLine[3]} || ${arrLine[4]} || ${arrLine[5]}" done < "test.txt" 

but empty colums are not filled into the array.

Thank you!

1
  • 1
    Added my actual code (from StackOverflow) but it doesn't consider empty cells Commented Mar 4, 2021 at 18:18

2 Answers 2

11

With IFS="anyWithespaceCharacter" read there is no way to read an empty field between two other fields; with non-whitespace characters this would work. This inconsistent behavior is dictated by posix:

The term " IFS white space" is used to mean any sequence (zero or more instances) of white-space characters that are in the IFS value (for example, if IFS contains <space>/<comma>/<tab>, any sequence of <space> and <tab> characters is considered IFS white space).

  • IFS white space shall be ignored at the beginning and end of the input.
  • Each occurrence in the input of an IFS character that is not IFS white space, along with any adjacent IFS white space, shall delimit a field, as described previously.
  • Non-zero-length IFS white space shall delimit a field.

But you still can use mapfile to split each line into fields:

while IFS= read -r line; do mapfile -td $'\t' arrLine < <(printf %s "$line") declare -p arrLine # prints the array for debugging done < test.txt 

Alternatively, swap the whitespace delimiter (in this case tab) for any other non-whitespace character that does not appear in the input. In this case we use the ascii symbol "unit separator" \037.

while IFS=$'\037' read -ra arrLine; do declare -p arrLine # prints the array for debugging done < <(tr \\t \\037 < test.txt) 
Sign up to request clarification or add additional context in comments.

7 Comments

To visualize the content of array arrLine I recommend: declare -p arrLine
Thank you Socowi, looks perfect. @Cyrus: I don't understand your comment, where to add that code?
@StOMicha: To quickly get an overview of what is in array arrLine, you can remove both lines with echo and put there one declare -p arrLine.
Thanks Cyrus, perfect for debugging!
@Socowi The alternative solution may not work if the line already contains a unit separator (\037) character (unusual, but not impossible). But there is a remedy for that: Replace the tr \\t \\037 with tr '\t\037' '\037\t' and add this as a first line within the while loop: arrLine=("${arrLine[@]//$'\t'/$'\037'}")
|
2

You can split an input line into an array using IFS but in this case bash wants to glob the tab so you lose columns if there are consecutive tabs. You can sidestep that by translating the tabs to a different delimiter.

#IFS=$'\t' IFS='|' while read -a arrLine; do for i in {0..19}; do echo "arrLine [$i]: ${arrLine[$i]}" done done < <(cat input.txt | tr '\t' '|') arrLine [0]: 111 arrLine [1]: arrLine [2]: Walter arrLine [3]: 11.1234 arrLine [4]: This is a sample Text arrLine [5]: "Another sample" arrLine [6]: arrLine [7]: arrLine [8]: 555... arrLine [9]: arrLine [10]: etc..... 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.