Revisions to Why is using a shell loop to process text considered bad practice?

added 328 characters in body

edited Mar 16, 2023 at 22:22

23k
6
53
56

The accepted answer is good as it states clearly the drawbacks of parsing text files in the shell, but people have been cargo culting the main idea (mainly, that shell scripts deal poorly with text processing tasks) to criticize anything that uses a shell loop.

There is nothing inherently wrong with shell loops to the extent there is nothing wrong with loops in shell scripts or command substitutions outside of loops. It is certainly true that in most cases, you can replace them with more idiomatic constructs. For example, instead of writing

for i in $(find . -iname "*.txt"); do ... done

write this:

for i in *.txt; do ... done

In other scenarios, it is better to rely on more specialized tools such as awk, sed, cut, join, paste, datamash, miller, general purpose programming languages with good text processing capabilities (e.g. perl, python, ruby) or parsers for specific file types (XML, HTML, JSON)

Having said that, using a shell loop is the right call as long as you know:

Performance is not a priority. Is it important that your script runs fast? Are you running a task once every few hours as a cron job? Then maybe performance is not an issue. Or if it is, run benchmarks to make sure your shell loop is not a bottleneck. Intuition or preconceptions about what tool is "fast" or "slow" cannot serve as a replacement of accurate benchmarks.
Legibility is maintained. If you're adding too much logic in your shell loop that it becomes hard to follow, then you may need to rethink this approach.
Complexity does not increase substantially.
Security is preserved.
Testability doesn't become an issue. Properly testing shell script is already difficult. If using external commands makes it more difficult to know when you have a bug in your code or you're working under incorrect assumptions about return values, then that's a problem.
The shell loop has the same semantics as the alternative or the differences don't matter for what you're doing at the moment. For example, the find command above recurses into subdirectories and matches files whose names start with .. (Both are likely to have problems if you have files with spaces in their names.)

As an example demonstrating it is not an impossible task to satisfy the previous statements, this is the pattern used in an installer for a well-known commercial software:

i=1 MD5=... # embedded checksum for s in $sizes do checksum=`echo $VAR | cut -d" " -f $i` if <checksum condition>; then md5=`echo $MD5 | cut -d" " -f $i ... done

This runs for a very small number of times, its purpose is clear, it is concise and doesn't increase complexity unnecessarily, no user-controlled input is used and therefore, security is not a concern. Does it matter that it is invoking additional processes in a loop? Not at all.

The accepted answer is good as it states clearly the drawbacks of parsing text files in the shell, but people have been cargo culting the main idea (mainly, that shell scripts deal poorly with text processing tasks) to criticize anything that uses a shell loop.

There is nothing inherently wrong with shell loops to the extent there is nothing wrong with loops in shell scripts or command substitutions outside of loops. It is certainly true that in most cases, you can replace them with more idiomatic constructs. For example, instead of writing

for i in $(find . -iname "*.txt"); do ... done

write this:

for i in *.txt; do ... done

In other scenarios, it is better to rely on more specialized tools such as awk, sed, cut, join, paste, datamash, miller, general purpose programming languages with good text processing capabilities (e.g. perl, python, ruby) or parsers for specific file types (XML, HTML, JSON)

Having said that, using a shell loop is the right call as long as you know:

Performance is not a priority. Is it important that your script runs fast? Are you running a task once every few hours as a cron job? Then maybe performance is not an issue. Or if it is, run benchmarks to make sure your shell loop is not a bottleneck. Intuition or preconceptions about what tool is "fast" or "slow" cannot serve as a replacement of accurate benchmarks.
Legibility is maintained. If you're adding too much logic in your shell loop that it becomes hard to follow, then you may need to rethink this approach.
Complexity does not increase substantially.
Security is preserved.
Testability doesn't become an issue. Properly testing shell script is already difficult. If using external commands makes it more difficult to know when you have a bug in your code or you're working under incorrect assumptions about return values, then that's a problem.

As an example demonstrating it is not an impossible task to satisfy the previous statements, this is the pattern used in an installer for a well-known commercial software:

i=1 MD5=... # embedded checksum for s in $sizes do checksum=`echo $VAR | cut -d" " -f $i` if <checksum condition>; then md5=`echo $MD5 | cut -d" " -f $i ... done

This runs for a very small number of times, its purpose is clear, it is concise and doesn't increase complexity unnecessarily, no user-controlled input is used and therefore, security is not a concern. Does it matter that it is invoking additional processes in a loop? Not at all.

The accepted answer is good as it states clearly the drawbacks of parsing text files in the shell, but people have been cargo culting the main idea (mainly, that shell scripts deal poorly with text processing tasks) to criticize anything that uses a shell loop.

There is nothing inherently wrong with shell loops to the extent there is nothing wrong with loops in shell scripts or command substitutions outside of loops. It is certainly true that in most cases, you can replace them with more idiomatic constructs. For example, instead of writing

for i in $(find . -iname "*.txt"); do ... done

write this:

for i in *.txt; do ... done

In other scenarios, it is better to rely on more specialized tools such as awk, sed, cut, join, paste, datamash, miller, general purpose programming languages with good text processing capabilities (e.g. perl, python, ruby) or parsers for specific file types (XML, HTML, JSON)

Having said that, using a shell loop is the right call as long as you know:

Performance is not a priority. Is it important that your script runs fast? Are you running a task once every few hours as a cron job? Then maybe performance is not an issue. Or if it is, run benchmarks to make sure your shell loop is not a bottleneck. Intuition or preconceptions about what tool is "fast" or "slow" cannot serve as a replacement of accurate benchmarks.
Legibility is maintained. If you're adding too much logic in your shell loop that it becomes hard to follow, then you may need to rethink this approach.
Complexity does not increase substantially.
Security is preserved.
Testability doesn't become an issue. Properly testing shell script is already difficult. If using external commands makes it more difficult to know when you have a bug in your code or you're working under incorrect assumptions about return values, then that's a problem.
The shell loop has the same semantics as the alternative or the differences don't matter for what you're doing at the moment. For example, the find command above recurses into subdirectories and matches files whose names start with .. (Both are likely to have problems if you have files with spaces in their names.)

As an example demonstrating it is not an impossible task to satisfy the previous statements, this is the pattern used in an installer for a well-known commercial software:

i=1 MD5=... # embedded checksum for s in $sizes do checksum=`echo $VAR | cut -d" " -f $i` if <checksum condition>; then md5=`echo $MD5 | cut -d" " -f $i ... done

This runs for a very small number of times, its purpose is clear, it is concise and doesn't increase complexity unnecessarily, no user-controlled input is used and therefore, security is not a concern. Does it matter that it is invoking additional processes in a loop? Not at all.

clarify last sentence

Source Link

edited Jul 27, 2022 at 20:15

r_31415

514
1
4
7

The accepted answer is good as it states clearly the drawbacks of parsing text files in the shell, but people have been cargo culting the main idea (mainly, that shell scripts deal poorly with text processing tasks) to criticize anything that uses a shell loop.

There is nothing inherently wrong with shell loops to the extent there is nothing wrong with loops in shell scripts or command substitutions outside of loops. In fact,It is certainly true that in most cases, you can replace them with more idiomatic constructs. For example, instead of writing

for i in $(find . -iname "*.txt"); do ... done

write this:

for i in *.txt; do ... done

In other scenarios, it is better to rely on more specialized tools such as awk, sed, cut, join, paste, datamash, miller, general purpose programming languages with good text processing capabilities (e.g. perl, python, ruby) or parsers for specific file types (XML, HTML, JSON)

Having said that, using a shell loop is the right call as long as you know:

Performance is not a priority. Is it important that your script runs fast? Are you running a task once every few hours as a cron job? Then maybe performance is not an issue. Or if it is, run benchmarks to make sure your shell loop is not a bottleneck. Intuition or preconceptions about what tool is "fast" or "slow" cannot serve as a replacement of accurate benchmarks.
Legibility is maintained. If you're adding too much logic in your shell loop that it becomes hard to follow, then you may need to rethink this approach.
Complexity does not increase substantially.
Security is preserved.
Testability doesn't become an issue. Properly testing shell script is already difficult. If using external commands makes it more difficult to know when you have a bug in your code or you're working under incorrect assumptions about return values, then that's a problem.

As an example demonstrating it is not an impossible task to satisfy the previous statements, this is the pattern used in an installer for a well-known commercial software:

i=1 MD5=... # embedded checksum for s in $sizes do checksum=`echo $VAR | cut -d" " -f $i` if <checksum condition>; then md5=`echo $MD5 | cut -d" " -f $i ... done

This runs for a very small number of times, its purpose is clear, it is concise and doesn't increase complexity unnecessarily, no user-controlled input is used and therefore, security is not a concern. Does it matter that it is invoking additional processes in a loop? Not at all.

The accepted answer is good as it states clearly the drawbacks of parsing text files in the shell, but people have been cargo culting the main idea (mainly, that shell scripts deal poorly with text processing tasks) to criticize anything that uses a shell loop.

There is nothing inherently wrong with shell loops to the extent there is nothing wrong with loops in shell scripts or command substitutions outside of loops. In fact, in most cases, you can replace them with more idiomatic constructs. For example, instead of writing

for i in $(find . -iname "*.txt"); do ... done

write this:

for i in *.txt; do ... done

In other scenarios, it is better to rely on more specialized tools such as awk, sed, cut, join, paste, datamash, miller, general purpose programming languages with good text processing capabilities (e.g. perl, python, ruby) or parsers for specific file types (XML, HTML, JSON)

Having said that, using a shell loop is the right call as long as you know:

Performance is not a priority. Is it important that your script runs fast? Are you running a task once every few hours as a cron job? Then maybe performance is not an issue. Or if it is, run benchmarks to make sure your shell loop is not a bottleneck. Intuition or preconceptions about what tool is "fast" or "slow" cannot serve as a replacement of accurate benchmarks.
Legibility is maintained. If you're adding too much logic in your shell loop that it becomes hard to follow, then you may need to rethink this approach.
Complexity does not increase substantially.
Security is preserved.
Testability doesn't become an issue. Properly testing shell script is already difficult. If using external commands makes it more difficult to know when you have a bug in your code or you're working under incorrect assumptions about return values, then that's a problem.

As an example demonstrating it is not an impossible task to satisfy the previous statements, this is the pattern used in an installer for a well-known commercial software:

i=1 MD5=... # embedded checksum for s in $sizes do checksum=`echo $VAR | cut -d" " -f $i` if <checksum condition>; then md5=`echo $MD5 | cut -d" " -f $i ... done

This runs for a very small number of times, its purpose is clear, it is concise and doesn't increase complexity unnecessarily, no user-controlled input is used and therefore, security is not concern. Does it matter that it is invoking additional processes? Not at all.

The accepted answer is good as it states clearly the drawbacks of parsing text files in the shell, but people have been cargo culting the main idea (mainly, that shell scripts deal poorly with text processing tasks) to criticize anything that uses a shell loop.

There is nothing inherently wrong with shell loops to the extent there is nothing wrong with loops in shell scripts or command substitutions outside of loops. It is certainly true that in most cases, you can replace them with more idiomatic constructs. For example, instead of writing

for i in $(find . -iname "*.txt"); do ... done

write this:

for i in *.txt; do ... done

In other scenarios, it is better to rely on more specialized tools such as awk, sed, cut, join, paste, datamash, miller, general purpose programming languages with good text processing capabilities (e.g. perl, python, ruby) or parsers for specific file types (XML, HTML, JSON)

Having said that, using a shell loop is the right call as long as you know:

Performance is not a priority. Is it important that your script runs fast? Are you running a task once every few hours as a cron job? Then maybe performance is not an issue. Or if it is, run benchmarks to make sure your shell loop is not a bottleneck. Intuition or preconceptions about what tool is "fast" or "slow" cannot serve as a replacement of accurate benchmarks.
Legibility is maintained. If you're adding too much logic in your shell loop that it becomes hard to follow, then you may need to rethink this approach.
Complexity does not increase substantially.
Security is preserved.
Testability doesn't become an issue. Properly testing shell script is already difficult. If using external commands makes it more difficult to know when you have a bug in your code or you're working under incorrect assumptions about return values, then that's a problem.

As an example demonstrating it is not an impossible task to satisfy the previous statements, this is the pattern used in an installer for a well-known commercial software:

i=1 MD5=... # embedded checksum for s in $sizes do checksum=`echo $VAR | cut -d" " -f $i` if <checksum condition>; then md5=`echo $MD5 | cut -d" " -f $i ... done

This runs for a very small number of times, its purpose is clear, it is concise and doesn't increase complexity unnecessarily, no user-controlled input is used and therefore, security is not a concern. Does it matter that it is invoking additional processes in a loop? Not at all.

deleted 1 character in body

Source Link

edited Jul 27, 2022 at 20:05

r_31415

514
1
4
7

The accepted answer is good as it states clearly the drawbacks of parsing text files in the shell, but people have been cargo culting the main idea (mainly, that shell scripts deal poorly with text processing tasks) to criticize anything that uses a shell loop.

There is nothing inherently wrong with shell loops to the extent there is nothing wrong with loops in shell scripts or command substitutions outside of loops. In fact, in most cases, you can replace them with more idiomatic constructs. For example, instead of writing

for i in $(find . -iname "*.txt"); do ... done

write this:

for i in *.txt; do ... done

In other scenarios, it is better to rely on more specialized tools such as awk, sed, cut, join, paste, datamash, miller, general purpose programming languages with good text processing capabilities (e.g. perl, python, ruby) or parsers for specific file types (XML, HTML, JSON)

Having said that, using a shell loop is the right call as long as you know:

Performance is not a priority. Is it important that your script runs fast? Are you running a task once every few hours as a cron job? Then maybe performance is not an issue. Or if it is, run benchmarks to make sure your shell loop is not a bottleneck. Intuition or preconceptions about what tool is "fast" or "slow" cannot serverserve as a replacement of accurate benchmarks.
Legibility is maintained. If you're adding too much logic in your shell loop that it becomes hard to follow, then you may need to rethink this approach.
Complexity does not increase substantially.
Security is preserved.
Testability doesn't become an issue. Properly testing shell script is already difficult. If using external commands makes it more difficult to know when you have a bug in your code or you're working under incorrect assumptions about return values, then that's a problem.

As an example demonstrating it is not an impossible task to satisfy the previous statements, this is the pattern used in an installer for a well-known commercial software:

i=1 MD5=... # embedded checksum for s in $sizes do checksum=`echo $VAR | cut -d" " -f $i` if <checksum condition>; then md5=`echo $MD5 | cut -d" " -f $i ... done

This runs for a very small number of times, its purpose is clear, it is concise and doesn't increase complexity unnecessarily, no user-controlled input is used and therefore, security is not concern. Does it matter that it is invoking additional processes? Not at all.

The accepted answer is good as it states clearly the drawbacks of parsing text files in the shell, but people have been cargo culting the main idea (mainly, that shell scripts deal poorly with text processing tasks) to criticize anything that uses a shell loop.

There is nothing inherently wrong with shell loops to the extent there is nothing wrong with loops in shell scripts or command substitutions outside of loops. In fact, in most cases, you can replace them with more idiomatic constructs. For example, instead of writing

for i in $(find . -iname "*.txt"); do ... done

write this:

for i in *.txt; do ... done

In other scenarios, it is better to rely on more specialized tools such as awk, sed, cut, join, paste, datamash, miller, general purpose programming languages with good text processing capabilities (e.g. perl, python, ruby) or parsers for specific file types (XML, HTML, JSON)

Having said that, using a shell loop is the right call as long as you know:

Performance is not a priority. Is it important that your script runs fast? Are you running a task once every few hours as a cron job? Then maybe performance is not an issue. Or if it is, run benchmarks to make sure your shell loop is not a bottleneck. Intuition or preconceptions about what tool is "fast" or "slow" cannot server as a replacement of accurate benchmarks.
Legibility is maintained. If you're adding too much logic in your shell loop that it becomes hard to follow, then you may need to rethink this approach.
Complexity does not increase substantially.
Security is preserved.
Testability doesn't become an issue. Properly testing shell script is already difficult. If using external commands makes it more difficult to know when you have a bug in your code or you're working under incorrect assumptions about return values, then that's a problem.

As an example demonstrating it is not an impossible task to satisfy the previous statements, this is the pattern used in an installer for a well-known commercial software:

i=1 MD5=... # embedded checksum for s in $sizes do checksum=`echo $VAR | cut -d" " -f $i` if <checksum condition>; then md5=`echo $MD5 | cut -d" " -f $i ... done

This runs for a very small number of times, its purpose is clear, it is concise and doesn't increase complexity unnecessarily, no user-controlled input is used and therefore, security is not concern. Does it matter that it is invoking additional processes? Not at all.

The accepted answer is good as it states clearly the drawbacks of parsing text files in the shell, but people have been cargo culting the main idea (mainly, that shell scripts deal poorly with text processing tasks) to criticize anything that uses a shell loop.

There is nothing inherently wrong with shell loops to the extent there is nothing wrong with loops in shell scripts or command substitutions outside of loops. In fact, in most cases, you can replace them with more idiomatic constructs. For example, instead of writing

for i in $(find . -iname "*.txt"); do ... done

write this:

for i in *.txt; do ... done

In other scenarios, it is better to rely on more specialized tools such as awk, sed, cut, join, paste, datamash, miller, general purpose programming languages with good text processing capabilities (e.g. perl, python, ruby) or parsers for specific file types (XML, HTML, JSON)

Having said that, using a shell loop is the right call as long as you know:

Performance is not a priority. Is it important that your script runs fast? Are you running a task once every few hours as a cron job? Then maybe performance is not an issue. Or if it is, run benchmarks to make sure your shell loop is not a bottleneck. Intuition or preconceptions about what tool is "fast" or "slow" cannot serve as a replacement of accurate benchmarks.
Legibility is maintained. If you're adding too much logic in your shell loop that it becomes hard to follow, then you may need to rethink this approach.
Complexity does not increase substantially.
Security is preserved.
Testability doesn't become an issue. Properly testing shell script is already difficult. If using external commands makes it more difficult to know when you have a bug in your code or you're working under incorrect assumptions about return values, then that's a problem.

As an example demonstrating it is not an impossible task to satisfy the previous statements, this is the pattern used in an installer for a well-known commercial software:

i=1 MD5=... # embedded checksum for s in $sizes do checksum=`echo $VAR | cut -d" " -f $i` if <checksum condition>; then md5=`echo $MD5 | cut -d" " -f $i ... done

This runs for a very small number of times, its purpose is clear, it is concise and doesn't increase complexity unnecessarily, no user-controlled input is used and therefore, security is not concern. Does it matter that it is invoking additional processes? Not at all.

Source Link

answered Jul 27, 2022 at 20:00

r_31415

514
1
4
7

Loading

Stack Exchange Network

Return to Answer