7

I have some SLES 12 SP5 machines with grep version 2.16 and on a single machine I'm heavily using scripts which contained the following grep --quiet condition:

# $pid_list contains the result of pstree and $script_pid equals $$ if echo "$pid_list" | grep -qF "($script_pid)"; then continue fi if echo "$pid_list" | grep -qF "($script_pid)"; then echo "Error: grep has a bug!" continue fi 

I doubled it, because by a chance of ~0.1% the first condition failed, while the second identical condition succeeded?!

After changing the condition as follows, it works flawlessly (Full code here):

if echo "$pid_list" | grep -F "($script_pid)" >/dev/null; then continue fi 

Regarding the manual the quiet option should behave as I would expect it. It should even return true if an error happens:

Exit immediately with zero status if any match is found, even if an error was detected

So I'm confused why it fails sometimes. RAM and filesystem of the machine is fine. The grep binary has the correct file hash, too.

I searched for a commit, but the only one I found is from 2001, which should be part of 2.16 as it is from 2014.

Update1

I tried to use a subshell as @kamil suggested, but it still fails (the "Race condition"-error is displayed sometimes):

if (echo "$pid_list"; true) | grep -qF "($script_pid)"; then continue elif (echo "$pid_list"; true) | grep -qF "($script_pid)"; then echo "Error: Race condition!" continue fi 

This instead works:

if echo "$pid_list" | grep -qF "($script_pid)" || [[ $? -eq 141 ]]; then continue fi 
5
  • 5
    would you have a text file containing an example pid_list and a concrete example of $script_pid? I know it sounds like it's just as given, but we're looking at an "unlikely" bug, so being able to have a fully reproducible test case that we can a couple hundred times until you're sure it would have failed on your machine is necessary. Commented Apr 18, 2023 at 8:41
  • 3
    continue has no point outside of a loop, so please show the loop in question. Commented Apr 18, 2023 at 8:44
  • 2
    BTW: You don't need echo and grep for substring matching, the shell has built-in means, e.g. case "$pid_list" in ; *"($script_pid)"*) do_something ;; esac Commented Apr 18, 2023 at 9:03
  • 1
    Show a complete (but minimal) script that exhibits the issue, including the exact values of $pid_list and a $script_pid. Commented Apr 18, 2023 at 9:32
  • @MarcusMüller Link to full code added to the question. Commented Apr 18, 2023 at 11:04

1 Answer 1

13

Hypothesis: in the script you did set -o pipefail (if the shell interpreting the script is Bash; other shells may provide similar functionality). From man 1 bash:

pipefail
If set, the return value of a pipeline is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands in the pipeline exit successfully. This option is disabled by default.

When there is a match, grep -q exits early and returns 0. Then echo may or may not return 141. It depends on if echo manages to write everything to the pipe buffer before grep closes its end of the pipe. Even for the same input things may go one way or another, it's a race condition. If 141 happens then the return value of the pipe will be 141 because of pipefail.

Assuming you really used set -o pipefail, I say the behavior you observed is absolutely not a bug of grep nor the shell. The bug is in your script.

My answer under the linked question provides solutions. The simplest one for you is:

if (/bin/echo "$pid_list"; true) | grep -qF "($script_pid)"; then 

or

if ((echo "$pid_list"); true) | grep -qF "($script_pid)"; then 

Why /bin/echo … or (echo …) instead of echo? Because your echo is most likely a shell builtin and when "it" gets SIGPIPE then it's really the whole subshell that gets SIGPIPE and gets killed. We don't want the subshell that is supposed to execute true to be killed. See the beginning of this answer.

Check the whole script; other conditionals may be buggy in the same way.

9
  • 1
    Yes, you are right. I'm using pipefail and then it should be as you explained. Thank you! Commented Apr 18, 2023 at 10:44
  • 1
    It has been requested for shellcheck: github.com/koalaman/shellcheck/issues/1109 Commented Apr 18, 2023 at 11:21
  • @mgutt Replacing echo ... | grep ... with shell builtins for substring matching would avoid the pipe and the resulting issue related to pipefail. Commented Apr 18, 2023 at 12:12
  • @Bodo Thank you. I already read your above comment. But even if I use the binary operator =~ instead, I still could face this problem with other pipes to head or tail. Commented Apr 18, 2023 at 14:22
  • 3
    if errexit is also enabled, you'll need /bin/echo ... || true instead of /bin/echo ...; true. There is a misguided recommendation to use set -euo pipefail floating around that many people seem to be following these days even though errexit and pipefail introduce more problems than they fix. Here (set +o pipefail; echo ... | grep ...) would probably be better. But best would be not to set errexit and pipefail globally and do proper error handling and set pipefail only when needed. Commented Apr 18, 2023 at 19:18

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.