How to use a loop to chain piped commands together?

Question

I would like to write a function with a loop to construct the command:

cat example.txt | sed ′s/A/1//' | sed ′s/B/2//' | sed ′s/C/3//' | sed ′s/D/4//' ...

while taking in a string of A B C D.

example.txt

A B C D

Here is what I have come up with so far but I do not know the syntax to chain piped commands together. I was thinking that I could use echo to construct the string version of the command and then execute it that way but I am wondering if there is a better way to do this.

elements="A B C D" n=1 for i in $elements ; do cat example.txt | sed "s/$i/$n/g" n=$(($n+1)) done

The output makes sense given the commands that I have generated: cat example.txt | sed "s/A/1/g" cat example.txt | sed "s/B/2/g" cat example.txt | sed "s/C/3/g" cat example.txt | sed "s/D/4/g"

but as stated above I would like to "chain" pipe them together.

BTW, it's very deliberate that my answer isn't using cat. Using cat file results in the component to the right of it in a pipeline getting a FIFO that it can only read once, front-to-back. For some programs this forces them to be very inefficient compared to getting a real file descriptor -- for example, if you give sort a real FD it can split up into threads and have each one process a piece of the file in parallel; if you give tail a real FD it can skip straight to the end no matter how long it is, etc — Charles Duffy
– Charles Duffy, Commented Mar 28, 2024 at 18:21
Also see BashFAQ/119 - What's the difference between "cmd < file" and "cat file | cmd"? What is a UUOC?. — pjh
– pjh, Commented Mar 28, 2024 at 21:53

Charles Duffy · Accepted Answer · 2024-03-28 18:22:59Z

Since you tagged this sed, I'm assuming a sed-specific answer is acceptable. This doesn't require dynamic pipeline elements at all: You can add more operations to a single sed command by adding to its command line argument list.

#!/usr/bin/env bash # ^^^^- NOT /bin/sh; arrays are a non-POSIX feature elements=( A B C D ) # use a proper array, not a string n=1 args=( ) # likewise, using a real array here too for i in "${elements[@]}"; do # iterate over input array args+=( -e "s/$i/$n/g" ) # append to operation array n=$(($n+1)) done sed "${args[@]}" <example.txt # expand command array onto sed command line

For an approach that doesn't take advantage of sed behavior and is instead generating a pipeline dynamically (without using eval, which makes it much easier but introduces security problems without used without great care), see the answer to Handling long edit lists in XMLStarlet

pjh · Accepted Answer · 2024-03-29 00:10:59Z

Although it's not necessary for what you are trying to do (just pass multiple commands to a single sed process), it is possible to build pipelines of commands dynamically in Bash.

This Shellcheck-clean Bash code demonstrates one way to do it:

#! /bin/bash -p function run_sed_pipeline { local pipecmd='' i for ((i=1; i<=$#; i++)); do pipecmd+="${pipecmd:+ | }sed \"s/\${$i}/$i/\"" done printf 'DEBUG: EVAL: %s\n' "$pipecmd" >&2 eval "$pipecmd" } run_sed_pipeline A B C D <example.txt

When the code is run it produces output:

DEBUG: EVAL: sed "s/${1}/1/" | sed "s/${2}/2/" | sed "s/${3}/3/" | sed "s/${4}/4/" 1 2 3 4

The basic idea is to build up a pipeline of commands in a string variable and use eval to run it.
There are serious pitfalls associated with eval and it is best avoided. See Why should eval be avoided in Bash, and what should I use instead?. Also see BashFAQ/048 (Eval command and security issues).
I think that the code here avoids significant eval pitfalls, but I could be wrong. The main thing that the code does to avoid problems is to refrain from putting the expanded function arguments (A B C D in this example) in the string to be evaled. Instead, the only expansions in the command string are (quoted) ${1}, ${2}, ${3}, and ${4}. This ensures that embedded expansions, or quotes etc., in the function arguments will not cause problems.

Another way to create a dynamic pipeline of commands is to use a recursive function, as with this Shellcheck-clean code:

#! /bin/bash -p function run_sed_pipeline { if (( $# < 2 )); then cat else sed "s/$2/$1/" | run_sed_pipeline "$(($1+1))" "${@:3}" fi } run_sed_pipeline 1 A B C D <example.txt

When the code is run it produces output:

1 2 3 4

The first argument to the function is the number to substitute for the first (remaining) argument to be replaced. Each recursive call increments the first argument by one and removes the first of the following (remaining) arguments.
This code features an unusual "useless use of cat" (UUoC). The last process in the pipeline is a useless cat. It's easy to avoid, but doing so makes the code a bit more complicated so I left it as it is for this (illustrative) example.

Hmm -- clever. Generating positional argument expansions is a sound practice here.
@pjh Unfortunately the recursive function method is inefficient since it doesn't exec(3) the pipe subshell. For every recursion you have an unnecessary subshell running just waiting for its own child process subshell pipe to close. It's what kept me from answering it. lastpipe might help but I'm not sure if it can be safely used recursively. Plus it requires job control to be inactive. While job control is always disabled and isn't always necessary in a script, it's still something to consider.

potong · Accepted Answer · 2024-03-29 09:13:03Z

This might work for you (GNU sed & parallel):

parallel echo 's/{1}/{2}/g' ::: $elements :::+ {1..999} | sed -f - file

Use parallel to build up the sed commands into a file which is imported using the -f option on a piped invocation of sed.

N.B. If there are more than 999 elements, then increase the argument. Beware of overlapping patterns!

Collectives™ on Stack Overflow

How to use a loop to chain piped commands together?

3 Answers 3

Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Linked

Related