Removing duplicates on a variable without sorting

Question

I have a variable that contains the following space separated entries.

variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana"

How do I remove the duplicates without sorting?

#Something like this. new_variable="apple lemon papaya avocado grapes mango banana"

I have found somewhere a script that accomplish removing the duplicates of a variable, but does sort the contents.

#Not something like this. new_variable=$(echo "$variable"|tr " " "\n"|sort|uniq|tr "\n" " ") echo $new_variable apple avocado banana grapes lemon mango papaya

SiegeX · Accepted Answer · 2009-12-09 17:30:57Z

new_variable=$( awk 'BEGIN{RS=ORS=" "}!a[$0]++' <<<$variable );

Here's how it works:

RS (Input Record Separator) is set to a white space so that it treats each fruit in $variable as a record instead of a field. The non-sorting unique magic happens with !a[$0]++. Since awk supports associative arrays, it uses the current record ($0) as the key to the array a[]. If that key has not been seen before, a[$0] evaluates to '0' (awk's default value for unset indices) which is then negated to return TRUE. I then exploit the fact that awk will default to 'print $0' if an expression returns TRUE and no '{ commands }' are given. Finally, a[$0] is then incremented such that this key can no longer return TRUE and thus repeat values are never printed. ORS (Output Record Separator) is set to a space as well to mimic the input format.

A less terse version of this command which produces the same output would be the following:

awk 'BEGIN{RS=ORS=" "}{ if (a[$0] == 0){ a[$0] += 1; print $0}}'

Gotta love awk =)

EDIT

If you needed to do this in pure Bash 2.1+, I would suggest this:

#!/bin/bash variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" temp="$variable" new_variable="${temp%% *}" while [[ "$temp" != ${new_variable##* } ]]; do temp=${temp//${temp%% *} /} new_variable="$new_variable ${temp%% *}" done echo $new_variable;

Simply testing for membership is better than counting: awk 'BEGIN{RS=ORS=" "} { if (!($0 in a)) { a[$0]; print } }' Or more tersely: awk 'BEGIN{RS=ORS=" "} !($0 in a || a[$0])'
@Mark: Doing a 'time' over a loop of 10,000 iterations shows that yours is just over 3% slower. Not very much but nonetheless, not better. This difference will only become larger as the number of elements grows since your version takes O(n) time while mine is always a constant O(1).
Really nice solution, thanks. Except if duplicates are found at the end, one after the other then it doesn't work. Ex: variable="apple lemon papaya papaya" prints: apple lemon papaya papaya. Whereas if I have: variable="apple lemon papaya papaya mango" then it removes the duplicate papaya and prints: apple lemon papaya mango. Thoughts?
Found the following solution which helped with the problem outlined in my previous comment: stackoverflow.com/questions/46185241/… Thank you for sharing your solution.

Mark Edgar · Accepted Answer · 2009-12-09 12:34:50Z

6

This pipeline version works by preserving the original order:

variable=$(echo "$variable" | tr ' ' '\n' | nl | sort -u -k2 | sort -n | cut -f2-)

answered Dec 9, 2009 at 12:34

Mark Edgar

4,8872 gold badges26 silver badges18 bronze badges

1 Comment

MiloDC Over a year ago

This is the only solution here that worked for me. The awk solution still had duplicates. Thanks.

Fritz G. Mehner · Accepted Answer · 2009-12-10 09:10:58Z

Pure Bash:

variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" declare new_value='' for item in $variable; do if [[ ! $new_value =~ $item ]] ; then # first time? new_value="$new_value $item" fi done new_value=${new_value:1} # remove leading blank

Good solution, but note that this locks you into Bash 3.X due to the '=~' operator.

Idelic · Accepted Answer · 2009-12-11 04:13:36Z

In pure, portable sh:

 words="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" seen= for word in $words; do case $seen in $word\ * | *\ $word | *\ $word\ * | $word) # already seen ;; *) seen="$seen $word" ;; esac done echo $seen

ghostdog74 · Accepted Answer · 2009-12-09 13:32:25Z

shell

declare -a arr variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" set -- $variable count=0 for c in $@ do flag=0 for((i=0;i<=${#arr[@]}-1;i++)) do if [ "${arr[$i]}" == "$c" ] ;then flag=1 break fi done if [ "$flag" -eq 0 ] ; then arr[$count]="$c" count=$((count+1)) fi done for((i=0;i<=${#arr[@]}-1;i++)) do echo "result: ${arr[$i]}" done

Result when run:

linux# ./myscript.sh result: apple result: lemon result: papaya result: avocado result: grapes result: mango result: banana

OR if you want to use gawk

awk 'BEGIN{RS=ORS=" "} (!($0 in a) ){a[$0];print}'

Dimitre Radoulov · Accepted Answer · 2009-12-09 19:42:34Z

Z Shell:

% variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" % print ${(zu)variable} apple lemon papaya avocado grapes mango banana

Jahid · Accepted Answer · 2015-05-18 07:24:08Z

Another awk solution:

#!/bin/bash variable="apple lemon papaya avocado lemon grapes papaya apple avocado mango banana" variable=$(printf '%s\n' "$variable" | awk -v RS='[[:space:]]+' '!a[$0]++{printf "%s%s", $0, RT}') variable="${variable%,*}" echo "$variable"

Output:

apple lemon papaya avocado grapes mango banana

Chris Koknat · Accepted Answer · 2015-10-29 00:24:18Z

Perl solution:

perl -le 'for (@ARGV){ $h{$_}++ }; for (keys %h){ print $_ }' $variable

@ARGV is the list of input parameters from $variable
Loop through the list, populating the h hash with the loop variable $_
Loop through the keys of the h hash, and print each one

grapes avocado apple lemon banana mango papaya

This variation prints the output sorted first by frequency $h{$a} <=> $h{$b} and then alphabetically $a cmp $b

perl -le 'for (@ARGV){ $h{$_}++ }; for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" }' $variable

1 banana 1 grapes 1 mango 2 apple 2 avocado 2 lemon 2 papaya

This variation produces the same output as the last one.
However, instead of an input shell variable, uses an input file 'fruits', with one fruit per line:

perl -lne '$h{$_}++; END{ for (sort { $h{$a} <=> $h{$b} || $a cmp $b } keys %h){ print "$h{$_}\t$_" } }' fruits

Collectives™ on Stack Overflow

Removing duplicates on a variable without sorting

8 Answers 8

4 Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

4 Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

Linked

Related