1

I have a list of file names I have tried to extract the index between sil. and .asc and put them in a list while I do not to have the repetition of indexes in my list. The following is some part of the list of my files.

ellip5.0.apo.3.sil.16.asc ellip5.0.apo.3.sil.7.asc ellip5.0.apo.3.sil.8.asc ellip5.0.apo.4.sil.3.asc ellip5.0.apo.4.sil.14.asc ellip5.0.apo.4.sil.5.asc ellip5.0.apo.4.sil.6.asc ellip5.0.apo.4.sil.7.asc ellip5.0.apo.4.sil.8.asc ellip5.0.apo.5.sil.3.asc ellip5.0.apo.5.sil.14.asc ellip5.0.apo.5.sil.5.asc ellip5.0.apo.5.sil.6.asc ellip5.0.apo.5.sil.7.asc ellip5.0.apo.5.sil.8.asc ellip5.0.apo.6.sil.3.asc ellip5.0.apo.6.sil.4.asc ellip5.0.apo.6.sil.5.asc ellip5.0.apo.6.sil.16.asc ellip5.0.apo.6.sil.7.asc ellip5.0.apo.6.sil.8.asc ellip5.0.apo.7.sil.13.asc ellip5.0.apo.7.sil.4.asc ellip5.0.apo.7.sil.5.asc 

The following code is my attempt to make the list but it doesn't work

args=() containsElement () { local e for e in "${@:2}"; do [[ "$e" == "$1" ]] && return 0; done return 1 } for MYVAR in "ellip*.asc" j=0 for i in $(ls ellip*.asc) do INDEX=`echo $i | grep -oE 'sil.[^/]+.asc' | cut -c5- | rev | cut -c5- | rev` listcontains INDEX "${args[@]}" if [ $? == 1 ];then args[j]=$INDEX j=$(($j + 1)) echo $INDEX fi done echo ${args[@]} 

Any suggestion will be appreciated.. My expected list would be :

16 7 8 3 14 5 6 16 4 13 

and preferably a sorted list.

0

3 Answers 3

2

You can use this script in BASH 4:

# declare an associative array declare -A arr for f in ellip*.asc; do f="${f/#*sil.}" f="${f%.asc}" arr["$f"]=1 done # print sorted index values printf "%s\n" "${!arr[@]}" | sort -n 3 4 5 6 7 8 13 14 16 

In older BASH where associative array is not supported use:

declare -a arr for f in ellip*.asc; do f="${f/#*sil.}" f="${f%.asc}" arr+=("$f") done sort -un <(printf "%s\n" "${arr[@]}") 

Output:

3 4 5 6 7 8 13 14 16 
Sign up to request clarification or add additional context in comments.

6 Comments

can you please explain how f="${f/#*sil.}" and f="${f%.asc}" help to split the string? Because when I try to get the number between apo. and .sil it doesn't return the right values.
f="${f/#*sil.}" is removing everything from start to sil. and f="${f%.asc}" is removing .asc in the end. I have created al the files exactly as your question and both the scripts worked fine in my testing.
I also want to get the number between apo. and .sil but it doesn't work with changing a bit your solution. I am wondering why did you set arr["$f"]=1?
arr["$f"]=1 is storing those indexes in an array arr. Can you run unset arr; declare -A arr; for f in ellip*.asc; do echo "<$f>"; f="${f/#*sil.}"; f="${f%.asc}"; arr["$f"]=1; done and tell me what you see on your terminal. I see <ellip5.0.apo.3.sil.16.asc>, <ellip5.0.apo.3.sil.7.asc> etc in different lines.
it does return what you wrote plus when I run echo ${arr[@]}, it returns wrong answer.
|
2

I would use something like

ls ellip*.asc | cut -f 6 -d . | sort -nu 

The cut program does just what you want here, selecting the 6th field as separated by delimiters of . .

4 Comments

How could I put the output of your answer in an array?
You'll have to ask someone else -- I don't know about shell arrays!
Also I think requirement says extract the index between sil. and .asc It may not be safer to assume 6th field is index. Besides parsing ls output can be error prone.
@SteveSummit I think it will work like this arr=( $( ls ellip*.asc | cut -f 6 -d . | sort -nu)).
0

If you don't worry about using some utilities (which you probably don't, as you already have grep, cut and rev in your example), then you can do this in a oneliner:

arr=($(sed 's/ /\n/g' <<< $(echo *.sil.*.asc) |cut -d. -f6 |sort -n |uniq)) 

This will first get your file list (note that you need echo to input your file list to sed, since pathnames are not expanded after <<<), break it into lines, select the 6th field with delimiters set to ., then choose a uniqe value from each (also note that uniq needs a sorted list as input). This list is then assigned to an array.

Also also note that in your example you have:

... for i in $(ls ellip*.asc) do ... 

Here you parse the output of ls, which you should generally avoid, see here. Specifically in this case it would probably be safe, as your filenames have a fixed format.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.