1

Sample input file:

#name complete(cs) len(cs) simple(ss) len(ss) position(ss) NAME1 A0AAA000AAA00A 14 AAAAAAAA 8 4,6 NAME2 AAAA0AA00000A 13 AAAAAAA 7 7 

Let's say I'm interested in knowing the corresponding position in the complete string (cs) of some letters of the simplified string(ss), given in the position(ss) column. Note: in the simple string(ss), only letters are allowed. In the complete string, every character's allowed.

In this exemple, it would return:

Sample output file:

#name complete(cs) len(cs) simple(ss) len(ss) pos(ss) pos(cs) NAME1 A0AAA000AAA00A 14 AAAAAAAA 8 4,6 5,10 NAME2 AAAA0AA00000A 13 AAAAAAA 7 7 13 

I'm currently building this file using python, but I'm sure there is an easy Unix way out.

2
  • Can you make it clearer? What is relationship between pos(ss) and pos(cs)? Commented Jul 30, 2014 at 9:00
  • pos(ss) are positions of the simple(ss) string. I'd like to find their equivalent positions pos(cs) in the complete(cs) string. For exemple, if I take the first row, position 4 of the simple(ss) is actually position 5 in the complete(ss) . That's because complete(cs) contains a 0 that causes a shift. Commented Jul 30, 2014 at 9:09

2 Answers 2

1

A perl solution:

$ perl -anle ' print "$_ position(cs)" and next if /^#/; printf "%s",$_; for $pos_ss (split ",",$F[5]) { $char = substr($F[3],$pos_ss-1,1); @cs = split //,$F[1]; @cs_idx = grep {$cs[$_] eq $char} 0..$#cs; push @res,++$cs_idx[$pos_ss-1]; } printf "%14s\n", join ",",@res; @res=(); ' file #name complete(cs) len(cs) simple(ss) len(ss) position(ss) position(cs) NAME1 A0AAA000AAA00A 14 AAAAAAAA 8 4,6 5,10 NAME2 AAAA0AA00000A 13 AAAAAAA 7 7 13 

How does it work

  • First two lines print the original entry.
  • for $pos_ss (split ",",$F[5]): we split field 6, get each index wanted in simple string.
  • $char = substr($F[3],$pos_ss-1,1): get the character at given index in simple string.
  • @cs = split //,$F[1]: we get all characters in complete string, save them to an array.
  • @cs_idx = grep {$cs[$_] eq $char} 0..$#cs: get all indexs in array @cs, which value equal $char.
  • push @res,++$cs_idx[$pos_ss-1]: save the index we wanted to array @res.
  • Last two lines just print the result we got and empty @res array for next use.
1

This can be a beginning with bash operators and hardcoded info. It is quite auto-explanatory:

#!/bin/bash word="A0AAA000AAA00A" required=(4 6) match="A" w=$word # get the positions of $match in $word while [ ! -z "$w" ]; do n=$(expr index "$w" $match) w=${w:$n} counter=$(( counter + n )) # echo "position $counter. now w=$w" pos+=($counter) done echo "All positions: ${pos[@]}" # print the position of $match in $word on positions given by $required for i in "${required[@]}" do echo "position $i: ${pos[i-1]}" done 

A generic case can be done with some kind of while read; do... done < file, fetching the necessary columns.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.