1

I have a csv where the first few rows look like this

c("4288", "57534"),MIB1 c("2272", "2385"),FHIT c("5550", "10531", "56239"),PREP c("25809", "23669"),TTLL1 

I want to manipulate the number of variables so that everything grouped in parenthesis is one variable. Unfortunately my document has several entries like line 3 where there are more than one comma separating the values inside parenthesis.

Is there a sed expression capable of manipulating only the commas inside the parenthesis?

The expected output would be something like this:

c("4288" "57534"), MIB1 c("2272" "2385"),FHIT c("5550" "10531" "56239"),PREP c("25809" "23669"),TTLL1 

Cheers.

1
  • For this to be an actual CSV file, the fields containing commas would be quoted. Commented Apr 22, 2020 at 8:41

2 Answers 2

0

Using perl instead of sed to get more advanced regular expressions:

perl -pe 's/(?:\G[^,)]*|\([^,)]*)\K,(?=.*?\))//g' input.csv c("4288" "57534"),MIB1 c("2272" "2385"),FHIT c("5550" "10531" "56239"),PREP c("25809" "23669"),TTLL1 

This will remove all commas that appear inside parenthesis.

0

Same solution I have answered here, that will also apply to your question with a bit modification here:

sed -E ':loop s/(\([^)]*),([^)]*\))/\1\2/; t loop' infile 

Breaking down:

Note: un-escaped ( or ) outside character class [...] is to used for grouping match; escaped \( or \) or within character class [...] will match literal ( and ); ^ is negation match, so [^)] matches "any single character but not a )".

then we have:

(\([^)]*): first group match, back referend \1 is referring to.
,: match a single comma.
([^)]*\)): second group match, back-reference \2 is referring to.

Considering one sample line like below and explaining on how this match works:

c(("4288", "57534", "somtoher")),d("f1", "f2", "f3"),MIB1 

this (\([^)]*),([^)]*\)) will match:

  1. from very first open parenthesis ( followed by anything but not a ) and up-to last , to the first close parenthesis ); so, first group match \1 will match (("4288", "57534", part of the sample line at above;

  2. then anything after last , to the first close parenthesis up-to first close parenthesis and ) itself will be in second group match \2; it will be "somtoher") part of the sample line above.

  3. in replacement part in \1\2, we revert the both matched groups back but dropped comma between them.

  4. :loop s///; t loop; do steps 1 to 3 in until all commas between (&) cleared in a sed's loop (loop is used as label).

    at first attempt, our sample line would change to:

    c(("4288", "57534" "somtoher")),d("f1", "f2", "f3"),MIB1 

    at second attempt would be:

    c(("4288" "57534" "somtoher")),d("f1", "f2", "f3"),MIB1 

    at third attempt would be:

    c(("4288" "57534" "somtoher")),d("f1", "f2" "f3"),MIB1 

    and so on.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.