Using **Raku** (formerly known as Perl_6)
```
~$ raku -ne 'BEGIN put get; \
my @a = .split(:skip-empty, / \t /, 3); \
@a[2] = (@a[2] // "").comb(/ GO\: \d+ /).join(","); \
@a.join("\t").trim-trailing.put;' file
```
Here's an answer coded in Raku, a member of the Perl-family of programming languages. Going line by line:
1. The `BEGIN` statement out`put`s the header line (can be omitted if the header line is `\t` tab separated like the body rows).
2. The body rows (lines) can be split on `\t`. It might be possible to split on `\s**4` i.e. four-consecutive whitespace characters, or even or `\h**4` four-consecutive _horizontal_ whitespace characters.
3. The third column (i.e. `@a[2]`) is replaced by `@a[2]` column text that has been `comb`ed (i.e. positively-selected) for a match to `GO\: \d+`. Think of `comb` as the inverse of `split`. These selected `GO` ids are then joined with commas.
4. Finally, the `split` columns are `join`ed back together on `\t` tabs, and out`put`.
Sample Input:
ID transcript_id go_description
MA_10000213g0010 MA_10000213g0010
MA_10000405g0010 MA_10000405g0010 GO:0006468-protein phosphorylation;GO:0030246-carbohydrate binding;GO:0005524-ATP binding;GO:0004672-protein kinase activity
MA_1000049g0010 MA_1000049g0010
MA_10000516g0010 MA_10000516g0010 GO:0005515-protein binding
MA_10001015g0010 MA_10001015g0010
MA_10001337g0010 MA_10001337g0010
MA_10001425g0010 MA_10001425g0010
MA_10001478g0010 MA_10001478g0010
MA_10001558g0010 MA_10001558g0010
MA_10001g0010 MA_10001g0010
MA_10002030g0010 MA_10002030g0010 GO:0005737-cytoplasm;GO:0000184-nuclear-transcribed mRNA catabolic process, nonsense-mediated decay;GO:0004386-helicase activity;GO:0008270-zinc ion binding;GO:0003677-DNA binding;GO:0005524-ATP binding
MA_10002157g0010 MA_10002157g0010 GO:0006468-protein phosphorylation;GO:0005524-ATP binding;GO:0004672-protein kinase activity
MA_10002549g0010 MA_10002549g0010
MA_10002583g0010 MA_10002583g0010 GO:0008168-methyltransferase activity
MA_10002614g0010 MA_10002614g0010
MA_10002643g0010 MA_10002643g0010 GO:0055114-oxidation-reduction process
Sample Output:
ID transcript_id go_description
MA_10000213g0010 MA_10000213g0010
MA_10000405g0010 MA_10000405g0010 GO:0006468,GO:0030246,GO:0005524,GO:0004672
MA_1000049g0010 MA_1000049g0010
MA_10000516g0010 MA_10000516g0010 GO:0005515
MA_10001015g0010 MA_10001015g0010
MA_10001337g0010 MA_10001337g0010
MA_10001425g0010 MA_10001425g0010
MA_10001478g0010 MA_10001478g0010
MA_10001558g0010 MA_10001558g0010
MA_10001g0010 MA_10001g0010
MA_10002030g0010 MA_10002030g0010 GO:0005737,GO:0000184,GO:0004386,GO:0008270,GO:0003677,GO:0005524
MA_10002157g0010 MA_10002157g0010 GO:0006468,GO:0005524,GO:0004672
MA_10002549g0010 MA_10002549g0010
MA_10002583g0010 MA_10002583g0010 GO:0008168
MA_10002614g0010 MA_10002614g0010
MA_10002643g0010 MA_10002643g0010 GO:0055114
https://docs.raku.org
https://raku.org