added 16 characters in body

edited Feb 15, 2021 at 19:29

111
4

I have a space-separated text file, e.g.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3,id4 text4c text5c text1d text2d id5,id6,id7 text4d,text4di text5d

The file is about 1.5 million lines long.

Some lines have two ids, separated by a comma, e.g. line 3 in the example. This is causing issues when attempting to join the file with another file in which the id could either be id3 or id4.

I would like to find all instances of column 3 in which a comma is present, and separate whatever is on either side into separate lines, e.g the above file would turn into.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3 text4c text5c text1c text2c id4 text4c text5c text1d text2d id5 text4d,text4di text5d text1d text2d id6 text4d,text4di text5d text1d text2d id7 text4d,text4di text5d

There are rows that contain 3 or more comma-separated ids. Commas can appear in other columns but they should stay as they are. The order does not matter, e.g. whether id3 or id4 come first in the file.

I'm fairly inexperienced with awk, sed etc, which I assume is the best tool for this job.

Would anyone be able to point me in the right direction please?

I have a space-separated text file, e.g.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3,id4 text4c text5c text1d text2d id5 text4d text5d

The file is about 1.5 million lines long.

Some lines have two ids, separated by a comma, e.g. line 3 in the example. This is causing issues when attempting to join the file with another file in which the id could either be id3 or id4.

I would like to find all instances of column 3 in which a comma is present, and separate whatever is on either side into separate lines, e.g the above file would turn into.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3 text4c text5c text1c text2c id4 text4c text5c text1d text2d id5 text4d text5d

There are rows that contain 3 or more comma-separated ids. Commas can appear in other columns but they should stay as they are.

I'm fairly inexperienced with awk, sed etc, which I assume is the best tool for this job.

Would anyone be able to point me in the right direction please?

I have a space-separated text file, e.g.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3,id4 text4c text5c text1d text2d id5,id6,id7 text4d,text4di text5d

The file is about 1.5 million lines long.

Some lines have two ids, separated by a comma, e.g. line 3 in the example. This is causing issues when attempting to join the file with another file in which the id could either be id3 or id4.

I would like to find all instances of column 3 in which a comma is present, and separate whatever is on either side into separate lines, e.g the above file would turn into.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3 text4c text5c text1c text2c id4 text4c text5c text1d text2d id5 text4d,text4di text5d text1d text2d id6 text4d,text4di text5d text1d text2d id7 text4d,text4di text5d

There are rows that contain 3 or more comma-separated ids. Commas can appear in other columns but they should stay as they are. The order does not matter, e.g. whether id3 or id4 come first in the file.

I'm fairly inexperienced with awk, sed etc, which I assume is the best tool for this job.

Would anyone be able to point me in the right direction please?

Improved formatting, text to code

Source Link

edited Feb 15, 2021 at 18:41

schrodingerscatcuriosity

12.8k
5
38
64

I have a space-separated text file, e.g.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3,id4 text4c text5c text1d text2d id5 text4d text5d

The file is about 1.5 million lines long.

Some lines have two ids, separated by a comma, e.g. line 3 in the example. This is causing issues when attempting to join the file with another file in which the id could either be id3id3 or id4id4.

I would like to find all instances of column 3 in which a comma is present, and separate whatever is on either side into separate lines, e.g the above file would turn into.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3 text4c text5c text1c text2c id4 text4c text5c text1d text2d id5 text4d text5d

There are rows that contain 3 or more comma-separated ids. Commas can appear in other columns but they should stay as they are.

I'm fairly inexperienced with awkawk, sedsed etc, which I assume is the best tool for this job.

Would anyone be able to point me in the right direction please?

I have a space-separated text file, e.g.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3,id4 text4c text5c text1d text2d id5 text4d text5d

The file is about 1.5 million lines long. Some lines have two ids, separated by a comma, e.g. line 3 in the example. This is causing issues when attempting to join the file with another file in which the id could either be id3 or id4. I would like to find all instances of column 3 in which a comma is present, and separate whatever is on either side into separate lines, e.g the above file would turn into.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3 text4c text5c text1c text2c id4 text4c text5c text1d text2d id5 text4d text5d

There are rows that contain 3 or more comma-separated ids. Commas can appear in other columns but they should stay as they are. I'm fairly inexperienced with awk, sed etc, which I assume is the best tool for this job. Would anyone be able to point me in the right direction please?

I have a space-separated text file, e.g.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3,id4 text4c text5c text1d text2d id5 text4d text5d

The file is about 1.5 million lines long.

Some lines have two ids, separated by a comma, e.g. line 3 in the example. This is causing issues when attempting to join the file with another file in which the id could either be id3 or id4.

I would like to find all instances of column 3 in which a comma is present, and separate whatever is on either side into separate lines, e.g the above file would turn into.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3 text4c text5c text1c text2c id4 text4c text5c text1d text2d id5 text4d text5d

There are rows that contain 3 or more comma-separated ids. Commas can appear in other columns but they should stay as they are.

I'm fairly inexperienced with awk, sed etc, which I assume is the best tool for this job.

Would anyone be able to point me in the right direction please?

Source Link

asked Feb 15, 2021 at 18:33

E. Rei

111
4

Split a row by a delimiter

I have a space-separated text file, e.g.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3,id4 text4c text5c text1d text2d id5 text4d text5d

The file is about 1.5 million lines long. Some lines have two ids, separated by a comma, e.g. line 3 in the example. This is causing issues when attempting to join the file with another file in which the id could either be id3 or id4. I would like to find all instances of column 3 in which a comma is present, and separate whatever is on either side into separate lines, e.g the above file would turn into.

text1a text2a id1 text4a text5a text1b text2b id2 text4b text5b text1c text2c id3 text4c text5c text1c text2c id4 text4c text5c text1d text2d id5 text4d text5d

There are rows that contain 3 or more comma-separated ids. Commas can appear in other columns but they should stay as they are. I'm fairly inexperienced with awk, sed etc, which I assume is the best tool for this job. Would anyone be able to point me in the right direction please?

Stack Exchange Network

Return to Question

Split a row by a delimiter