Find a value in a specific position when the only information you have is the position

Question

I have a csv file approximately 16,000 rows long, with two fields. The first field contains a list of numeric values, and the second field contains a list of first and names delimited by semi-colons e.g.

3, Jack Mackie; Hanna Jones; Mike Freeland; Ollie Downs; Farrah Anderson; Judy John 9, Jewel Woodley; Jean Sullivan; Marcia Robin; Kerry Morton; Joelle Armour; Zakiya Pulwarty; Karen Thornhill; Shurm Ahmet; Ed Aslan; Adam Condell; Zeliha Manners; Joan Johnson 5, Haydn Smart; Andre Henry; Tamara Brownbill; Kelly Withers; Eden Anderson; Naomi Casa; Azaria Amritt; Jamile Newton; Nabahe Durand

The name listed in the second field which corresponds to the numeric position in the first field, is the team leader e.g. the team leader in the first row is Mike Freeland (position 3), in the second row is Ed Aslan (position 9), and in the third row is Eden Anderson (position 5). I need to extract all the names of the team leaders.

I'm trying to write a shell script to extract all the names of the team leaders, run it against my csv file, then output it to a new file.

I have been researching how to use 'grep', or 'awk' plus 'FS' (FS to specify the semi-colon as the delimiter instead of whitespace) to find the information, but I don't know how to incorporate the value in the first field as the selection criteria. Every example that I have seen uses these commands to search for a known value or string. In this case, however, I only know the position of the value (first and last name). Am I looking at the right commands?

I have not been able to come up with a script. How do I extract the names of the team leaders?

Your description of the file format is confusing. You say “the first field contains a list of numeric values, and the second field contains a list of first and names delimited by semi-colons”. So I expected to see data like 42;83.6;17, John;Paul;George. If you mean that the first column contains a list of numbers, that’s implied by the fact that you have a CSV file with many rows. Your example data suggest that the first field contains a positive integer. Also, I guess you meant “first and last names”. — G-Man Says 'Reinstate Monica'
– G-Man Says 'Reinstate Monica', Commented Apr 21, 2020 at 6:32
Thank you - yes the first column contains a positive integer in each row. The second column contains a list of first and last names delimited by semi-colons. — nattac
– nattac, Commented Apr 22, 2020 at 20:36

steeldriver · Accepted Answer · 2020-04-21 00:17:01Z

3

$ awk -F, '{split($2,names,";"); print names[$1]}' file.csv Mike Freeland Ed Aslan Eden Anderson

answered Apr 21, 2020 at 0:17

steeldriver

83.9k12 gold badges124 silver badges175 bronze badges

Add a comment |

Freddy · Accepted Answer · 2020-04-21 00:36:00Z

$ awk -F'[,;] ' '{print $($1 + 1)}' file Mike Freeland Ed Aslan Eden Anderson

Change the field separator to '[,;] ', i.e. comma or semicolon followed by a space character. Then get the value of the first field $1, add one to it and print the value of that field $(...).

Kusalananda · Accepted Answer · 2024-04-22 09:53:59Z

Using Miller (mlr) and assuming that there are no empty lines in the input data:

$ mlr --csv -N put -q 'print clean_whitespace(splita($2, ";")[$1])' file Mike Freeland Ed Aslan Eden Anderson

This splits the values in the second field in the header-less CSV input into an array on the ; characters, and extracts the element given by the value in the first field. The excessive whitespace characters are removed from the values before they are printed.

Stack Exchange Network

Find a value in a specific position when the only information you have is the position

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Find a value in a specific position when the only information you have is the position

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions