0

I have a csv file approximately 16,000 rows long, with two fields. The first field contains a list of numeric values, and the second field contains a list of first and names delimited by semi-colons e.g.

3, Jack Mackie; Hanna Jones; Mike Freeland; Ollie Downs; Farrah Anderson; Judy John 9, Jewel Woodley; Jean Sullivan; Marcia Robin; Kerry Morton; Joelle Armour; Zakiya Pulwarty; Karen Thornhill; Shurm Ahmet; Ed Aslan; Adam Condell; Zeliha Manners; Joan Johnson 5, Haydn Smart; Andre Henry; Tamara Brownbill; Kelly Withers; Eden Anderson; Naomi Casa; Azaria Amritt; Jamile Newton; Nabahe Durand 

The name listed in the second field which corresponds to the numeric position in the first field, is the team leader e.g. the team leader in the first row is Mike Freeland (position 3), in the second row is Ed Aslan (position 9), and in the third row is Eden Anderson (position 5). I need to extract all the names of the team leaders.

I'm trying to write a shell script to extract all the names of the team leaders, run it against my csv file, then output it to a new file.

I have been researching how to use 'grep', or 'awk' plus 'FS' (FS to specify the semi-colon as the delimiter instead of whitespace) to find the information, but I don't know how to incorporate the value in the first field as the selection criteria. Every example that I have seen uses these commands to search for a known value or string. In this case, however, I only know the position of the value (first and last name). Am I looking at the right commands?

I have not been able to come up with a script. How do I extract the names of the team leaders?

4
  • Your description of the file format is confusing.  You say “the first field contains a list of numeric values, and the second field contains a list of first and names delimited by semi-colons”.  So I expected to see data like 42;83.6;17, John;Paul;George.  If you mean that the first column contains a list of numbers, that’s implied by the fact that you have a CSV file with many rows.  Your example data suggest that the first field contains a positive integer.  Also, I guess you meant “first and last names”. Commented Apr 21, 2020 at 6:32
  • Thank you - yes the first column contains a positive integer in each row. The second column contains a list of first and last names delimited by semi-colons. Commented Apr 22, 2020 at 20:36
  • The script suggested by @Freddy worked. Commented Apr 22, 2020 at 20:36
  • The script suggested by @steeldriver worked. Commented Apr 22, 2020 at 20:37

3 Answers 3

3
$ awk -F, '{split($2,names,";"); print names[$1]}' file.csv Mike Freeland Ed Aslan Eden Anderson 
3
$ awk -F'[,;] ' '{print $($1 + 1)}' file Mike Freeland Ed Aslan Eden Anderson 

Change the field separator to '[,;] ', i.e. comma or semicolon followed by a space character. Then get the value of the first field $1, add one to it and print the value of that field $(...).

0

Using Miller (mlr) and assuming that there are no empty lines in the input data:

$ mlr --csv -N put -q 'print clean_whitespace(splita($2, ";")[$1])' file Mike Freeland Ed Aslan Eden Anderson 

This splits the values in the second field in the header-less CSV input into an array on the ; characters, and extracts the element given by the value in the first field. The excessive whitespace characters are removed from the values before they are printed.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.