Skip to main content
typo, cleanup
Source Link
RobertL
  • 6.9k
  • 1
  • 22
  • 39

Since awk arrays are indexed by strings, you can use one array to keep the total price for that brand so far, and use another array to keep the count of records seen for that brand.

Because "brand" is field 4, you can index the arrays in awk like this:

total_price[$4] += $3 # accumulate total price for this brand count[$4] += 1 # increment count of records for this brand 

At the end, loop through the keys to the arrays, and format the output while calculating the averages.

Since POSIX awk contains no sort function, pipe the output of the awk command to the standard Unix sort command.

Please try this:

#!/bin/sh #first_name,last_name,price_paid,brand,year #print for each brand, the average price paid awk -F, ' NR == 1 { next # skip header } { price_paid[$4] += $3  # accumulate total price for this brand  count[$4] += 1 # increment count of records for this brand } END { for (brand in price_paid) { printf "%s,%7.2f\n", brand, price_paid[brand] / count[brand] } } ' < "${1:?filename required}" | sort 
  1. Invoke the awk command, setting the Field Separator to comma (,) and passing everything between the single quote on this line and the next single quote several lines below, as the script:

     awk -F, ' 
  2. Skip Header: If the current record number is 1, then skip all processing on the current line (the first line), and get the next line of input:

     NR == 1 { next # skip header } 
  3. Accumulate Price Total Per Brand (this is executed on every line):
    The arrays price_paid and count are indexed by the brand string.
    Add the current price paid ($3) to the price_paid total for this brand.
    Increment the count of records for this brand:

     { price_paid[$4] += $3 # accumulate total price for this brand count[$4] += 1 # increment count of records for this brand } 
  4. Print the Output Table: After all input is processed, step through the keys (brand) to the price_paid array, and for each brand, print the brand and the average of price_paid for that brand:

     END { for (brand in price_paid) { printf "%s,%7.2f\n", brand, price_paid[brand] / count[brand] } } 
  5. Terminate the script argument, redirect input from the filename parameter, and pipe the output of the awk command to the sort command:

     ' < "${1:?filename required}" | sort 

The single quote (') terminates the script argument to awk. < "${1:?filename required}".
redirects the standard< input"${1:?filename ofrequired}"awk` redirects the standard input of awk from the filename specified by the first command line parameter to the script. If there is no parameter, then the shell will print an error message containing "filename required" and exit with error status.

Please try this:

#!/bin/sh #first_name,last_name,price_paid,brand,year #print for each brand, the average price paid awk -F, ' NR == 1 { next } { price_paid[$4] += $3 count[$4] += 1 } END { for (brand in price_paid) { printf "%s,%7.2f\n", brand, price_paid[brand] / count[brand] } } ' < "${1:?filename required}" | sort 
  1. Invoke the awk command, setting the Field Separator to comma (,) and passing everything between the single quote on this line and the next single quote several lines below, as the script:

     awk -F, ' 
  2. Skip Header: If the current record number is 1, then skip all processing on the current line (the first line), and get the next line of input:

     NR == 1 { next } 
  3. Accumulate Price Total Per Brand (this is executed on every line):
    The arrays price_paid and count are indexed by the brand string.
    Add the current price paid ($3) to the price_paid total for this brand.
    Increment the count of records for this brand:

     { price_paid[$4] += $3 count[$4] += 1 } 
  4. Print the Output Table: After all input is processed, step through the keys (brand) to the price_paid array, and for each brand, print the brand and the average of price_paid for that brand:

     END { for (brand in price_paid) { printf "%s,%7.2f\n", brand, price_paid[brand] / count[brand] } } 
  5. Terminate the script argument, redirect input from the filename parameter, and pipe the output of the awk command to the sort command:

     ' < "${1:?filename required}" | sort 

The single quote (') terminates the script argument to awk. < "${1:?filename required}"redirects the standard input ofawk` from the filename specified by the first command line parameter to the script. If there is no parameter, then the shell will print an error message containing "filename required" and exit with error status.

Since awk arrays are indexed by strings, you can use one array to keep the total price for that brand so far, and use another array to keep the count of records seen for that brand.

Because "brand" is field 4, you can index the arrays in awk like this:

total_price[$4] += $3 # accumulate total price for this brand count[$4] += 1 # increment count of records for this brand 

At the end, loop through the keys to the arrays, and format the output while calculating the averages.

Since POSIX awk contains no sort function, pipe the output of the awk command to the standard Unix sort command.

Please try this:

#!/bin/sh #first_name,last_name,price_paid,brand,year #print for each brand, the average price paid awk -F, ' NR == 1 { next # skip header } { price_paid[$4] += $3  # accumulate total price for this brand  count[$4] += 1 # increment count of records for this brand } END { for (brand in price_paid) { printf "%s,%7.2f\n", brand, price_paid[brand] / count[brand] } } ' < "${1:?filename required}" | sort 
  1. Invoke the awk command, setting the Field Separator to comma (,) and passing everything between the single quote on this line and the next single quote several lines below, as the script:

     awk -F, ' 
  2. Skip Header: If the current record number is 1, then skip all processing on the current line (the first line), and get the next line of input:

     NR == 1 { next # skip header } 
  3. Accumulate Price Total Per Brand (this is executed on every line):
    The arrays price_paid and count are indexed by the brand string.
    Add the current price paid ($3) to the price_paid total for this brand.
    Increment the count of records for this brand:

     { price_paid[$4] += $3 # accumulate total price for this brand count[$4] += 1 # increment count of records for this brand } 
  4. Print the Output Table: After all input is processed, step through the keys (brand) to the price_paid array, and for each brand, print the brand and the average of price_paid for that brand:

     END { for (brand in price_paid) { printf "%s,%7.2f\n", brand, price_paid[brand] / count[brand] } } 
  5. Terminate the script argument, redirect input from the filename parameter, and pipe the output of the awk command to the sort command:

     ' < "${1:?filename required}" | sort 

The single quote (') terminates the script argument to awk.
< "${1:?filename required}" redirects the standard input of awk from the filename specified by the first command line parameter to the script. If there is no parameter, then the shell will print an error message containing "filename required" and exit with error status.

Source Link
RobertL
  • 6.9k
  • 1
  • 22
  • 39

Please try this:

Script

#!/bin/sh #first_name,last_name,price_paid,brand,year #print for each brand, the average price paid awk -F, ' NR == 1 { next } { price_paid[$4] += $3 count[$4] += 1 } END { for (brand in price_paid) { printf "%s,%7.2f\n", brand, price_paid[brand] / count[brand] } } ' < "${1:?filename required}" | sort 

Annotation/Explananation

  1. Invoke the awk command, setting the Field Separator to comma (,) and passing everything between the single quote on this line and the next single quote several lines below, as the script:

     awk -F, ' 
  2. Skip Header: If the current record number is 1, then skip all processing on the current line (the first line), and get the next line of input:

     NR == 1 { next } 
  3. Accumulate Price Total Per Brand (this is executed on every line):
    The arrays price_paid and count are indexed by the brand string.
    Add the current price paid ($3) to the price_paid total for this brand.
    Increment the count of records for this brand:

     { price_paid[$4] += $3 count[$4] += 1 } 
  4. Print the Output Table: After all input is processed, step through the keys (brand) to the price_paid array, and for each brand, print the brand and the average of price_paid for that brand:

     END { for (brand in price_paid) { printf "%s,%7.2f\n", brand, price_paid[brand] / count[brand] } } 
  5. Terminate the script argument, redirect input from the filename parameter, and pipe the output of the awk command to the sort command:

     ' < "${1:?filename required}" | sort 

The single quote (') terminates the script argument to awk. < "${1:?filename required}"redirects the standard input ofawk` from the filename specified by the first command line parameter to the script. If there is no parameter, then the shell will print an error message containing "filename required" and exit with error status.