file 1:
HOGBRM443983 -2522.00 19800826 HOGBRM445985 -2389.00 19801101 HOUSAM1891409 -1153.00 19811228 HOUSAM2004289 -650.00 19860101 HOUSAM2005991 -843.00 19860109 HOCANM388722 -1546.00 19860116 HOUSAM2007297 -1882.00 19860125 HOUSAM2007389 -1074.00 19860128 HOITAM801038516 -691.00 19860128 Columns 2 and 3 include values and birthdate information( year,month,day) of each id from column1, respectively . I want to check how many ids exist within each birth year and what are the average values (from second column) of ids across different years. For example, in file1 there are 2, 1 and 6 ids across years 1980, 1981 and 1986 respectively so the output should be:
output: 1980 2 -2455.5 1981 1 -1153.00 1986 6 -114.33 in which the first column shows the year of birth, the second column shows a number of ids within each year and the third column is the average values of ids across different years.
Considering that the real data is indeed huge, any suggestion would be appreciated.