Group Together Items By Consecutive Number

Question

I have a series of files that look like the below, I need to group them into "consecutive groups", each line starts with a number, the file should be read from top to bottom, and if the next line below is the same number or 1 less than the line above, they should be "grouped" together, this should also be the case if there are multiple lines together.

The aim at the end is to have a number generated from the file of individual "groups" where the closest number to each one is more than one away. I have shown the desired output below the example file below.

78' Corner, Bristol City. Conceded by Wes Hoolahan. 75' Corner, Bristol City. Conceded by Ahmed Hegazi. 60' Corner, Bristol City. Conceded by Ahmed Hegazi. 51' Corner, Bristol City. Conceded by Sam Johnstone. 20' Corner, West Bromwich Albion. Conceded by Niki Mäenpää. 19' Corner, West Bromwich Albion. Conceded by Adam Webster. 13' Corner, Bristol City. Conceded by Ahmed Hegazi. 7' Corner, Bristol City. Conceded by Sam Johnstone. 2' Corner, Bristol City. Conceded by Sam Johnstone.

The overall aim is to get a total number where the matches in the line are more than 1 apart, so this file has 9 lines, which I can get from a simple wc -l. I want the ability to run a script / command line to get a number of independent matches.

So in the above example "19 & 20" should be grouped together, so the total count would be 8 "independent" lines. (A line counting as independent if it is at least 1 away from any other number).

If there was a line starting with 21 for example in the above example the output would still be 8 as that would be grouped with the "19 & 20" hits, it is also possible that there would be lines with the same number for example "19 & 19".

I'm not sure how possible this is without writing a more complex script to take into account the requirements but I've seen some impressive sed/awk lines in my time so may be a job for one of those.

It's good that you show the desired output, but what is the input? — RalfFriedl
– RalfFriedl, Commented Jun 15, 2019 at 22:10
The coded block above is the input I have, already, each file looks like that, so I just need something to be able to run against each file which will let me know how many "groups" of corners there were. — N App
– N App, Commented Jun 15, 2019 at 22:11

steeldriver · Accepted Answer · 2019-06-15 22:59:46Z

Since your data files are already sorted, you just need to compare the first value of each line (after the first) against the previous one - making sure they undergo numeric conversion. So if (as indicated in comments) all you want is a count, you could do:

awk ' BEGIN { if(getline == 1) {last = $1+0; c = 1}} last - $1 > 1 {c++} {last = $1+0} END {print c} ' file

BEGIN { if(getline == 1) {last = $1+0; c = 1}} - getline is a can of worms, not necessary for this (see awk.freeshell.org/AllAboutGetline), and means you can't use your script for more than one file. Just use a FNR==1 block. — Ed Morton
– Ed Morton, Commented Jun 16, 2019 at 14:28

Ed Morton · Accepted Answer · 2019-06-16 14:36:06Z

With GNU awk for ENDFILE:

$ cat tst.awk FNR==1 { prev=$1; cnt=1; fname=FILENAME; next } (prev - $1) > 1 { cnt++ } { prev = $1 } ENDFILE { print fname, cnt } $ awk -f tst.awk * file1 8 file2 3 file3 24

With any awk:

$ cat tst.awk FNR==1 { if ( NR > 1 ) { print fname, cnt } prev = $1 cnt = 1 fname = FILENAME next } (prev - $1) > 1 { cnt++ } { prev = $1 } END { print fname, cnt }

Stack Exchange Network

Group Together Items By Consecutive Number

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Group Together Items By Consecutive Number

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions