How to delete every two lines after 3rd lines in a file contains very large number of lines? [duplicate]

Question

Like
If I have :

1st line (keep) 2nd line (keep) 3rd line (keep) 4rth lines (delete) 5th (del) 6th (keep) 7nth (keep) 8th lines (keep) 9th (del) 10th (del) 11th (keep) 12th (keep) 13th (keep) 14th (del) 15th (del)

etc....

increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3) — ChuckCottrill
– ChuckCottrill, Commented Mar 30, 2019 at 4:04
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be print lines 1,2,3 out of each 5 lines for ex: seq 15 | awk 'BEGIN { a[1] a[2] a[3] }; NR % 5 in a' and seq 15 | sed -n 'p;n;p;n;p;n;n' — Sundeep
– Sundeep, Commented Mar 30, 2019 at 7:27
also, the sed version above might be faster than the awk one for large files — Sundeep
– Sundeep, Commented Mar 30, 2019 at 7:47

Kusalananda · Accepted Answer · 2019-03-30 08:44:41Z

13

Try:

awk '(NR-1)%5<3' file

For example:

$ awk '(NR-1)%5<3' file 1st line (keep) 2nd line (keep) 3rd line (keep) 6th (keep) 7nth (keep) 8th lines (keep) 11th (keep) 12th (keep) 13th (keep)

How it works

The command (NR-1)%5<3 tells awk to print any line for which (NR-1)%5<3 is true. In awk, NR is the line number with the first line counting as 1. For every five lines in the file, that statement will be true for the first three.

edited Mar 30, 2019 at 8:44

Kusalananda♦

356k42 gold badges737 silver badges1.1k bronze badges

answered Mar 30, 2019 at 4:38

John1024

76.4k12 gold badges176 silver badges165 bronze badges

Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion

Jaguar Jom
– Jaguar Jom

2019-04-01 00:38:30 +00:00
Commented Apr 1, 2019 at 0:38
@JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?

John1024
– John1024

2019-04-01 00:47:22 +00:00
Commented Apr 1, 2019 at 0:47
yes actually i got different result actually,

Jaguar Jom
– Jaguar Jom

2019-04-01 05:00:35 +00:00
Commented Apr 1, 2019 at 5:00
To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?

John1024
– John1024

2019-04-01 05:08:30 +00:00
Commented Apr 1, 2019 at 5:08
@JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, use awk '(NR-1)%6<4' file.

John1024
– John1024

2019-04-01 05:51:23 +00:00
Commented Apr 1, 2019 at 5:51

| Show 2 more comments

Prvt_Yadav · Accepted Answer · 2019-03-30 09:42:20Z

A simple command is:

awk '{if((NR-1) % 5<=2){print $0}}' file

It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5 will give output like 0 1 2 3 4, and first 3 lines are less than equal to 2. So it will only print them.

I have file with contents:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

The output is:

1 2 3 6 7 8 11 12 13

Or as suggested in comments you can use:

awk '(NR - 1) % 5 <= 2' file

Or, with idiomatic use of awk syntax: awk '(NR - 1) % 5 <= 2' file — Kusalananda
– Kusalananda ♦, Commented Mar 30, 2019 at 8:39
awk '{if((NR-1) % 5<=2){print $0}}' file Thank you, this work very good for me but increasing 1 to line awk '{if((NR-1) % 6<=2){print $0}}' file — Jaguar Jom
– Jaguar Jom, Commented Apr 1, 2019 at 0:42

ChuckCottrill · Accepted Answer · 2019-03-30 04:37:41Z

Basically, you want something like 'Fizz-Buzz' in awk ...

awk '{ if (i++%5 < 3) print $0;}'

To show this works...

for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done | awk '{ if (i++%5 < 3) print $0;}'

When your file is named, 'mybigfile.csv',

awk '{ if (i++%5 < 3) print $0;}' < mybigfile.csv > mybigfile-123.csv

You could use NR, or just rely on i defaulting to zero :-) (code golf) — ChuckCottrill
– ChuckCottrill, Commented Mar 30, 2019 at 4:38

Kusalananda · Accepted Answer · 2019-03-30 13:07:45Z

A generic solution for masking out a particular pattern of lines from a file:

#!/bin/sh # The pattern is given on the command line. pattern=$1 # The period is simply the length of the pattern. period=${#pattern} # Use bc to convert the binary pattern to an integer. mask=$( printf 'ibase=2; %s\n' "$pattern" | bc ) awk -v mask="$mask" -v period="$period" ' BEGIN { p = lshift(1, period-1) } and(rshift(p, (FNR-1) % period), mask)'

This relies on awk implementing the non-standard functions and() (bitwise AND), rshift() and lshift() (bitwise right and left shift), which both GNU awk and some BSD implementations of awk does, but not mawk.

This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1 means "keep" and a 0 means "delete".

For example: The pattern of line that should be applied in your question is 11100, which means "for each set of five lines, keep the first three and delete the others".

Using 01001000 would delete all but the 2nd and 5th lines in every 8 lines.

The awk program could also be written without the BEGIN block as

and(lshift(1, (period-1) - (FNR-1) % period), mask)

Left-shifting 1 by (period-1) - (FNR-1) % period positions is the same as calculating 2 to that power, but I'm using lshift() since awk does its arithmetics using floating point operations rather than in exact integer arithmetics.

Since the code relies on the binary representation of the pattern, very long patterns may not work well.

Testing:

Removing the lines you want to remove:

$ sh script.sh 11100 <file 1st line (keep) 2nd line (keep) 3rd line (keep) 6th (keep) 7nth (keep) 8th lines (keep) 11th (keep) 12th (keep) 13th (keep)

Inverting the pattern:

$ sh script.sh 00011 <file 4rth lines (delete) 5th (del) 9th (del) 10th (del) 14th (del) 15th (del)

tomsmeding · Accepted Answer · 2019-03-30 13:47:09Z

This can be solved using GNU sed:

sed '4~5,5~5d' file

Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew, after which it can be used as gsed. On Linux, GNU sed is the default.

This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d' fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.

The top-voted answer suggests using awk '(NR-1)%5<3'. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.

Praveen Kumar BS · Accepted Answer · 2019-03-30 07:52:59Z

1

Tried with below command and it worked fine

for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done

output

1st line (keep) 2nd line (keep) 3rd line (keep) 6th (keep) 7nth (keep) 8th lines (keep) 11th (keep) 12th (keep) 13th (keep)

answered Mar 30, 2019 at 7:52

Praveen Kumar BS

5,3212 gold badges12 silver badges16 bronze badges

1

That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.

Law29
– Law29

2019-03-30 11:45:34 +00:00
Commented Mar 30, 2019 at 11:45

Add a comment |

Stack Exchange Network

How to delete every two lines after 3rd lines in a file contains very large number of lines? [duplicate]

6 Answers 6

How it works

Linked

Hot Network Questions

How to delete every two lines after 3rd lines in a file contains very large number of lines? [duplicate]

6 Answers 6

How it works

Linked

Related

Hot Network Questions