How to delete line if longer than XY?

Question

How can i delete a line if it is longer than e.g.: 2048 chars?

Do you insist on using sed? This is easy, for example in python. And no doubt even easier in perl. Though the question is not terribly well defined. Copy a file, removing all lines longer than 2048, or something else? — Faheem Mitha
– Faheem Mitha, Commented Mar 23, 2011 at 18:21

Wildcard · Accepted Answer · 2016-11-01 00:43:27Z

40

sed '/^.\{2048\}./d' input.txt > output.txt

edited Nov 1, 2016 at 0:43

Wildcard

37.5k30 gold badges149 silver badges284 bronze badges

answered Mar 23, 2011 at 18:26

forcefsck

8,02236 silver badges31 bronze badges

5

I get the error message sed: 1: "/^.\{2048\}..*/d": RE error: invalid repetition count(s) (Mac OS X)

wedi
– wedi

2014-10-13 15:47:21 +00:00
Commented Oct 13, 2014 at 15:47
1

@wedi you probably want to install the GNU version instead of the BSD version that ships with Mac. This is easy with brew

Freedom_Ben
– Freedom_Ben

2016-07-06 00:00:52 +00:00
Commented Jul 6, 2016 at 0:00
The question says "if longer than XY (e.g., 2048 chars)". Then it must be > 2048 and not => 2048

acgbox
– acgbox

2019-08-28 13:53:04 +00:00
Commented Aug 28, 2019 at 13:53
1

@ajcg, It is > 2048. Notice that there's an extra period in the end of the regex to match the 2049th character.

forcefsck
– forcefsck

2019-08-30 14:02:57 +00:00
Commented Aug 30, 2019 at 14:02
@forcefsck and it wouldn't be better if you take it away "^" ? (with your command you are only removing lines that "start with XYZ", but if XYZ is in another part of the line then it does not delete it)

acgbox
– acgbox

2019-08-30 16:13:31 +00:00
Commented Aug 30, 2019 at 16:13

| Show 1 more comment

Kusalananda · Accepted Answer · 2021-02-21 07:23:47Z

Here's a solution which deletes lines that has 2049 or more characters:

sed '/.\{2049\}/d' <file.in >file.out

The regular expression .\{2049\} would match any line that contains a substring of 2049 characters (another way of saying "at least 2049 characters"). The d command deletes them from the input, producing only shorter line on the output.

BSD sed (on e.g. macOS) can only handle repetition counts of up to 256 in the \{...\} operator (the value of RE_DUP_MAX; see getconf RE_DUP_MAX in the shell). On these systems, you may instead use awk:

awk 'length <= 2048' <file.in >file.out

Mimicking the sed solution literally with awk:

awk 'length >= 2049 { next } { print }' <file.in >file.out

Note that any awk implementation is only guaranteed to be able to handle records of lengths up to LINE_MAX bytes (see getconf LINE_MAX in the shell), but may support longer ones. On macOS, LINE_MAX is 2048.

MaratC · Accepted Answer · 2014-01-29 17:14:12Z

5

perl -lne "length < 2048 && print" infile > outfile

answered Jan 29, 2014 at 17:14

MaratC

1511 silver badge2 bronze badges

1

Does not work for me. Perl v5.16.2. Warning: Use of "length" without parentheses is ambiguous at -e line 1. Unterminated <> operator at -e line 1.

wedi
– wedi

2014-10-13 15:51:37 +00:00
Commented Oct 13, 2014 at 15:51
You may try length($_) > 2048 && print. length is a shortcut for length($_) anyway.

MaratC
– MaratC

2014-11-17 12:10:35 +00:00
Commented Nov 17, 2014 at 12:10
Had to use ' instead of "

Larsen
– Larsen

2021-09-30 13:31:40 +00:00
Commented Sep 30, 2021 at 13:31

Add a comment |

Faheem Mitha · Accepted Answer · 2011-03-23 18:33:40Z

3

Something like this should work in Python.

of = open("orig") nf = open("new",'w') for line in of: if len(line) < 2048: nf.write(line) of.close() nf.close()

answered Mar 23, 2011 at 18:33

Faheem Mitha

36.1k33 gold badges130 silver badges190 bronze badges

1

Personally, @Faheem, I prefer your answer. The reason why is that it was very easy for me to turn it around into 'delete all lines smaller than x'. I don't use Python all the time, but when I do I always feel I should learn it well.

ixtmixilix
– ixtmixilix

2011-05-22 18:18:19 +00:00
Commented May 22, 2011 at 18:18
1

@ixtmixilix: Yes, using a full featured language like Python is pretty flexible. Thanks for the comment.

Faheem Mitha
– Faheem Mitha

2011-05-24 16:46:05 +00:00
Commented May 24, 2011 at 16:46
If you love Python but also like using the CLI and not having to write and run a seperate script for this task, check out pz! : github.com/CZ-NIC/pz It brings Python to shell pipes. For this question the solution would be cat input | pz 's if len(s) < 2048 else ""' > output

Chris
– Chris

2022-02-17 21:40:54 +00:00
Commented Feb 17, 2022 at 21:40

Add a comment |

DomainsFeatured · Accepted Answer · 2016-09-15 21:28:59Z

1

The above answers do not work for me on Mac OS X 10.9.5.

The following code does work:

sed '/.\{2048\}/d'.

Although not asked, but provided for reference, the reverse can be achieved the following code:

sed '/.\{2048\}/!d'.

edited Sep 15, 2016 at 21:28

DomainsFeatured

1591 silver badge9 bronze badges

answered Oct 13, 2014 at 16:02

wedi

5731 gold badge4 silver badges9 bronze badges

lol, but sed: 1: "/.\{2048\}/d": RE error: invalid repetition count(s) (Mac OS X, 10.10.4)

alex gray
– alex gray

2015-07-24 13:29:02 +00:00
Commented Jul 24, 2015 at 13:29
Ah. I installed the GNU version instead of the BSD version that ships with Mac as @Freedom_Ben suggested above. But Kusalananda found the switch to enable extended regex. So you should go with his solution if you still have that problem. ;)

wedi
– wedi

2018-11-30 19:40:18 +00:00
Commented Nov 30, 2018 at 19:40

Add a comment |

user unknown · Accepted Answer · 2018-11-30 00:17:49Z

With gnu-sed, you may use the -r flag, to avoid typing the backslashes, and a comma, to define an open interval:

sed -r "/.{2049,}/d" input.txt > output.txt

with:

x{2049} meaning exactly 2049 xs
x{2049,3072} meaning from 2049 to 3072 xs
x{2049,} meaning at least 2049 xs
x{,2049} meaning at most 2049 xs

For the intervals, to not match bigger patterns, you would need line anchors like

sed -r "/^.{32,64}$/d" input.txt > output.txt

Chris · Accepted Answer · 2022-02-17 21:48:07Z

The sed solutions are all very slow when the line lengths become very long. This is the disadvantage of matching line length with regexes. (But of course the advantage is that sed is everywhere)

If you like the speed of the Perl solution, but prefer using Python, the pz CLI tool makes this really easy. It brings Python to shell pipes.

With pz the solution would be:

cat input | pz 's if len(s) < 2048 else ""' > output

DanieleGrassini · Accepted Answer · 2022-02-19 01:03:06Z

Split the row at each char by setting FS to nothing :

awk 'BEGIN{FS=""} NF <= 2048' file

test with :

perl -e 'print "z"x2048' | awk 'BEGIN{FS=""} NF <= 2048' # This print perl -e 'print "z"x2049' | awk 'BEGIN{FS=""} NF <= 2048' # This not

Jordan Brough · Accepted Answer · 2022-05-04 10:36:25Z

With Ruby:

ruby -ne 'print if $_.size <= 2048' input.txt > output.txt

Or to edit in place and create a backup:

ruby -i.bak -ne 'print if $_.size <= 2048' file.txt

Without a backup:

ruby -i -ne 'print if $_.size <= 2048' file.txt

Note: $_.size includes the trailing newline, if any. You can use $_.chomp.size to ignore trailing newlines.

You could also check line size via a regex, like some of the other examples, but it will be slower:

# slow ruby -ne 'print if /.{2048}./' input.txt > output.txt

Stack Exchange Network

How to delete line if longer than XY?

9 Answers 9

You must log in to answer this question.

Linked

Hot Network Questions

How to delete line if longer than XY?

9 Answers 9

You must log in to answer this question.

Linked

Related

Hot Network Questions