How to delete all lines containing more than a certain number of letters?

Question

How can I delete all of the lines in a file which contain more than a given number of letters? E.g.

bear rabbit tree elephant

If I restrict it to words of 5 letters or less, the output would be:

bear tree

The file contains various foreign characters, each of these should count as one letter.
Punctuation symbols also can count as one letter.

Related: serverfault.com/questions/355321/…

Ciro Santilli OurBigBook.com
– Ciro Santilli OurBigBook.com

2014-03-12 21:31:47 +00:00
Commented Mar 12, 2014 at 21:31 — Ciro Santilli OurBigBook.com
– Ciro Santilli OurBigBook.com, Commented Mar 12, 2014 at 21:31

kev · Accepted Answer · 2012-04-12 06:43:55Z

20

$ awk 'length<=5' input.txt bear tree

answered Apr 12, 2012 at 6:43

kev

163k49 gold badges286 silver badges282 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

rwos · Accepted Answer · 2012-04-12 06:42:54Z

The following would do the trick:

sed -i '/^.\{5,\}$/d' FILE

What that means is this:

Delete (/ [...] /d) in-place (-i switch) all lines matching the following pattern:

line beginning (^)
followed by any character (.) repeated 5 or more times (\{5,\})
followed by line ending ($)

from the file named FILE.

paxdiablo · Accepted Answer · 2012-04-12 06:35:12Z

grep -v '......' myfile.txt

will deliver lines five characters or less.

It does this by "selecting" lines containing six characters or more, then reversing the action with -v, to only print out those that don't match.

grep -Ev '.{6,}' is more general and golfs identically in this case. It is however faster to type 6 dots than .{6,}.

pizza · Accepted Answer · 2012-04-12 08:26:34Z

"The file contains various foreign characters, each of these should count as one letter." Assuming your input data is in UTF8, this bash filter script should do it.

#!/bin/bash function px { local a="$@" local i=0 while [ $i -lt ${#a} ] do printf \\x${a:$i:2} i=$(($i+2)) done } (iconv -f UTF8 -t UTF16 | od -x | cut -b 9- | xargs -n 1) | if read utf16header then px $utf16header cnt=0 out='' while read line do cnt=$(($cnt+1)) if [ "$line" == "000a" ] then if [[ $cnt -le 5+1 ]] ; then out=$out$line px $out fi cnt=0 out='' else out=$out$line fi done fi | iconv -f UTF16 -t UTF8

Collectives™ on Stack Overflow

How to delete all lines containing more than a certain number of letters?

4 Answers 4

Comments

Comments

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

Comments

Related