4

How can I delete all of the lines in a file which contain more than a given number of letters? E.g.

bear rabbit tree elephant 

If I restrict it to words of 5 letters or less, the output would be:

bear tree 
  • The file contains various foreign characters, each of these should count as one letter.
  • Punctuation symbols also can count as one letter.
1

4 Answers 4

20
$ awk 'length<=5' input.txt bear tree 
Sign up to request clarification or add additional context in comments.

Comments

9

The following would do the trick:

sed -i '/^.\{5,\}$/d' FILE 

What that means is this:

Delete (/ [...] /d) in-place (-i switch) all lines matching the following pattern:

  • line beginning (^)
  • followed by any character (.) repeated 5 or more times (\{5,\})
  • followed by line ending ($)

from the file named FILE.

Comments

4
grep -v '......' myfile.txt 

will deliver lines five characters or less.

It does this by "selecting" lines containing six characters or more, then reversing the action with -v, to only print out those that don't match.

1 Comment

grep -Ev '.{6,}' is more general and golfs identically in this case. It is however faster to type 6 dots than .{6,}.
1

"The file contains various foreign characters, each of these should count as one letter." Assuming your input data is in UTF8, this bash filter script should do it.

#!/bin/bash function px { local a="$@" local i=0 while [ $i -lt ${#a} ] do printf \\x${a:$i:2} i=$(($i+2)) done } (iconv -f UTF8 -t UTF16 | od -x | cut -b 9- | xargs -n 1) | if read utf16header then px $utf16header cnt=0 out='' while read line do cnt=$(($cnt+1)) if [ "$line" == "000a" ] then if [[ $cnt -le 5+1 ]] ; then out=$out$line px $out fi cnt=0 out='' else out=$out$line fi done fi | iconv -f UTF16 -t UTF8 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.