2

I have a file with multiple paragraphs separated by blank line. Technically they are not paragraphs just sections of text separated by blank line.

I want to number the paragraphs so to speak by inserting a number in the first line of each line following a blank line. So if my file says:

 This is text. This is more text. Even more text! This is text in section two. Some more text. You get the point... 

I want to make it say:

 1This is text this is more text Even more text! 2This is text in section two. Some more text. You get the point... 

2 Answers 2

1

Try this with bash builtin commands:

#!/bin/bash l=1 # paragraph counter echo -n $l # print paragraph counter without new line while read x; do # read current line from file, see last line if [[ $x == "" ]]; then # empty line? echo # print empty line read x # read next line from file, see last line ((l++)) # increment paragraph counter echo -n $l # print paragraph counter without new line fi echo "$x" # print current line done < file 
0
2

In general, using the shell for text-parsing is very slow and cumbersome. Here are some other options:

  1. Perl in "paragraph mode"

    perl -00pe 's/^/$./' file 

    Explanation

    The -00 turns on paragraph mode where "lines" are defined by consecutive \n\n, paragraphs in other words. The s/^/$./ will replace the start of the line (^) with the current "line" (paragraph) number $.. The -p tells perl to print each line of the input file after running the script given by -e on it.

  2. Awk

    awk -vRS='\n\n' -vORS='\n\n' '{print NR$0}' file 

    Explanation

    -vRS='\n\n' sets awk's record separator to consecutive newline characters. Like perl's paragraph mode, this makes it treat paragraphs as "lines". We then tell it to print the current line number (NR) and the current "line" $0. The -vORS= sets the output record separator to consecutive newlines so that paragraphs are separated by blank lines in the output as well. Note that this will add 2 empty lines at the end of the output. To avoid that, you can use head:

    awk -v RS='\n\n' -vORS='\n\n' '{print NR$0}' file | head -n -2 

By way of comparison, here are the times that the various solutions took on my system when run on a 10M test file:

$ time a.sh > /dev/null ## a.sh is Cyrus's solution real 0m1.419s user 0m1.308s sys 0m0.104s $ time perl -00pe 's/^/$./' file > /dev/null real 0m0.087s user 0m0.084s sys 0m0.000s $ time awk -v RS='\n\n' -vORS='\n\n' '{print NR$0}' file | head -n -2 >/dev/null real 0m0.074s user 0m0.056s sys 0m0.020s 

As you can see above, both the perl and awk solutions are an order of magnitude faster than the shell approach.

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.