how to insert text at the beginning of each paragraph in bash

Question

I have a file with multiple paragraphs separated by blank line. Technically they are not paragraphs just sections of text separated by blank line.

I want to number the paragraphs so to speak by inserting a number in the first line of each line following a blank line. So if my file says:

 This is text. This is more text. Even more text! This is text in section two. Some more text. You get the point...

I want to make it say:

 1This is text this is more text Even more text! 2This is text in section two. Some more text. You get the point...

Cyrus · Accepted Answer · 2015-05-18 15:43:38Z

Try this with bash builtin commands:

#!/bin/bash l=1 # paragraph counter echo -n $l # print paragraph counter without new line while read x; do # read current line from file, see last line if [[ $x == "" ]]; then # empty line? echo # print empty line read x # read next line from file, see last line ((l++)) # increment paragraph counter echo -n $l # print paragraph counter without new line fi echo "$x" # print current line done < file

terdon · Accepted Answer · 2015-05-18 14:19:45Z

In general, using the shell for text-parsing is very slow and cumbersome. Here are some other options:

Perl in "paragraph mode"
```
perl -00pe 's/^/$./' file 
```
Explanation

The -00 turns on paragraph mode where "lines" are defined by consecutive \n\n, paragraphs in other words. The s/^/$./ will replace the start of the line (^) with the current "line" (paragraph) number $.. The -p tells perl to print each line of the input file after running the script given by -e on it.
Awk
```
awk -vRS='\n\n' -vORS='\n\n' '{print NR$0}' file 
```
Explanation

-vRS='\n\n' sets awk's record separator to consecutive newline characters. Like perl's paragraph mode, this makes it treat paragraphs as "lines". We then tell it to print the current line number (NR) and the current "line" $0. The -vORS= sets the output record separator to consecutive newlines so that paragraphs are separated by blank lines in the output as well. Note that this will add 2 empty lines at the end of the output. To avoid that, you can use head:
```
awk -v RS='\n\n' -vORS='\n\n' '{print NR$0}' file | head -n -2 
```

By way of comparison, here are the times that the various solutions took on my system when run on a 10M test file:

$ time a.sh > /dev/null ## a.sh is Cyrus's solution real 0m1.419s user 0m1.308s sys 0m0.104s $ time perl -00pe 's/^/$./' file > /dev/null real 0m0.087s user 0m0.084s sys 0m0.000s $ time awk -v RS='\n\n' -vORS='\n\n' '{print NR$0}' file | head -n -2 >/dev/null real 0m0.074s user 0m0.056s sys 0m0.020s

As you can see above, both the perl and awk solutions are an order of magnitude faster than the shell approach.

Stack Exchange Network

how to insert text at the beginning of each paragraph in bash

2 Answers 2

Explanation

Explanation

You must log in to answer this question.

Linked

Hot Network Questions

how to insert text at the beginning of each paragraph in bash

2 Answers 2

Explanation

Explanation

You must log in to answer this question.

Linked

Related

Hot Network Questions