1

I want to split a file into multiple files. My input is

Report : 1 ABC DEF GHI JKL End of Report $ Report : 2 ABC DEF GHI JKL $ Report : 2 ABC DEF GHI JKL End of Report $ Report : 3 ABC DEF GHI JKL End of Report $ 

The output should be:

File 1

Report : 1 ABC DEF GHI JKL End of Report $ 

File 2

Report : 2 ABC DEF GHI JKL $ Report : 2 ABC DEF GHI JKL End of Report $ 

File 3

Report : 3 ABC DEF GHI JKL End of Report $ 

I have tried

awk '{print $0 "Report :"> "/tmp/File" NR}' RS="END OF" test.txt 

but I'm not getting appropriate output.

Any guidance would be appreciated.

1
  • I'd add the technology that you want to use on the question title: split file based on content using bash Commented Jan 5, 2015 at 10:01

3 Answers 3

6

You can try something like

$awk '/^Report/{filename++} {print > "FILE"filename}' input 

Test

$awk '/^Report/{filename++} {print > "FILE"filename}' input $ cat FILE1 Report : 1 ABC DEF GHI JKL End of Report $ $ cat FILE2 Report : 2 ABC DEF GHI JKL $ Report : 2 ABC DEF GHI JKL End of Report $ $ cat FILE3 Report : 3 ABC DEF GHI JKL End of Report $ 

What it does

  • /^Report/ pattern is true for lines that start with Report the number in the third colum in the same line is the filename that must be used as the filename for the next couple of lines

  • {filename++} increments the filename value by one

  • {print > "FILE"filename} prints each line into the files.

    Eg if filename is 1 then this line is same as

    print > FILE1 

    This is ouput redirection, which is same as the one used in bash etc.

    Note that there is no attribute for print if the attribute is missed, then awk prints the entire record. That is it is same as writing print $0 > "FILE"filename

Sign up to request clarification or add additional context in comments.

10 Comments

Can you explain me the command. As i am running the same but it's not working for me.
i don't want filename = $3. it can be anything. I have tried for this but it copy every single line as file. awk '/^Report/{print > "tmp/File" NR}' test.txt
@Viru It does because NR get incremented for each line. so each line will go into each file
@Viru or if you want the filenames to be incrementing for each record you can try awk '/^Report/{filename++} {print > "FILE"filename}' input
Can you please give me the solution? i want filename as incremental
|
5

Try this,

csplit input.txt '/End of Report$/' '{*}' 

Explanation

  • csplit is a UNIX utility that is used to split a file into two or more smaller files determined by context lines.

  • input.txt This is the file which will be get splitted.

  • '/End of Report$/' specific pattern like "End of Report" .

  • '{*}' option which indicates the whole file.

1 Comment

This seems to be close but not exactly right. It puts the End of Report lines into the wrong files. Matching on the Report lines instead helps somewhat (but gets you a blank first split file) and doesn't keep records with the same number together.
1

Here's another awk answer:

awk '/^Report/{n=$3} {print > "File"n}' input 

This is similar to nu11p01n73R's answer but uses the third field of each Report line to determine the file number.

  • When /^Report/ matches the line, the set n to $3.
  • Use n when creating the file name to print each line to

If you have a large number of these blocks, you might need to end up closing files and could use this command instead:

awk '/^Report/{f="File"$3; if(lf != f) {close(lf); lf=f}} {print > f}' input 
  • When /^Report/ matches the line, create a filename f.
  • If lf (last filename) doesn't match f, first try to close lf then reset lf. Calling close() when lf hasn't been set is safe
  • print every line to f

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.