I have a huge log file, which needs to be streamed over HTTP. For performance reason, I want to capture/collect every n lines and send it over. So, basically, what I want is n lines buffered output from a file. Is there a tail/head or any other Linux command to achieve this?
1 Answer
I would suggest a combination of awk for the splitting and a separate inotifywait watching your "outgoing data" directory. E.g. create a directory called "outgoing" and whenever a new file appears, we'll send it out.
Script 1: Splitting via awk every 10th line and write to new file "bufferX" with increasing number X - adapt as required.
$cat split.awk NR%10==1 {buffer="buffer"++i} { print > buffer if (NR%%10==0) {system("mv "buffer" outgoing/")} } Script 2: watch the outgoing directory and send data whenever a new log batch appears. I just assumed you use curl for sending - adapt accordingly.
$cat watch_dir.sh #!/bin/bash inotifywait -m -o watch.logs -e moved_to --format '%w%f' outgoing/ |\ while read bufferfile do curl -T ${bufferfile} http://taget.url && rm ${bufferfile} done Here inotifywait watches the directory "outgoing" for the -event of a file moved_to it, runs indefenitly with -monitor mode, l-ogs to "watch.logs" and prints the detected file in the --format with path & file name. This last part we read for the curl command and delete the file after uploading.
Create the outgoing directory, then run:
bash watchdir.sh & <your_command_creating_output> | awk -f split.awk
man split, assuming you have sufficient space to store the same data volume twice. At least you could then send the parts at intervals. A script looping tail/head would not use extra space, but would need either to send all the data consecutively, or with a fixed sleep between sections, or to set up a series of jobs inat.gzip, [split,] transfer, [concatanate,] decompress.