Reading a file incrementally [duplicate]

Question

I have requirement where I need to read part of the file which was updated from the last read. I mean if I read the file last time at 2016-07-26T01:30 and after I want to run again at 2016-07-26T02:30 where 100 records are updated in the file then I need to read only those 100 records instead of whole file.

File Format is :

[2016-07-26T16:26:31.953-04:00] [AnalyticProviderServices0] [ERROR] [] [oracle.EPMOHPS] [tid: 17] [userId: <anonymous>] [ecid: 0000LGXnLUEComOpyg4EyW1N4iIi000002,1:28342] [APP: APS#11.1.2.0] Unable to resolve 'jdbc.EPMSystemRegistry'. Resolved 'jdbc'[[ [2016-07-26T16:26:31.954-04:00] [AnalyticProviderServices0] [WARNING] [] [oracle.EPMOHPS] [tid: 17] [userId: <anonymous>] [ecid: 0000LGXnLUEComOpyg4EyW1N4iIi000002,1:28342] [APP: APS#11.1.2.0] Failure while getting the active Essbase node for cluster [SWESSPROD1]. Runtime Provider Services Error: [Unable to resolve 'jdbc.EPMSystemRegistry'. Resolved 'jdbc']

What exactly are your requirements? If you can have a long-lived process, then tail -f or similar should work, if you need to remember where you were, store the last offset and then seek there when you want to continue reading. — dhag
– dhag, Commented Jul 26, 2016 at 20:45
... and deal with inode changes, should the file change out under you ... — thrig
– thrig, Commented Jul 26, 2016 at 21:19

traal · Accepted Answer · 2017-07-25 09:29:11Z

There is a command line utility (from 2003) called Re-Tail or "retail", which does incremental log file reading each time you run the program on the log file.

This is great for cron jobs that run every hour, for instance.

Re-Tail saves state in an "offset file"; for each file you run it on, it will store the last line number, and also the text that was on that line number.

The next time you run the program, it will try to seek to the stored line number and compare the contents. If there is a match, it will output the rest of the file, starting from the following line. If there are fewer lines in the file on disk, or if the line contents don't match, the file is assumed to have been purged or rotated, in which case it will start over from the first line.

Finally, retail will update the saved line number and contents.

The software is at: http://xjack.org/retail/

When I run retail as root, I like to put the saved state in /var/lib/retail. For instance, on one machine, I run retail every hour to make a report about SSH logins, using a script containing the following command line:

/usr/local/bin/retail -p /var/lib/retail/ /var/log/secure >"$tempfile"

Good luck!

Stéphane Chazelas · Accepted Answer · 2016-07-27 13:12:27Z

You can leave the file open:

exec 3< file cat <&3 sleep 3600 echo After one hour, these records were added: cat <&3

That means it has to be the same process invoking those cats one hour appart.

If access times are enabled on the file system, and your script is the only thing reading that file, you can also read the lines whose time stamp post-date the last access time. On a GNU system:

awk -v last_access="$(find file -prune -printf %AFT%AT)" ' $0 > last_access' < file

That assumes the -04:00 in the log file corresponds to the current timezone offset.

Another approach is to record the current file position somewhere like in file.pos:

{ if [ -e file.pos ]; then pos=$(cat file.pos) else pos=0 fi tail -c +"$((pos+1))" perl -le 'print tell STDIN' > file.pos } < file

Or with ksh93

{ if [ -e file.pos ]; then pos=$(<file.pos) else pos=0 fi cat <#((pos)) exec <#((pos=CUR)) echo "$pos" > file.pos } < file

Or with zsh:

zmodload zsh/system { if [ -e file.pos ]; then pos=$(<file.pos) else pos=0 fi sysseek $pos cat echo "$((systell(0)))" > file.pos } < file

Kusalananda · Accepted Answer · 2016-07-27 12:14:02Z

#!/bin/bash logfile="$1" test -f "$logfile" || exit 1 lastline="$( basename "$logfile" )-last" if [ -f "$lastline" ]; then place=$( <"$lastline" ) else place=1 fi tmpfile="$( mktemp )" trap 'rm -f "$tmpfile"' EXIT sed -n -e "$place,\$p" -e '$=' "$logfile" | tee "$tmpfile" | tail -n 1 >"$lastline" sed '$d' "$tmpfile"

This little script will take a log file on the command line and show all lines in it added since you last used the script. It does not understand log file rotation in its current form, so you would need to manually remove the ...-last file that it creates in the current directory if the log is rotated.

What it does:

When first run, it uses sed to output all lines of the given logfile to a temporary file, followed by the line number of the last line. This number is also stored into a file in the current directory with the same name as the logfile, suffixed with -last. The temporary file, sans the last line containing the line number, is then outputted to the terminal (pipe the output of the script to less if you want). When the script exits, the temporary file is removed.

When run again, the line number is read from the ...-last file in the current directory and the contents of the log file is processed from that number and to the end in a similar way as before.

If no output has been made to the logfile in between runs of this script. The last line of the logfile will be displayed.

Running it:

$ bash script.sh /var/log/system.log [lots of output] $ ls system* system.log-last $ cat system.log-last 14758 $ bash script.sh /var/log/system.log [a few lines of output, with the first line being the same as the last of the previous run] $ cat system.log-last 14768

You can avoid the tmpfile by doing < "$logfile" awk 'NR > ENVIRON["place"];END{print NR > ENVIRON["lastline"]}' — Stéphane Chazelas
– Stéphane Chazelas, Commented Jul 27, 2016 at 13:17

Stack Exchange Network

Reading a file incrementally [duplicate]

3 Answers 3

Linked

Hot Network Questions

Reading a file incrementally [duplicate]

3 Answers 3

Linked

Related

Hot Network Questions