3

Troubleshooting /var/log files for a recent series of crashes, what should I look for in the files if I believe low memory or disk space are to blame? Is there a general term used in the Linux error-throwing lingo for hardware faults of this kind? And, what system processes would be effected, such as the kernel, by a critical shortage of memory?


Just as background, I was working on a Drupal site hosted on my Fedora 17 sandbox project laptop when I experienced these system crashes. Recently I've downloaded some rather large files (I've since moved to media) and was down to about 1.8G of HD space.

I found some useful posts here about monitoring memory usage with top or current disk usage with du. This question, however, is specifically about log files. I found a similar post at Fedora Forums searching for an explanation of FPrintObject which lead me to do Memtest, but nothing is reported bad there.

1 Answer 1

5

The information you are looking for is not found in usual syslog logs. For viewing performance history from the command line, sysstat is an excellent tool.

With sysstat, the sadc collects system information and writes them to a log file. The log file is a binary format, but can be viewed with the sar command.

Here is an example of sar output with no options:

$ sar 09:15:01 AM CPU %user %nice %system %iowait %steal %idle 10:05:01 AM all 77.49 0.37 22.13 0.00 0.00 0.00 10:15:01 AM all 77.30 0.40 22.29 0.00 0.00 0.00 10:25:01 AM all 77.19 0.38 22.42 0.00 0.00 0.00 10:35:01 AM all 39.31 0.35 23.80 0.01 0.00 36.53 10:45:01 AM all 32.22 0.34 24.26 0.03 0.00 43.15 10:55:01 AM all 32.80 0.33 23.78 0.01 0.00 43.08 11:05:01 AM all 32.70 0.33 23.76 0.00 0.00 43.20 Average: all 63.90 0.39 22.79 0.00 0.00 12.91 

The information you see is the same information provided by top, but is historical data. You can also see detailed information about RAM, network, and disk utilization. Here is an example for RAM usage:

$ sar -r 09:15:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit 02:15:01 PM 457076 1357116 74.81 277876 810948 205520 5.40 02:25:01 PM 456836 1357356 74.82 277876 811168 205384 5.40 02:35:01 PM 456976 1357216 74.81 277876 811256 204728 5.38 02:45:01 PM 457036 1357156 74.81 277876 811368 204840 5.38 02:55:01 PM 456588 1357604 74.83 277896 811492 204924 5.38 Average: 332452 1481740 81.67 277720 793953 416953 10.96 

Outside of running sar locally, there are many monitoring systems that show performance trending data. Munin, cacti, and zabbix are some examples. These have the benefit of graphing and keeping the data for multiple servers in a centralized location.

Update to answer from comments:

The sar command will tell you if you ran out of RAM prior to the crash. This will be obvious as kbbuffers and kbcached will drop dramatically. You can also check dmesg for OOM (out of memory) killer, but dmesg is only written to logs if klogd is installed. You won't see any logs about out of disk space, unless an application specifically reports its failure to write to disk. However, if the disk is full, syslog won't be able to write the log to disk either.

6
  • while certainly useful tool, I'm not sure if you've answered the questions about classes or specific terms in a log file related to faults of this kind. This is for a real, and currently continuing problem and even with this info, I'm not clear what I'm looking for. Commented Aug 25, 2012 at 11:46
  • 1
    @xtian - sar will tell you if you ran out of RAM prior to the crash. This will be obvious as kbbuffers and kbcached will drop dramatically. You can also check dmesg for OOM (out of memory) killer, but dmesg is only written to logs if klogd is installed. You won't see any logs about out of disk space, unless an application specifically reports its failure to write to disk. However, if the disk is full, syslog won't be able to write the log to disk either. Commented Aug 25, 2012 at 21:06
  • +1 for kbbuffers, kbcached, OOM, klogd, syslog (^_^) The system was not completely out of memory, there was something like 700MB left over. abrtd wrote a directory/core dump each time I tried to relaunch Firefox resulting in 1.1G of core dump files. Therefore, the memory loss was not all at once, but incrementally as I stubbornly continued to relaunch Firefox (>_<). If you add the comment to the answer I will certainly accept it. Commented Aug 28, 2012 at 15:55
  • @xtian - as another note, you can configure /etc/security/limits.conf to disallow core dumps from being written. Commented Aug 28, 2012 at 15:57
  • @xtian - Comment added to the answer. Commented Aug 28, 2012 at 15:58

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.