SD card related random freezes

Question

I am using raspberry pi 2 (B+) in a biology lab to do real time tracking of small insects. Basically, we have an python/opencv program that detect positions of animals and saves it in a MySQL db. We typically run experiments for 2 weeks on sometimes as many as 30 pis simultaneously. Often, some of our pis will freeze: the green ACT LED does not blink any more and they do not respond to ping/ssh. The power LED remains on, and the only way I find to reboot the devices is to power them off by hand. This is very hard to reproduce in so far as a device can run fine for several days and then crash (or not). journalctl does not provide any clues after it happens.

We keep a log of crashing devices, and it appears that some have never crashed while others keep crashing after a few days.

For several reasons, we think it is related to faulty SD cards:

We had a much higher propensity to crash with alternative cards (Verbatim microSDHC, 32GB class 10).
Swapping cards between devices indicates that the issue is related to the card -- as opposed to the not the power supply or other hardware issue.
It does not seem that we run out of RAM either

I have tried to:

Reburn card from img file
Update firmware
Use Pi3

Because of the random nature of the bug, testing every possible solution is a matter of time and statistics, so I am not quite sure where to start.

Technical details: $ uname -a Linux e043 4.4.13-1-ARCH #1 SMP Wed Jun 8 19:31:47 MDT 2016 armv7l GNU/Linux

The SD cards we use are '32G Samsung EVO SD card': $ grep . /sys/class/mmc_host/mmc0/mmc0:*/* 2>/dev/null /sys/class/mmc_host/mmc0/mmc0:0001/cid:1b534d303030303010c337142500f147 /sys/class/mmc_host/mmc0/mmc0:0001/csd:400e00325b590000ee7f7f800a404055 /sys/class/mmc_host/mmc0/mmc0:0001/date:01/2015 /sys/class/mmc_host/mmc0/mmc0:0001/erase_size:512 /sys/class/mmc_host/mmc0/mmc0:0001/fwrev:0x0 /sys/class/mmc_host/mmc0/mmc0:0001/hwrev:0x1 /sys/class/mmc_host/mmc0/mmc0:0001/manfid:0x00001b /sys/class/mmc_host/mmc0/mmc0:0001/name:00000 /sys/class/mmc_host/mmc0/mmc0:0001/oemid:0x534d /sys/class/mmc_host/mmc0/mmc0:0001/preferred_erase_size:4194304 /sys/class/mmc_host/mmc0/mmc0:0001/scr:02b5800200000000 /sys/class/mmc_host/mmc0/mmc0:0001/serial:0xc3371425 /sys/class/mmc_host/mmc0/mmc0:0001/type:SD /sys/class/mmc_host/mmc0/mmc0:0001/uevent:DRIVER=mmcblk /sys/class/mmc_host/mmc0/mmc0:0001/uevent:MMC_TYPE=SD /sys/class/mmc_host/mmc0/mmc0:0001/uevent:MMC_NAME=00000 /sys/class/mmc_host/mmc0/mmc0:0001/uevent:MODALIAS=mmc:block

Boot config file (Pi3) $ /opt/vc/bin/vcgencmd get_config int arm_freq=1200 audio_pwm_mode=1 config_hdmi_boost=5 core_freq=400 desired_osc_freq=0x36ee80 disable_camera_led=1 disable_commandline_tags=2 disable_l2cache=1 force_eeprom_read=1 force_pwm_open=1 framebuffer_ignore_alpha=1 framebuffer_swap=1 gpu_freq=300 hdmi_force_cec_address=65535 init_uart_clock=0x2dc6c00 lcd_framerate=60 over_voltage_avs=0x13d62 overscan_bottom=48 overscan_left=48 overscan_right=48 overscan_top=48 pause_burst_frames=1 program_serial_random=1 sdram_freq=450 temp_limit=85

"journalctl does not show anything at all before it happens" is a little like saying "My car was fine before it drove off the cliff". That's a little snarky and of course for some problems -- particularly I/O failure, which is what you've said seems a likely candidate -- it might. However: Have you tried examining /var/log/syslog for clues after rebooting? Have you tried plugging in a screen and keyboard before rebooting to see if it is possible to do some diagnostics that way? — goldilocks
– goldilocks, Commented Jun 15, 2016 at 11:49
SD Card manufacturers do not support using them for an OS so this is not entirely unexpected. You could consider retiring cards which do not perform and/or try different brands. NOTE that there is no advantage in using Class10 cards on a Pi - even though they are better at HD video (they are optimised for high speed sequential writing). More practically you should try implementing the watchdog timer (included on the SOC) to detect failure and perform a graceful restart - this is the normal engineering approach because ANY system can fail for inexplicable reasons. — Milliways
– Milliways, Commented Jun 15, 2016 at 12:34
@goldilocks There is no /var/log/syslog in my system. My understanding is that it is centralised to systemd. I meant I looked at the system log, using journalctl after crash, to investigate what happened before. — Quentin Geissmann
– Quentin Geissmann, Commented Jun 15, 2016 at 12:39
@Milliways. Thanks, I think this is a good idea, and I am working on persistence of my software after reboot, I will investigate how to make a watchdog. — Quentin Geissmann
– Quentin Geissmann, Commented Jun 15, 2016 at 12:42
I'll strongly disagree with Milliways WRT "there is no advantage in using Class 10 cards" in that while the top speed of the Pi's SD card interface is slower than the top speed of a class 10 card, pretty much all of the cards reported as achieving the maximum write speed (20-25 MB/s) on the Pi are class 10 or better. I'll concur with him about the usefulness of a watchdog timer here though. — goldilocks
– goldilocks, Commented Jun 15, 2016 at 12:49

Woodoo · Accepted Answer · 2020-06-17 18:04:17Z

Faced this problem in Jun 2020. Just turn swap file off:

sudo dphys-swapfile swapoff sudo dphys-swapfile uninstall sudo update-rc.d dphys-swapfile remove

In /etc/dphys-swapfile set CONF_SWAPSIZE=0 (was 100 fro me)

This fixed random freezing in my case. Check result by free before, after and after reboot. Swap should be 0.

Rebroad · Accepted Answer · 2016-08-10 10:46:22Z

I have experienced problems with corrupting SD cards on the Pi 3, so I suspect this is what you are experiencing also. My solution was to use a different SD card. Another solution is to copy the ext4 filesystem to a USB memory stick and change the config on the boot partition of the SD card so that it points to the new location of the root filesystem.

tlhIngan · Accepted Answer · 2017-07-05 05:50:37Z

SD cards are flimsy little things. I had a similar issue with a fleet of BeagleBone Black. Since you've tracked down the issue to the SD card itself, and reflashing the card didn't solve your issue, replace the SD card. They are cheap.

user2497 · Accepted Answer · 2017-08-11 03:07:03Z

I always try to stick with class 4 (or lower) microSD cards, and have never had this problem. Do you supply enough current? Do you have some USB gadget that pulls >100mA? Insufficient power is a nasty microSD cause-of-death. My Pi2B+ gets 2A.

I am not putting much faith in the 'use highest quality samsung/knownbrand SDHC' lore. All my cards are noname and cheap. I leave a bit (15-20%) unallocated, since I expect they will degrade faster. Just rename the resize script to autoresizeforfuturereference.sh and move it to /root before booting a freshly imaged microSD card to prevent the autoresize.

If your application requires heavy disk I/O, use a USB harddisk with a USB Y-cable, and power it from a sufficient source.

user91822 · Accepted Answer · 2018-09-30 13:08:12Z

I don't know if the following will do any good, but it won't hurt to give them a try.

I usually use this F3 (an alternative to h2testw utility) to perform a robust R/W tests on my microSD cards and/or USB memory sticks before I will use them.
As a double check, I also do some full checksums on a microSD card once I performed a dd to write an image OS on it.

Stack Exchange Network

SD card related random freezes

5 Answers 5

Hot Network Questions

SD card related random freezes

5 Answers 5

Related

Hot Network Questions