I am using raspberry pi 2 (B+) in a biology lab to do real time tracking of small insects. Basically, we have an python/opencv program that detect positions of animals and saves it in a MySQL db. We typically run experiments for 2 weeks on sometimes as many as 30 pis simultaneously. Often, some of our pis will freeze: the green ACT LED does not blink any more and they do not respond to ping/ssh. The power LED remains on, and the only way I find to reboot the devices is to power them off by hand. This is very hard to reproduce in so far as a device can run fine for several days and then crash (or not). journalctl does not provide any clues after it happens.
We keep a log of crashing devices, and it appears that some have never crashed while others keep crashing after a few days.
For several reasons, we think it is related to faulty SD cards:
- We had a much higher propensity to crash with alternative cards (Verbatim microSDHC, 32GB class 10).
- Swapping cards between devices indicates that the issue is related to the card -- as opposed to the not the power supply or other hardware issue.
- It does not seem that we run out of RAM either
I have tried to:
- Reburn card from img file
- Update firmware
- Use Pi3
Because of the random nature of the bug, testing every possible solution is a matter of time and statistics, so I am not quite sure where to start.
Technical details: $ uname -a Linux e043 4.4.13-1-ARCH #1 SMP Wed Jun 8 19:31:47 MDT 2016 armv7l GNU/Linux
The SD cards we use are '32G Samsung EVO SD card': $ grep . /sys/class/mmc_host/mmc0/mmc0:*/* 2>/dev/null /sys/class/mmc_host/mmc0/mmc0:0001/cid:1b534d303030303010c337142500f147 /sys/class/mmc_host/mmc0/mmc0:0001/csd:400e00325b590000ee7f7f800a404055 /sys/class/mmc_host/mmc0/mmc0:0001/date:01/2015 /sys/class/mmc_host/mmc0/mmc0:0001/erase_size:512 /sys/class/mmc_host/mmc0/mmc0:0001/fwrev:0x0 /sys/class/mmc_host/mmc0/mmc0:0001/hwrev:0x1 /sys/class/mmc_host/mmc0/mmc0:0001/manfid:0x00001b /sys/class/mmc_host/mmc0/mmc0:0001/name:00000 /sys/class/mmc_host/mmc0/mmc0:0001/oemid:0x534d /sys/class/mmc_host/mmc0/mmc0:0001/preferred_erase_size:4194304 /sys/class/mmc_host/mmc0/mmc0:0001/scr:02b5800200000000 /sys/class/mmc_host/mmc0/mmc0:0001/serial:0xc3371425 /sys/class/mmc_host/mmc0/mmc0:0001/type:SD /sys/class/mmc_host/mmc0/mmc0:0001/uevent:DRIVER=mmcblk /sys/class/mmc_host/mmc0/mmc0:0001/uevent:MMC_TYPE=SD /sys/class/mmc_host/mmc0/mmc0:0001/uevent:MMC_NAME=00000 /sys/class/mmc_host/mmc0/mmc0:0001/uevent:MODALIAS=mmc:block
Boot config file (Pi3) $ /opt/vc/bin/vcgencmd get_config int arm_freq=1200 audio_pwm_mode=1 config_hdmi_boost=5 core_freq=400 desired_osc_freq=0x36ee80 disable_camera_led=1 disable_commandline_tags=2 disable_l2cache=1 force_eeprom_read=1 force_pwm_open=1 framebuffer_ignore_alpha=1 framebuffer_swap=1 gpu_freq=300 hdmi_force_cec_address=65535 init_uart_clock=0x2dc6c00 lcd_framerate=60 over_voltage_avs=0x13d62 overscan_bottom=48 overscan_left=48 overscan_right=48 overscan_top=48 pause_burst_frames=1 program_serial_random=1 sdram_freq=450 temp_limit=85
/var/log/syslogfor clues after rebooting? Have you tried plugging in a screen and keyboard before rebooting to see if it is possible to do some diagnostics that way?/var/log/syslogin my system. My understanding is that it is centralised to systemd. I meant I looked at the system log, using journalctl after crash, to investigate what happened before.