We are working on custom embedded Linux board with power pc based SOC and display panel.
We are debugging an issue where during video playback Linux kernel just freezes, it doesn't respond to even sysrq-trigger,
if we enable ftrace issue disappears.
We also tried different kernel hacking options like debug soft irq, detect hung task etc. Haven't tried kdb or kgdb as yet but not sure if that would help in this case.
Lauterbach setup is not in working condition so can't use it :(
Chip vendor provided kernel and platform code however they are out of business so no support from them :(
Approach 1
- When it didn't respond to
sysrq-triggerI had a doubt that, it is getting stuck in interrupt handler, - so in working ( before the freeze ), I monitored
/proc/interruptsand figured out interrupts involved in the use case. Then I added flags innoinitsection and updated it on entry and exit of each irq handler. And I printed them beforerequest_irq - After reproducing the issue, after h/w watchdog rebooted the system, when I looked at the
dmesgthose flags had values indicating thatirq_handlersexited. - One irq I haven't looked at is
timer. I don't have a doubt on that.But if I don't find anything I am going to add flag toggle there as well.(However not hopeful there)
Question set 1
A) What could be a cause of freeze apart from kernel getting stuck in irq handler ?
B) Any suggestion to refine this approach ?
C) Anything other debugging technique?
Approach 2
- We tested the past different firmware versions and found that two non-related(kernel module which is not related to video playback and the other is small change in kernel code) changes in kernel are causing this issue, if we remove those changes kernel freeze goes away.
- However there is more to story if keep any of the two kernel changes crash comes back.
- We would be looking into these changes critically to decide how these unrelated changes are affecting it.
Question set 2
A) From above points in approach 2 anything in particular anyone suspects here?(I feel I don't know what question to ask so asking this).
B) Any suggestion to refine this approach more ?
freeto see if you have swap space, readman mkswap swapon fstab fallocateto create some. Swap space must be contiguous. usemkswaporfallocate, notdd. Traditionally, swap space of 1.5 × RAM has been recommended, but YMMV. If you don't plan to hibernate your system, you can have less than 1.0 × RAM.