[minicoredumper] stack detection fails with kernel >= v5.18
John Ogness
john.ogness at linutronix.de
Wed Sep 27 10:08:04 CEST 2023
Hi Holger,
On 2023-08-22, Holger Brunck <holger.brunck at hitachienergy.com> wrote:
> I currently try to integrate the minicoredumper into our embedded
> SW. I am using the latest version 2.06. With kernel 5.4.x it works
> pretty well. But with boards using kernel 6.1.x I saw problems. The
> stacks were missing in the generated core file from the
> minicoredumper.
>
> Our application is multithreaded and minicoredumper reports when
> processing the core file:
>
> Aug 15 08:18:45 unit user.err minicoredumper: unable to find thread #14's (386) stack
> Aug 15 08:18:45 unit user.err minicoredumper: unable to find thread #15's (387) stack
> Aug 15 08:18:45 unit user.err minicoredumper: unable to find thread #16's (388) stack
> Aug 15 08:18:45 unit user.err minicoredumper: unable to find thread #17's (389) stack
>
> I was able to reproduce this problem in a x86 qemu environment with a
> mainline kernel and parts of our application code. After bisecting the
> kernel I saw that this was introduced with kernel v5.15 due to the
> following commit:
>
> 7b1b610f coredump: Don't perform any cleanups before dumping core
Yes, that commit broke /proc/PID/stat. The commit does not properly take
the non-crashing threads into account. It has been on my TODO list for a
while to post a proper fix. I would also like to add kernel tests
because this is not the first time that a developer breaks
/proc/PID/stat.
> Btw when using the regular core file gdb is able to show the stacks
> from the threads as expected.
gdb manually parses the dump information to retrieve the stack
pointers. This is quite complicated because different architectures do
things differently.
The minicoredumper project has no interest in implementing all that. (It
has been suggested in the past [0].) Instead, /proc/PID/stat is used,
which already provides that information. However, that information is
only available for tasks that are no longer executing (are shutting down
or have crashed). And that is what is currently broken in mainline.
I have attached a workaround-patch for the kernel, that seems to fix the
issue. But a proper solution (and regression test!) needs to be
developed to send to mainline.
John Ogness
[0] https://lists.linutronix.de/pipermail/minicoredumper/2017-August/000052.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-fix-minicoredumper.patch
Type: text/x-diff
Size: 1378 bytes
Desc: not available
URL: <http://lists.linutronix.de/pipermail/minicoredumper/attachments/20230927/f3a6908e/attachment.patch>
More information about the minicoredumper
mailing list