default crashkernel does not work for baremetal server

Asked by norman shen

Hello,

We are testing kdump on a baremetal server with following spec

cpu: Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz * 2
ram: 376G

the default crashkernel configuration which is working on a virtual machine with smaller ram does not
work for the baremetal server. the symptom is that when a kernel panic being manually triggered,
the host hangs and gives no responses, even connecting with a monitoring. but surprisingly, if I
change crashkernel to 512M-:512M, the kdump works properly and problem solved.

My question is why a smaller reserved memory could cause host hang? and what should be a
more ideal value for crashkernel? Thank you

Best,
Norman

Question information

Language:
English Edit question
Status:
Answered
For:
Ubuntu Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Bernard Stafford (bernard010) said :
#1
Revision history for this message
norman shen (jshen28) said :
#2

Thank you for answering, but how could I know what is size of kernel + initrd uncompressed, such that
I can compute for a reasonable value?

Best
Norman

Revision history for this message
Bernard Stafford (bernard010) said (last edit ):
#3

Depending upon which version of Ubuntu and which kernel you are using. The latest kernel from : https://www.kernel.org/
is : 5.13.8 which size is 114MB compressed. The extracted kernel is 1GB in size.
I just downloaded the kernel saved to file, then extracted it for the total.

Revision history for this message
norman shen (jshen28) said :
#4

thank you for answering. I try to reserve a 2048M memory but failed, in dmesg I saw

[Sat Aug 7 06:32:50 2021] crashkernel reservation failed - No suitable area found.

Am I able to reserve a large memory block? thank you

Best,
Norman

Revision history for this message
norman shen (jshen28) said (last edit ):
#5

fyi, I have to tried to increse crashkernel to 10G but it still does not work for a machine

~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.15.0-72-generic root=/dev/mapper/vgroot-lvroot ro crashkernel=10240M,high
~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x66000000
0x3dff000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.15.0-72-generic
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.15.0-72-generic
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-4.15.0-72-generic root=/dev/mapper/vgroot-lvroot ro systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

For some machines, I can make it work with a crashkernel be 512MB, but for others using 10G still does not work...

Best,
Norman

I found from bmc web page that there an pcie uncorrectable error: "Bus Uncorrectable Error - Asserted" after triggering kernel panic. not sure if thi

Revision history for this message
Bernard Stafford (bernard010) said (last edit ):
#6

Might look at bug/1764146 with similar problem.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1764246
Quote: " By increasing the size reserved to 256MB it's working."
I would file a bug report. Terminal: ubuntu-bug kdump

Can you help with this problem?

Provide an answer of your own, or ask norman shen for more information if necessary.

To post a message you must log in.