linux/Documentation/nmi_watchdog.txt
<<
>>
Prefs
   1
   2[NMI watchdog is available for x86 and x86-64 architectures]
   3
   4Is your system locking up unpredictably? No keyboard activity, just
   5a frustrating complete hard lockup? Do you want to help us debugging
   6such lockups? If all yes then this document is definitely for you.
   7
   8On many x86/x86-64 type hardware there is a feature that enables
   9us to generate 'watchdog NMI interrupts'.  (NMI: Non Maskable Interrupt
  10which get executed even if the system is otherwise locked up hard).
  11This can be used to debug hard kernel lockups.  By executing periodic
  12NMI interrupts, the kernel can monitor whether any CPU has locked up,
  13and print out debugging messages if so.
  14
  15In order to use the NMI watchdog, you need to have APIC support in your
  16kernel. For SMP kernels, APIC support gets compiled in automatically. For
  17UP, enable either CONFIG_X86_UP_APIC (Processor type and features -> Local
  18APIC support on uniprocessors) or CONFIG_X86_UP_IOAPIC (Processor type and
  19features -> IO-APIC support on uniprocessors) in your kernel config.
  20CONFIG_X86_UP_APIC is for uniprocessor machines without an IO-APIC.
  21CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain
  22kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,
  23may implicitly disable the NMI watchdog.]
  24
  25For x86-64, the needed APIC is always compiled in.
  26
  27Using local APIC (nmi_watchdog=2) needs the first performance register, so
  28you can't use it for other purposes (such as high precision performance
  29profiling.) However, at least oprofile and the perfctr driver disable the
  30local APIC NMI watchdog automatically.
  31
  32To actually enable the NMI watchdog, use the 'nmi_watchdog=N' boot
  33parameter.  Eg. the relevant lilo.conf entry:
  34
  35        append="nmi_watchdog=1"
  36
  37For SMP machines and UP machines with an IO-APIC use nmi_watchdog=1.
  38For UP machines without an IO-APIC use nmi_watchdog=2, this only works
  39for some processor types.  If in doubt, boot with nmi_watchdog=1 and
  40check the NMI count in /proc/interrupts; if the count is zero then
  41reboot with nmi_watchdog=2 and check the NMI count.  If it is still
  42zero then log a problem, you probably have a processor that needs to be
  43added to the nmi code.
  44
  45A 'lockup' is the following scenario: if any CPU in the system does not
  46execute the period local timer interrupt for more than 5 seconds, then
  47the NMI handler generates an oops and kills the process. This
  48'controlled crash' (and the resulting kernel messages) can be used to
  49debug the lockup. Thus whenever the lockup happens, wait 5 seconds and
  50the oops will show up automatically. If the kernel produces no messages
  51then the system has crashed so hard (eg. hardware-wise) that either it
  52cannot even accept NMI interrupts, or the crash has made the kernel
  53unable to print messages.
  54
  55Be aware that when using local APIC, the frequency of NMI interrupts
  56it generates, depends on the system load. The local APIC NMI watchdog,
  57lacking a better source, uses the "cycles unhalted" event. As you may
  58guess it doesn't tick when the CPU is in the halted state (which happens
  59when the system is idle), but if your system locks up on anything but the
  60"hlt" processor instruction, the watchdog will trigger very soon as the
  61"cycles unhalted" event will happen every clock tick. If it locks up on
  62"hlt", then you are out of luck -- the event will not happen at all and the
  63watchdog won't trigger. This is a shortcoming of the local APIC watchdog
  64-- unfortunately there is no "clock ticks" event that would work all the
  65time. The I/O APIC watchdog is driven externally and has no such shortcoming.
  66But its NMI frequency is much higher, resulting in a more significant hit
  67to the overall system performance.
  68
  69On x86 nmi_watchdog is disabled by default so you have to enable it with
  70a boot time parameter.
  71
  72It's possible to disable the NMI watchdog in run-time by writing "0" to
  73/proc/sys/kernel/nmi_watchdog. Writing "1" to the same file will re-enable
  74the NMI watchdog. Notice that you still need to use "nmi_watchdog=" parameter
  75at boot time.
  76
  77NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally
  78on x86 SMP boxes.
  79
  80[ feel free to send bug reports, suggestions and patches to
  81  Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing
  82  list at <linux-smp@vger.kernel.org> ]
  83
  84