linux/Documentation/trace/timerlat-tracer.rst
<<
>>
Prefs
   1###############
   2Timerlat tracer
   3###############
   4
   5The timerlat tracer aims to help the preemptive kernel developers to
   6find souces of wakeup latencies of real-time threads. Like cyclictest,
   7the tracer sets a periodic timer that wakes up a thread. The thread then
   8computes a *wakeup latency* value as the difference between the *current
   9time* and the *absolute time* that the timer was set to expire. The main
  10goal of timerlat is tracing in such a way to help kernel developers.
  11
  12Usage
  13-----
  14
  15Write the ASCII text "timerlat" into the current_tracer file of the
  16tracing system (generally mounted at /sys/kernel/tracing).
  17
  18For example::
  19
  20        [root@f32 ~]# cd /sys/kernel/tracing/
  21        [root@f32 tracing]# echo timerlat > current_tracer
  22
  23It is possible to follow the trace by reading the trace trace file::
  24
  25  [root@f32 tracing]# cat trace
  26  # tracer: timerlat
  27  #
  28  #                              _-----=> irqs-off
  29  #                             / _----=> need-resched
  30  #                            | / _---=> hardirq/softirq
  31  #                            || / _--=> preempt-depth
  32  #                            || /
  33  #                            ||||             ACTIVATION
  34  #         TASK-PID      CPU# ||||   TIMESTAMP    ID            CONTEXT                LATENCY
  35  #            | |         |   ||||      |         |                  |                       |
  36          <idle>-0       [000] d.h1    54.029328: #1     context    irq timer_latency       932 ns
  37           <...>-867     [000] ....    54.029339: #1     context thread timer_latency     11700 ns
  38          <idle>-0       [001] dNh1    54.029346: #1     context    irq timer_latency      2833 ns
  39           <...>-868     [001] ....    54.029353: #1     context thread timer_latency      9820 ns
  40          <idle>-0       [000] d.h1    54.030328: #2     context    irq timer_latency       769 ns
  41           <...>-867     [000] ....    54.030330: #2     context thread timer_latency      3070 ns
  42          <idle>-0       [001] d.h1    54.030344: #2     context    irq timer_latency       935 ns
  43           <...>-868     [001] ....    54.030347: #2     context thread timer_latency      4351 ns
  44
  45
  46The tracer creates a per-cpu kernel thread with real-time priority that
  47prints two lines at every activation. The first is the *timer latency*
  48observed at the *hardirq* context before the activation of the thread.
  49The second is the *timer latency* observed by the thread. The ACTIVATION
  50ID field serves to relate the *irq* execution to its respective *thread*
  51execution.
  52
  53The *irq*/*thread* splitting is important to clarify at which context
  54the unexpected high value is coming from. The *irq* context can be
  55delayed by hardware related actions, such as SMIs, NMIs, IRQs
  56or by a thread masking interrupts. Once the timer happens, the delay
  57can also be influenced by blocking caused by threads. For example, by
  58postponing the scheduler execution via preempt_disable(), by the
  59scheduler execution, or by masking interrupts. Threads can
  60also be delayed by the interference from other threads and IRQs.
  61
  62Tracer options
  63---------------------
  64
  65The timerlat tracer is built on top of osnoise tracer.
  66So its configuration is also done in the osnoise/ config
  67directory. The timerlat configs are:
  68
  69 - cpus: CPUs at which a timerlat thread will execute.
  70 - timerlat_period_us: the period of the timerlat thread.
  71 - osnoise/stop_tracing_us: stop the system tracing if a
  72   timer latency at the *irq* context higher than the configured
  73   value happens. Writing 0 disables this option.
  74 - stop_tracing_total_us: stop the system tracing if a
  75   timer latency at the *thread* context higher than the configured
  76   value happens. Writing 0 disables this option.
  77 - print_stack: save the stack of the IRQ ocurrence, and print
  78   it afte the *thread context* event".
  79
  80timerlat and osnoise
  81----------------------------
  82
  83The timerlat can also take advantage of the osnoise: traceevents.
  84For example::
  85
  86        [root@f32 ~]# cd /sys/kernel/tracing/
  87        [root@f32 tracing]# echo timerlat > current_tracer
  88        [root@f32 tracing]# echo 1 > events/osnoise/enable
  89        [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us
  90        [root@f32 tracing]# tail -10 trace
  91             cc1-87882   [005] d..h...   548.771078: #402268 context    irq timer_latency     13585 ns
  92             cc1-87882   [005] dNLh1..   548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns
  93             cc1-87882   [005] dNLh2..   548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns
  94             cc1-87882   [005] d...3..   548.771102: thread_noise:      cc1:87882 start 548.771078243 duration 9909 ns
  95      timerlat/5-1035    [005] .......   548.771104: #402268 context thread timer_latency     39960 ns
  96
  97In this case, the root cause of the timer latency does not point to a
  98single cause, but to multiple ones. Firstly, the timer IRQ was delayed
  99for 13 us, which may point to a long IRQ disabled section (see IRQ
 100stacktrace section). Then the timer interrupt that wakes up the timerlat
 101thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally,
 102the cc1 thread noise took 9909 ns of time before the context switch.
 103Such pieces of evidence are useful for the developer to use other
 104tracing methods to figure out how to debug and optimize the system.
 105
 106It is worth mentioning that the *duration* values reported
 107by the osnoise: events are *net* values. For example, the
 108thread_noise does not include the duration of the overhead caused
 109by the IRQ execution (which indeed accounted for 12736 ns). But
 110the values reported by the timerlat tracer (timerlat_latency)
 111are *gross* values.
 112
 113The art below illustrates a CPU timeline and how the timerlat tracer
 114observes it at the top and the osnoise: events at the bottom. Each "-"
 115in the timelines means circa 1 us, and the time moves ==>::
 116
 117      External     timer irq                   thread
 118       clock        latency                    latency
 119       event        13585 ns                   39960 ns
 120         |             ^                         ^
 121         v             |                         |
 122         |-------------|                         |
 123         |-------------+-------------------------|
 124                       ^                         ^
 125  ========================================================================
 126                    [tmr irq]  [dev irq]
 127  [another thread...^       v..^       v.......][timerlat/ thread]  <-- CPU timeline
 128  =========================================================================
 129                    |-------|  |-------|
 130                            |--^       v-------|
 131                            |          |       |
 132                            |          |       + thread_noise: 9909 ns
 133                            |          +-> irq_noise: 6139 ns
 134                            +-> irq_noise: 7597 ns
 135
 136IRQ stacktrace
 137---------------------------
 138
 139The osnoise/print_stack option is helpful for the cases in which a thread
 140noise causes the major factor for the timer latency, because of preempt or
 141irq disabled. For example::
 142
 143        [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us
 144        [root@f32 tracing]# echo 500 > osnoise/print_stack
 145        [root@f32 tracing]# echo timerlat > current_tracer
 146        [root@f32 tracing]# tail -21 per_cpu/cpu7/trace
 147          insmod-1026    [007] dN.h1..   200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns
 148          insmod-1026    [007] d..h1..   200.202587: #29800 context    irq timer_latency      1616 ns
 149          insmod-1026    [007] dN.h2..   200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns
 150          insmod-1026    [007] dN.h3..   200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns
 151          insmod-1026    [007] d...3..   200.203444: thread_noise:   insmod:1026 start 200.202586933 duration 838681 ns
 152      timerlat/7-1001    [007] .......   200.203445: #29800 context thread timer_latency    859978 ns
 153      timerlat/7-1001    [007] ....1..   200.203446: <stack trace>
 154  => timerlat_irq
 155  => __hrtimer_run_queues
 156  => hrtimer_interrupt
 157  => __sysvec_apic_timer_interrupt
 158  => asm_call_irq_on_stack
 159  => sysvec_apic_timer_interrupt
 160  => asm_sysvec_apic_timer_interrupt
 161  => delay_tsc
 162  => dummy_load_1ms_pd_init
 163  => do_one_initcall
 164  => do_init_module
 165  => __do_sys_finit_module
 166  => do_syscall_64
 167  => entry_SYSCALL_64_after_hwframe
 168
 169In this case, it is possible to see that the thread added the highest
 170contribution to the *timer latency* and the stack trace, saved during
 171the timerlat IRQ handler, points to a function named
 172dummy_load_1ms_pd_init, which had the following code (on purpose)::
 173
 174        static int __init dummy_load_1ms_pd_init(void)
 175        {
 176                preempt_disable();
 177                mdelay(1);
 178                preempt_enable();
 179                return 0;
 180
 181        }
 182