linux/Documentation/trace/ftrace-uses.rst
<<
>>
Prefs
   1=================================
   2Using ftrace to hook to functions
   3=================================
   4
   5.. Copyright 2017 VMware Inc.
   6..   Author:   Steven Rostedt <srostedt@goodmis.org>
   7..  License:   The GNU Free Documentation License, Version 1.2
   8..               (dual licensed under the GPL v2)
   9
  10Written for: 4.14
  11
  12Introduction
  13============
  14
  15The ftrace infrastructure was originally created to attach callbacks to the
  16beginning of functions in order to record and trace the flow of the kernel.
  17But callbacks to the start of a function can have other use cases. Either
  18for live kernel patching, or for security monitoring. This document describes
  19how to use ftrace to implement your own function callbacks.
  20
  21
  22The ftrace context
  23==================
  24.. warning::
  25
  26  The ability to add a callback to almost any function within the
  27  kernel comes with risks. A callback can be called from any context
  28  (normal, softirq, irq, and NMI). Callbacks can also be called just before
  29  going to idle, during CPU bring up and takedown, or going to user space.
  30  This requires extra care to what can be done inside a callback. A callback
  31  can be called outside the protective scope of RCU.
  32
  33There are helper functions to help against recursion, and making sure
  34RCU is watching. These are explained below.
  35
  36
  37The ftrace_ops structure
  38========================
  39
  40To register a function callback, a ftrace_ops is required. This structure
  41is used to tell ftrace what function should be called as the callback
  42as well as what protections the callback will perform and not require
  43ftrace to handle.
  44
  45There is only one field that is needed to be set when registering
  46an ftrace_ops with ftrace:
  47
  48.. code-block:: c
  49
  50 struct ftrace_ops ops = {
  51       .func                    = my_callback_func,
  52       .flags                   = MY_FTRACE_FLAGS
  53       .private                 = any_private_data_structure,
  54 };
  55
  56Both .flags and .private are optional. Only .func is required.
  57
  58To enable tracing call::
  59
  60    register_ftrace_function(&ops);
  61
  62To disable tracing call::
  63
  64    unregister_ftrace_function(&ops);
  65
  66The above is defined by including the header::
  67
  68    #include <linux/ftrace.h>
  69
  70The registered callback will start being called some time after the
  71register_ftrace_function() is called and before it returns. The exact time
  72that callbacks start being called is dependent upon architecture and scheduling
  73of services. The callback itself will have to handle any synchronization if it
  74must begin at an exact moment.
  75
  76The unregister_ftrace_function() will guarantee that the callback is
  77no longer being called by functions after the unregister_ftrace_function()
  78returns. Note that to perform this guarantee, the unregister_ftrace_function()
  79may take some time to finish.
  80
  81
  82The callback function
  83=====================
  84
  85The prototype of the callback function is as follows (as of v4.14):
  86
  87.. code-block:: c
  88
  89   void callback_func(unsigned long ip, unsigned long parent_ip,
  90                      struct ftrace_ops *op, struct pt_regs *regs);
  91
  92@ip
  93         This is the instruction pointer of the function that is being traced.
  94         (where the fentry or mcount is within the function)
  95
  96@parent_ip
  97        This is the instruction pointer of the function that called the
  98        the function being traced (where the call of the function occurred).
  99
 100@op
 101        This is a pointer to ftrace_ops that was used to register the callback.
 102        This can be used to pass data to the callback via the private pointer.
 103
 104@regs
 105        If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
 106        flags are set in the ftrace_ops structure, then this will be pointing
 107        to the pt_regs structure like it would be if an breakpoint was placed
 108        at the start of the function where ftrace was tracing. Otherwise it
 109        either contains garbage, or NULL.
 110
 111Protect your callback
 112=====================
 113
 114As functions can be called from anywhere, and it is possible that a function
 115called by a callback may also be traced, and call that same callback,
 116recursion protection must be used. There are two helper functions that
 117can help in this regard. If you start your code with:
 118
 119.. code-block:: c
 120
 121        int bit;
 122
 123        bit = ftrace_test_recursion_trylock(ip, parent_ip);
 124        if (bit < 0)
 125                return;
 126
 127and end it with:
 128
 129.. code-block:: c
 130
 131        ftrace_test_recursion_unlock(bit);
 132
 133The code in between will be safe to use, even if it ends up calling a
 134function that the callback is tracing. Note, on success,
 135ftrace_test_recursion_trylock() will disable preemption, and the
 136ftrace_test_recursion_unlock() will enable it again (if it was previously
 137enabled). The instruction pointer (ip) and its parent (parent_ip) is passed to
 138ftrace_test_recursion_trylock() to record where the recursion happened
 139(if CONFIG_FTRACE_RECORD_RECURSION is set).
 140
 141Alternatively, if the FTRACE_OPS_FL_RECURSION flag is set on the ftrace_ops
 142(as explained below), then a helper trampoline will be used to test
 143for recursion for the callback and no recursion test needs to be done.
 144But this is at the expense of a slightly more overhead from an extra
 145function call.
 146
 147If your callback accesses any data or critical section that requires RCU
 148protection, it is best to make sure that RCU is "watching", otherwise
 149that data or critical section will not be protected as expected. In this
 150case add:
 151
 152.. code-block:: c
 153
 154        if (!rcu_is_watching())
 155                return;
 156
 157Alternatively, if the FTRACE_OPS_FL_RCU flag is set on the ftrace_ops
 158(as explained below), then a helper trampoline will be used to test
 159for rcu_is_watching for the callback and no other test needs to be done.
 160But this is at the expense of a slightly more overhead from an extra
 161function call.
 162
 163
 164The ftrace FLAGS
 165================
 166
 167The ftrace_ops flags are all defined and documented in include/linux/ftrace.h.
 168Some of the flags are used for internal infrastructure of ftrace, but the
 169ones that users should be aware of are the following:
 170
 171FTRACE_OPS_FL_SAVE_REGS
 172        If the callback requires reading or modifying the pt_regs
 173        passed to the callback, then it must set this flag. Registering
 174        a ftrace_ops with this flag set on an architecture that does not
 175        support passing of pt_regs to the callback will fail.
 176
 177FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
 178        Similar to SAVE_REGS but the registering of a
 179        ftrace_ops on an architecture that does not support passing of regs
 180        will not fail with this flag set. But the callback must check if
 181        regs is NULL or not to determine if the architecture supports it.
 182
 183FTRACE_OPS_FL_RECURSION
 184        By default, it is expected that the callback can handle recursion.
 185        But if the callback is not that worried about overehead, then
 186        setting this bit will add the recursion protection around the
 187        callback by calling a helper function that will do the recursion
 188        protection and only call the callback if it did not recurse.
 189
 190        Note, if this flag is not set, and recursion does occur, it could
 191        cause the system to crash, and possibly reboot via a triple fault.
 192
 193        Not, if this flag is set, then the callback will always be called
 194        with preemption disabled. If it is not set, then it is possible
 195        (but not guaranteed) that the callback will be called in
 196        preemptable context.
 197
 198FTRACE_OPS_FL_IPMODIFY
 199        Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack"
 200        the traced function (have another function called instead of the
 201        traced function), it requires setting this flag. This is what live
 202        kernel patches uses. Without this flag the pt_regs->ip can not be
 203        modified.
 204
 205        Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be
 206        registered to any given function at a time.
 207
 208FTRACE_OPS_FL_RCU
 209        If this is set, then the callback will only be called by functions
 210        where RCU is "watching". This is required if the callback function
 211        performs any rcu_read_lock() operation.
 212
 213        RCU stops watching when the system goes idle, the time when a CPU
 214        is taken down and comes back online, and when entering from kernel
 215        to user space and back to kernel space. During these transitions,
 216        a callback may be executed and RCU synchronization will not protect
 217        it.
 218
 219FTRACE_OPS_FL_PERMANENT
 220        If this is set on any ftrace ops, then the tracing cannot disabled by
 221        writing 0 to the proc sysctl ftrace_enabled. Equally, a callback with
 222        the flag set cannot be registered if ftrace_enabled is 0.
 223
 224        Livepatch uses it not to lose the function redirection, so the system
 225        stays protected.
 226
 227
 228Filtering which functions to trace
 229==================================
 230
 231If a callback is only to be called from specific functions, a filter must be
 232set up. The filters are added by name, or ip if it is known.
 233
 234.. code-block:: c
 235
 236   int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
 237                         int len, int reset);
 238
 239@ops
 240        The ops to set the filter with
 241
 242@buf
 243        The string that holds the function filter text.
 244@len
 245        The length of the string.
 246
 247@reset
 248        Non-zero to reset all filters before applying this filter.
 249
 250Filters denote which functions should be enabled when tracing is enabled.
 251If @buf is NULL and reset is set, all functions will be enabled for tracing.
 252
 253The @buf can also be a glob expression to enable all functions that
 254match a specific pattern.
 255
 256See Filter Commands in :file:`Documentation/trace/ftrace.rst`.
 257
 258To just trace the schedule function:
 259
 260.. code-block:: c
 261
 262   ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0);
 263
 264To add more functions, call the ftrace_set_filter() more than once with the
 265@reset parameter set to zero. To remove the current filter set and replace it
 266with new functions defined by @buf, have @reset be non-zero.
 267
 268To remove all the filtered functions and trace all functions:
 269
 270.. code-block:: c
 271
 272   ret = ftrace_set_filter(&ops, NULL, 0, 1);
 273
 274
 275Sometimes more than one function has the same name. To trace just a specific
 276function in this case, ftrace_set_filter_ip() can be used.
 277
 278.. code-block:: c
 279
 280   ret = ftrace_set_filter_ip(&ops, ip, 0, 0);
 281
 282Although the ip must be the address where the call to fentry or mcount is
 283located in the function. This function is used by perf and kprobes that
 284gets the ip address from the user (usually using debug info from the kernel).
 285
 286If a glob is used to set the filter, functions can be added to a "notrace"
 287list that will prevent those functions from calling the callback.
 288The "notrace" list takes precedence over the "filter" list. If the
 289two lists are non-empty and contain the same functions, the callback will not
 290be called by any function.
 291
 292An empty "notrace" list means to allow all functions defined by the filter
 293to be traced.
 294
 295.. code-block:: c
 296
 297   int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
 298                          int len, int reset);
 299
 300This takes the same parameters as ftrace_set_filter() but will add the
 301functions it finds to not be traced. This is a separate list from the
 302filter list, and this function does not modify the filter list.
 303
 304A non-zero @reset will clear the "notrace" list before adding functions
 305that match @buf to it.
 306
 307Clearing the "notrace" list is the same as clearing the filter list
 308
 309.. code-block:: c
 310
 311  ret = ftrace_set_notrace(&ops, NULL, 0, 1);
 312
 313The filter and notrace lists may be changed at any time. If only a set of
 314functions should call the callback, it is best to set the filters before
 315registering the callback. But the changes may also happen after the callback
 316has been registered.
 317
 318If a filter is in place, and the @reset is non-zero, and @buf contains a
 319matching glob to functions, the switch will happen during the time of
 320the ftrace_set_filter() call. At no time will all functions call the callback.
 321
 322.. code-block:: c
 323
 324   ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
 325
 326   register_ftrace_function(&ops);
 327
 328   msleep(10);
 329
 330   ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1);
 331
 332is not the same as:
 333
 334.. code-block:: c
 335
 336   ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
 337
 338   register_ftrace_function(&ops);
 339
 340   msleep(10);
 341
 342   ftrace_set_filter(&ops, NULL, 0, 1);
 343
 344   ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0);
 345
 346As the latter will have a short time where all functions will call
 347the callback, between the time of the reset, and the time of the
 348new setting of the filter.
 349