linux/Documentation/slow-work.txt
<<
>>
Prefs
   1                     ====================================
   2                     SLOW WORK ITEM EXECUTION THREAD POOL
   3                     ====================================
   4
   5By: David Howells <dhowells@redhat.com>
   6
   7The slow work item execution thread pool is a pool of threads for performing
   8things that take a relatively long time, such as making mkdir calls.
   9Typically, when processing something, these items will spend a lot of time
  10blocking a thread on I/O, thus making that thread unavailable for doing other
  11work.
  12
  13The standard workqueue model is unsuitable for this class of work item as that
  14limits the owner to a single thread or a single thread per CPU.  For some
  15tasks, however, more threads - or fewer - are required.
  16
  17There is just one pool per system.  It contains no threads unless something
  18wants to use it - and that something must register its interest first.  When
  19the pool is active, the number of threads it contains is dynamic, varying
  20between a maximum and minimum setting, depending on the load.
  21
  22
  23====================
  24CLASSES OF WORK ITEM
  25====================
  26
  27This pool support two classes of work items:
  28
  29 (*) Slow work items.
  30
  31 (*) Very slow work items.
  32
  33The former are expected to finish much quicker than the latter.
  34
  35An operation of the very slow class may do a batch combination of several
  36lookups, mkdirs, and a create for instance.
  37
  38An operation of the ordinarily slow class may, for example, write stuff or
  39expand files, provided the time taken to do so isn't too long.
  40
  41Operations of both types may sleep during execution, thus tying up the thread
  42loaned to it.
  43
  44A further class of work item is available, based on the slow work item class:
  45
  46 (*) Delayed slow work items.
  47
  48These are slow work items that have a timer to defer queueing of the item for
  49a while.
  50
  51
  52THREAD-TO-CLASS ALLOCATION
  53--------------------------
  54
  55Not all the threads in the pool are available to work on very slow work items.
  56The number will be between one and one fewer than the number of active threads.
  57This is configurable (see the "Pool Configuration" section).
  58
  59All the threads are available to work on ordinarily slow work items, but a
  60percentage of the threads will prefer to work on very slow work items.
  61
  62The configuration ensures that at least one thread will be available to work on
  63very slow work items, and at least one thread will be available that won't work
  64on very slow work items at all.
  65
  66
  67=====================
  68USING SLOW WORK ITEMS
  69=====================
  70
  71Firstly, a module or subsystem wanting to make use of slow work items must
  72register its interest:
  73
  74         int ret = slow_work_register_user(struct module *module);
  75
  76This will return 0 if successful, or a -ve error upon failure.  The module
  77pointer should be the module interested in using this facility (almost
  78certainly THIS_MODULE).
  79
  80
  81Slow work items may then be set up by:
  82
  83 (1) Declaring a slow_work struct type variable:
  84
  85        #include <linux/slow-work.h>
  86
  87        struct slow_work myitem;
  88
  89 (2) Declaring the operations to be used for this item:
  90
  91        struct slow_work_ops myitem_ops = {
  92                .get_ref = myitem_get_ref,
  93                .put_ref = myitem_put_ref,
  94                .execute = myitem_execute,
  95        };
  96
  97     [*] For a description of the ops, see section "Item Operations".
  98
  99 (3) Initialising the item:
 100
 101        slow_work_init(&myitem, &myitem_ops);
 102
 103     or:
 104
 105        delayed_slow_work_init(&myitem, &myitem_ops);
 106
 107     or:
 108
 109        vslow_work_init(&myitem, &myitem_ops);
 110
 111     depending on its class.
 112
 113A suitably set up work item can then be enqueued for processing:
 114
 115        int ret = slow_work_enqueue(&myitem);
 116
 117This will return a -ve error if the thread pool is unable to gain a reference
 118on the item, 0 otherwise, or (for delayed work):
 119
 120        int ret = delayed_slow_work_enqueue(&myitem, my_jiffy_delay);
 121
 122
 123The items are reference counted, so there ought to be no need for a flush
 124operation.  But as the reference counting is optional, means to cancel
 125existing work items are also included:
 126
 127        cancel_slow_work(&myitem);
 128        cancel_delayed_slow_work(&myitem);
 129
 130can be used to cancel pending work.  The above cancel function waits for
 131existing work to have been executed (or prevent execution of them, depending
 132on timing).
 133
 134
 135When all a module's slow work items have been processed, and the
 136module has no further interest in the facility, it should unregister its
 137interest:
 138
 139        slow_work_unregister_user(struct module *module);
 140
 141The module pointer is used to wait for all outstanding work items for that
 142module before completing the unregistration.  This prevents the put_ref() code
 143from being taken away before it completes.  module should almost certainly be
 144THIS_MODULE.
 145
 146
 147================
 148HELPER FUNCTIONS
 149================
 150
 151The slow-work facility provides a function by which it can be determined
 152whether or not an item is queued for later execution:
 153
 154        bool queued = slow_work_is_queued(struct slow_work *work);
 155
 156If it returns false, then the item is not on the queue (it may be executing
 157with a requeue pending).  This can be used to work out whether an item on which
 158another depends is on the queue, thus allowing a dependent item to be queued
 159after it.
 160
 161If the above shows an item on which another depends not to be queued, then the
 162owner of the dependent item might need to wait.  However, to avoid locking up
 163the threads unnecessarily be sleeping in them, it can make sense under some
 164circumstances to return the work item to the queue, thus deferring it until
 165some other items have had a chance to make use of the yielded thread.
 166
 167To yield a thread and defer an item, the work function should simply enqueue
 168the work item again and return.  However, this doesn't work if there's nothing
 169actually on the queue, as the thread just vacated will jump straight back into
 170the item's work function, thus busy waiting on a CPU.
 171
 172Instead, the item should use the thread to wait for the dependency to go away,
 173but rather than using schedule() or schedule_timeout() to sleep, it should use
 174the following function:
 175
 176        bool requeue = slow_work_sleep_till_thread_needed(
 177                        struct slow_work *work,
 178                        signed long *_timeout);
 179
 180This will add a second wait and then sleep, such that it will be woken up if
 181either something appears on the queue that could usefully make use of the
 182thread - and behind which this item can be queued, or if the event the caller
 183set up to wait for happens.  True will be returned if something else appeared
 184on the queue and this work function should perhaps return, of false if
 185something else woke it up.  The timeout is as for schedule_timeout().
 186
 187For example:
 188
 189        wq = bit_waitqueue(&my_flags, MY_BIT);
 190        init_wait(&wait);
 191        requeue = false;
 192        do {
 193                prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
 194                if (!test_bit(MY_BIT, &my_flags))
 195                        break;
 196                requeue = slow_work_sleep_till_thread_needed(&my_work,
 197                                                             &timeout);
 198        } while (timeout > 0 && !requeue);
 199        finish_wait(wq, &wait);
 200        if (!test_bit(MY_BIT, &my_flags)
 201                goto do_my_thing;
 202        if (requeue)
 203                return; // to slow_work
 204
 205
 206===============
 207ITEM OPERATIONS
 208===============
 209
 210Each work item requires a table of operations of type struct slow_work_ops.
 211Only ->execute() is required; the getting and putting of a reference and the
 212describing of an item are all optional.
 213
 214 (*) Get a reference on an item:
 215
 216        int (*get_ref)(struct slow_work *work);
 217
 218     This allows the thread pool to attempt to pin an item by getting a
 219     reference on it.  This function should return 0 if the reference was
 220     granted, or a -ve error otherwise.  If an error is returned,
 221     slow_work_enqueue() will fail.
 222
 223     The reference is held whilst the item is queued and whilst it is being
 224     executed.  The item may then be requeued with the same reference held, or
 225     the reference will be released.
 226
 227 (*) Release a reference on an item:
 228
 229        void (*put_ref)(struct slow_work *work);
 230
 231     This allows the thread pool to unpin an item by releasing the reference on
 232     it.  The thread pool will not touch the item again once this has been
 233     called.
 234
 235 (*) Execute an item:
 236
 237        void (*execute)(struct slow_work *work);
 238
 239     This should perform the work required of the item.  It may sleep, it may
 240     perform disk I/O and it may wait for locks.
 241
 242 (*) View an item through /proc:
 243
 244        void (*desc)(struct slow_work *work, struct seq_file *m);
 245
 246     If supplied, this should print to 'm' a small string describing the work
 247     the item is to do.  This should be no more than about 40 characters, and
 248     shouldn't include a newline character.
 249
 250     See the 'Viewing executing and queued items' section below.
 251
 252
 253==================
 254POOL CONFIGURATION
 255==================
 256
 257The slow-work thread pool has a number of configurables:
 258
 259 (*) /proc/sys/kernel/slow-work/min-threads
 260
 261     The minimum number of threads that should be in the pool whilst it is in
 262     use.  This may be anywhere between 2 and max-threads.
 263
 264 (*) /proc/sys/kernel/slow-work/max-threads
 265
 266     The maximum number of threads that should in the pool.  This may be
 267     anywhere between min-threads and 255 or NR_CPUS * 2, whichever is greater.
 268
 269 (*) /proc/sys/kernel/slow-work/vslow-percentage
 270
 271     The percentage of active threads in the pool that may be used to execute
 272     very slow work items.  This may be between 1 and 99.  The resultant number
 273     is bounded to between 1 and one fewer than the number of active threads.
 274     This ensures there is always at least one thread that can process very
 275     slow work items, and always at least one thread that won't.
 276
 277
 278==================================
 279VIEWING EXECUTING AND QUEUED ITEMS
 280==================================
 281
 282If CONFIG_SLOW_WORK_DEBUG is enabled, a debugfs file is made available:
 283
 284        /sys/kernel/debug/slow_work/runqueue
 285
 286through which the list of work items being executed and the queues of items to
 287be executed may be viewed.  The owner of a work item is given the chance to
 288add some information of its own.
 289
 290The contents look something like the following:
 291
 292    THR PID   ITEM ADDR        FL MARK  DESC
 293    === ===== ================ == ===== ==========
 294      0  3005 ffff880023f52348  a 952ms FSC: OBJ17d3: LOOK
 295      1  3006 ffff880024e33668  2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2
 296      2  3165 ffff8800296dd180  a 424ms FSC: OBJ17e4: LOOK
 297      3  4089 ffff8800262c8d78  a 212ms FSC: OBJ17ea: CRTN
 298      4  4090 ffff88002792bed8  2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2
 299      5  4092 ffff88002a0ef308  2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2
 300      6  4094 ffff88002abaf4b8  2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2
 301      7  4095 ffff88002bb188e0  a 388ms FSC: OBJ17e9: CRTN
 302    vsq     - ffff880023d99668  1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2
 303    vsq     - ffff8800295d1740  1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2
 304    vsq     - ffff880025ba3308  1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2
 305    vsq     - ffff880024ec83e0  1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2
 306    vsq     - ffff880026618e00  1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2
 307    vsq     - ffff880025a2a4b8  1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2
 308    vsq     - ffff880023cbe6d8  9 212ms FSC: OBJ17eb: LOOK
 309    vsq     - ffff880024d37590  9 212ms FSC: OBJ17ec: LOOK
 310    vsq     - ffff880027746cb0  9 212ms FSC: OBJ17ed: LOOK
 311    vsq     - ffff880024d37ae8  9 212ms FSC: OBJ17ee: LOOK
 312    vsq     - ffff880024d37cb0  9 212ms FSC: OBJ17ef: LOOK
 313    vsq     - ffff880025036550  9 212ms FSC: OBJ17f0: LOOK
 314    vsq     - ffff8800250368e0  9 212ms FSC: OBJ17f1: LOOK
 315    vsq     - ffff880025036aa8  9 212ms FSC: OBJ17f2: LOOK
 316
 317In the 'THR' column, executing items show the thread they're occupying and
 318queued threads indicate which queue they're on.  'PID' shows the process ID of
 319a slow-work thread that's executing something.  'FL' shows the work item flags.
 320'MARK' indicates how long since an item was queued or began executing.  Lastly,
 321the 'DESC' column permits the owner of an item to give some information.
 322
 323