qemu/docs/multiple-iothreads.txt
<<
>>
Prefs
   1Copyright (c) 2014 Red Hat Inc.
   2
   3This work is licensed under the terms of the GNU GPL, version 2 or later.  See
   4the COPYING file in the top-level directory.
   5
   6
   7This document explains the IOThread feature and how to write code that runs
   8outside the QEMU global mutex.
   9
  10The main loop and IOThreads
  11---------------------------
  12QEMU is an event-driven program that can do several things at once using an
  13event loop.  The VNC server and the QMP monitor are both processed from the
  14same event loop, which monitors their file descriptors until they become
  15readable and then invokes a callback.
  16
  17The default event loop is called the main loop (see main-loop.c).  It is
  18possible to create additional event loop threads using -object
  19iothread,id=my-iothread.
  20
  21Side note: The main loop and IOThread are both event loops but their code is
  22not shared completely.  Sometimes it is useful to remember that although they
  23are conceptually similar they are currently not interchangeable.
  24
  25Why IOThreads are useful
  26------------------------
  27IOThreads allow the user to control the placement of work.  The main loop is a
  28scalability bottleneck on hosts with many CPUs.  Work can be spread across
  29several IOThreads instead of just one main loop.  When set up correctly this
  30can improve I/O latency and reduce jitter seen by the guest.
  31
  32The main loop is also deeply associated with the QEMU global mutex, which is a
  33scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
  34global mutex to serialize execution of QEMU code.  This mutex is necessary
  35because a lot of QEMU's code historically was not thread-safe.
  36
  37The fact that all I/O processing is done in a single main loop and that the
  38QEMU global mutex is contended by all vCPU threads and the main loop explain
  39why it is desirable to place work into IOThreads.
  40
  41The experimental virtio-blk data-plane implementation has been benchmarked and
  42shows these effects:
  43ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
  44
  45How to program for IOThreads
  46----------------------------
  47The main difference between legacy code and new code that can run in an
  48IOThread is dealing explicitly with the event loop object, AioContext
  49(see include/block/aio.h).  Code that only works in the main loop
  50implicitly uses the main loop's AioContext.  Code that supports running
  51in IOThreads must be aware of its AioContext.
  52
  53AioContext supports the following services:
  54 * File descriptor monitoring (read/write/error on POSIX hosts)
  55 * Event notifiers (inter-thread signalling)
  56 * Timers
  57 * Bottom Halves (BH) deferred callbacks
  58
  59There are several old APIs that use the main loop AioContext:
  60 * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
  61 * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
  62 * LEGACY timer_new_ms() - create a timer
  63 * LEGACY qemu_bh_new() - create a BH
  64 * LEGACY qemu_aio_wait() - run an event loop iteration
  65
  66Since they implicitly work on the main loop they cannot be used in code that
  67runs in an IOThread.  They might cause a crash or deadlock if called from an
  68IOThread since the QEMU global mutex is not held.
  69
  70Instead, use the AioContext functions directly (see include/block/aio.h):
  71 * aio_set_fd_handler() - monitor a file descriptor
  72 * aio_set_event_notifier() - monitor an event notifier
  73 * aio_timer_new() - create a timer
  74 * aio_bh_new() - create a BH
  75 * aio_poll() - run an event loop iteration
  76
  77The AioContext can be obtained from the IOThread using
  78iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
  79Code that takes an AioContext argument works both in IOThreads or the main
  80loop, depending on which AioContext instance the caller passes in.
  81
  82How to synchronize with an IOThread
  83-----------------------------------
  84AioContext is not thread-safe so some rules must be followed when using file
  85descriptors, event notifiers, timers, or BHs across threads:
  86
  871. AioContext functions can be called safely from file descriptor, event
  88notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
  89necessary.
  90
  912. Other threads wishing to access the AioContext must use
  92aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
  93context is acquired no other thread can access it or run event loop iterations
  94in this AioContext.
  95
  96aio_context_acquire()/aio_context_release() calls may be nested.  This
  97means you can call them if you're not sure whether #1 applies.
  98
  99There is currently no lock ordering rule if a thread needs to acquire multiple
 100AioContexts simultaneously.  Therefore, it is only safe for code holding the
 101QEMU global mutex to acquire other AioContexts.
 102
 103Side note: the best way to schedule a function call across threads is to create
 104a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
 105acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
 106sure to acquire the AioContext for aio_bh_new() if necessary.
 107
 108The relationship between AioContext and the block layer
 109-------------------------------------------------------
 110The AioContext originates from the QEMU block layer because it provides a
 111scoped way of running event loop iterations until all work is done.  This
 112feature is used to complete all in-flight block I/O requests (see
 113bdrv_drain_all()).  Nowadays AioContext is a generic event loop that can be
 114used by any QEMU subsystem.
 115
 116The block layer has support for AioContext integrated.  Each BlockDriverState
 117is associated with an AioContext using bdrv_set_aio_context() and
 118bdrv_get_aio_context().  This allows block layer code to process I/O inside the
 119right AioContext.  Other subsystems may wish to follow a similar approach.
 120
 121Block layer code must therefore expect to run in an IOThread and avoid using
 122old APIs that implicitly use the main loop.  See the "How to program for
 123IOThreads" above for information on how to do that.
 124
 125If main loop code such as a QMP function wishes to access a BlockDriverState it
 126must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
 127IOThread does not run in parallel.
 128
 129Long-running jobs (usually in the form of coroutines) are best scheduled in the
 130BlockDriverState's AioContext to avoid the need to acquire/release around each
 131bdrv_*() call.  Be aware that there is currently no mechanism to get notified
 132when bdrv_set_aio_context() moves this BlockDriverState to a different
 133AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
 134may need to add this if you want to support long-running jobs.
 135