qemu/docs/system/security.rst
<<
>>
Prefs
   1Security
   2========
   3
   4Overview
   5--------
   6
   7This chapter explains the security requirements that QEMU is designed to meet
   8and principles for securely deploying QEMU.
   9
  10Security Requirements
  11---------------------
  12
  13QEMU supports many different use cases, some of which have stricter security
  14requirements than others.  The community has agreed on the overall security
  15requirements that users may depend on.  These requirements define what is
  16considered supported from a security perspective.
  17
  18Virtualization Use Case
  19'''''''''''''''''''''''
  20
  21The virtualization use case covers cloud and virtual private server (VPS)
  22hosting, as well as traditional data center and desktop virtualization.  These
  23use cases rely on hardware virtualization extensions to execute guest code
  24safely on the physical CPU at close-to-native speed.
  25
  26The following entities are untrusted, meaning that they may be buggy or
  27malicious:
  28
  29- Guest
  30- User-facing interfaces (e.g. VNC, SPICE, WebSocket)
  31- Network protocols (e.g. NBD, live migration)
  32- User-supplied files (e.g. disk images, kernels, device trees)
  33- Passthrough devices (e.g. PCI, USB)
  34
  35Bugs affecting these entities are evaluated on whether they can cause damage in
  36real-world use cases and treated as security bugs if this is the case.
  37
  38Non-virtualization Use Case
  39'''''''''''''''''''''''''''
  40
  41The non-virtualization use case covers emulation using the Tiny Code Generator
  42(TCG).  In principle the TCG and device emulation code used in conjunction with
  43the non-virtualization use case should meet the same security requirements as
  44the virtualization use case.  However, for historical reasons much of the
  45non-virtualization use case code was not written with these security
  46requirements in mind.
  47
  48Bugs affecting the non-virtualization use case are not considered security
  49bugs at this time.  Users with non-virtualization use cases must not rely on
  50QEMU to provide guest isolation or any security guarantees.
  51
  52Architecture
  53------------
  54
  55This section describes the design principles that ensure the security
  56requirements are met.
  57
  58Guest Isolation
  59'''''''''''''''
  60
  61Guest isolation is the confinement of guest code to the virtual machine.  When
  62guest code gains control of execution on the host this is called escaping the
  63virtual machine.  Isolation also includes resource limits such as throttling of
  64CPU, memory, disk, or network.  Guests must be unable to exceed their resource
  65limits.
  66
  67QEMU presents an attack surface to the guest in the form of emulated devices.
  68The guest must not be able to gain control of QEMU.  Bugs in emulated devices
  69could allow malicious guests to gain code execution in QEMU.  At this point the
  70guest has escaped the virtual machine and is able to act in the context of the
  71QEMU process on the host.
  72
  73Guests often interact with other guests and share resources with them.  A
  74malicious guest must not gain control of other guests or access their data.
  75Disk image files and network traffic must be protected from other guests unless
  76explicitly shared between them by the user.
  77
  78Principle of Least Privilege
  79''''''''''''''''''''''''''''
  80
  81The principle of least privilege states that each component only has access to
  82the privileges necessary for its function.  In the case of QEMU this means that
  83each process only has access to resources belonging to the guest.
  84
  85The QEMU process should not have access to any resources that are inaccessible
  86to the guest.  This way the guest does not gain anything by escaping into the
  87QEMU process since it already has access to those same resources from within
  88the guest.
  89
  90Following the principle of least privilege immediately fulfills guest isolation
  91requirements.  For example, guest A only has access to its own disk image file
  92``a.img`` and not guest B's disk image file ``b.img``.
  93
  94In reality certain resources are inaccessible to the guest but must be
  95available to QEMU to perform its function.  For example, host system calls are
  96necessary for QEMU but are not exposed to guests.  A guest that escapes into
  97the QEMU process can then begin invoking host system calls.
  98
  99New features must be designed to follow the principle of least privilege.
 100Should this not be possible for technical reasons, the security risk must be
 101clearly documented so users are aware of the trade-off of enabling the feature.
 102
 103Isolation mechanisms
 104''''''''''''''''''''
 105
 106Several isolation mechanisms are available to realize this architecture of
 107guest isolation and the principle of least privilege.  With the exception of
 108Linux seccomp, these mechanisms are all deployed by management tools that
 109launch QEMU, such as libvirt.  They are also platform-specific so they are only
 110described briefly for Linux here.
 111
 112The fundamental isolation mechanism is that QEMU processes must run as
 113unprivileged users.  Sometimes it seems more convenient to launch QEMU as
 114root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
 115huge security risk.  File descriptor passing can be used to give an otherwise
 116unprivileged QEMU process access to host devices without running QEMU as root.
 117It is also possible to launch QEMU as a non-root user and configure UNIX groups
 118for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes.
 119Some Linux distros already ship with UNIX groups for these devices by default.
 120
 121- SELinux and AppArmor make it possible to confine processes beyond the
 122  traditional UNIX process and file permissions model.  They restrict the QEMU
 123  process from accessing processes and files on the host system that are not
 124  needed by QEMU.
 125
 126- Resource limits and cgroup controllers provide throughput and utilization
 127  limits on key resources such as CPU time, memory, and I/O bandwidth.
 128
 129- Linux namespaces can be used to make process, file system, and other system
 130  resources unavailable to QEMU.  A namespaced QEMU process is restricted to only
 131  those resources that were granted to it.
 132
 133- Linux seccomp is available via the QEMU ``--sandbox`` option.  It disables
 134  system calls that are not needed by QEMU, thereby reducing the host kernel
 135  attack surface.
 136
 137Sensitive configurations
 138------------------------
 139
 140There are aspects of QEMU that can have security implications which users &
 141management applications must be aware of.
 142
 143Monitor console (QMP and HMP)
 144'''''''''''''''''''''''''''''
 145
 146The monitor console (whether used with QMP or HMP) provides an interface
 147to dynamically control many aspects of QEMU's runtime operation. Many of the
 148commands exposed will instruct QEMU to access content on the host file system
 149and/or trigger spawning of external processes.
 150
 151For example, the ``migrate`` command allows for the spawning of arbitrary
 152processes for the purpose of tunnelling the migration data stream. The
 153``blockdev-add`` command instructs QEMU to open arbitrary files, exposing
 154their content to the guest as a virtual disk.
 155
 156Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor,
 157or Linux namespaces, the monitor console should be considered to have privileges
 158equivalent to those of the user account QEMU is running under.
 159
 160It is further important to consider the security of the character device backend
 161over which the monitor console is exposed. It needs to have protection against
 162malicious third parties which might try to make unauthorized connections, or
 163perform man-in-the-middle attacks. Many of the character device backends do not
 164satisfy this requirement and so must not be used for the monitor console.
 165
 166The general recommendation is that the monitor console should be exposed over
 167a UNIX domain socket backend to the local host only. Use of the TCP based
 168character device backend is inappropriate unless configured to use both TLS
 169encryption and authorization control policy on client connections.
 170
 171In summary, the monitor console is considered a privileged control interface to
 172QEMU and as such should only be made accessible to a trusted management
 173application or user.
 174