qemu/docs/security.texi
<<
>>
Prefs
   1@node Security
   2@chapter Security
   3
   4@section Overview
   5
   6This chapter explains the security requirements that QEMU is designed to meet
   7and principles for securely deploying QEMU.
   8
   9@section Security Requirements
  10
  11QEMU supports many different use cases, some of which have stricter security
  12requirements than others.  The community has agreed on the overall security
  13requirements that users may depend on.  These requirements define what is
  14considered supported from a security perspective.
  15
  16@subsection Virtualization Use Case
  17
  18The virtualization use case covers cloud and virtual private server (VPS)
  19hosting, as well as traditional data center and desktop virtualization.  These
  20use cases rely on hardware virtualization extensions to execute guest code
  21safely on the physical CPU at close-to-native speed.
  22
  23The following entities are untrusted, meaning that they may be buggy or
  24malicious:
  25
  26@itemize
  27@item Guest
  28@item User-facing interfaces (e.g. VNC, SPICE, WebSocket)
  29@item Network protocols (e.g. NBD, live migration)
  30@item User-supplied files (e.g. disk images, kernels, device trees)
  31@item Passthrough devices (e.g. PCI, USB)
  32@end itemize
  33
  34Bugs affecting these entities are evaluated on whether they can cause damage in
  35real-world use cases and treated as security bugs if this is the case.
  36
  37@subsection Non-virtualization Use Case
  38
  39The non-virtualization use case covers emulation using the Tiny Code Generator
  40(TCG).  In principle the TCG and device emulation code used in conjunction with
  41the non-virtualization use case should meet the same security requirements as
  42the virtualization use case.  However, for historical reasons much of the
  43non-virtualization use case code was not written with these security
  44requirements in mind.
  45
  46Bugs affecting the non-virtualization use case are not considered security
  47bugs at this time.  Users with non-virtualization use cases must not rely on
  48QEMU to provide guest isolation or any security guarantees.
  49
  50@section Architecture
  51
  52This section describes the design principles that ensure the security
  53requirements are met.
  54
  55@subsection Guest Isolation
  56
  57Guest isolation is the confinement of guest code to the virtual machine.  When
  58guest code gains control of execution on the host this is called escaping the
  59virtual machine.  Isolation also includes resource limits such as throttling of
  60CPU, memory, disk, or network.  Guests must be unable to exceed their resource
  61limits.
  62
  63QEMU presents an attack surface to the guest in the form of emulated devices.
  64The guest must not be able to gain control of QEMU.  Bugs in emulated devices
  65could allow malicious guests to gain code execution in QEMU.  At this point the
  66guest has escaped the virtual machine and is able to act in the context of the
  67QEMU process on the host.
  68
  69Guests often interact with other guests and share resources with them.  A
  70malicious guest must not gain control of other guests or access their data.
  71Disk image files and network traffic must be protected from other guests unless
  72explicitly shared between them by the user.
  73
  74@subsection Principle of Least Privilege
  75
  76The principle of least privilege states that each component only has access to
  77the privileges necessary for its function.  In the case of QEMU this means that
  78each process only has access to resources belonging to the guest.
  79
  80The QEMU process should not have access to any resources that are inaccessible
  81to the guest.  This way the guest does not gain anything by escaping into the
  82QEMU process since it already has access to those same resources from within
  83the guest.
  84
  85Following the principle of least privilege immediately fulfills guest isolation
  86requirements.  For example, guest A only has access to its own disk image file
  87@code{a.img} and not guest B's disk image file @code{b.img}.
  88
  89In reality certain resources are inaccessible to the guest but must be
  90available to QEMU to perform its function.  For example, host system calls are
  91necessary for QEMU but are not exposed to guests.  A guest that escapes into
  92the QEMU process can then begin invoking host system calls.
  93
  94New features must be designed to follow the principle of least privilege.
  95Should this not be possible for technical reasons, the security risk must be
  96clearly documented so users are aware of the trade-off of enabling the feature.
  97
  98@subsection Isolation mechanisms
  99
 100Several isolation mechanisms are available to realize this architecture of
 101guest isolation and the principle of least privilege.  With the exception of
 102Linux seccomp, these mechanisms are all deployed by management tools that
 103launch QEMU, such as libvirt.  They are also platform-specific so they are only
 104described briefly for Linux here.
 105
 106The fundamental isolation mechanism is that QEMU processes must run as
 107unprivileged users.  Sometimes it seems more convenient to launch QEMU as
 108root to give it access to host devices (e.g. @code{/dev/net/tun}) but this poses a
 109huge security risk.  File descriptor passing can be used to give an otherwise
 110unprivileged QEMU process access to host devices without running QEMU as root.
 111It is also possible to launch QEMU as a non-root user and configure UNIX groups
 112for access to @code{/dev/kvm}, @code{/dev/net/tun}, and other device nodes.
 113Some Linux distros already ship with UNIX groups for these devices by default.
 114
 115@itemize
 116@item SELinux and AppArmor make it possible to confine processes beyond the
 117traditional UNIX process and file permissions model.  They restrict the QEMU
 118process from accessing processes and files on the host system that are not
 119needed by QEMU.
 120
 121@item Resource limits and cgroup controllers provide throughput and utilization
 122limits on key resources such as CPU time, memory, and I/O bandwidth.
 123
 124@item Linux namespaces can be used to make process, file system, and other system
 125resources unavailable to QEMU.  A namespaced QEMU process is restricted to only
 126those resources that were granted to it.
 127
 128@item Linux seccomp is available via the QEMU @option{--sandbox} option.  It disables
 129system calls that are not needed by QEMU, thereby reducing the host kernel
 130attack surface.
 131@end itemize
 132
 133@section Sensitive configurations
 134
 135There are aspects of QEMU that can have security implications which users &
 136management applications must be aware of.
 137
 138@subsection Monitor console (QMP and HMP)
 139
 140The monitor console (whether used with QMP or HMP) provides an interface
 141to dynamically control many aspects of QEMU's runtime operation. Many of the
 142commands exposed will instruct QEMU to access content on the host file system
 143and/or trigger spawning of external processes.
 144
 145For example, the @code{migrate} command allows for the spawning of arbitrary
 146processes for the purpose of tunnelling the migration data stream. The
 147@code{blockdev-add} command instructs QEMU to open arbitrary files, exposing
 148their content to the guest as a virtual disk.
 149
 150Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor,
 151or Linux namespaces, the monitor console should be considered to have privileges
 152equivalent to those of the user account QEMU is running under.
 153
 154It is further important to consider the security of the character device backend
 155over which the monitor console is exposed. It needs to have protection against
 156malicious third parties which might try to make unauthorized connections, or
 157perform man-in-the-middle attacks. Many of the character device backends do not
 158satisfy this requirement and so must not be used for the monitor console.
 159
 160The general recommendation is that the monitor console should be exposed over
 161a UNIX domain socket backend to the local host only. Use of the TCP based
 162character device backend is inappropriate unless configured to use both TLS
 163encryption and authorization control policy on client connections.
 164
 165In summary, the monitor console is considered a privileged control interface to
 166QEMU and as such should only be made accessible to a trusted management
 167application or user.
 168