qemu/docs/tools/virtiofsd.rst
<<
>>
Prefs
   1QEMU virtio-fs shared file system daemon
   2========================================
   3
   4Synopsis
   5--------
   6
   7**virtiofsd** [*OPTIONS*]
   8
   9Description
  10-----------
  11
  12Share a host directory tree with a guest through a virtio-fs device.  This
  13program is a vhost-user backend that implements the virtio-fs device.  Each
  14virtio-fs device instance requires its own virtiofsd process.
  15
  16This program is designed to work with QEMU's ``--device vhost-user-fs-pci``
  17but should work with any virtual machine monitor (VMM) that supports
  18vhost-user.  See the Examples section below.
  19
  20This program must be run as the root user.  The program drops privileges where
  21possible during startup although it must be able to create and access files
  22with any uid/gid:
  23
  24* The ability to invoke syscalls is limited using seccomp(2).
  25* Linux capabilities(7) are dropped.
  26
  27In "namespace" sandbox mode the program switches into a new file system
  28namespace and invokes pivot_root(2) to make the shared directory tree its root.
  29A new pid and net namespace is also created to isolate the process.
  30
  31In "chroot" sandbox mode the program invokes chroot(2) to make the shared
  32directory tree its root. This mode is intended for container environments where
  33the container runtime has already set up the namespaces and the program does
  34not have permission to create namespaces itself.
  35
  36Both sandbox modes prevent "file system escapes" due to symlinks and other file
  37system objects that might lead to files outside the shared directory.
  38
  39Options
  40-------
  41
  42.. program:: virtiofsd
  43
  44.. option:: -h, --help
  45
  46  Print help.
  47
  48.. option:: -V, --version
  49
  50  Print version.
  51
  52.. option:: -d
  53
  54  Enable debug output.
  55
  56.. option:: --syslog
  57
  58  Print log messages to syslog instead of stderr.
  59
  60.. option:: -o OPTION
  61
  62  * debug -
  63    Enable debug output.
  64
  65  * flock|no_flock -
  66    Enable/disable flock.  The default is ``no_flock``.
  67
  68  * modcaps=CAPLIST
  69    Modify the list of capabilities allowed; CAPLIST is a colon separated
  70    list of capabilities, each preceded by either + or -, e.g.
  71    ''+sys_admin:-chown''.
  72
  73  * log_level=LEVEL -
  74    Print only log messages matching LEVEL or more severe.  LEVEL is one of
  75    ``err``, ``warn``, ``info``, or ``debug``.  The default is ``info``.
  76
  77  * posix_lock|no_posix_lock -
  78    Enable/disable remote POSIX locks.  The default is ``no_posix_lock``.
  79
  80  * readdirplus|no_readdirplus -
  81    Enable/disable readdirplus.  The default is ``readdirplus``.
  82
  83  * sandbox=namespace|chroot -
  84    Sandbox mode:
  85    - namespace: Create mount, pid, and net namespaces and pivot_root(2) into
  86    the shared directory.
  87    - chroot: chroot(2) into shared directory (use in containers).
  88    The default is "namespace".
  89
  90  * source=PATH -
  91    Share host directory tree located at PATH.  This option is required.
  92
  93  * timeout=TIMEOUT -
  94    I/O timeout in seconds.  The default depends on cache= option.
  95
  96  * writeback|no_writeback -
  97    Enable/disable writeback cache. The cache allows the FUSE client to buffer
  98    and merge write requests.  The default is ``no_writeback``.
  99
 100  * xattr|no_xattr -
 101    Enable/disable extended attributes (xattr) on files and directories.  The
 102    default is ``no_xattr``.
 103
 104  * posix_acl|no_posix_acl -
 105    Enable/disable posix acl support.  Posix ACLs are disabled by default.
 106
 107  * security_label|no_security_label -
 108    Enable/disable security label support. Security labels are disabled by
 109    default. This will allow client to send a MAC label of file during
 110    file creation. Typically this is expected to be SELinux security
 111    label. Server will try to set that label on newly created file
 112    atomically wherever possible.
 113
 114.. option:: --socket-path=PATH
 115
 116  Listen on vhost-user UNIX domain socket at PATH.
 117
 118.. option:: --socket-group=GROUP
 119
 120  Set the vhost-user UNIX domain socket gid to GROUP.
 121
 122.. option:: --fd=FDNUM
 123
 124  Accept connections from vhost-user UNIX domain socket file descriptor FDNUM.
 125  The file descriptor must already be listening for connections.
 126
 127.. option:: --thread-pool-size=NUM
 128
 129  Restrict the number of worker threads per request queue to NUM.  The default
 130  is 64.
 131
 132.. option:: --cache=none|auto|always
 133
 134  Select the desired trade-off between coherency and performance.  ``none``
 135  forbids the FUSE client from caching to achieve best coherency at the cost of
 136  performance.  ``auto`` acts similar to NFS with a 1 second metadata cache
 137  timeout.  ``always`` sets a long cache lifetime at the expense of coherency.
 138  The default is ``auto``.
 139
 140Extended attribute (xattr) mapping
 141----------------------------------
 142
 143By default the name of xattr's used by the client are passed through to the server
 144file system.  This can be a problem where either those xattr names are used
 145by something on the server (e.g. selinux client/server confusion) or if the
 146``virtiofsd`` is running in a container with restricted privileges where it
 147cannot access some attributes.
 148
 149Mapping syntax
 150~~~~~~~~~~~~~~
 151
 152A mapping of xattr names can be made using -o xattrmap=mapping where the ``mapping``
 153string consists of a series of rules.
 154
 155The first matching rule terminates the mapping.
 156The set of rules must include a terminating rule to match any remaining attributes
 157at the end.
 158
 159Each rule consists of a number of fields separated with a separator that is the
 160first non-white space character in the rule.  This separator must then be used
 161for the whole rule.
 162White space may be added before and after each rule.
 163
 164Using ':' as the separator a rule is of the form:
 165
 166``:type:scope:key:prepend:``
 167
 168**scope** is:
 169
 170- 'client' - match 'key' against a xattr name from the client for
 171             setxattr/getxattr/removexattr
 172- 'server' - match 'prepend' against a xattr name from the server
 173             for listxattr
 174- 'all' - can be used to make a single rule where both the server
 175          and client matches are triggered.
 176
 177**type** is one of:
 178
 179- 'prefix' - is designed to prepend and strip a prefix;  the modified
 180  attributes then being passed on to the client/server.
 181
 182- 'ok' - Causes the rule set to be terminated when a match is found
 183  while allowing matching xattr's through unchanged.
 184  It is intended both as a way of explicitly terminating
 185  the list of rules, and to allow some xattr's to skip following rules.
 186
 187- 'bad' - If a client tries to use a name matching 'key' it's
 188  denied using EPERM; when the server passes an attribute
 189  name matching 'prepend' it's hidden.  In many ways it's use is very like
 190  'ok' as either an explicit terminator or for special handling of certain
 191  patterns.
 192
 193- 'unsupported' - If a client tries to use a name matching 'key' it's
 194  denied using ENOTSUP; when the server passes an attribute
 195  name matching 'prepend' it's hidden.  In many ways it's use is very like
 196  'ok' as either an explicit terminator or for special handling of certain
 197  patterns.
 198
 199**key** is a string tested as a prefix on an attribute name originating
 200on the client.  It maybe empty in which case a 'client' rule
 201will always match on client names.
 202
 203**prepend** is a string tested as a prefix on an attribute name originating
 204on the server, and used as a new prefix.  It may be empty
 205in which case a 'server' rule will always match on all names from
 206the server.
 207
 208e.g.:
 209
 210  ``:prefix:client:trusted.:user.virtiofs.:``
 211
 212  will match 'trusted.' attributes in client calls and prefix them before
 213  passing them to the server.
 214
 215  ``:prefix:server::user.virtiofs.:``
 216
 217  will strip 'user.virtiofs.' from all server replies.
 218
 219  ``:prefix:all:trusted.:user.virtiofs.:``
 220
 221  combines the previous two cases into a single rule.
 222
 223  ``:ok:client:user.::``
 224
 225  will allow get/set xattr for 'user.' xattr's and ignore
 226  following rules.
 227
 228  ``:ok:server::security.:``
 229
 230  will pass 'securty.' xattr's in listxattr from the server
 231  and ignore following rules.
 232
 233  ``:ok:all:::``
 234
 235  will terminate the rule search passing any remaining attributes
 236  in both directions.
 237
 238  ``:bad:server::security.:``
 239
 240  would hide 'security.' xattr's in listxattr from the server.
 241
 242A simpler 'map' type provides a shorter syntax for the common case:
 243
 244``:map:key:prepend:``
 245
 246The 'map' type adds a number of separate rules to add **prepend** as a prefix
 247to the matched **key** (or all attributes if **key** is empty).
 248There may be at most one 'map' rule and it must be the last rule in the set.
 249
 250Note: When the 'security.capability' xattr is remapped, the daemon has to do
 251extra work to remove it during many operations, which the host kernel normally
 252does itself.
 253
 254Security considerations
 255~~~~~~~~~~~~~~~~~~~~~~~
 256
 257Operating systems typically partition the xattr namespace using
 258well defined name prefixes. Each partition may have different
 259access controls applied. For example, on Linux there are multiple
 260partitions
 261
 262 * ``system.*`` - access varies depending on attribute & filesystem
 263 * ``security.*`` - only processes with CAP_SYS_ADMIN
 264 * ``trusted.*`` - only processes with CAP_SYS_ADMIN
 265 * ``user.*`` - any process granted by file permissions / ownership
 266
 267While other OS such as FreeBSD have different name prefixes
 268and access control rules.
 269
 270When remapping attributes on the host, it is important to
 271ensure that the remapping does not allow a guest user to
 272evade the guest access control rules.
 273
 274Consider if ``trusted.*`` from the guest was remapped to
 275``user.virtiofs.trusted*`` in the host. An unprivileged
 276user in a Linux guest has the ability to write to xattrs
 277under ``user.*``. Thus the user can evade the access
 278control restriction on ``trusted.*`` by instead writing
 279to ``user.virtiofs.trusted.*``.
 280
 281As noted above, the partitions used and access controls
 282applied, will vary across guest OS, so it is not wise to
 283try to predict what the guest OS will use.
 284
 285The simplest way to avoid an insecure configuration is
 286to remap all xattrs at once, to a given fixed prefix.
 287This is shown in example (1) below.
 288
 289If selectively mapping only a subset of xattr prefixes,
 290then rules must be added to explicitly block direct
 291access to the target of the remapping. This is shown
 292in example (2) below.
 293
 294Mapping examples
 295~~~~~~~~~~~~~~~~
 296
 2971) Prefix all attributes with 'user.virtiofs.'
 298
 299::
 300
 301 -o xattrmap=":prefix:all::user.virtiofs.::bad:all:::"
 302
 303
 304This uses two rules, using : as the field separator;
 305the first rule prefixes and strips 'user.virtiofs.',
 306the second rule hides any non-prefixed attributes that
 307the host set.
 308
 309This is equivalent to the 'map' rule:
 310
 311::
 312
 313 -o xattrmap=":map::user.virtiofs.:"
 314
 3152) Prefix 'trusted.' attributes, allow others through
 316
 317::
 318
 319   "/prefix/all/trusted./user.virtiofs./
 320    /bad/server//trusted./
 321    /bad/client/user.virtiofs.//
 322    /ok/all///"
 323
 324
 325Here there are four rules, using / as the field
 326separator, and also demonstrating that new lines can
 327be included between rules.
 328The first rule is the prefixing of 'trusted.' and
 329stripping of 'user.virtiofs.'.
 330The second rule hides unprefixed 'trusted.' attributes
 331on the host.
 332The third rule stops a guest from explicitly setting
 333the 'user.virtiofs.' path directly to prevent access
 334control bypass on the target of the earlier prefix
 335remapping.
 336Finally, the fourth rule lets all remaining attributes
 337through.
 338
 339This is equivalent to the 'map' rule:
 340
 341::
 342
 343 -o xattrmap="/map/trusted./user.virtiofs./"
 344
 3453) Hide 'security.' attributes, and allow everything else
 346
 347::
 348
 349    "/bad/all/security./security./
 350     /ok/all///'
 351
 352The first rule combines what could be separate client and server
 353rules into a single 'all' rule, matching 'security.' in either
 354client arguments or lists returned from the host.  This stops
 355the client seeing any 'security.' attributes on the server and
 356stops it setting any.
 357
 358SELinux support
 359---------------
 360One can enable support for SELinux by running virtiofsd with option
 361"-o security_label". But this will try to save guest's security context
 362in xattr security.selinux on host and it might fail if host's SELinux
 363policy does not permit virtiofsd to do this operation.
 364
 365Hence, it is preferred to remap guest's "security.selinux" xattr to say
 366"trusted.virtiofs.security.selinux" on host.
 367
 368"-o xattrmap=:map:security.selinux:trusted.virtiofs.:"
 369
 370This will make sure that guest and host's SELinux xattrs on same file
 371remain separate and not interfere with each other. And will allow both
 372host and guest to implement their own separate SELinux policies.
 373
 374Setting trusted xattr on host requires CAP_SYS_ADMIN. So one will need
 375add this capability to daemon.
 376
 377"-o modcaps=+sys_admin"
 378
 379Giving CAP_SYS_ADMIN increases the risk on system. Now virtiofsd is more
 380powerful and if gets compromised, it can do lot of damage to host system.
 381So keep this trade-off in my mind while making a decision.
 382
 383Examples
 384--------
 385
 386Export ``/var/lib/fs/vm001/`` on vhost-user UNIX domain socket
 387``/var/run/vm001-vhost-fs.sock``:
 388
 389.. parsed-literal::
 390
 391  host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001
 392  host# |qemu_system| \\
 393        -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \\
 394        -device vhost-user-fs-pci,chardev=char0,tag=myfs \\
 395        -object memory-backend-memfd,id=mem,size=4G,share=on \\
 396        -numa node,memdev=mem \\
 397        ...
 398  guest# mount -t virtiofs myfs /mnt
 399