1QEMU virtio-fs shared file system daemon 2======================================== 3 4Synopsis 5-------- 6 7**virtiofsd** [*OPTIONS*] 8 9Description 10----------- 11 12Share a host directory tree with a guest through a virtio-fs device. This 13program is a vhost-user backend that implements the virtio-fs device. Each 14virtio-fs device instance requires its own virtiofsd process. 15 16This program is designed to work with QEMU's ``--device vhost-user-fs-pci`` 17but should work with any virtual machine monitor (VMM) that supports 18vhost-user. See the Examples section below. 19 20This program must be run as the root user. The program drops privileges where 21possible during startup although it must be able to create and access files 22with any uid/gid: 23 24* The ability to invoke syscalls is limited using seccomp(2). 25* Linux capabilities(7) are dropped. 26 27In "namespace" sandbox mode the program switches into a new file system 28namespace and invokes pivot_root(2) to make the shared directory tree its root. 29A new pid and net namespace is also created to isolate the process. 30 31In "chroot" sandbox mode the program invokes chroot(2) to make the shared 32directory tree its root. This mode is intended for container environments where 33the container runtime has already set up the namespaces and the program does 34not have permission to create namespaces itself. 35 36Both sandbox modes prevent "file system escapes" due to symlinks and other file 37system objects that might lead to files outside the shared directory. 38 39Options 40------- 41 42.. program:: virtiofsd 43 44.. option:: -h, --help 45 46 Print help. 47 48.. option:: -V, --version 49 50 Print version. 51 52.. option:: -d 53 54 Enable debug output. 55 56.. option:: --syslog 57 58 Print log messages to syslog instead of stderr. 59 60.. option:: -o OPTION 61 62 * debug - 63 Enable debug output. 64 65 * flock|no_flock - 66 Enable/disable flock. The default is ``no_flock``. 67 68 * modcaps=CAPLIST 69 Modify the list of capabilities allowed; CAPLIST is a colon separated 70 list of capabilities, each preceded by either + or -, e.g. 71 ''+sys_admin:-chown''. 72 73 * log_level=LEVEL - 74 Print only log messages matching LEVEL or more severe. LEVEL is one of 75 ``err``, ``warn``, ``info``, or ``debug``. The default is ``info``. 76 77 * posix_lock|no_posix_lock - 78 Enable/disable remote POSIX locks. The default is ``no_posix_lock``. 79 80 * readdirplus|no_readdirplus - 81 Enable/disable readdirplus. The default is ``readdirplus``. 82 83 * sandbox=namespace|chroot - 84 Sandbox mode: 85 - namespace: Create mount, pid, and net namespaces and pivot_root(2) into 86 the shared directory. 87 - chroot: chroot(2) into shared directory (use in containers). 88 The default is "namespace". 89 90 * source=PATH - 91 Share host directory tree located at PATH. This option is required. 92 93 * timeout=TIMEOUT - 94 I/O timeout in seconds. The default depends on cache= option. 95 96 * writeback|no_writeback - 97 Enable/disable writeback cache. The cache allows the FUSE client to buffer 98 and merge write requests. The default is ``no_writeback``. 99 100 * xattr|no_xattr - 101 Enable/disable extended attributes (xattr) on files and directories. The 102 default is ``no_xattr``. 103 104 * posix_acl|no_posix_acl - 105 Enable/disable posix acl support. Posix ACLs are disabled by default. 106 107 * security_label|no_security_label - 108 Enable/disable security label support. Security labels are disabled by 109 default. This will allow client to send a MAC label of file during 110 file creation. Typically this is expected to be SELinux security 111 label. Server will try to set that label on newly created file 112 atomically wherever possible. 113 114.. option:: --socket-path=PATH 115 116 Listen on vhost-user UNIX domain socket at PATH. 117 118.. option:: --socket-group=GROUP 119 120 Set the vhost-user UNIX domain socket gid to GROUP. 121 122.. option:: --fd=FDNUM 123 124 Accept connections from vhost-user UNIX domain socket file descriptor FDNUM. 125 The file descriptor must already be listening for connections. 126 127.. option:: --thread-pool-size=NUM 128 129 Restrict the number of worker threads per request queue to NUM. The default 130 is 64. 131 132.. option:: --cache=none|auto|always 133 134 Select the desired trade-off between coherency and performance. ``none`` 135 forbids the FUSE client from caching to achieve best coherency at the cost of 136 performance. ``auto`` acts similar to NFS with a 1 second metadata cache 137 timeout. ``always`` sets a long cache lifetime at the expense of coherency. 138 The default is ``auto``. 139 140Extended attribute (xattr) mapping 141---------------------------------- 142 143By default the name of xattr's used by the client are passed through to the server 144file system. This can be a problem where either those xattr names are used 145by something on the server (e.g. selinux client/server confusion) or if the 146``virtiofsd`` is running in a container with restricted privileges where it 147cannot access some attributes. 148 149Mapping syntax 150~~~~~~~~~~~~~~ 151 152A mapping of xattr names can be made using -o xattrmap=mapping where the ``mapping`` 153string consists of a series of rules. 154 155The first matching rule terminates the mapping. 156The set of rules must include a terminating rule to match any remaining attributes 157at the end. 158 159Each rule consists of a number of fields separated with a separator that is the 160first non-white space character in the rule. This separator must then be used 161for the whole rule. 162White space may be added before and after each rule. 163 164Using ':' as the separator a rule is of the form: 165 166``:type:scope:key:prepend:`` 167 168**scope** is: 169 170- 'client' - match 'key' against a xattr name from the client for 171 setxattr/getxattr/removexattr 172- 'server' - match 'prepend' against a xattr name from the server 173 for listxattr 174- 'all' - can be used to make a single rule where both the server 175 and client matches are triggered. 176 177**type** is one of: 178 179- 'prefix' - is designed to prepend and strip a prefix; the modified 180 attributes then being passed on to the client/server. 181 182- 'ok' - Causes the rule set to be terminated when a match is found 183 while allowing matching xattr's through unchanged. 184 It is intended both as a way of explicitly terminating 185 the list of rules, and to allow some xattr's to skip following rules. 186 187- 'bad' - If a client tries to use a name matching 'key' it's 188 denied using EPERM; when the server passes an attribute 189 name matching 'prepend' it's hidden. In many ways it's use is very like 190 'ok' as either an explicit terminator or for special handling of certain 191 patterns. 192 193- 'unsupported' - If a client tries to use a name matching 'key' it's 194 denied using ENOTSUP; when the server passes an attribute 195 name matching 'prepend' it's hidden. In many ways it's use is very like 196 'ok' as either an explicit terminator or for special handling of certain 197 patterns. 198 199**key** is a string tested as a prefix on an attribute name originating 200on the client. It maybe empty in which case a 'client' rule 201will always match on client names. 202 203**prepend** is a string tested as a prefix on an attribute name originating 204on the server, and used as a new prefix. It may be empty 205in which case a 'server' rule will always match on all names from 206the server. 207 208e.g.: 209 210 ``:prefix:client:trusted.:user.virtiofs.:`` 211 212 will match 'trusted.' attributes in client calls and prefix them before 213 passing them to the server. 214 215 ``:prefix:server::user.virtiofs.:`` 216 217 will strip 'user.virtiofs.' from all server replies. 218 219 ``:prefix:all:trusted.:user.virtiofs.:`` 220 221 combines the previous two cases into a single rule. 222 223 ``:ok:client:user.::`` 224 225 will allow get/set xattr for 'user.' xattr's and ignore 226 following rules. 227 228 ``:ok:server::security.:`` 229 230 will pass 'securty.' xattr's in listxattr from the server 231 and ignore following rules. 232 233 ``:ok:all:::`` 234 235 will terminate the rule search passing any remaining attributes 236 in both directions. 237 238 ``:bad:server::security.:`` 239 240 would hide 'security.' xattr's in listxattr from the server. 241 242A simpler 'map' type provides a shorter syntax for the common case: 243 244``:map:key:prepend:`` 245 246The 'map' type adds a number of separate rules to add **prepend** as a prefix 247to the matched **key** (or all attributes if **key** is empty). 248There may be at most one 'map' rule and it must be the last rule in the set. 249 250Note: When the 'security.capability' xattr is remapped, the daemon has to do 251extra work to remove it during many operations, which the host kernel normally 252does itself. 253 254Security considerations 255~~~~~~~~~~~~~~~~~~~~~~~ 256 257Operating systems typically partition the xattr namespace using 258well defined name prefixes. Each partition may have different 259access controls applied. For example, on Linux there are multiple 260partitions 261 262 * ``system.*`` - access varies depending on attribute & filesystem 263 * ``security.*`` - only processes with CAP_SYS_ADMIN 264 * ``trusted.*`` - only processes with CAP_SYS_ADMIN 265 * ``user.*`` - any process granted by file permissions / ownership 266 267While other OS such as FreeBSD have different name prefixes 268and access control rules. 269 270When remapping attributes on the host, it is important to 271ensure that the remapping does not allow a guest user to 272evade the guest access control rules. 273 274Consider if ``trusted.*`` from the guest was remapped to 275``user.virtiofs.trusted*`` in the host. An unprivileged 276user in a Linux guest has the ability to write to xattrs 277under ``user.*``. Thus the user can evade the access 278control restriction on ``trusted.*`` by instead writing 279to ``user.virtiofs.trusted.*``. 280 281As noted above, the partitions used and access controls 282applied, will vary across guest OS, so it is not wise to 283try to predict what the guest OS will use. 284 285The simplest way to avoid an insecure configuration is 286to remap all xattrs at once, to a given fixed prefix. 287This is shown in example (1) below. 288 289If selectively mapping only a subset of xattr prefixes, 290then rules must be added to explicitly block direct 291access to the target of the remapping. This is shown 292in example (2) below. 293 294Mapping examples 295~~~~~~~~~~~~~~~~ 296 2971) Prefix all attributes with 'user.virtiofs.' 298 299:: 300 301 -o xattrmap=":prefix:all::user.virtiofs.::bad:all:::" 302 303 304This uses two rules, using : as the field separator; 305the first rule prefixes and strips 'user.virtiofs.', 306the second rule hides any non-prefixed attributes that 307the host set. 308 309This is equivalent to the 'map' rule: 310 311:: 312 313 -o xattrmap=":map::user.virtiofs.:" 314 3152) Prefix 'trusted.' attributes, allow others through 316 317:: 318 319 "/prefix/all/trusted./user.virtiofs./ 320 /bad/server//trusted./ 321 /bad/client/user.virtiofs.// 322 /ok/all///" 323 324 325Here there are four rules, using / as the field 326separator, and also demonstrating that new lines can 327be included between rules. 328The first rule is the prefixing of 'trusted.' and 329stripping of 'user.virtiofs.'. 330The second rule hides unprefixed 'trusted.' attributes 331on the host. 332The third rule stops a guest from explicitly setting 333the 'user.virtiofs.' path directly to prevent access 334control bypass on the target of the earlier prefix 335remapping. 336Finally, the fourth rule lets all remaining attributes 337through. 338 339This is equivalent to the 'map' rule: 340 341:: 342 343 -o xattrmap="/map/trusted./user.virtiofs./" 344 3453) Hide 'security.' attributes, and allow everything else 346 347:: 348 349 "/bad/all/security./security./ 350 /ok/all///' 351 352The first rule combines what could be separate client and server 353rules into a single 'all' rule, matching 'security.' in either 354client arguments or lists returned from the host. This stops 355the client seeing any 'security.' attributes on the server and 356stops it setting any. 357 358SELinux support 359--------------- 360One can enable support for SELinux by running virtiofsd with option 361"-o security_label". But this will try to save guest's security context 362in xattr security.selinux on host and it might fail if host's SELinux 363policy does not permit virtiofsd to do this operation. 364 365Hence, it is preferred to remap guest's "security.selinux" xattr to say 366"trusted.virtiofs.security.selinux" on host. 367 368"-o xattrmap=:map:security.selinux:trusted.virtiofs.:" 369 370This will make sure that guest and host's SELinux xattrs on same file 371remain separate and not interfere with each other. And will allow both 372host and guest to implement their own separate SELinux policies. 373 374Setting trusted xattr on host requires CAP_SYS_ADMIN. So one will need 375add this capability to daemon. 376 377"-o modcaps=+sys_admin" 378 379Giving CAP_SYS_ADMIN increases the risk on system. Now virtiofsd is more 380powerful and if gets compromised, it can do lot of damage to host system. 381So keep this trade-off in my mind while making a decision. 382 383Examples 384-------- 385 386Export ``/var/lib/fs/vm001/`` on vhost-user UNIX domain socket 387``/var/run/vm001-vhost-fs.sock``: 388 389.. parsed-literal:: 390 391 host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001 392 host# |qemu_system| \\ 393 -chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \\ 394 -device vhost-user-fs-pci,chardev=char0,tag=myfs \\ 395 -object memory-backend-memfd,id=mem,size=4G,share=on \\ 396 -numa node,memdev=mem \\ 397 ... 398 guest# mount -t virtiofs myfs /mnt 399