LXR linux/Documentation/filesystems/autofs4.txt

   1<head>
   2<style> p { max-width:50em} ol, ul {max-width: 40em}</style>
   3</head>
   4
   5autofs - how it works
   6=====================
   7
   8Purpose
   9-------
  10
  11The goal of autofs is to provide on-demand mounting and race free
  12automatic unmounting of various other filesystems.  This provides two
  13key advantages:
  14
  151. There is no need to delay boot until all filesystems that
  16   might be needed are mounted.  Processes that try to access those
  17   slow filesystems might be delayed but other processes can
  18   continue freely.  This is particularly important for
  19   network filesystems (e.g. NFS) or filesystems stored on
  20   media with a media-changing robot.
  21
  222. The names and locations of filesystems can be stored in
  23   a remote database and can change at any time.  The content
  24   in that data base at the time of access will be used to provide
  25   a target for the access.  The interpretation of names in the
  26   filesystem can even be programmatic rather than database-backed,
  27   allowing wildcards for example, and can vary based on the user who
  28   first accessed a name.
  29
  30Context
  31-------
  32
  33The "autofs4" filesystem module is only one part of an autofs system.
  34There also needs to be a user-space program which looks up names
  35and mounts filesystems.  This will often be the "automount" program,
  36though other tools including "systemd" can make use of "autofs4".
  37This document describes only the kernel module and the interactions
  38required with any user-space program.  Subsequent text refers to this
  39as the "automount daemon" or simply "the daemon".
  40
  41"autofs4" is a Linux kernel module with provides the "autofs"
  42filesystem type.  Several "autofs" filesystems can be mounted and they
  43can each be managed separately, or all managed by the same daemon.
  44
  45Content
  46-------
  47
  48An autofs filesystem can contain 3 sorts of objects: directories,
  49symbolic links and mount traps.  Mount traps are directories with
  50extra properties as described in the next section.
  51
  52Objects can only be created by the automount daemon: symlinks are
  53created with a regular `symlink` system call, while directories and
  54mount traps are created with `mkdir`.  The determination of whether a
  55directory should be a mount trap or not is quite _ad hoc_, largely for
  56historical reasons, and is determined in part by the
  57*direct*/*indirect*/*offset* mount options, and the *maxproto* mount option.
  58
  59If neither the *direct* or *offset* mount options are given (so the
  60mount is considered to be *indirect*), then the root directory is
  61always a regular directory, otherwise it is a mount trap when it is
  62empty and a regular directory when not empty.  Note that *direct* and
  63*offset* are treated identically so a concise summary is that the root
  64directory is a mount trap only if the filesystem is mounted *direct*
  65and the root is empty.
  66
  67Directories created in the root directory are mount traps only if the
  68filesystem is mounted  *indirect* and they are empty.
  69
  70Directories further down the tree depend on the *maxproto* mount
  71option and particularly whether it is less than five or not.
  72When *maxproto* is five, no directories further down the
  73tree are ever mount traps, they are always regular directories.  When
  74the *maxproto* is four (or three), these directories are mount traps
  75precisely when they are empty.
  76
  77So: non-empty (i.e. non-leaf) directories are never mount traps. Empty
  78directories are sometimes mount traps, and sometimes not depending on
  79where in the tree they are (root, top level, or lower), the *maxproto*,
  80and whether the mount was *indirect* or not.
  81
  82Mount Traps
  83---------------
  84
  85A core element of the implementation of autofs is the Mount Traps
  86which are provided by the Linux VFS.  Any directory provided by a
  87filesystem can be designated as a trap.  This involves two separate
  88features that work together to allow autofs to do its job.
  89
  90**DCACHE_NEED_AUTOMOUNT**
  91
  92If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if
  93the inode has S_AUTOMOUNT set, or can be set directly) then it is
  94(potentially) a mount trap.  Any access to this directory beyond a
  95"`stat`" will (normally) cause the `d_op->d_automount()` dentry operation
  96to be called. The task of this method is to find the filesystem that
  97should be mounted on the directory and to return it.  The VFS is
  98responsible for actually mounting the root of this filesystem on the
  99directory.
 100
 101autofs doesn't find the filesystem itself but sends a message to the
 102automount daemon asking it to find and mount the filesystem.  The
 103autofs `d_automount` method then waits for the daemon to report that
 104everything is ready.  It will then return "`NULL`" indicating that the
 105mount has already happened.  The VFS doesn't try to mount anything but
 106follows down the mount that is already there.
 107
 108This functionality is sufficient for some users of mount traps such
 109as NFS which creates traps so that mountpoints on the server can be
 110reflected on the client.  However it is not sufficient for autofs.  As
 111mounting onto a directory is considered to be "beyond a `stat`", the
 112automount daemon would not be able to mount a filesystem on the 'trap'
 113directory without some way to avoid getting caught in the trap.  For
 114that purpose there is another flag.
 115
 116**DCACHE_MANAGE_TRANSIT**
 117
 118If a dentry has DCACHE_MANAGE_TRANSIT set then two very different but
 119related behaviors are invoked, both using the `d_op->d_manage()`
 120dentry operation.
 121
 122Firstly, before checking to see if any filesystem is mounted on the
 123directory, d_manage() will be called with the `rcu_walk` parameter set
 124to `false`.  It may return one of three things:
 125
 126-  A return value of zero indicates that there is nothing special
 127   about this dentry and normal checks for mounts and automounts
 128   should proceed.
 129
 130   autofs normally returns zero, but first waits for any
 131   expiry (automatic unmounting of the mounted filesystem) to
 132   complete.  This avoids races.
 133
 134-  A return value of `-EISDIR` tells the VFS to ignore any mounts
 135   on the directory and to not consider calling `->d_automount()`.
 136   This effectively disables the **DCACHE_NEED_AUTOMOUNT** flag
 137   causing the directory not be a mount trap after all.
 138
 139   autofs returns this if it detects that the process performing the
 140   lookup is the automount daemon and that the mount has been
 141   requested but has not yet completed.  How it determines this is
 142   discussed later.  This allows the automount daemon not to get
 143   caught in the mount trap.
 144
 145   There is a subtlety here.  It is possible that a second autofs
 146   filesystem can be mounted below the first and for both of them to
 147   be managed by the same daemon.  For the daemon to be able to mount
 148   something on the second it must be able to "walk" down past the
 149   first.  This means that d_manage cannot *always* return -EISDIR for
 150   the automount daemon.  It must only return it when a mount has
 151   been requested, but has not yet completed.
 152
 153   `d_manage` also returns `-EISDIR` if the dentry shouldn't be a
 154   mount trap, either because it is a symbolic link or because it is
 155   not empty.
 156
 157-  Any other negative value is treated as an error and returned
 158   to the caller.
 159
 160   autofs can return
 161
 162   - -ENOENT if the automount daemon failed to mount anything,
 163   - -ENOMEM if it ran out of memory,
 164   - -EINTR if a signal arrived while waiting for expiry to
 165     complete
 166   - or any other error sent down by the automount daemon.
 167
 168
 169The second use case only occurs during an "RCU-walk" and so `rcu_walk`
 170will be set.
 171
 172An RCU-walk is a fast and lightweight process for walking down a
 173filename path (i.e. it is like running on tip-toes).  RCU-walk cannot
 174cope with all situations so when it finds a difficulty it falls back
 175to "REF-walk", which is slower but more robust.
 176
 177RCU-walk will never call `->d_automount`; the filesystems must already
 178be mounted or RCU-walk cannot handle the path.
 179To determine if a mount-trap is safe for RCU-walk mode it calls
 180`->d_manage()` with `rcu_walk` set to `true`.
 181
 182In this case `d_manage()` must avoid blocking and should avoid taking
 183spinlocks if at all possible.  Its sole purpose is to determine if it
 184would be safe to follow down into any mounted directory and the only
 185reason that it might not be is if an expiry of the mount is
 186underway.
 187
 188In the `rcu_walk` case, `d_manage()` cannot return -EISDIR to tell the
 189VFS that this is a directory that doesn't require d_automount.  If
 190`rcu_walk` sees a dentry with DCACHE_NEED_AUTOMOUNT set but nothing
 191mounted, it *will* fall back to REF-walk.  `d_manage()` cannot make the
 192VFS remain in RCU-walk mode, but can only tell it to get out of
 193RCU-walk mode by returning `-ECHILD`.
 194
 195So `d_manage()`, when called with `rcu_walk` set, should either return
 196-ECHILD if there is any reason to believe it is unsafe to end the
 197mounted filesystem, and otherwise should return 0.
 198
 199autofs will return `-ECHILD` if an expiry of the filesystem has been
 200initiated or is being considered, otherwise it returns 0.
 201
 202
 203Mountpoint expiry
 204-----------------
 205
 206The VFS has a mechansim for automatically expiring unused mounts,
 207much as it can expire any unused dentry information from the dcache.
 208This is guided by the MNT_SHRINKABLE flag.  This  only applies to
 209mounts that were created by `d_automount()` returning a filesystem to be
 210mounted.  As autofs doesn't return such a filesystem but leaves the
 211mounting to the automount daemon, it must involve the automount daemon
 212in unmounting as well.  This also means that autofs has more control
 213of expiry.
 214
 215The VFS also supports "expiry" of mounts using the MNT_EXPIRE flag to
 216the `umount` system call.  Unmounting with MNT_EXPIRE will fail unless
 217a previous attempt had been made, and the filesystem has been inactive
 218and untouched since that previous attempt.  autofs4 does not depend on
 219this but has its own internal tracking of whether filesystems were
 220recently used.  This allows individual names in the autofs directory
 221to expire separately.
 222
 223With version 4 of the protocol, the automount daemon can try to
 224unmount any filesystems mounted on the autofs filesystem or remove any
 225symbolic links or empty directories any time it likes.  If the unmount
 226or removal is successful the filesystem will be returned to the state
 227it was before the mount or creation, so that any access of the name
 228will trigger normal auto-mount processing.  In particlar, `rmdir` and
 229`unlink` do not leave negative entries in the dcache as a normal
 230filesystem would, so an attempt to access a recently-removed object is
 231passed to autofs for handling.
 232
 233With version 5, this is not safe except for unmounting from top-level
 234directories.  As lower-level directories are never mount traps, other
 235processes will see an empty directory as soon as the filesystem is
 236unmounted.  So it is generally safest to use the autofs expiry
 237protocol described below.
 238
 239Normally the daemon only wants to remove entries which haven't been
 240used for a while.  For this purpose autofs maintains a "`last_used`"
 241time stamp on each directory or symlink.  For symlinks it genuinely
 242does record the last time the symlink was "used" or followed to find
 243out where it points to.  For directories the field is a slight
 244misnomer.  It actually records the last time that autofs checked if
 245the directory or one of its descendents was busy and found that it
 246was.  This is just as useful and doesn't require updating the field so
 247often.
 248
 249The daemon is able to ask autofs if anything is due to be expired,
 250using an `ioctl` as discussed later.  For a *direct* mount, autofs
 251considers if the entire mount-tree can be unmounted or not.  For an
 252*indirect* mount, autofs considers each of the names in the top level
 253directory to determine if any of those can be unmounted and cleaned
 254up.
 255
 256There is an option with indirect mounts to consider each of the leaves
 257that has been mounted on instead of considering the top-level names.
 258This is intended for compatability with version 4 of autofs and should
 259be considered as deprecated.
 260
 261When autofs considers a directory it checks the `last_used` time and
 262compares it with the "timeout" value set when the filesystem was
 263mounted, though this check is ignored in some cases. It also checks if
 264the directory or anything below it is in use.  For symbolic links,
 265only the `last_used` time is ever considered.
 266
 267If both appear to support expiring the directory or symlink, an action
 268is taken.
 269
 270There are two ways to ask autofs to consider expiry.  The first is to
 271use the **AUTOFS_IOC_EXPIRE** ioctl.  This only works for indirect
 272mounts.  If it finds something in the root directory to expire it will
 273return the name of that thing.  Once a name has been returned the
 274automount daemon needs to unmount any filesystems mounted below the
 275name normally.  As described above, this is unsafe for non-toplevel
 276mounts in a version-5 autofs.  For this reason the current `automountd`
 277does not use this ioctl.
 278
 279The second mechanism uses either the **AUTOFS_DEV_IOCTL_EXPIRE_CMD** or
 280the **AUTOFS_IOC_EXPIRE_MULTI** ioctl.  This will work for both direct and
 281indirect mounts.  If it selects an object to expire, it will notify
 282the daemon using the notification mechanism described below.  This
 283will block until the daemon acknowledges the expiry notification.
 284This implies that the "`EXPIRE`" ioctl must be sent from a different
 285thread than the one which handles notification.
 286
 287While the ioctl is blocking, the entry is marked as "expiring" and
 288`d_manage` will block until the daemon affirms that the unmount has
 289completed (together with removing any directories that might have been
 290necessary), or has been aborted.
 291
 292Communicating with autofs: detecting the daemon
 293-----------------------------------------------
 294
 295There are several forms of communication between the automount daemon
 296and the filesystem.  As we have already seen, the daemon can create and
 297remove directories and symlinks using normal filesystem operations.
 298autofs knows whether a process requesting some operation is the daemon
 299or not based on its process-group id number (see getpgid(1)).
 300
 301When an autofs filesystem it mounted the pgid of the mounting
 302processes is recorded unless the "pgrp=" option is given, in which
 303case that number is recorded instead.  Any request arriving from a
 304process in that process group is considered to come from the daemon.
 305If the daemon ever has to be stopped and restarted a new pgid can be
 306provided through an ioctl as will be described below.
 307
 308Communicating with autofs: the event pipe
 309-----------------------------------------
 310
 311When an autofs filesystem is mounted, the 'write' end of a pipe must
 312be passed using the 'fd=' mount option.  autofs will write
 313notification messages to this pipe for the daemon to respond to.
 314For version 5, the format of the message is:
 315
 316        struct autofs_v5_packet {
 317                int proto_version;                /* Protocol version */
 318                int type;                        /* Type of packet */
 319                autofs_wqt_t wait_queue_token;
 320                __u32 dev;
 321                __u64 ino;
 322                __u32 uid;
 323                __u32 gid;
 324                __u32 pid;
 325                __u32 tgid;
 326                __u32 len;
 327                char name[NAME_MAX+1];
 328        };
 329
 330where the type is one of
 331
 332        autofs_ptype_missing_indirect
 333        autofs_ptype_expire_indirect
 334        autofs_ptype_missing_direct
 335        autofs_ptype_expire_direct
 336
 337so messages can indicate that a name is missing (something tried to
 338access it but it isn't there) or that it has been selected for expiry.
 339
 340The pipe will be set to "packet mode" (equivalent to passing
 341`O_DIRECT`) to _pipe2(2)_ so that a read from the pipe will return at
 342most one packet, and any unread portion of a packet will be discarded.
 343
 344The `wait_queue_token` is a unique number which can identify a
 345particular request to be acknowledged.  When a message is sent over
 346the pipe the affected dentry is marked as either "active" or
 347"expiring" and other accesses to it block until the message is
 348acknowledged using one of the ioctls below and the relevant
 349`wait_queue_token`.
 350
 351Communicating with autofs: root directory ioctls
 352------------------------------------------------
 353
 354The root directory of an autofs filesystem will respond to a number of
 355ioctls.   The process issuing the ioctl must have the CAP_SYS_ADMIN
 356capability, or must be the automount daemon.
 357
 358The available ioctl commands are:
 359
 360- **AUTOFS_IOC_READY**: a notification has been handled.  The argument
 361    to the ioctl command is the "wait_queue_token" number
 362    corresponding to the notification being acknowledged.
 363- **AUTOFS_IOC_FAIL**: similar to above, but indicates failure with
 364    the error code `ENOENT`.
 365- **AUTOFS_IOC_CATATONIC**: Causes the autofs to enter "catatonic"
 366    mode meaning that it stops sending notifications to the daemon.
 367    This mode is also entered if a write to the pipe fails.
 368- **AUTOFS_IOC_PROTOVER**:  This returns the protocol version in use.
 369- **AUTOFS_IOC_PROTOSUBVER**: Returns the protocol sub-version which
 370    is really a version number for the implementation.  It is
 371    currently 2.
 372- **AUTOFS_IOC_SETTIMEOUT**:  This passes a pointer to an unsigned
 373    long.  The value is used to set the timeout for expiry, and
 374    the current timeout value is stored back through the pointer.
 375- **AUTOFS_IOC_ASKUMOUNT**:  Returns, in the pointed-to `int`, 1 if
 376    the filesystem could be unmounted.  This is only a hint as
 377    the situation could change at any instant.  This call can be
 378    use to avoid a more expensive full unmount attempt.
 379- **AUTOFS_IOC_EXPIRE**: as described above, this asks if there is
 380    anything suitable to expire.  A pointer to a packet:
 381
 382        struct autofs_packet_expire_multi {
 383                int proto_version;              /* Protocol version */
 384                int type;                       /* Type of packet */
 385                autofs_wqt_t wait_queue_token;
 386                int len;
 387                char name[NAME_MAX+1];
 388        };
 389
 390     is required.  This is filled in with the name of something
 391     that can be unmounted or removed.  If nothing can be expired,
 392     `errno` is set to `EAGAIN`.  Even though a `wait_queue_token`
 393     is present in the structure, no "wait queue" is established
 394     and no acknowledgment is needed.
 395- **AUTOFS_IOC_EXPIRE_MULTI**:  This is similar to
 396     **AUTOFS_IOC_EXPIRE** except that it causes notification to be
 397     sent to the daemon, and it blocks until the daemon acknowledges.
 398     The argument is an integer which can contain two different flags.
 399
 400     **AUTOFS_EXP_IMMEDIATE** causes `last_used` time to be ignored
 401     and objects are expired if the are not in use.
 402
 403     **AUTOFS_EXP_LEAVES** will select a leaf rather than a top-level
 404     name to expire.  This is only safe when *maxproto* is 4.
 405
 406Communicating with autofs: char-device ioctls
 407---------------------------------------------
 408
 409It is not always possible to open the root of an autofs filesystem,
 410particularly a *direct* mounted filesystem.  If the automount daemon
 411is restarted there is no way for it to regain control of existing
 412mounts using any of the above communication channels.  To address this
 413need there is a "miscellaneous" character device (major 10, minor 235)
 414which can be used to communicate directly with the autofs filesystem.
 415It requires CAP_SYS_ADMIN for access.
 416
 417The `ioctl`s that can be used on this device are described in a separate
 418document `autofs4-mount-control.txt`, and are summarized briefly here.
 419Each ioctl is passed a pointer to an `autofs_dev_ioctl` structure:
 420
 421        struct autofs_dev_ioctl {
 422                __u32 ver_major;
 423                __u32 ver_minor;
 424                __u32 size;             /* total size of data passed in
 425                                         * including this struct */
 426                __s32 ioctlfd;          /* automount command fd */
 427
 428                __u32 arg1;             /* Command parameters */
 429                __u32 arg2;
 430
 431                char path[0];
 432        };
 433
 434For the **OPEN_MOUNT** and **IS_MOUNTPOINT** commands, the target
 435filesystem is identified by the `path`.  All other commands identify
 436the filesystem by the `ioctlfd` which is a file descriptor open on the
 437root, and which can be returned by **OPEN_MOUNT**.
 438
 439The `ver_major` and `ver_minor` are in/out parameters which check that
 440the requested version is supported, and report the maximum version
 441that the kernel module can support.
 442
 443Commands are:
 444
 445- **AUTOFS_DEV_IOCTL_VERSION_CMD**: does nothing, except validate and
 446    set version numbers.
 447- **AUTOFS_DEV_IOCTL_OPENMOUNT_CMD**: return an open file descriptor
 448    on the root of an autofs filesystem.  The filesystem is identified
 449    by name and device number, which is stored in `arg1`.  Device
 450    numbers for existing filesystems can be found in
 451    `/proc/self/mountinfo`.
 452- **AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD**: same as `close(ioctlfd)`.
 453- **AUTOFS_DEV_IOCTL_SETPIPEFD_CMD**: if the  filesystem is in
 454    catatonic mode, this can provide the write end of a new pipe
 455    in `arg1` to re-establish communication with a daemon.  The
 456    process group of the calling process is used to identify the
 457    daemon.
 458- **AUTOFS_DEV_IOCTL_REQUESTER_CMD**: `path` should be a
 459    name within the filesystem that has been auto-mounted on.
 460    arg1 is the dev number of the underlying autofs.  On successful
 461    return, `arg1` and `arg2` will be the UID and GID of the process
 462    which triggered that mount.
 463
 464- **AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD**: Check if path is a
 465    mountpoint of a particular type - see separate documentation for
 466    details.
 467
 468- **AUTOFS_DEV_IOCTL_PROTOVER_CMD**:
 469- **AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD**:
 470- **AUTOFS_DEV_IOCTL_READY_CMD**:
 471- **AUTOFS_DEV_IOCTL_FAIL_CMD**:
 472- **AUTOFS_DEV_IOCTL_CATATONIC_CMD**:
 473- **AUTOFS_DEV_IOCTL_TIMEOUT_CMD**:
 474- **AUTOFS_DEV_IOCTL_EXPIRE_CMD**:
 475- **AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD**:  These all have the same
 476    function as the similarly named **AUTOFS_IOC** ioctls, except
 477    that **FAIL** can be given an explicit error number in `arg1`
 478    instead of assuming `ENOENT`, and this **EXPIRE** command
 479    corresponds to **AUTOFS_IOC_EXPIRE_MULTI**.
 480
 481Catatonic mode
 482--------------
 483
 484As mentioned, an autofs mount can enter "catatonic" mode.  This
 485happens if a write to the notification pipe fails, or if it is
 486explicitly requested by an `ioctl`.
 487
 488When entering catatonic mode, the pipe is closed and any pending
 489notifications are acknowledged with the error `ENOENT`.
 490
 491Once in catatonic mode attempts to access non-existing names will
 492result in `ENOENT` while attempts to access existing directories will
 493be treated in the same way as if they came from the daemon, so mount
 494traps will not fire.
 495
 496When the filesystem is mounted a _uid_ and _gid_ can be given which
 497set the ownership of directories and symbolic links.  When the
 498filesystem is in catatonic mode, any process with a matching UID can
 499create directories or symlinks in the root directory, but not in other
 500directories.
 501
 502Catatonic mode can only be left via the
 503**AUTOFS_DEV_IOCTL_OPENMOUNT_CMD** ioctl on the `/dev/autofs`.
 504
 505autofs, name spaces, and shared mounts
 506--------------------------------------
 507
 508With bind mounts and name spaces it is possible for an autofs
 509filesystem to appear at multiple places in one or more filesystem
 510name spaces.  For this to work sensibly, the autofs filesystem should
 511always be mounted "shared". e.g.
 512
 513> `mount --make-shared /autofs/mount/point`
 514
 515The automount daemon is only able to mange a single mount location for
 516an autofs filesystem and if mounts on that are not 'shared', other
 517locations will not behave as expected.  In particular access to those
 518other locations will likely result in the `ELOOP` error
 519
 520> Too many levels of symbolic links
 521