linux/Documentation/filesystems/files.txt
<<
>>
Prefs
   1File management in the Linux kernel
   2-----------------------------------
   3
   4This document describes how locking for files (struct file)
   5and file descriptor table (struct files) works.
   6
   7Up until 2.6.12, the file descriptor table has been protected
   8with a lock (files->file_lock) and reference count (files->count).
   9->file_lock protected accesses to all the file related fields
  10of the table. ->count was used for sharing the file descriptor
  11table between tasks cloned with CLONE_FILES flag. Typically
  12this would be the case for posix threads. As with the common
  13refcounting model in the kernel, the last task doing
  14a put_files_struct() frees the file descriptor (fd) table.
  15The files (struct file) themselves are protected using
  16reference count (->f_count).
  17
  18In the new lock-free model of file descriptor management,
  19the reference counting is similar, but the locking is
  20based on RCU. The file descriptor table contains multiple
  21elements - the fd sets (open_fds and close_on_exec, the
  22array of file pointers, the sizes of the sets and the array
  23etc.). In order for the updates to appear atomic to
  24a lock-free reader, all the elements of the file descriptor
  25table are in a separate structure - struct fdtable.
  26files_struct contains a pointer to struct fdtable through
  27which the actual fd table is accessed. Initially the
  28fdtable is embedded in files_struct itself. On a subsequent
  29expansion of fdtable, a new fdtable structure is allocated
  30and files->fdtab points to the new structure. The fdtable
  31structure is freed with RCU and lock-free readers either
  32see the old fdtable or the new fdtable making the update
  33appear atomic. Here are the locking rules for
  34the fdtable structure -
  35
  361. All references to the fdtable must be done through
  37   the files_fdtable() macro :
  38
  39        struct fdtable *fdt;
  40
  41        rcu_read_lock();
  42
  43        fdt = files_fdtable(files);
  44        ....
  45        if (n <= fdt->max_fds)
  46                ....
  47        ...
  48        rcu_read_unlock();
  49
  50   files_fdtable() uses rcu_dereference() macro which takes care of
  51   the memory barrier requirements for lock-free dereference.
  52   The fdtable pointer must be read within the read-side
  53   critical section.
  54
  552. Reading of the fdtable as described above must be protected
  56   by rcu_read_lock()/rcu_read_unlock().
  57
  583. For any update to the fd table, files->file_lock must
  59   be held.
  60
  614. To look up the file structure given an fd, a reader
  62   must use either fcheck() or fcheck_files() APIs. These
  63   take care of barrier requirements due to lock-free lookup.
  64   An example :
  65
  66        struct file *file;
  67
  68        rcu_read_lock();
  69        file = fcheck(fd);
  70        if (file) {
  71                ...
  72        }
  73        ....
  74        rcu_read_unlock();
  75
  765. Handling of the file structures is special. Since the look-up
  77   of the fd (fget()/fget_light()) are lock-free, it is possible
  78   that look-up may race with the last put() operation on the
  79   file structure. This is avoided using atomic_long_inc_not_zero()
  80   on ->f_count :
  81
  82        rcu_read_lock();
  83        file = fcheck_files(files, fd);
  84        if (file) {
  85                if (atomic_long_inc_not_zero(&file->f_count))
  86                        *fput_needed = 1;
  87                else
  88                /* Didn't get the reference, someone's freed */
  89                        file = NULL;
  90        }
  91        rcu_read_unlock();
  92        ....
  93        return file;
  94
  95   atomic_long_inc_not_zero() detects if refcounts is already zero or
  96   goes to zero during increment. If it does, we fail
  97   fget()/fget_light().
  98
  996. Since both fdtable and file structures can be looked up
 100   lock-free, they must be installed using rcu_assign_pointer()
 101   API. If they are looked up lock-free, rcu_dereference()
 102   must be used. However it is advisable to use files_fdtable()
 103   and fcheck()/fcheck_files() which take care of these issues.
 104
 1057. While updating, the fdtable pointer must be looked up while
 106   holding files->file_lock. If ->file_lock is dropped, then
 107   another thread expand the files thereby creating a new
 108   fdtable and making the earlier fdtable pointer stale.
 109   For example :
 110
 111        spin_lock(&files->file_lock);
 112        fd = locate_fd(files, file, start);
 113        if (fd >= 0) {
 114                /* locate_fd() may have expanded fdtable, load the ptr */
 115                fdt = files_fdtable(files);
 116                __set_open_fd(fd, fdt);
 117                __clear_close_on_exec(fd, fdt);
 118                spin_unlock(&files->file_lock);
 119        .....
 120
 121   Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
 122   the fdtable pointer (fdt) must be loaded after locate_fd().
 123
 124