linux/Documentation/filesystems/index.rst
<<
>>
Prefs
   1=====================
   2Linux Filesystems API
   3=====================
   4
   5The Linux VFS
   6=============
   7
   8The Filesystem types
   9--------------------
  10
  11.. kernel-doc:: include/linux/fs.h
  12   :internal:
  13
  14The Directory Cache
  15-------------------
  16
  17.. kernel-doc:: fs/dcache.c
  18   :export:
  19
  20.. kernel-doc:: include/linux/dcache.h
  21   :internal:
  22
  23Inode Handling
  24--------------
  25
  26.. kernel-doc:: fs/inode.c
  27   :export:
  28
  29.. kernel-doc:: fs/bad_inode.c
  30   :export:
  31
  32Registration and Superblocks
  33----------------------------
  34
  35.. kernel-doc:: fs/super.c
  36   :export:
  37
  38File Locks
  39----------
  40
  41.. kernel-doc:: fs/locks.c
  42   :export:
  43
  44.. kernel-doc:: fs/locks.c
  45   :internal:
  46
  47Other Functions
  48---------------
  49
  50.. kernel-doc:: fs/mpage.c
  51   :export:
  52
  53.. kernel-doc:: fs/namei.c
  54   :export:
  55
  56.. kernel-doc:: fs/buffer.c
  57   :export:
  58
  59.. kernel-doc:: block/bio.c
  60   :export:
  61
  62.. kernel-doc:: fs/seq_file.c
  63   :export:
  64
  65.. kernel-doc:: fs/filesystems.c
  66   :export:
  67
  68.. kernel-doc:: fs/fs-writeback.c
  69   :export:
  70
  71.. kernel-doc:: fs/block_dev.c
  72   :export:
  73
  74The proc filesystem
  75===================
  76
  77sysctl interface
  78----------------
  79
  80.. kernel-doc:: kernel/sysctl.c
  81   :export:
  82
  83proc filesystem interface
  84-------------------------
  85
  86.. kernel-doc:: fs/proc/base.c
  87   :internal:
  88
  89Events based on file descriptors
  90================================
  91
  92.. kernel-doc:: fs/eventfd.c
  93   :export:
  94
  95The Filesystem for Exporting Kernel Objects
  96===========================================
  97
  98.. kernel-doc:: fs/sysfs/file.c
  99   :export:
 100
 101.. kernel-doc:: fs/sysfs/symlink.c
 102   :export:
 103
 104The debugfs filesystem
 105======================
 106
 107debugfs interface
 108-----------------
 109
 110.. kernel-doc:: fs/debugfs/inode.c
 111   :export:
 112
 113.. kernel-doc:: fs/debugfs/file.c
 114   :export:
 115
 116The Linux Journalling API
 117=========================
 118
 119Overview
 120--------
 121
 122Details
 123~~~~~~~
 124
 125The journalling layer is easy to use. You need to first of all create a
 126journal_t data structure. There are two calls to do this dependent on
 127how you decide to allocate the physical media on which the journal
 128resides. The :c:func:`jbd2_journal_init_inode` call is for journals stored in
 129filesystem inodes, or the :c:func:`jbd2_journal_init_dev` call can be used
 130for journal stored on a raw device (in a continuous range of blocks). A
 131journal_t is a typedef for a struct pointer, so when you are finally
 132finished make sure you call :c:func:`jbd2_journal_destroy` on it to free up
 133any used kernel memory.
 134
 135Once you have got your journal_t object you need to 'mount' or load the
 136journal file. The journalling layer expects the space for the journal
 137was already allocated and initialized properly by the userspace tools.
 138When loading the journal you must call :c:func:`jbd2_journal_load` to process
 139journal contents. If the client file system detects the journal contents
 140does not need to be processed (or even need not have valid contents), it
 141may call :c:func:`jbd2_journal_wipe` to clear the journal contents before
 142calling :c:func:`jbd2_journal_load`.
 143
 144Note that jbd2_journal_wipe(..,0) calls
 145:c:func:`jbd2_journal_skip_recovery` for you if it detects any outstanding
 146transactions in the journal and similarly :c:func:`jbd2_journal_load` will
 147call :c:func:`jbd2_journal_recover` if necessary. I would advise reading
 148:c:func:`ext4_load_journal` in fs/ext4/super.c for examples on this stage.
 149
 150Now you can go ahead and start modifying the underlying filesystem.
 151Almost.
 152
 153You still need to actually journal your filesystem changes, this is done
 154by wrapping them into transactions. Additionally you also need to wrap
 155the modification of each of the buffers with calls to the journal layer,
 156so it knows what the modifications you are actually making are. To do
 157this use :c:func:`jbd2_journal_start` which returns a transaction handle.
 158
 159:c:func:`jbd2_journal_start` and its counterpart :c:func:`jbd2_journal_stop`,
 160which indicates the end of a transaction are nestable calls, so you can
 161reenter a transaction if necessary, but remember you must call
 162:c:func:`jbd2_journal_stop` the same number of times as
 163:c:func:`jbd2_journal_start` before the transaction is completed (or more
 164accurately leaves the update phase). Ext4/VFS makes use of this feature to
 165simplify handling of inode dirtying, quota support, etc.
 166
 167Inside each transaction you need to wrap the modifications to the
 168individual buffers (blocks). Before you start to modify a buffer you
 169need to call :c:func:`jbd2_journal_get_create_access()` /
 170:c:func:`jbd2_journal_get_write_access()` /
 171:c:func:`jbd2_journal_get_undo_access()` as appropriate, this allows the
 172journalling layer to copy the unmodified
 173data if it needs to. After all the buffer may be part of a previously
 174uncommitted transaction. At this point you are at last ready to modify a
 175buffer, and once you are have done so you need to call
 176:c:func:`jbd2_journal_dirty_metadata`. Or if you've asked for access to a
 177buffer you now know is now longer required to be pushed back on the
 178device you can call :c:func:`jbd2_journal_forget` in much the same way as you
 179might have used :c:func:`bforget` in the past.
 180
 181A :c:func:`jbd2_journal_flush` may be called at any time to commit and
 182checkpoint all your transactions.
 183
 184Then at umount time , in your :c:func:`put_super` you can then call
 185:c:func:`jbd2_journal_destroy` to clean up your in-core journal object.
 186
 187Unfortunately there a couple of ways the journal layer can cause a
 188deadlock. The first thing to note is that each task can only have a
 189single outstanding transaction at any one time, remember nothing commits
 190until the outermost :c:func:`jbd2_journal_stop`. This means you must complete
 191the transaction at the end of each file/inode/address etc. operation you
 192perform, so that the journalling system isn't re-entered on another
 193journal. Since transactions can't be nested/batched across differing
 194journals, and another filesystem other than yours (say ext4) may be
 195modified in a later syscall.
 196
 197The second case to bear in mind is that :c:func:`jbd2_journal_start` can block
 198if there isn't enough space in the journal for your transaction (based
 199on the passed nblocks param) - when it blocks it merely(!) needs to wait
 200for transactions to complete and be committed from other tasks, so
 201essentially we are waiting for :c:func:`jbd2_journal_stop`. So to avoid
 202deadlocks you must treat :c:func:`jbd2_journal_start` /
 203:c:func:`jbd2_journal_stop` as if they were semaphores and include them in
 204your semaphore ordering rules to prevent
 205deadlocks. Note that :c:func:`jbd2_journal_extend` has similar blocking
 206behaviour to :c:func:`jbd2_journal_start` so you can deadlock here just as
 207easily as on :c:func:`jbd2_journal_start`.
 208
 209Try to reserve the right number of blocks the first time. ;-). This will
 210be the maximum number of blocks you are going to touch in this
 211transaction. I advise having a look at at least ext4_jbd.h to see the
 212basis on which ext4 uses to make these decisions.
 213
 214Another wriggle to watch out for is your on-disk block allocation
 215strategy. Why? Because, if you do a delete, you need to ensure you
 216haven't reused any of the freed blocks until the transaction freeing
 217these blocks commits. If you reused these blocks and crash happens,
 218there is no way to restore the contents of the reallocated blocks at the
 219end of the last fully committed transaction. One simple way of doing
 220this is to mark blocks as free in internal in-memory block allocation
 221structures only after the transaction freeing them commits. Ext4 uses
 222journal commit callback for this purpose.
 223
 224With journal commit callbacks you can ask the journalling layer to call
 225a callback function when the transaction is finally committed to disk,
 226so that you can do some of your own management. You ask the journalling
 227layer for calling the callback by simply setting
 228``journal->j_commit_callback`` function pointer and that function is
 229called after each transaction commit. You can also use
 230``transaction->t_private_list`` for attaching entries to a transaction
 231that need processing when the transaction commits.
 232
 233JBD2 also provides a way to block all transaction updates via
 234:c:func:`jbd2_journal_lock_updates()` /
 235:c:func:`jbd2_journal_unlock_updates()`. Ext4 uses this when it wants a
 236window with a clean and stable fs for a moment. E.g.
 237
 238::
 239
 240
 241        jbd2_journal_lock_updates() //stop new stuff happening..
 242        jbd2_journal_flush()        // checkpoint everything.
 243        ..do stuff on stable fs
 244        jbd2_journal_unlock_updates() // carry on with filesystem use.
 245
 246The opportunities for abuse and DOS attacks with this should be obvious,
 247if you allow unprivileged userspace to trigger codepaths containing
 248these calls.
 249
 250Summary
 251~~~~~~~
 252
 253Using the journal is a matter of wrapping the different context changes,
 254being each mount, each modification (transaction) and each changed
 255buffer to tell the journalling layer about them.
 256
 257Data Types
 258----------
 259
 260The journalling layer uses typedefs to 'hide' the concrete definitions
 261of the structures used. As a client of the JBD2 layer you can just rely
 262on the using the pointer as a magic cookie of some sort. Obviously the
 263hiding is not enforced as this is 'C'.
 264
 265Structures
 266~~~~~~~~~~
 267
 268.. kernel-doc:: include/linux/jbd2.h
 269   :internal:
 270
 271Functions
 272---------
 273
 274The functions here are split into two groups those that affect a journal
 275as a whole, and those which are used to manage transactions
 276
 277Journal Level
 278~~~~~~~~~~~~~
 279
 280.. kernel-doc:: fs/jbd2/journal.c
 281   :export:
 282
 283.. kernel-doc:: fs/jbd2/recovery.c
 284   :internal:
 285
 286Transasction Level
 287~~~~~~~~~~~~~~~~~~
 288
 289.. kernel-doc:: fs/jbd2/transaction.c
 290
 291See also
 292--------
 293
 294`Journaling the Linux ext2fs Filesystem, LinuxExpo 98, Stephen
 295Tweedie <http://kernel.org/pub/linux/kernel/people/sct/ext3/journal-design.ps.gz>`__
 296
 297`Ext3 Journalling FileSystem, OLS 2000, Dr. Stephen
 298Tweedie <http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html>`__
 299
 300splice API
 301==========
 302
 303splice is a method for moving blocks of data around inside the kernel,
 304without continually transferring them between the kernel and user space.
 305
 306.. kernel-doc:: fs/splice.c
 307
 308pipes API
 309=========
 310
 311Pipe interfaces are all for in-kernel (builtin image) use. They are not
 312exported for use by modules.
 313
 314.. kernel-doc:: include/linux/pipe_fs_i.h
 315   :internal:
 316
 317.. kernel-doc:: fs/pipe.c
 318