linux/Documentation/credentials.txt
<<
>>
Prefs
   1                             ====================
   2                             CREDENTIALS IN LINUX
   3                             ====================
   4
   5By: David Howells <dhowells@redhat.com>
   6
   7Contents:
   8
   9 (*) Overview.
  10
  11 (*) Types of credentials.
  12
  13 (*) File markings.
  14
  15 (*) Task credentials.
  16
  17     - Immutable credentials.
  18     - Accessing task credentials.
  19     - Accessing another task's credentials.
  20     - Altering credentials.
  21     - Managing credentials.
  22
  23 (*) Open file credentials.
  24
  25 (*) Overriding the VFS's use of credentials.
  26
  27
  28========
  29OVERVIEW
  30========
  31
  32There are several parts to the security check performed by Linux when one
  33object acts upon another:
  34
  35 (1) Objects.
  36
  37     Objects are things in the system that may be acted upon directly by
  38     userspace programs.  Linux has a variety of actionable objects, including:
  39
  40        - Tasks
  41        - Files/inodes
  42        - Sockets
  43        - Message queues
  44        - Shared memory segments
  45        - Semaphores
  46        - Keys
  47
  48     As a part of the description of all these objects there is a set of
  49     credentials.  What's in the set depends on the type of object.
  50
  51 (2) Object ownership.
  52
  53     Amongst the credentials of most objects, there will be a subset that
  54     indicates the ownership of that object.  This is used for resource
  55     accounting and limitation (disk quotas and task rlimits for example).
  56
  57     In a standard UNIX filesystem, for instance, this will be defined by the
  58     UID marked on the inode.
  59
  60 (3) The objective context.
  61
  62     Also amongst the credentials of those objects, there will be a subset that
  63     indicates the 'objective context' of that object.  This may or may not be
  64     the same set as in (2) - in standard UNIX files, for instance, this is the
  65     defined by the UID and the GID marked on the inode.
  66
  67     The objective context is used as part of the security calculation that is
  68     carried out when an object is acted upon.
  69
  70 (4) Subjects.
  71
  72     A subject is an object that is acting upon another object.
  73
  74     Most of the objects in the system are inactive: they don't act on other
  75     objects within the system.  Processes/tasks are the obvious exception:
  76     they do stuff; they access and manipulate things.
  77
  78     Objects other than tasks may under some circumstances also be subjects.
  79     For instance an open file may send SIGIO to a task using the UID and EUID
  80     given to it by a task that called fcntl(F_SETOWN) upon it.  In this case,
  81     the file struct will have a subjective context too.
  82
  83 (5) The subjective context.
  84
  85     A subject has an additional interpretation of its credentials.  A subset
  86     of its credentials forms the 'subjective context'.  The subjective context
  87     is used as part of the security calculation that is carried out when a
  88     subject acts.
  89
  90     A Linux task, for example, has the FSUID, FSGID and the supplementary
  91     group list for when it is acting upon a file - which are quite separate
  92     from the real UID and GID that normally form the objective context of the
  93     task.
  94
  95 (6) Actions.
  96
  97     Linux has a number of actions available that a subject may perform upon an
  98     object.  The set of actions available depends on the nature of the subject
  99     and the object.
 100
 101     Actions include reading, writing, creating and deleting files; forking or
 102     signalling and tracing tasks.
 103
 104 (7) Rules, access control lists and security calculations.
 105
 106     When a subject acts upon an object, a security calculation is made.  This
 107     involves taking the subjective context, the objective context and the
 108     action, and searching one or more sets of rules to see whether the subject
 109     is granted or denied permission to act in the desired manner on the
 110     object, given those contexts.
 111
 112     There are two main sources of rules:
 113
 114     (a) Discretionary access control (DAC):
 115
 116         Sometimes the object will include sets of rules as part of its
 117         description.  This is an 'Access Control List' or 'ACL'.  A Linux
 118         file may supply more than one ACL.
 119
 120         A traditional UNIX file, for example, includes a permissions mask that
 121         is an abbreviated ACL with three fixed classes of subject ('user',
 122         'group' and 'other'), each of which may be granted certain privileges
 123         ('read', 'write' and 'execute' - whatever those map to for the object
 124         in question).  UNIX file permissions do not allow the arbitrary
 125         specification of subjects, however, and so are of limited use.
 126
 127         A Linux file might also sport a POSIX ACL.  This is a list of rules
 128         that grants various permissions to arbitrary subjects.
 129
 130     (b) Mandatory access control (MAC):
 131
 132         The system as a whole may have one or more sets of rules that get
 133         applied to all subjects and objects, regardless of their source.
 134         SELinux and Smack are examples of this.
 135
 136         In the case of SELinux and Smack, each object is given a label as part
 137         of its credentials.  When an action is requested, they take the
 138         subject label, the object label and the action and look for a rule
 139         that says that this action is either granted or denied.
 140
 141
 142====================
 143TYPES OF CREDENTIALS
 144====================
 145
 146The Linux kernel supports the following types of credentials:
 147
 148 (1) Traditional UNIX credentials.
 149
 150        Real User ID
 151        Real Group ID
 152
 153     The UID and GID are carried by most, if not all, Linux objects, even if in
 154     some cases it has to be invented (FAT or CIFS files for example, which are
 155     derived from Windows).  These (mostly) define the objective context of
 156     that object, with tasks being slightly different in some cases.
 157
 158        Effective, Saved and FS User ID
 159        Effective, Saved and FS Group ID
 160        Supplementary groups
 161
 162     These are additional credentials used by tasks only.  Usually, an
 163     EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
 164     will be used as the objective.  For tasks, it should be noted that this is
 165     not always true.
 166
 167 (2) Capabilities.
 168
 169        Set of permitted capabilities
 170        Set of inheritable capabilities
 171        Set of effective capabilities
 172        Capability bounding set
 173
 174     These are only carried by tasks.  They indicate superior capabilities
 175     granted piecemeal to a task that an ordinary task wouldn't otherwise have.
 176     These are manipulated implicitly by changes to the traditional UNIX
 177     credentials, but can also be manipulated directly by the capset() system
 178     call.
 179
 180     The permitted capabilities are those caps that the process might grant
 181     itself to its effective or permitted sets through capset().  This
 182     inheritable set might also be so constrained.
 183
 184     The effective capabilities are the ones that a task is actually allowed to
 185     make use of itself.
 186
 187     The inheritable capabilities are the ones that may get passed across
 188     execve().
 189
 190     The bounding set limits the capabilities that may be inherited across
 191     execve(), especially when a binary is executed that will execute as UID 0.
 192
 193 (3) Secure management flags (securebits).
 194
 195     These are only carried by tasks.  These govern the way the above
 196     credentials are manipulated and inherited over certain operations such as
 197     execve().  They aren't used directly as objective or subjective
 198     credentials.
 199
 200 (4) Keys and keyrings.
 201
 202     These are only carried by tasks.  They carry and cache security tokens
 203     that don't fit into the other standard UNIX credentials.  They are for
 204     making such things as network filesystem keys available to the file
 205     accesses performed by processes, without the necessity of ordinary
 206     programs having to know about security details involved.
 207
 208     Keyrings are a special type of key.  They carry sets of other keys and can
 209     be searched for the desired key.  Each process may subscribe to a number
 210     of keyrings:
 211
 212        Per-thread keying
 213        Per-process keyring
 214        Per-session keyring
 215
 216     When a process accesses a key, if not already present, it will normally be
 217     cached on one of these keyrings for future accesses to find.
 218
 219     For more information on using keys, see Documentation/keys.txt.
 220
 221 (5) LSM
 222
 223     The Linux Security Module allows extra controls to be placed over the
 224     operations that a task may do.  Currently Linux supports two main
 225     alternate LSM options: SELinux and Smack.
 226
 227     Both work by labelling the objects in a system and then applying sets of
 228     rules (policies) that say what operations a task with one label may do to
 229     an object with another label.
 230
 231 (6) AF_KEY
 232
 233     This is a socket-based approach to credential management for networking
 234     stacks [RFC 2367].  It isn't discussed by this document as it doesn't
 235     interact directly with task and file credentials; rather it keeps system
 236     level credentials.
 237
 238
 239When a file is opened, part of the opening task's subjective context is
 240recorded in the file struct created.  This allows operations using that file
 241struct to use those credentials instead of the subjective context of the task
 242that issued the operation.  An example of this would be a file opened on a
 243network filesystem where the credentials of the opened file should be presented
 244to the server, regardless of who is actually doing a read or a write upon it.
 245
 246
 247=============
 248FILE MARKINGS
 249=============
 250
 251Files on disk or obtained over the network may have annotations that form the
 252objective security context of that file.  Depending on the type of filesystem,
 253this may include one or more of the following:
 254
 255 (*) UNIX UID, GID, mode;
 256
 257 (*) Windows user ID;
 258
 259 (*) Access control list;
 260
 261 (*) LSM security label;
 262
 263 (*) UNIX exec privilege escalation bits (SUID/SGID);
 264
 265 (*) File capabilities exec privilege escalation bits.
 266
 267These are compared to the task's subjective security context, and certain
 268operations allowed or disallowed as a result.  In the case of execve(), the
 269privilege escalation bits come into play, and may allow the resulting process
 270extra privileges, based on the annotations on the executable file.
 271
 272
 273================
 274TASK CREDENTIALS
 275================
 276
 277In Linux, all of a task's credentials are held in (uid, gid) or through
 278(groups, keys, LSM security) a refcounted structure of type 'struct cred'.
 279Each task points to its credentials by a pointer called 'cred' in its
 280task_struct.
 281
 282Once a set of credentials has been prepared and committed, it may not be
 283changed, barring the following exceptions:
 284
 285 (1) its reference count may be changed;
 286
 287 (2) the reference count on the group_info struct it points to may be changed;
 288
 289 (3) the reference count on the security data it points to may be changed;
 290
 291 (4) the reference count on any keyrings it points to may be changed;
 292
 293 (5) any keyrings it points to may be revoked, expired or have their security
 294     attributes changed; and
 295
 296 (6) the contents of any keyrings to which it points may be changed (the whole
 297     point of keyrings being a shared set of credentials, modifiable by anyone
 298     with appropriate access).
 299
 300To alter anything in the cred struct, the copy-and-replace principle must be
 301adhered to.  First take a copy, then alter the copy and then use RCU to change
 302the task pointer to make it point to the new copy.  There are wrappers to aid
 303with this (see below).
 304
 305A task may only alter its _own_ credentials; it is no longer permitted for a
 306task to alter another's credentials.  This means the capset() system call is no
 307longer permitted to take any PID other than the one of the current process.
 308Also keyctl_instantiate() and keyctl_negate() functions no longer permit
 309attachment to process-specific keyrings in the requesting process as the
 310instantiating process may need to create them.
 311
 312
 313IMMUTABLE CREDENTIALS
 314---------------------
 315
 316Once a set of credentials has been made public (by calling commit_creds() for
 317example), it must be considered immutable, barring two exceptions:
 318
 319 (1) The reference count may be altered.
 320
 321 (2) Whilst the keyring subscriptions of a set of credentials may not be
 322     changed, the keyrings subscribed to may have their contents altered.
 323
 324To catch accidental credential alteration at compile time, struct task_struct
 325has _const_ pointers to its credential sets, as does struct file.  Furthermore,
 326certain functions such as get_cred() and put_cred() operate on const pointers,
 327thus rendering casts unnecessary, but require to temporarily ditch the const
 328qualification to be able to alter the reference count.
 329
 330
 331ACCESSING TASK CREDENTIALS
 332--------------------------
 333
 334A task being able to alter only its own credentials permits the current process
 335to read or replace its own credentials without the need for any form of locking
 336- which simplifies things greatly.  It can just call:
 337
 338        const struct cred *current_cred()
 339
 340to get a pointer to its credentials structure, and it doesn't have to release
 341it afterwards.
 342
 343There are convenience wrappers for retrieving specific aspects of a task's
 344credentials (the value is simply returned in each case):
 345
 346        uid_t current_uid(void)         Current's real UID
 347        gid_t current_gid(void)         Current's real GID
 348        uid_t current_euid(void)        Current's effective UID
 349        gid_t current_egid(void)        Current's effective GID
 350        uid_t current_fsuid(void)       Current's file access UID
 351        gid_t current_fsgid(void)       Current's file access GID
 352        kernel_cap_t current_cap(void)  Current's effective capabilities
 353        void *current_security(void)    Current's LSM security pointer
 354        struct user_struct *current_user(void)  Current's user account
 355
 356There are also convenience wrappers for retrieving specific associated pairs of
 357a task's credentials:
 358
 359        void current_uid_gid(uid_t *, gid_t *);
 360        void current_euid_egid(uid_t *, gid_t *);
 361        void current_fsuid_fsgid(uid_t *, gid_t *);
 362
 363which return these pairs of values through their arguments after retrieving
 364them from the current task's credentials.
 365
 366
 367In addition, there is a function for obtaining a reference on the current
 368process's current set of credentials:
 369
 370        const struct cred *get_current_cred(void);
 371
 372and functions for getting references to one of the credentials that don't
 373actually live in struct cred:
 374
 375        struct user_struct *get_current_user(void);
 376        struct group_info *get_current_groups(void);
 377
 378which get references to the current process's user accounting structure and
 379supplementary groups list respectively.
 380
 381Once a reference has been obtained, it must be released with put_cred(),
 382free_uid() or put_group_info() as appropriate.
 383
 384
 385ACCESSING ANOTHER TASK'S CREDENTIALS
 386------------------------------------
 387
 388Whilst a task may access its own credentials without the need for locking, the
 389same is not true of a task wanting to access another task's credentials.  It
 390must use the RCU read lock and rcu_dereference().
 391
 392The rcu_dereference() is wrapped by:
 393
 394        const struct cred *__task_cred(struct task_struct *task);
 395
 396This should be used inside the RCU read lock, as in the following example:
 397
 398        void foo(struct task_struct *t, struct foo_data *f)
 399        {
 400                const struct cred *tcred;
 401                ...
 402                rcu_read_lock();
 403                tcred = __task_cred(t);
 404                f->uid = tcred->uid;
 405                f->gid = tcred->gid;
 406                f->groups = get_group_info(tcred->groups);
 407                rcu_read_unlock();
 408                ...
 409        }
 410
 411A function need not get RCU read lock to use __task_cred() if it is holding a
 412spinlock at the time as this implicitly holds the RCU read lock.
 413
 414Should it be necessary to hold another task's credentials for a long period of
 415time, and possibly to sleep whilst doing so, then the caller should get a
 416reference on them using:
 417
 418        const struct cred *get_task_cred(struct task_struct *task);
 419
 420This does all the RCU magic inside of it.  The caller must call put_cred() on
 421the credentials so obtained when they're finished with.
 422
 423There are a couple of convenience functions to access bits of another task's
 424credentials, hiding the RCU magic from the caller:
 425
 426        uid_t task_uid(task)            Task's real UID
 427        uid_t task_euid(task)           Task's effective UID
 428
 429If the caller is holding a spinlock or the RCU read lock at the time anyway,
 430then:
 431
 432        __task_cred(task)->uid
 433        __task_cred(task)->euid
 434
 435should be used instead.  Similarly, if multiple aspects of a task's credentials
 436need to be accessed, RCU read lock or a spinlock should be used, __task_cred()
 437called, the result stored in a temporary pointer and then the credential
 438aspects called from that before dropping the lock.  This prevents the
 439potentially expensive RCU magic from being invoked multiple times.
 440
 441Should some other single aspect of another task's credentials need to be
 442accessed, then this can be used:
 443
 444        task_cred_xxx(task, member)
 445
 446where 'member' is a non-pointer member of the cred struct.  For instance:
 447
 448        uid_t task_cred_xxx(task, suid);
 449
 450will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
 451magic.  This may not be used for pointer members as what they point to may
 452disappear the moment the RCU read lock is dropped.
 453
 454
 455ALTERING CREDENTIALS
 456--------------------
 457
 458As previously mentioned, a task may only alter its own credentials, and may not
 459alter those of another task.  This means that it doesn't need to use any
 460locking to alter its own credentials.
 461
 462To alter the current process's credentials, a function should first prepare a
 463new set of credentials by calling:
 464
 465        struct cred *prepare_creds(void);
 466
 467this locks current->cred_replace_mutex and then allocates and constructs a
 468duplicate of the current process's credentials, returning with the mutex still
 469held if successful.  It returns NULL if not successful (out of memory).
 470
 471The mutex prevents ptrace() from altering the ptrace state of a process whilst
 472security checks on credentials construction and changing is taking place as
 473the ptrace state may alter the outcome, particularly in the case of execve().
 474
 475The new credentials set should be altered appropriately, and any security
 476checks and hooks done.  Both the current and the proposed sets of credentials
 477are available for this purpose as current_cred() will return the current set
 478still at this point.
 479
 480
 481When the credential set is ready, it should be committed to the current process
 482by calling:
 483
 484        int commit_creds(struct cred *new);
 485
 486This will alter various aspects of the credentials and the process, giving the
 487LSM a chance to do likewise, then it will use rcu_assign_pointer() to actually
 488commit the new credentials to current->cred, it will release
 489current->cred_replace_mutex to allow ptrace() to take place, and it will notify
 490the scheduler and others of the changes.
 491
 492This function is guaranteed to return 0, so that it can be tail-called at the
 493end of such functions as sys_setresuid().
 494
 495Note that this function consumes the caller's reference to the new credentials.
 496The caller should _not_ call put_cred() on the new credentials afterwards.
 497
 498Furthermore, once this function has been called on a new set of credentials,
 499those credentials may _not_ be changed further.
 500
 501
 502Should the security checks fail or some other error occur after prepare_creds()
 503has been called, then the following function should be invoked:
 504
 505        void abort_creds(struct cred *new);
 506
 507This releases the lock on current->cred_replace_mutex that prepare_creds() got
 508and then releases the new credentials.
 509
 510
 511A typical credentials alteration function would look something like this:
 512
 513        int alter_suid(uid_t suid)
 514        {
 515                struct cred *new;
 516                int ret;
 517
 518                new = prepare_creds();
 519                if (!new)
 520                        return -ENOMEM;
 521
 522                new->suid = suid;
 523                ret = security_alter_suid(new);
 524                if (ret < 0) {
 525                        abort_creds(new);
 526                        return ret;
 527                }
 528
 529                return commit_creds(new);
 530        }
 531
 532
 533MANAGING CREDENTIALS
 534--------------------
 535
 536There are some functions to help manage credentials:
 537
 538 (*) void put_cred(const struct cred *cred);
 539
 540     This releases a reference to the given set of credentials.  If the
 541     reference count reaches zero, the credentials will be scheduled for
 542     destruction by the RCU system.
 543
 544 (*) const struct cred *get_cred(const struct cred *cred);
 545
 546     This gets a reference on a live set of credentials, returning a pointer to
 547     that set of credentials.
 548
 549 (*) struct cred *get_new_cred(struct cred *cred);
 550
 551     This gets a reference on a set of credentials that is under construction
 552     and is thus still mutable, returning a pointer to that set of credentials.
 553
 554
 555=====================
 556OPEN FILE CREDENTIALS
 557=====================
 558
 559When a new file is opened, a reference is obtained on the opening task's
 560credentials and this is attached to the file struct as 'f_cred' in place of
 561'f_uid' and 'f_gid'.  Code that used to access file->f_uid and file->f_gid
 562should now access file->f_cred->fsuid and file->f_cred->fsgid.
 563
 564It is safe to access f_cred without the use of RCU or locking because the
 565pointer will not change over the lifetime of the file struct, and nor will the
 566contents of the cred struct pointed to, barring the exceptions listed above
 567(see the Task Credentials section).
 568
 569
 570=======================================
 571OVERRIDING THE VFS'S USE OF CREDENTIALS
 572=======================================
 573
 574Under some circumstances it is desirable to override the credentials used by
 575the VFS, and that can be done by calling into such as vfs_mkdir() with a
 576different set of credentials.  This is done in the following places:
 577
 578 (*) sys_faccessat().
 579
 580 (*) do_coredump().
 581
 582 (*) nfs4recover.c.
 583