linux/Documentation/MSI-HOWTO.txt
<<
>>
Prefs
   1                The MSI Driver Guide HOWTO
   2        Tom L Nguyen tom.l.nguyen@intel.com
   3                        10/03/2003
   4        Revised Feb 12, 2004 by Martine Silbermann
   5                email: Martine.Silbermann@hp.com
   6        Revised Jun 25, 2004 by Tom L Nguyen
   7
   81. About this guide
   9
  10This guide describes the basics of Message Signaled Interrupts (MSI),
  11the advantages of using MSI over traditional interrupt mechanisms,
  12and how to enable your driver to use MSI or MSI-X. Also included is
  13a Frequently Asked Questions (FAQ) section.
  14
  151.1 Terminology
  16
  17PCI devices can be single-function or multi-function.  In either case,
  18when this text talks about enabling or disabling MSI on a "device
  19function," it is referring to one specific PCI device and function and
  20not to all functions on a PCI device (unless the PCI device has only
  21one function).
  22
  232. Copyright 2003 Intel Corporation
  24
  253. What is MSI/MSI-X?
  26
  27Message Signaled Interrupt (MSI), as described in the PCI Local Bus
  28Specification Revision 2.3 or later, is an optional feature, and a
  29required feature for PCI Express devices. MSI enables a device function
  30to request service by sending an Inbound Memory Write on its PCI bus to
  31the FSB as a Message Signal Interrupt transaction. Because MSI is
  32generated in the form of a Memory Write, all transaction conditions,
  33such as a Retry, Master-Abort, Target-Abort or normal completion, are
  34supported.
  35
  36A PCI device that supports MSI must also support pin IRQ assertion
  37interrupt mechanism to provide backward compatibility for systems that
  38do not support MSI. In systems which support MSI, the bus driver is
  39responsible for initializing the message address and message data of
  40the device function's MSI/MSI-X capability structure during device
  41initial configuration.
  42
  43An MSI capable device function indicates MSI support by implementing
  44the MSI/MSI-X capability structure in its PCI capability list. The
  45device function may implement both the MSI capability structure and
  46the MSI-X capability structure; however, the bus driver should not
  47enable both.
  48
  49The MSI capability structure contains Message Control register,
  50Message Address register and Message Data register. These registers
  51provide the bus driver control over MSI. The Message Control register
  52indicates the MSI capability supported by the device. The Message
  53Address register specifies the target address and the Message Data
  54register specifies the characteristics of the message. To request
  55service, the device function writes the content of the Message Data
  56register to the target address. The device and its software driver
  57are prohibited from writing to these registers.
  58
  59The MSI-X capability structure is an optional extension to MSI. It
  60uses an independent and separate capability structure. There are
  61some key advantages to implementing the MSI-X capability structure
  62over the MSI capability structure as described below.
  63
  64        - Support a larger maximum number of vectors per function.
  65
  66        - Provide the ability for system software to configure
  67        each vector with an independent message address and message
  68        data, specified by a table that resides in Memory Space.
  69
  70        - MSI and MSI-X both support per-vector masking. Per-vector
  71        masking is an optional extension of MSI but a required
  72        feature for MSI-X. Per-vector masking provides the kernel the
  73        ability to mask/unmask a single MSI while running its
  74        interrupt service routine. If per-vector masking is
  75        not supported, then the device driver should provide the
  76        hardware/software synchronization to ensure that the device
  77        generates MSI when the driver wants it to do so.
  78
  794. Why use MSI?
  80
  81As a benefit to the simplification of board design, MSI allows board
  82designers to remove out-of-band interrupt routing. MSI is another
  83step towards a legacy-free environment.
  84
  85Due to increasing pressure on chipset and processor packages to
  86reduce pin count, the need for interrupt pins is expected to
  87diminish over time. Devices, due to pin constraints, may implement
  88messages to increase performance.
  89
  90PCI Express endpoints uses INTx emulation (in-band messages) instead
  91of IRQ pin assertion. Using INTx emulation requires interrupt
  92sharing among devices connected to the same node (PCI bridge) while
  93MSI is unique (non-shared) and does not require BIOS configuration
  94support. As a result, the PCI Express technology requires MSI
  95support for better interrupt performance.
  96
  97Using MSI enables the device functions to support two or more
  98vectors, which can be configured to target different CPUs to
  99increase scalability.
 100
 1015. Configuring a driver to use MSI/MSI-X
 102
 103By default, the kernel will not enable MSI/MSI-X on all devices that
 104support this capability. The CONFIG_PCI_MSI kernel option
 105must be selected to enable MSI/MSI-X support.
 106
 1075.1 Including MSI/MSI-X support into the kernel
 108
 109To allow MSI/MSI-X capable device drivers to selectively enable
 110MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described
 111below), the VECTOR based scheme needs to be enabled by setting
 112CONFIG_PCI_MSI during kernel config.
 113
 114Since the target of the inbound message is the local APIC, providing
 115CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.
 116
 1175.2 Configuring for MSI support
 118
 119Due to the non-contiguous fashion in vector assignment of the
 120existing Linux kernel, this version does not support multiple
 121messages regardless of a device function is capable of supporting
 122more than one vector. To enable MSI on a device function's MSI
 123capability structure requires a device driver to call the function
 124pci_enable_msi() explicitly.
 125
 1265.2.1 API pci_enable_msi
 127
 128int pci_enable_msi(struct pci_dev *dev)
 129
 130With this new API, a device driver that wants to have MSI
 131enabled on its device function must call this API to enable MSI.
 132A successful call will initialize the MSI capability structure
 133with ONE vector, regardless of whether a device function is
 134capable of supporting multiple messages. This vector replaces the
 135pre-assigned dev->irq with a new MSI vector. To avoid a conflict
 136of the new assigned vector with existing pre-assigned vector requires
 137a device driver to call this API before calling request_irq().
 138
 1395.2.2 API pci_disable_msi
 140
 141void pci_disable_msi(struct pci_dev *dev)
 142
 143This API should always be used to undo the effect of pci_enable_msi()
 144when a device driver is unloading. This API restores dev->irq with
 145the pre-assigned IOAPIC vector and switches a device's interrupt
 146mode to PCI pin-irq assertion/INTx emulation mode.
 147
 148Note that a device driver should always call free_irq() on the MSI vector
 149that it has done request_irq() on before calling this API. Failure to do
 150so results in a BUG_ON() and a device will be left with MSI enabled and
 151leaks its vector.
 152
 1535.2.3 MSI mode vs. legacy mode diagram
 154
 155The below diagram shows the events which switch the interrupt
 156mode on the MSI-capable device function between MSI mode and
 157PIN-IRQ assertion mode.
 158
 159         ------------   pci_enable_msi   ------------------------
 160        |            | <=============== |                        |
 161        | MSI MODE   |                  | PIN-IRQ ASSERTION MODE |
 162        |            | ===============> |                        |
 163         ------------   pci_disable_msi  ------------------------
 164
 165
 166Figure 1. MSI Mode vs. Legacy Mode
 167
 168In Figure 1, a device operates by default in legacy mode. Legacy
 169in this context means PCI pin-irq assertion or PCI-Express INTx
 170emulation. A successful MSI request (using pci_enable_msi()) switches
 171a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector
 172stored in dev->irq will be saved by the PCI subsystem and a new
 173assigned MSI vector will replace dev->irq.
 174
 175To return back to its default mode, a device driver should always call
 176pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a
 177device driver should always call free_irq() on the MSI vector it has
 178done request_irq() on before calling pci_disable_msi(). Failure to do
 179so results in a BUG_ON() and a device will be left with MSI enabled and
 180leaks its vector. Otherwise, the PCI subsystem restores a device's
 181dev->irq with a pre-assigned IOAPIC vector and marks the released
 182MSI vector as unused.
 183
 184Once being marked as unused, there is no guarantee that the PCI
 185subsystem will reserve this MSI vector for a device. Depending on
 186the availability of current PCI vector resources and the number of
 187MSI/MSI-X requests from other drivers, this MSI may be re-assigned.
 188
 189For the case where the PCI subsystem re-assigns this MSI vector to
 190another driver, a request to switch back to MSI mode may result
 191in being assigned a different MSI vector or a failure if no more
 192vectors are available.
 193
 1945.3 Configuring for MSI-X support
 195
 196Due to the ability of the system software to configure each vector of
 197the MSI-X capability structure with an independent message address
 198and message data, the non-contiguous fashion in vector assignment of
 199the existing Linux kernel has no impact on supporting multiple
 200messages on an MSI-X capable device functions. To enable MSI-X on
 201a device function's MSI-X capability structure requires its device
 202driver to call the function pci_enable_msix() explicitly.
 203
 204The function pci_enable_msix(), once invoked, enables either
 205all or nothing, depending on the current availability of PCI vector
 206resources. If the PCI vector resources are available for the number
 207of vectors requested by a device driver, this function will configure
 208the MSI-X table of the MSI-X capability structure of a device with
 209requested messages. To emphasize this reason, for example, a device
 210may be capable for supporting the maximum of 32 vectors while its
 211software driver usually may request 4 vectors. It is recommended
 212that the device driver should call this function once during the
 213initialization phase of the device driver.
 214
 215Unlike the function pci_enable_msi(), the function pci_enable_msix()
 216does not replace the pre-assigned IOAPIC dev->irq with a new MSI
 217vector because the PCI subsystem writes the 1:1 vector-to-entry mapping
 218into the field vector of each element contained in a second argument.
 219Note that the pre-assigned IOAPIC dev->irq is valid only if the device
 220operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at
 221using dev->irq by the device driver to request for interrupt service
 222may result in unpredictable behavior.
 223
 224For each MSI-X vector granted, a device driver is responsible for calling
 225other functions like request_irq(), enable_irq(), etc. to enable
 226this vector with its corresponding interrupt service handler. It is
 227a device driver's choice to assign all vectors with the same
 228interrupt service handler or each vector with a unique interrupt
 229service handler.
 230
 2315.3.1 Handling MMIO address space of MSI-X Table
 232
 233The PCI 3.0 specification has implementation notes that MMIO address
 234space for a device's MSI-X structure should be isolated so that the
 235software system can set different pages for controlling accesses to the
 236MSI-X structure. The implementation of MSI support requires the PCI
 237subsystem, not a device driver, to maintain full control of the MSI-X
 238table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X
 239table/MSI-X PBA.  A device driver is prohibited from requesting the MMIO
 240address space of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem
 241will fail enabling MSI-X on its hardware device when it calls the function
 242pci_enable_msix().
 243
 2445.3.2 API pci_enable_msix
 245
 246int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
 247
 248This API enables a device driver to request the PCI subsystem
 249to enable MSI-X messages on its hardware device. Depending on
 250the availability of PCI vectors resources, the PCI subsystem enables
 251either all or none of the requested vectors.
 252
 253Argument 'dev' points to the device (pci_dev) structure.
 254
 255Argument 'entries' is a pointer to an array of msix_entry structs.
 256The number of entries is indicated in argument 'nvec'.
 257struct msix_entry is defined in /driver/pci/msi.h:
 258
 259struct msix_entry {
 260        u16     vector; /* kernel uses to write alloc vector */
 261        u16     entry; /* driver uses to specify entry */
 262};
 263
 264A device driver is responsible for initializing the field 'entry' of
 265each element with a unique entry supported by MSI-X table. Otherwise,
 266-EINVAL will be returned as a result. A successful return of zero
 267indicates the PCI subsystem completed initializing each of the requested
 268entries of the MSI-X table with message address and message data.
 269Last but not least, the PCI subsystem will write the 1:1
 270vector-to-entry mapping into the field 'vector' of each element. A
 271device driver is responsible for keeping track of allocated MSI-X
 272vectors in its internal data structure.
 273
 274A return of zero indicates that the number of MSI-X vectors was
 275successfully allocated. A return of greater than zero indicates
 276MSI-X vector shortage. Or a return of less than zero indicates
 277a failure. This failure may be a result of duplicate entries
 278specified in second argument, or a result of no available vector,
 279or a result of failing to initialize MSI-X table entries.
 280
 2815.3.3 API pci_disable_msix
 282
 283void pci_disable_msix(struct pci_dev *dev)
 284
 285This API should always be used to undo the effect of pci_enable_msix()
 286when a device driver is unloading. Note that a device driver should
 287always call free_irq() on all MSI-X vectors it has done request_irq()
 288on before calling this API. Failure to do so results in a BUG_ON() and
 289a device will be left with MSI-X enabled and leaks its vectors.
 290
 2915.3.4 MSI-X mode vs. legacy mode diagram
 292
 293The below diagram shows the events which switch the interrupt
 294mode on the MSI-X capable device function between MSI-X mode and
 295PIN-IRQ assertion mode (legacy).
 296
 297         ------------   pci_enable_msix(,,n) ------------------------
 298        |            | <===============     |                        |
 299        | MSI-X MODE |                      | PIN-IRQ ASSERTION MODE |
 300        |            | ===============>     |                        |
 301         ------------   pci_disable_msix     ------------------------
 302
 303Figure 2. MSI-X Mode vs. Legacy Mode
 304
 305In Figure 2, a device operates by default in legacy mode. A
 306successful MSI-X request (using pci_enable_msix()) switches a
 307device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector
 308stored in dev->irq will be saved by the PCI subsystem; however,
 309unlike MSI mode, the PCI subsystem will not replace dev->irq with
 310assigned MSI-X vector because the PCI subsystem already writes the 1:1
 311vector-to-entry mapping into the field 'vector' of each element
 312specified in second argument.
 313
 314To return back to its default mode, a device driver should always call
 315pci_disable_msix() to undo the effect of pci_enable_msix(). Note that
 316a device driver should always call free_irq() on all MSI-X vectors it
 317has done request_irq() on before calling pci_disable_msix(). Failure
 318to do so results in a BUG_ON() and a device will be left with MSI-X
 319enabled and leaks its vectors. Otherwise, the PCI subsystem switches a
 320device function's interrupt mode from MSI-X mode to legacy mode and
 321marks all allocated MSI-X vectors as unused.
 322
 323Once being marked as unused, there is no guarantee that the PCI
 324subsystem will reserve these MSI-X vectors for a device. Depending on
 325the availability of current PCI vector resources and the number of
 326MSI/MSI-X requests from other drivers, these MSI-X vectors may be
 327re-assigned.
 328
 329For the case where the PCI subsystem re-assigned these MSI-X vectors
 330to other drivers, a request to switch back to MSI-X mode may result
 331being assigned with another set of MSI-X vectors or a failure if no
 332more vectors are available.
 333
 3345.4 Handling function implementing both MSI and MSI-X capabilities
 335
 336For the case where a function implements both MSI and MSI-X
 337capabilities, the PCI subsystem enables a device to run either in MSI
 338mode or MSI-X mode but not both. A device driver determines whether it
 339wants MSI or MSI-X enabled on its hardware device. Once a device
 340driver requests for MSI, for example, it is prohibited from requesting
 341MSI-X; in other words, a device driver is not permitted to ping-pong
 342between MSI mod MSI-X mode during a run-time.
 343
 3445.5 Hardware requirements for MSI/MSI-X support
 345
 346MSI/MSI-X support requires support from both system hardware and
 347individual hardware device functions.
 348
 3495.5.1 Required x86 hardware support
 350
 351Since the target of MSI address is the local APIC CPU, enabling
 352MSI/MSI-X support in the Linux kernel is dependent on whether existing
 353system hardware supports local APIC. Users should verify that their
 354system supports local APIC operation by testing that it runs when
 355CONFIG_X86_LOCAL_APIC=y.
 356
 357In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
 358however, in UP environment, users must manually set
 359CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting
 360CONFIG_PCI_MSI enables the VECTOR based scheme and the option for
 361MSI-capable device drivers to selectively enable MSI/MSI-X.
 362
 363Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X
 364vector is allocated new during runtime and MSI/MSI-X support does not
 365depend on BIOS support. This key independency enables MSI/MSI-X
 366support on future IOxAPIC free platforms.
 367
 3685.5.2 Device hardware support
 369
 370The hardware device function supports MSI by indicating the
 371MSI/MSI-X capability structure on its PCI capability list. By
 372default, this capability structure will not be initialized by
 373the kernel to enable MSI during the system boot. In other words,
 374the device function is running on its default pin assertion mode.
 375Note that in many cases the hardware supporting MSI have bugs,
 376which may result in system hangs. The software driver of specific
 377MSI-capable hardware is responsible for deciding whether to call
 378pci_enable_msi or not. A return of zero indicates the kernel
 379successfully initialized the MSI/MSI-X capability structure of the
 380device function. The device function is now running on MSI/MSI-X mode.
 381
 3825.6 How to tell whether MSI/MSI-X is enabled on device function
 383
 384At the driver level, a return of zero from the function call of
 385pci_enable_msi()/pci_enable_msix() indicates to a device driver that
 386its device function is initialized successfully and ready to run in
 387MSI/MSI-X mode.
 388
 389At the user level, users can use the command 'cat /proc/interrupts'
 390to display the vectors allocated for devices and their interrupt
 391MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is
 392enabled on a SCSI Adaptec 39320D Ultra320 controller.
 393
 394           CPU0       CPU1
 395  0:     324639          0    IO-APIC-edge  timer
 396  1:       1186          0    IO-APIC-edge  i8042
 397  2:          0          0          XT-PIC  cascade
 398 12:       2797          0    IO-APIC-edge  i8042
 399 14:       6543          0    IO-APIC-edge  ide0
 400 15:          1          0    IO-APIC-edge  ide1
 401169:          0          0   IO-APIC-level  uhci-hcd
 402185:          0          0   IO-APIC-level  uhci-hcd
 403193:        138         10         PCI-MSI  aic79xx
 404201:         30          0         PCI-MSI  aic79xx
 405225:         30          0   IO-APIC-level  aic7xxx
 406233:         30          0   IO-APIC-level  aic7xxx
 407NMI:          0          0
 408LOC:     324553     325068
 409ERR:          0
 410MIS:          0
 411
 4126. MSI quirks
 413
 414Several PCI chipsets or devices are known to not support MSI.
 415The PCI stack provides 3 possible levels of MSI disabling:
 416* on a single device
 417* on all devices behind a specific bridge
 418* globally
 419
 4206.1. Disabling MSI on a single device
 421
 422Under some circumstances it might be required to disable MSI on a
 423single device.  This may be achieved by either not calling pci_enable_msi()
 424or all, or setting the pci_dev->no_msi flag before (most of the time
 425in a quirk).
 426
 4276.2. Disabling MSI below a bridge
 428
 429The vast majority of MSI quirks are required by PCI bridges not
 430being able to route MSI between busses. In this case, MSI have to be
 431disabled on all devices behind this bridge. It is achieves by setting
 432the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge
 433subordinate bus. There is no need to set the same flag on bridges that
 434are below the broken bridge. When pci_enable_msi() is called to enable
 435MSI on a device, pci_msi_supported() takes care of checking the NO_MSI
 436flag in all parent busses of the device.
 437
 438Some bridges actually support dynamic MSI support enabling/disabling
 439by changing some bits in their PCI configuration space (especially
 440the Hypertransport chipsets such as the nVidia nForce and Serverworks
 441HT2000). It may then be required to update the NO_MSI flag on the
 442corresponding devices in the sysfs hierarchy. To enable MSI support
 443on device "0000:00:0e", do:
 444
 445        echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus
 446
 447To disable MSI support, echo 0 instead of 1. Note that it should be
 448used with caution since changing this value might break interrupts.
 449
 4506.3. Disabling MSI globally
 451
 452Some extreme cases may require to disable MSI globally on the system.
 453For now, the only known case is a Serverworks PCI-X chipsets (MSI are
 454not supported on several busses that are not all connected to the
 455chipset in the Linux PCI hierarchy). In the vast majority of other
 456cases, disabling only behind a specific bridge is enough.
 457
 458For debugging purpose, the user may also pass pci=nomsi on the kernel
 459command-line to explicitly disable MSI globally. But, once the appro-
 460priate quirks are added to the kernel, this option should not be
 461required anymore.
 462
 4636.4. Finding why MSI cannot be enabled on a device
 464
 465Assuming that MSI are not enabled on a device, you should look at
 466dmesg to find messages that quirks may output when disabling MSI
 467on some devices, some bridges or even globally.
 468Then, lspci -t gives the list of bridges above a device. Reading
 469/sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI
 470are enabled (1) or disabled (0). In 0 is found in a single bridge
 471msi_bus file above the device, MSI cannot be enabled.
 472
 4737. FAQ
 474
 475Q1. Are there any limitations on using the MSI?
 476
 477A1. If the PCI device supports MSI and conforms to the
 478specification and the platform supports the APIC local bus,
 479then using MSI should work.
 480
 481Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
 482AMD processors)? In P3 IPI's are transmitted on the APIC local
 483bus and in P4 and Xeon they are transmitted on the system
 484bus. Are there any implications with this?
 485
 486A2. MSI support enables a PCI device sending an inbound
 487memory write (0xfeexxxxx as target address) on its PCI bus
 488directly to the FSB. Since the message address has a
 489redirection hint bit cleared, it should work.
 490
 491Q3. The target address 0xfeexxxxx will be translated by the
 492Host Bridge into an interrupt message. Are there any
 493limitations on the chipsets such as Intel 8xx, Intel e7xxx,
 494or VIA?
 495
 496A3. If these chipsets support an inbound memory write with
 497target address set as 0xfeexxxxx, as conformed to PCI
 498specification 2.3 or latest, then it should work.
 499
 500Q4. From the driver point of view, if the MSI is lost because
 501of errors occurring during inbound memory write, then it may
 502wait forever. Is there a mechanism for it to recover?
 503
 504A4. Since the target of the transaction is an inbound memory
 505write, all transaction termination conditions (Retry,
 506Master-Abort, Target-Abort, or normal completion) are
 507supported. A device sending an MSI must abide by all the PCI
 508rules and conditions regarding that inbound memory write. So,
 509if a retry is signaled it must retry, etc... We believe that
 510the recommendation for Abort is also a retry (refer to PCI
 511specification 2.3 or latest).
 512