qemu/docs/specs/ppc-xive.rst
<<
>>
Prefs
   1================================
   2POWER9 XIVE interrupt controller
   3================================
   4
   5The POWER9 processor comes with a new interrupt controller
   6architecture, called XIVE as "eXternal Interrupt Virtualization
   7Engine".
   8
   9Compared to the previous architecture, the main characteristics of
  10XIVE are to support a larger number of interrupt sources and to
  11deliver interrupts directly to virtual processors without hypervisor
  12assistance. This removes the context switches required for the
  13delivery process.
  14
  15
  16XIVE architecture
  17=================
  18
  19The XIVE IC is composed of three sub-engines, each taking care of a
  20processing layer of external interrupts:
  21
  22- Interrupt Virtualization Source Engine (IVSE), or Source Controller
  23  (SC). These are found in PCI PHBs, in the Processor Service
  24  Interface (PSI) host bridge Controller, but also inside the main
  25  controller for the core IPIs and other sub-chips (NX, CAP, NPU) of
  26  the chip/processor. They are configured to feed the IVRE with
  27  events.
  28- Interrupt Virtualization Routing Engine (IVRE) or Virtualization
  29  Controller (VC). It handles event coalescing and perform interrupt
  30  routing by matching an event source number with an Event
  31  Notification Descriptor (END).
  32- Interrupt Virtualization Presentation Engine (IVPE) or Presentation
  33  Controller (PC). It maintains the interrupt context state of each
  34  thread and handles the delivery of the external interrupt to the
  35  thread.
  36
  37::
  38
  39                XIVE Interrupt Controller
  40                +------------------------------------+      IPIs
  41                | +---------+ +---------+ +--------+ |    +-------+
  42                | |IVRE     | |Common Q | |IVPE    |----> | CORES |
  43                | |     esb | |         | |        |----> |       |
  44                | |     eas | |  Bridge | |   tctx |----> |       |
  45                | |SC   end | |         | |    nvt | |    |       |
  46    +------+    | +---------+ +----+----+ +--------+ |    +-+-+-+-+
  47    | RAM  |    +------------------|-----------------+      | | |
  48    |      |                       |                        | | |
  49    |      |                       |                        | | |
  50    |      |  +--------------------v------------------------v-v-v--+    other
  51    |      <--+                     Power Bus                      +--> chips
  52    |  esb |  +---------+-----------------------+------------------+
  53    |  eas |            |                       |
  54    |  end |         +--|------+                |
  55    |  nvt |       +----+----+ |           +----+----+
  56    +------+       |IVSE     | |           |IVSE     |
  57                   |         | |           |         |
  58                   | PQ-bits | |           | PQ-bits |
  59                   | local   |-+           |  in VC  |
  60                   +---------+             +---------+
  61                      PCIe                 NX,NPU,CAPI
  62
  63
  64    PQ-bits: 2 bits source state machine (P:pending Q:queued)
  65    esb: Event State Buffer (Array of PQ bits in an IVSE)
  66    eas: Event Assignment Structure
  67    end: Event Notification Descriptor
  68    nvt: Notification Virtual Target
  69    tctx: Thread interrupt Context registers
  70
  71
  72
  73XIVE internal tables
  74--------------------
  75
  76Each of the sub-engines uses a set of tables to redirect interrupts
  77from event sources to CPU threads.
  78
  79::
  80
  81                                            +-------+
  82    User or O/S                             |  EQ   |
  83        or                          +------>|entries|
  84    Hypervisor                      |       |  ..   |
  85      Memory                        |       +-------+
  86                                    |           ^
  87                                    |           |
  88               +-------------------------------------------------+
  89                                    |           |
  90    Hypervisor      +------+    +---+--+    +---+--+   +------+
  91      Memory        | ESB  |    | EAT  |    | ENDT |   | NVTT |
  92     (skiboot)      +----+-+    +----+-+    +----+-+   +------+
  93                      ^  |        ^  |        ^  |       ^
  94                      |  |        |  |        |  |       |
  95               +-------------------------------------------------+
  96                      |  |        |  |        |  |       |
  97                      |  |        |  |        |  |       |
  98                 +----|--|--------|--|--------|--|-+   +-|-----+    +------+
  99                 |    |  |        |  |        |  | |   | | tctx|    |Thread|
 100     IPI or   ---+    +  v        +  v        +  v |---| +  .. |----->     |
 101    HW events    |                                 |   |       |    |      |
 102                 |             IVRE                |   | IVPE  |    +------+
 103                 +---------------------------------+   +-------+
 104
 105
 106The IVSE have a 2-bits state machine, P for pending and Q for queued,
 107for each source that allows events to be triggered. They are stored in
 108an Event State Buffer (ESB) array and can be controlled by MMIOs.
 109
 110If the event is let through, the IVRE looks up in the Event Assignment
 111Structure (EAS) table for an Event Notification Descriptor (END)
 112configured for the source. Each Event Notification Descriptor defines
 113a notification path to a CPU and an in-memory Event Queue, in which
 114will be enqueued an EQ data for the O/S to pull.
 115
 116The IVPE determines if a Notification Virtual Target (NVT) can handle
 117the event by scanning the thread contexts of the VCPUs dispatched on
 118the processor HW threads. It maintains the interrupt context state of
 119each thread in a NVT table.
 120
 121XIVE thread interrupt context
 122-----------------------------
 123
 124The XIVE presenter can generate four different exceptions to its
 125HW threads:
 126
 127- hypervisor exception
 128- O/S exception
 129- Event-Based Branch (user level)
 130- msgsnd (doorbell)
 131
 132Each exception has a state independent from the others called a Thread
 133Interrupt Management context. This context is a set of registers which
 134lets the thread handle priority management and interrupt
 135acknowledgment among other things. The most important ones being :
 136
 137- Interrupt Priority Register  (PIPR)
 138- Interrupt Pending Buffer     (IPB)
 139- Current Processor Priority   (CPPR)
 140- Notification Source Register (NSR)
 141
 142TIMA
 143~~~~
 144
 145The Thread Interrupt Management registers are accessible through a
 146specific MMIO region, called the Thread Interrupt Management Area
 147(TIMA), four aligned pages, each exposing a different view of the
 148registers. First page (page address ending in ``0b00``) gives access
 149to the entire context and is reserved for the ring 0 view for the
 150physical thread context. The second (page address ending in ``0b01``)
 151is for the hypervisor, ring 1 view. The third (page address ending in
 152``0b10``) is for the operating system, ring 2 view. The fourth (page
 153address ending in ``0b11``) is for user level, ring 3 view.
 154
 155Interrupt flow from an O/S perspective
 156~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 157
 158After an event data has been enqueued in the O/S Event Queue, the IVPE
 159raises the bit corresponding to the priority of the pending interrupt
 160in the register IBP (Interrupt Pending Buffer) to indicate that an
 161event is pending in one of the 8 priority queues. The Pending
 162Interrupt Priority Register (PIPR) is also updated using the IPB. This
 163register represent the priority of the most favored pending
 164notification.
 165
 166The PIPR is then compared to the Current Processor Priority
 167Register (CPPR). If it is more favored (numerically less than), the
 168CPU interrupt line is raised and the EO bit of the Notification Source
 169Register (NSR) is updated to notify the presence of an exception for
 170the O/S. The O/S acknowledges the interrupt with a special load in the
 171Thread Interrupt Management Area.
 172
 173The O/S handles the interrupt and when done, performs an EOI using a
 174MMIO operation on the ESB management page of the associate source.
 175
 176Overview of the QEMU models for XIVE
 177====================================
 178
 179The XiveSource models the IVSE in general, internal and external. It
 180handles the source ESBs and the MMIO interface to control them.
 181
 182The XiveNotifier is a small helper interface interconnecting the
 183XiveSource to the XiveRouter.
 184
 185The XiveRouter is an abstract model acting as a combined IVRE and
 186IVPE. It routes event notifications using the EAS and END tables to
 187the IVPE sub-engine which does a CAM scan to find a CPU to deliver the
 188exception. Storage should be provided by the inheriting classes.
 189
 190XiveEnDSource is a special source object. It exposes the END ESB MMIOs
 191of the Event Queues which are used for coalescing event notifications
 192and for escalation. Not used on the field, only to sync the EQ cache
 193in OPAL.
 194
 195Finally, the XiveTCTX contains the interrupt state context of a thread,
 196four sets of registers, one for each exception that can be delivered
 197to a CPU. These contexts are scanned by the IVPE to find a matching VP
 198when a notification is triggered. It also models the Thread Interrupt
 199Management Area (TIMA), which exposes the thread context registers to
 200the CPU for interrupt management.
 201