linux/Documentation/ntb.txt
<<
>>
Prefs
   1===========
   2NTB Drivers
   3===========
   4
   5NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
   6the separate memory systems of two or more computers to the same PCI-Express
   7fabric. Existing NTB hardware supports a common feature set: doorbell
   8registers and memory translation windows, as well as non common features like
   9scratchpad and message registers. Scratchpad registers are read-and-writable
  10registers that are accessible from either side of the device, so that peers can
  11exchange a small amount of information at a fixed address. Message registers can
  12be utilized for the same purpose. Additionally they are provided with with
  13special status bits to make sure the information isn't rewritten by another
  14peer. Doorbell registers provide a way for peers to send interrupt events.
  15Memory windows allow translated read and write access to the peer memory.
  16
  17NTB Core Driver (ntb)
  18=====================
  19
  20The NTB core driver defines an api wrapping the common feature set, and allows
  21clients interested in NTB features to discover NTB the devices supported by
  22hardware drivers.  The term "client" is used here to mean an upper layer
  23component making use of the NTB api.  The term "driver," or "hardware driver,"
  24is used here to mean a driver for a specific vendor and model of NTB hardware.
  25
  26NTB Client Drivers
  27==================
  28
  29NTB client drivers should register with the NTB core driver.  After
  30registering, the client probe and remove functions will be called appropriately
  31as ntb hardware, or hardware drivers, are inserted and removed.  The
  32registration uses the Linux Device framework, so it should feel familiar to
  33anyone who has written a pci driver.
  34
  35NTB Typical client driver implementation
  36----------------------------------------
  37
  38Primary purpose of NTB is to share some peace of memory between at least two
  39systems. So the NTB device features like Scratchpad/Message registers are
  40mainly used to perform the proper memory window initialization. Typically
  41there are two types of memory window interfaces supported by the NTB API:
  42inbound translation configured on the local ntb port and outbound translation
  43configured by the peer, on the peer ntb port. The first type is
  44depicted on the next figure
  45
  46Inbound translation:
  47 Memory:              Local NTB Port:      Peer NTB Port:      Peer MMIO:
  48  ____________
  49 | dma-mapped |-ntb_mw_set_trans(addr)  |
  50 | memory     |        _v____________   |   ______________
  51 | (addr)     |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
  52 |------------|       |--------------|  |  |--------------|
  53
  54So typical scenario of the first type memory window initialization looks:
  551) allocate a memory region, 2) put translated address to NTB config,
  563) somehow notify a peer device of performed initialization, 4) peer device
  57maps corresponding outbound memory window so to have access to the shared
  58memory region.
  59
  60The second type of interface, that implies the shared windows being
  61initialized by a peer device, is depicted on the figure:
  62
  63Outbound translation:
  64 Memory:        Local NTB Port:    Peer NTB Port:      Peer MMIO:
  65  ____________                      ______________
  66 | dma-mapped |                |   | MW base addr |<== memory-mapped IO
  67 | memory     |                |   |--------------|
  68 | (addr)     |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
  69 |------------|                |   |--------------|
  70
  71Typical scenario of the second type interface initialization would be:
  721) allocate a memory region, 2) somehow deliver a translated address to a peer
  73device, 3) peer puts the translated address to NTB config, 4) peer device maps
  74outbound memory window so to have access to the shared memory region.
  75
  76As one can see the described scenarios can be combined in one portable
  77algorithm.
  78 Local device:
  79  1) Allocate memory for a shared window
  80  2) Initialize memory window by translated address of the allocated region
  81     (it may fail if local memory window initialization is unsupported)
  82  3) Send the translated address and memory window index to a peer device
  83 Peer device:
  84  1) Initialize memory window with retrieved address of the allocated
  85     by another device memory region (it may fail if peer memory window
  86     initialization is unsupported)
  87  2) Map outbound memory window
  88
  89In accordance with this scenario, the NTB Memory Window API can be used as
  90follows:
  91 Local device:
  92  1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
  93     be allocated for memory windows between local device and peer device
  94     of port with specified index.
  95  2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
  96     shared memory region alignment and size. Then memory can be properly
  97     allocated.
  98  3) Allocate physically contiguous memory region in compliance with
  99     restrictions retrieved in 2).
 100  4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
 101     the memory window with specified index for the defined peer device
 102     (it may fail if local translated address setting is not supported)
 103  5) Send translated base address (usually together with memory window
 104     number) to the peer device using, for instance, scratchpad or message
 105     registers.
 106 Peer device:
 107  1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
 108     device (related to pidx) translated address for specified memory
 109     window. It may fail if retrieved address, for instance, exceeds
 110     maximum possible address or isn't properly aligned.
 111  2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
 112     window so to have an access to the shared memory.
 113
 114Also it is worth to note, that method ntb_mw_count(pidx) should return the
 115same value as ntb_peer_mw_count() on the peer with port index - pidx.
 116
 117NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
 118------------------------------------------------------------------
 119
 120The primary client for NTB is the Transport client, used in tandem with NTB
 121Netdev.  These drivers function together to create a logical link to the peer,
 122across the ntb, to exchange packets of network data.  The Transport client
 123establishes a logical link to the peer, and creates queue pairs to exchange
 124messages and data.  The NTB Netdev then creates an ethernet device using a
 125Transport queue pair.  Network data is copied between socket buffers and the
 126Transport queue pair buffer.  The Transport client may be used for other things
 127besides Netdev, however no other applications have yet been written.
 128
 129NTB Ping Pong Test Client (ntb\_pingpong)
 130-----------------------------------------
 131
 132The Ping Pong test client serves as a demonstration to exercise the doorbell
 133and scratchpad registers of NTB hardware, and as an example simple NTB client.
 134Ping Pong enables the link when started, waits for the NTB link to come up, and
 135then proceeds to read and write the doorbell scratchpad registers of the NTB.
 136The peers interrupt each other using a bit mask of doorbell bits, which is
 137shifted by one in each round, to test the behavior of multiple doorbell bits
 138and interrupt vectors.  The Ping Pong driver also reads the first local
 139scratchpad, and writes the value plus one to the first peer scratchpad, each
 140round before writing the peer doorbell register.
 141
 142Module Parameters:
 143
 144* unsafe - Some hardware has known issues with scratchpad and doorbell
 145        registers.  By default, Ping Pong will not attempt to exercise such
 146        hardware.  You may override this behavior at your own risk by setting
 147        unsafe=1.
 148* delay\_ms - Specify the delay between receiving a doorbell
 149        interrupt event and setting the peer doorbell register for the next
 150        round.
 151* init\_db - Specify the doorbell bits to start new series of rounds.  A new
 152        series begins once all the doorbell bits have been shifted out of
 153        range.
 154* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
 155        then to observe debugging output on the console.
 156
 157NTB Tool Test Client (ntb\_tool)
 158--------------------------------
 159
 160The Tool test client serves for debugging, primarily, ntb hardware and drivers.
 161The Tool provides access through debugfs for reading, setting, and clearing the
 162NTB doorbell, and reading and writing scratchpads.
 163
 164The Tool does not currently have any module parameters.
 165
 166Debugfs Files:
 167
 168* *debugfs*/ntb\_tool/*hw*/
 169        A directory in debugfs will be created for each
 170        NTB device probed by the tool.  This directory is shortened to *hw*
 171        below.
 172* *hw*/db
 173        This file is used to read, set, and clear the local doorbell.  Not
 174        all operations may be supported by all hardware.  To read the doorbell,
 175        read the file.  To set the doorbell, write `s` followed by the bits to
 176        set (eg: `echo 's 0x0101' > db`).  To clear the doorbell, write `c`
 177        followed by the bits to clear.
 178* *hw*/mask
 179        This file is used to read, set, and clear the local doorbell mask.
 180        See *db* for details.
 181* *hw*/peer\_db
 182        This file is used to read, set, and clear the peer doorbell.
 183        See *db* for details.
 184* *hw*/peer\_mask
 185        This file is used to read, set, and clear the peer doorbell
 186        mask.  See *db* for details.
 187* *hw*/spad
 188        This file is used to read and write local scratchpads.  To read
 189        the values of all scratchpads, read the file.  To write values, write a
 190        series of pairs of scratchpad number and value
 191        (eg: `echo '4 0x123 7 0xabc' > spad`
 192        # to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
 193* *hw*/peer\_spad
 194        This file is used to read and write peer scratchpads.  See
 195        *spad* for details.
 196
 197NTB Hardware Drivers
 198====================
 199
 200NTB hardware drivers should register devices with the NTB core driver.  After
 201registering, clients probe and remove functions will be called.
 202
 203NTB Intel Hardware Driver (ntb\_hw\_intel)
 204------------------------------------------
 205
 206The Intel hardware driver supports NTB on Xeon and Atom CPUs.
 207
 208Module Parameters:
 209
 210* b2b\_mw\_idx
 211        If the peer ntb is to be accessed via a memory window, then use
 212        this memory window to access the peer ntb.  A value of zero or positive
 213        starts from the first mw idx, and a negative value starts from the last
 214        mw idx.  Both sides MUST set the same value here!  The default value is
 215        `-1`.
 216* b2b\_mw\_share
 217        If the peer ntb is to be accessed via a memory window, and if
 218        the memory window is large enough, still allow the client to use the
 219        second half of the memory window for address translation to the peer.
 220* xeon\_b2b\_usd\_bar2\_addr64
 221        If using B2B topology on Xeon hardware, use
 222        this 64 bit address on the bus between the NTB devices for the window
 223        at BAR2, on the upstream side of the link.
 224* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
 225* xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
 226* xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
 227* xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
 228* xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
 229* xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
 230* xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
 231