linux/Documentation/vfio-mediated-device.txt
<<
>>
Prefs
   1.. include:: <isonum.txt>
   2
   3=====================
   4VFIO Mediated devices
   5=====================
   6
   7:Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved.
   8:Author: Neo Jia <cjia@nvidia.com>
   9:Author: Kirti Wankhede <kwankhede@nvidia.com>
  10
  11This program is free software; you can redistribute it and/or modify
  12it under the terms of the GNU General Public License version 2 as
  13published by the Free Software Foundation.
  14
  15
  16Virtual Function I/O (VFIO) Mediated devices[1]
  17===============================================
  18
  19The number of use cases for virtualizing DMA devices that do not have built-in
  20SR_IOV capability is increasing. Previously, to virtualize such devices,
  21developers had to create their own management interfaces and APIs, and then
  22integrate them with user space software. To simplify integration with user space
  23software, we have identified common requirements and a unified management
  24interface for such devices.
  25
  26The VFIO driver framework provides unified APIs for direct device access. It is
  27an IOMMU/device-agnostic framework for exposing direct device access to user
  28space in a secure, IOMMU-protected environment. This framework is used for
  29multiple devices, such as GPUs, network adapters, and compute accelerators. With
  30direct device access, virtual machines or user space applications have direct
  31access to the physical device. This framework is reused for mediated devices.
  32
  33The mediated core driver provides a common interface for mediated device
  34management that can be used by drivers of different devices. This module
  35provides a generic interface to perform these operations:
  36
  37* Create and destroy a mediated device
  38* Add a mediated device to and remove it from a mediated bus driver
  39* Add a mediated device to and remove it from an IOMMU group
  40
  41The mediated core driver also provides an interface to register a bus driver.
  42For example, the mediated VFIO mdev driver is designed for mediated devices and
  43supports VFIO APIs. The mediated bus driver adds a mediated device to and
  44removes it from a VFIO group.
  45
  46The following high-level block diagram shows the main components and interfaces
  47in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM
  48devices as examples, as these devices are the first devices to use this module::
  49
  50     +---------------+
  51     |               |
  52     | +-----------+ |  mdev_register_driver() +--------------+
  53     | |           | +<------------------------+              |
  54     | |  mdev     | |                         |              |
  55     | |  bus      | +------------------------>+ vfio_mdev.ko |<-> VFIO user
  56     | |  driver   | |     probe()/remove()    |              |    APIs
  57     | |           | |                         +--------------+
  58     | +-----------+ |
  59     |               |
  60     |  MDEV CORE    |
  61     |   MODULE      |
  62     |   mdev.ko     |
  63     | +-----------+ |  mdev_register_device() +--------------+
  64     | |           | +<------------------------+              |
  65     | |           | |                         |  nvidia.ko   |<-> physical
  66     | |           | +------------------------>+              |    device
  67     | |           | |        callbacks        +--------------+
  68     | | Physical  | |
  69     | |  device   | |  mdev_register_device() +--------------+
  70     | | interface | |<------------------------+              |
  71     | |           | |                         |  i915.ko     |<-> physical
  72     | |           | +------------------------>+              |    device
  73     | |           | |        callbacks        +--------------+
  74     | |           | |
  75     | |           | |  mdev_register_device() +--------------+
  76     | |           | +<------------------------+              |
  77     | |           | |                         | ccw_device.ko|<-> physical
  78     | |           | +------------------------>+              |    device
  79     | |           | |        callbacks        +--------------+
  80     | +-----------+ |
  81     +---------------+
  82
  83
  84Registration Interfaces
  85=======================
  86
  87The mediated core driver provides the following types of registration
  88interfaces:
  89
  90* Registration interface for a mediated bus driver
  91* Physical device driver interface
  92
  93Registration Interface for a Mediated Bus Driver
  94------------------------------------------------
  95
  96The registration interface for a mediated bus driver provides the following
  97structure to represent a mediated device's driver::
  98
  99     /*
 100      * struct mdev_driver [2] - Mediated device's driver
 101      * @name: driver name
 102      * @probe: called when new device created
 103      * @remove: called when device removed
 104      * @driver: device driver structure
 105      */
 106     struct mdev_driver {
 107             const char *name;
 108             int  (*probe)  (struct device *dev);
 109             void (*remove) (struct device *dev);
 110             struct device_driver    driver;
 111     };
 112
 113A mediated bus driver for mdev should use this structure in the function calls
 114to register and unregister itself with the core driver:
 115
 116* Register::
 117
 118    extern int  mdev_register_driver(struct mdev_driver *drv,
 119                                   struct module *owner);
 120
 121* Unregister::
 122
 123    extern void mdev_unregister_driver(struct mdev_driver *drv);
 124
 125The mediated bus driver is responsible for adding mediated devices to the VFIO
 126group when devices are bound to the driver and removing mediated devices from
 127the VFIO when devices are unbound from the driver.
 128
 129
 130Physical Device Driver Interface
 131--------------------------------
 132
 133The physical device driver interface provides the mdev_parent_ops[3] structure
 134to define the APIs to manage work in the mediated core driver that is related
 135to the physical device.
 136
 137The structures in the mdev_parent_ops structure are as follows:
 138
 139* dev_attr_groups: attributes of the parent device
 140* mdev_attr_groups: attributes of the mediated device
 141* supported_config: attributes to define supported configurations
 142
 143The functions in the mdev_parent_ops structure are as follows:
 144
 145* create: allocate basic resources in a driver for a mediated device
 146* remove: free resources in a driver when a mediated device is destroyed
 147
 148The callbacks in the mdev_parent_ops structure are as follows:
 149
 150* open: open callback of mediated device
 151* close: close callback of mediated device
 152* ioctl: ioctl callback of mediated device
 153* read : read emulation callback
 154* write: write emulation callback
 155* mmap: mmap emulation callback
 156
 157A driver should use the mdev_parent_ops structure in the function call to
 158register itself with the mdev core driver::
 159
 160        extern int  mdev_register_device(struct device *dev,
 161                                         const struct mdev_parent_ops *ops);
 162
 163However, the mdev_parent_ops structure is not required in the function call
 164that a driver should use to unregister itself with the mdev core driver::
 165
 166        extern void mdev_unregister_device(struct device *dev);
 167
 168
 169Mediated Device Management Interface Through sysfs
 170==================================================
 171
 172The management interface through sysfs enables user space software, such as
 173libvirt, to query and configure mediated devices in a hardware-agnostic fashion.
 174This management interface provides flexibility to the underlying physical
 175device's driver to support features such as:
 176
 177* Mediated device hot plug
 178* Multiple mediated devices in a single virtual machine
 179* Multiple mediated devices from different physical devices
 180
 181Links in the mdev_bus Class Directory
 182-------------------------------------
 183The /sys/class/mdev_bus/ directory contains links to devices that are registered
 184with the mdev core driver.
 185
 186Directories and files under the sysfs for Each Physical Device
 187--------------------------------------------------------------
 188
 189::
 190
 191  |- [parent physical device]
 192  |--- Vendor-specific-attributes [optional]
 193  |--- [mdev_supported_types]
 194  |     |--- [<type-id>]
 195  |     |   |--- create
 196  |     |   |--- name
 197  |     |   |--- available_instances
 198  |     |   |--- device_api
 199  |     |   |--- description
 200  |     |   |--- [devices]
 201  |     |--- [<type-id>]
 202  |     |   |--- create
 203  |     |   |--- name
 204  |     |   |--- available_instances
 205  |     |   |--- device_api
 206  |     |   |--- description
 207  |     |   |--- [devices]
 208  |     |--- [<type-id>]
 209  |          |--- create
 210  |          |--- name
 211  |          |--- available_instances
 212  |          |--- device_api
 213  |          |--- description
 214  |          |--- [devices]
 215
 216* [mdev_supported_types]
 217
 218  The list of currently supported mediated device types and their details.
 219
 220  [<type-id>], device_api, and available_instances are mandatory attributes
 221  that should be provided by vendor driver.
 222
 223* [<type-id>]
 224
 225  The [<type-id>] name is created by adding the device driver string as a prefix
 226  to the string provided by the vendor driver. This format of this name is as
 227  follows::
 228
 229        sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);
 230
 231  (or using mdev_parent_dev(mdev) to arrive at the parent device outside
 232  of the core mdev code)
 233
 234* device_api
 235
 236  This attribute should show which device API is being created, for example,
 237  "vfio-pci" for a PCI device.
 238
 239* available_instances
 240
 241  This attribute should show the number of devices of type <type-id> that can be
 242  created.
 243
 244* [device]
 245
 246  This directory contains links to the devices of type <type-id> that have been
 247  created.
 248
 249* name
 250
 251  This attribute should show human readable name. This is optional attribute.
 252
 253* description
 254
 255  This attribute should show brief features/description of the type. This is
 256  optional attribute.
 257
 258Directories and Files Under the sysfs for Each mdev Device
 259----------------------------------------------------------
 260
 261::
 262
 263  |- [parent phy device]
 264  |--- [$MDEV_UUID]
 265         |--- remove
 266         |--- mdev_type {link to its type}
 267         |--- vendor-specific-attributes [optional]
 268
 269* remove (write only)
 270
 271Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can
 272fail the remove() callback if that device is active and the vendor driver
 273doesn't support hot unplug.
 274
 275Example::
 276
 277        # echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
 278
 279Mediated device Hot plug
 280------------------------
 281
 282Mediated devices can be created and assigned at runtime. The procedure to hot
 283plug a mediated device is the same as the procedure to hot plug a PCI device.
 284
 285Translation APIs for Mediated Devices
 286=====================================
 287
 288The following APIs are provided for translating user pfn to host pfn in a VFIO
 289driver::
 290
 291        extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
 292                                  int npage, int prot, unsigned long *phys_pfn);
 293
 294        extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
 295                                    int npage);
 296
 297These functions call back into the back-end IOMMU module by using the pin_pages
 298and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently
 299these callbacks are supported in the TYPE1 IOMMU module. To enable them for
 300other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide
 301these two callback functions.
 302
 303Using the Sample Code
 304=====================
 305
 306mtty.c in samples/vfio-mdev/ directory is a sample driver program to
 307demonstrate how to use the mediated device framework.
 308
 309The sample driver creates an mdev device that simulates a serial port over a PCI
 310card.
 311
 3121. Build and load the mtty.ko module.
 313
 314   This step creates a dummy device, /sys/devices/virtual/mtty/mtty/
 315
 316   Files in this device directory in sysfs are similar to the following::
 317
 318     # tree /sys/devices/virtual/mtty/mtty/
 319        /sys/devices/virtual/mtty/mtty/
 320        |-- mdev_supported_types
 321        |   |-- mtty-1
 322        |   |   |-- available_instances
 323        |   |   |-- create
 324        |   |   |-- device_api
 325        |   |   |-- devices
 326        |   |   `-- name
 327        |   `-- mtty-2
 328        |       |-- available_instances
 329        |       |-- create
 330        |       |-- device_api
 331        |       |-- devices
 332        |       `-- name
 333        |-- mtty_dev
 334        |   `-- sample_mtty_dev
 335        |-- power
 336        |   |-- autosuspend_delay_ms
 337        |   |-- control
 338        |   |-- runtime_active_time
 339        |   |-- runtime_status
 340        |   `-- runtime_suspended_time
 341        |-- subsystem -> ../../../../class/mtty
 342        `-- uevent
 343
 3442. Create a mediated device by using the dummy device that you created in the
 345   previous step::
 346
 347     # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >    \
 348              /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
 349
 3503. Add parameters to qemu-kvm::
 351
 352     -device vfio-pci,\
 353      sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
 354
 3554. Boot the VM.
 356
 357   In the Linux guest VM, with no hardware on the host, the device appears
 358   as  follows::
 359
 360     # lspci -s 00:05.0 -xxvv
 361     00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])
 362             Subsystem: Device 4348:3253
 363             Physical Slot: 5
 364             Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
 365     Stepping- SERR- FastB2B- DisINTx-
 366             Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
 367     <TAbort- <MAbort- >SERR- <PERR- INTx-
 368             Interrupt: pin A routed to IRQ 10
 369             Region 0: I/O ports at c150 [size=8]
 370             Region 1: I/O ports at c158 [size=8]
 371             Kernel driver in use: serial
 372     00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00
 373     10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00
 374     20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32
 375     30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00
 376
 377     In the Linux guest VM, dmesg output for the device is as follows:
 378
 379     serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
 380     0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A
 381     0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A
 382
 383
 3845. In the Linux guest VM, check the serial ports::
 385
 386     # setserial -g /dev/ttyS*
 387     /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
 388     /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10
 389     /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10
 390
 3916. Using minicom or any terminal emulation program, open port /dev/ttyS1 or
 392   /dev/ttyS2 with hardware flow control disabled.
 393
 3947. Type data on the minicom terminal or send data to the terminal emulation
 395   program and read the data.
 396
 397   Data is loop backed from hosts mtty driver.
 398
 3998. Destroy the mediated device that you created::
 400
 401     # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove
 402
 403References
 404==========
 405
 4061. See Documentation/vfio.txt for more information on VFIO.
 4072. struct mdev_driver in include/linux/mdev.h
 4083. struct mdev_parent_ops in include/linux/mdev.h
 4094. struct vfio_iommu_driver_ops in include/linux/vfio.h
 410