qemu/docs/nvdimm.txt
<<
>>
Prefs
   1QEMU Virtual NVDIMM
   2===================
   3
   4This document explains the usage of virtual NVDIMM (vNVDIMM) feature
   5which is available since QEMU v2.6.0.
   6
   7The current QEMU only implements the persistent memory mode of vNVDIMM
   8device and not the block window mode.
   9
  10Basic Usage
  11-----------
  12
  13The storage of a vNVDIMM device in QEMU is provided by the memory
  14backend (i.e. memory-backend-file and memory-backend-ram). A simple
  15way to create a vNVDIMM device at startup time is done via the
  16following command line options:
  17
  18 -machine pc,nvdimm
  19 -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE
  20 -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE
  21 -device nvdimm,id=nvdimm1,memdev=mem1
  22
  23Where,
  24
  25 - the "nvdimm" machine option enables vNVDIMM feature.
  26
  27 - "slots=$N" should be equal to or larger than the total amount of
  28   normal RAM devices and vNVDIMM devices, e.g. $N should be >= 2 here.
  29
  30 - "maxmem=$MAX_SIZE" should be equal to or larger than the total size
  31   of normal RAM devices and vNVDIMM devices, e.g. $MAX_SIZE should be
  32   >= $RAM_SIZE + $NVDIMM_SIZE here.
  33
  34 - "object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE"
  35   creates a backend storage of size $NVDIMM_SIZE on a file $PATH. All
  36   accesses to the virtual NVDIMM device go to the file $PATH.
  37
  38   "share=on/off" controls the visibility of guest writes. If
  39   "share=on", then guest writes will be applied to the backend
  40   file. If another guest uses the same backend file with option
  41   "share=on", then above writes will be visible to it as well. If
  42   "share=off", then guest writes won't be applied to the backend
  43   file and thus will be invisible to other guests.
  44
  45 - "device nvdimm,id=nvdimm1,memdev=mem1" creates a virtual NVDIMM
  46   device whose storage is provided by above memory backend device.
  47
  48Multiple vNVDIMM devices can be created if multiple pairs of "-object"
  49and "-device" are provided.
  50
  51For above command line options, if the guest OS has the proper NVDIMM
  52driver (e.g. "CONFIG_ACPI_NFIT=y" under Linux), it should be able to
  53detect a NVDIMM device which is in the persistent memory mode and whose
  54size is $NVDIMM_SIZE.
  55
  56Note:
  57
  581. Prior to QEMU v2.8.0, if memory-backend-file is used and the actual
  59   backend file size is not equal to the size given by "size" option,
  60   QEMU will truncate the backend file by ftruncate(2), which will
  61   corrupt the existing data in the backend file, especially for the
  62   shrink case.
  63
  64   QEMU v2.8.0 and later check the backend file size and the "size"
  65   option. If they do not match, QEMU will report errors and abort in
  66   order to avoid the data corruption.
  67
  682. QEMU v2.6.0 only puts a basic alignment requirement on the "size"
  69   option of memory-backend-file, e.g. 4KB alignment on x86.  However,
  70   QEMU v.2.7.0 puts an additional alignment requirement, which may
  71   require a larger value than the basic one, e.g. 2MB on x86. This
  72   change breaks the usage of memory-backend-file that only satisfies
  73   the basic alignment.
  74
  75   QEMU v2.8.0 and later remove the additional alignment on non-s390x
  76   architectures, so the broken memory-backend-file can work again.
  77
  78Label
  79-----
  80
  81QEMU v2.7.0 and later implement the label support for vNVDIMM devices.
  82To enable label on vNVDIMM devices, users can simply add
  83"label-size=$SZ" option to "-device nvdimm", e.g.
  84
  85 -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K
  86
  87Note:
  88
  891. The minimal label size is 128KB.
  90
  912. QEMU v2.7.0 and later store labels at the end of backend storage.
  92   If a memory backend file, which was previously used as the backend
  93   of a vNVDIMM device without labels, is now used for a vNVDIMM
  94   device with label, the data in the label area at the end of file
  95   will be inaccessible to the guest. If any useful data (e.g. the
  96   meta-data of the file system) was stored there, the latter usage
  97   may result guest data corruption (e.g. breakage of guest file
  98   system).
  99
 100Hotplug
 101-------
 102
 103QEMU v2.8.0 and later implement the hotplug support for vNVDIMM
 104devices. Similarly to the RAM hotplug, the vNVDIMM hotplug is
 105accomplished by two monitor commands "object_add" and "device_add".
 106
 107For example, the following commands add another 4GB vNVDIMM device to
 108the guest:
 109
 110 (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=new_nvdimm.img,size=4G
 111 (qemu) device_add nvdimm,id=nvdimm2,memdev=mem2
 112
 113Note:
 114
 1151. Each hotplugged vNVDIMM device consumes one memory slot. Users
 116   should always ensure the memory option "-m ...,slots=N" specifies
 117   enough number of slots, i.e.
 118     N >= number of RAM devices +
 119          number of statically plugged vNVDIMM devices +
 120          number of hotplugged vNVDIMM devices
 121
 1222. The similar is required for the memory option "-m ...,maxmem=M", i.e.
 123     M >= size of RAM devices +
 124          size of statically plugged vNVDIMM devices +
 125          size of hotplugged vNVDIMM devices
 126
 127Alignment
 128---------
 129
 130QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping
 131address to the page size (getpagesize(2)) by default. However, some
 132types of backends may require an alignment different than the page
 133size. In that case, QEMU v2.12.0 and later provide 'align' option to
 134memory-backend-file to allow users to specify the proper alignment.
 135
 136For example, device dax require the 2 MB alignment, so we can use
 137following QEMU command line options to use it (/dev/dax0.0) as the
 138backend of vNVDIMM:
 139
 140 -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M
 141 -device nvdimm,id=nvdimm1,memdev=mem1
 142
 143Guest Data Persistence
 144----------------------
 145
 146Though QEMU supports multiple types of vNVDIMM backends on Linux,
 147the only backend that can guarantee the guest write persistence is:
 148
 149A. DAX device (e.g., /dev/dax0.0, ) or
 150B. DAX file(mounted with dax option)
 151
 152When using B (A file supporting direct mapping of persistent memory)
 153as a backend, write persistence is guaranteed if the host kernel has
 154support for the MAP_SYNC flag in the mmap system call (available
 155since Linux 4.15 and on certain distro kernels) and additionally
 156both 'pmem' and 'share' flags are set to 'on' on the backend.
 157
 158If these conditions are not satisfied i.e. if either 'pmem' or 'share'
 159are not set, if the backend file does not support DAX or if MAP_SYNC
 160is not supported by the host kernel, write persistence is not
 161guaranteed after a system crash. For compatibility reasons, these
 162conditions are ignored if not satisfied. Currently, no way is
 163provided to test for them.
 164For more details, please reference mmap(2) man page:
 165http://man7.org/linux/man-pages/man2/mmap.2.html.
 166
 167When using other types of backends, it's suggested to set 'unarmed'
 168option of '-device nvdimm' to 'on', which sets the unarmed flag of the
 169guest NVDIMM region mapping structure.  This unarmed flag indicates
 170guest software that this vNVDIMM device contains a region that cannot
 171accept persistent writes. In result, for example, the guest Linux
 172NVDIMM driver, marks such vNVDIMM device as read-only.
 173
 174NVDIMM Persistence
 175------------------
 176
 177ACPI 6.2 Errata A added support for a new Platform Capabilities Structure
 178which allows the platform to communicate what features it supports related to
 179NVDIMM data persistence.  Users can provide a persistence value to a guest via
 180the optional "nvdimm-persistence" machine command line option:
 181
 182    -machine pc,accel=kvm,nvdimm,nvdimm-persistence=cpu
 183
 184There are currently two valid values for this option:
 185
 186"mem-ctrl" - The platform supports flushing dirty data from the memory
 187             controller to the NVDIMMs in the event of power loss.
 188
 189"cpu"      - The platform supports flushing dirty data from the CPU cache to
 190             the NVDIMMs in the event of power loss.  This implies that the
 191             platform also supports flushing dirty data through the memory
 192             controller on power loss.
 193
 194If the vNVDIMM backend is in host persistent memory that can be accessed in
 195SNIA NVM Programming Model [1] (e.g., Intel NVDIMM), it's suggested to set
 196the 'pmem' option of memory-backend-file to 'on'. When 'pmem' is 'on' and QEMU
 197is built with libpmem [2] support (configured with --enable-libpmem), QEMU
 198will take necessary operations to guarantee the persistence of its own writes
 199to the vNVDIMM backend(e.g., in vNVDIMM label emulation and live migration).
 200If 'pmem' is 'on' while there is no libpmem support, qemu will exit and report
 201a "lack of libpmem support" message to ensure the persistence is available.
 202For example, if we want to ensure the persistence for some backend file,
 203use the QEMU command line:
 204
 205    -object memory-backend-file,id=nv_mem,mem-path=/XXX/yyy,size=4G,pmem=on
 206
 207References
 208----------
 209
 210[1] NVM Programming Model (NPM)
 211        Version 1.2
 212    https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf
 213[2] Persistent Memory Development Kit (PMDK), formerly known as NVML project, home page:
 214    http://pmem.io/pmdk/
 215