uboot/lib/lzma/lzma.txt
<<
>>
Prefs
   1LZMA SDK 9.20
   2-------------
   3
   4LZMA SDK provides the documentation, samples, header files, libraries,
   5and tools you need to develop applications that use LZMA compression.
   6
   7LZMA is default and general compression method of 7z format
   8in 7-Zip compression program (www.7-zip.org). LZMA provides high
   9compression ratio and very fast decompression.
  10
  11LZMA is an improved version of famous LZ77 compression algorithm.
  12It was improved in way of maximum increasing of compression ratio,
  13keeping high decompression speed and low memory requirements for
  14decompressing.
  15
  16
  17
  18LICENSE
  19-------
  20
  21LZMA SDK is written and placed in the public domain by Igor Pavlov.
  22
  23Some code in LZMA SDK is based on public domain code from another developers:
  24  1) PPMd var.H (2001): Dmitry Shkarin
  25  2) SHA-256: Wei Dai (Crypto++ library)
  26
  27
  28LZMA SDK Contents
  29-----------------
  30
  31LZMA SDK includes:
  32
  33  - ANSI-C/C++/C#/Java source code for LZMA compressing and decompressing
  34  - Compiled file->file LZMA compressing/decompressing program for Windows system
  35
  36
  37UNIX/Linux version
  38------------------
  39To compile C++ version of file->file LZMA encoding, go to directory
  40CPP/7zip/Bundles/LzmaCon
  41and call make to recompile it:
  42  make -f makefile.gcc clean all
  43
  44In some UNIX/Linux versions you must compile LZMA with static libraries.
  45To compile with static libraries, you can use
  46LIB = -lm -static
  47
  48
  49Files
  50---------------------
  51lzma.txt     - LZMA SDK description (this file)
  527zFormat.txt - 7z Format description
  537zC.txt      - 7z ANSI-C Decoder description
  54methods.txt  - Compression method IDs for .7z
  55lzma.exe     - Compiled file->file LZMA encoder/decoder for Windows
  567zr.exe      - 7-Zip with 7z/lzma/xz support.
  57history.txt  - history of the LZMA SDK
  58
  59
  60Source code structure
  61---------------------
  62
  63C/  - C files
  64        7zCrc*.*   - CRC code
  65        Alloc.*    - Memory allocation functions
  66        Bra*.*     - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code
  67        LzFind.*   - Match finder for LZ (LZMA) encoders
  68        LzFindMt.* - Match finder for LZ (LZMA) encoders for multithreading encoding
  69        LzHash.h   - Additional file for LZ match finder
  70        LzmaDec.*  - LZMA decoding
  71        LzmaEnc.*  - LZMA encoding
  72        LzmaLib.*  - LZMA Library for DLL calling
  73        Types.h    - Basic types for another .c files
  74        Threads.*  - The code for multithreading.
  75
  76    LzmaLib  - LZMA Library (.DLL for Windows)
  77
  78    LzmaUtil - LZMA Utility (file->file LZMA encoder/decoder).
  79
  80    Archive - files related to archiving
  81      7z     - 7z ANSI-C Decoder
  82
  83CPP/ -- CPP files
  84
  85  Common  - common files for C++ projects
  86  Windows - common files for Windows related code
  87
  88  7zip    - files related to 7-Zip Project
  89
  90    Common   - common files for 7-Zip
  91
  92    Compress - files related to compression/decompression
  93
  94    Archive - files related to archiving
  95
  96      Common   - common files for archive handling
  97      7z       - 7z C++ Encoder/Decoder
  98
  99    Bundles    - Modules that are bundles of other modules
 100
 101      Alone7z           - 7zr.exe: Standalone version of 7z.exe that supports only 7z/LZMA/BCJ/BCJ2
 102      LzmaCon           - lzma.exe: LZMA compression/decompression
 103      Format7zR         - 7zr.dll: Reduced version of 7za.dll: extracting/compressing to 7z/LZMA/BCJ/BCJ2
 104      Format7zExtractR  - 7zxr.dll: Reduced version of 7zxa.dll: extracting from 7z/LZMA/BCJ/BCJ2.
 105
 106    UI        - User Interface files
 107
 108      Client7z - Test application for 7za.dll,  7zr.dll, 7zxr.dll
 109      Common   - Common UI files
 110      Console  - Code for console archiver
 111
 112
 113
 114CS/ - C# files
 115  7zip
 116    Common   - some common files for 7-Zip
 117    Compress - files related to compression/decompression
 118      LZ     - files related to LZ (Lempel-Ziv) compression algorithm
 119      LZMA         - LZMA compression/decompression
 120      LzmaAlone    - file->file LZMA compression/decompression
 121      RangeCoder   - Range Coder (special code of compression/decompression)
 122
 123Java/  - Java files
 124  SevenZip
 125    Compression    - files related to compression/decompression
 126      LZ           - files related to LZ (Lempel-Ziv) compression algorithm
 127      LZMA         - LZMA compression/decompression
 128      RangeCoder   - Range Coder (special code of compression/decompression)
 129
 130
 131C/C++ source code of LZMA SDK is part of 7-Zip project.
 1327-Zip source code can be downloaded from 7-Zip's SourceForge page:
 133
 134  http://sourceforge.net/projects/sevenzip/
 135
 136
 137
 138LZMA features
 139-------------
 140  - Variable dictionary size (up to 1 GB)
 141  - Estimated compressing speed: about 2 MB/s on 2 GHz CPU
 142  - Estimated decompressing speed:
 143      - 20-30 MB/s on 2 GHz Core 2 or AMD Athlon 64
 144      - 1-2 MB/s on 200 MHz ARM, MIPS, PowerPC or other simple RISC
 145  - Small memory requirements for decompressing (16 KB + DictionarySize)
 146  - Small code size for decompressing: 5-8 KB
 147
 148LZMA decoder uses only integer operations and can be
 149implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).
 150
 151Some critical operations that affect the speed of LZMA decompression:
 152  1) 32*16 bit integer multiply
 153  2) Misspredicted branches (penalty mostly depends from pipeline length)
 154  3) 32-bit shift and arithmetic operations
 155
 156The speed of LZMA decompressing mostly depends from CPU speed.
 157Memory speed has no big meaning. But if your CPU has small data cache,
 158overall weight of memory speed will slightly increase.
 159
 160
 161How To Use
 162----------
 163
 164Using LZMA encoder/decoder executable
 165--------------------------------------
 166
 167Usage:  LZMA <e|d> inputFile outputFile [<switches>...]
 168
 169  e: encode file
 170
 171  d: decode file
 172
 173  b: Benchmark. There are two tests: compressing and decompressing
 174     with LZMA method. Benchmark shows rating in MIPS (million
 175     instructions per second). Rating value is calculated from
 176     measured speed and it is normalized with Intel's Core 2 results.
 177     Also Benchmark checks possible hardware errors (RAM
 178     errors in most cases). Benchmark uses these settings:
 179     (-a1, -d21, -fb32, -mfbt4). You can change only -d parameter.
 180     Also you can change the number of iterations. Example for 30 iterations:
 181       LZMA b 30
 182     Default number of iterations is 10.
 183
 184<Switches>
 185
 186
 187  -a{N}:  set compression mode 0 = fast, 1 = normal
 188          default: 1 (normal)
 189
 190  d{N}:   Sets Dictionary size - [0, 30], default: 23 (8MB)
 191          The maximum value for dictionary size is 1 GB = 2^30 bytes.
 192          Dictionary size is calculated as DictionarySize = 2^N bytes.
 193          For decompressing file compressed by LZMA method with dictionary
 194          size D = 2^N you need about D bytes of memory (RAM).
 195
 196  -fb{N}: set number of fast bytes - [5, 273], default: 128
 197          Usually big number gives a little bit better compression ratio
 198          and slower compression process.
 199
 200  -lc{N}: set number of literal context bits - [0, 8], default: 3
 201          Sometimes lc=4 gives gain for big files.
 202
 203  -lp{N}: set number of literal pos bits - [0, 4], default: 0
 204          lp switch is intended for periodical data when period is
 205          equal 2^N. For example, for 32-bit (4 bytes)
 206          periodical data you can use lp=2. Often it's better to set lc0,
 207          if you change lp switch.
 208
 209  -pb{N}: set number of pos bits - [0, 4], default: 2
 210          pb switch is intended for periodical data
 211          when period is equal 2^N.
 212
 213  -mf{MF_ID}: set Match Finder. Default: bt4.
 214              Algorithms from hc* group doesn't provide good compression
 215              ratio, but they often works pretty fast in combination with
 216              fast mode (-a0).
 217
 218              Memory requirements depend from dictionary size
 219              (parameter "d" in table below).
 220
 221               MF_ID     Memory                   Description
 222
 223                bt2    d *  9.5 + 4MB  Binary Tree with 2 bytes hashing.
 224                bt3    d * 11.5 + 4MB  Binary Tree with 3 bytes hashing.
 225                bt4    d * 11.5 + 4MB  Binary Tree with 4 bytes hashing.
 226                hc4    d *  7.5 + 4MB  Hash Chain with 4 bytes hashing.
 227
 228  -eos:   write End Of Stream marker. By default LZMA doesn't write
 229          eos marker, since LZMA decoder knows uncompressed size
 230          stored in .lzma file header.
 231
 232  -si:    Read data from stdin (it will write End Of Stream marker).
 233  -so:    Write data to stdout
 234
 235
 236Examples:
 237
 2381) LZMA e file.bin file.lzma -d16 -lc0
 239
 240compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K)
 241and 0 literal context bits. -lc0 allows to reduce memory requirements
 242for decompression.
 243
 244
 2452) LZMA e file.bin file.lzma -lc0 -lp2
 246
 247compresses file.bin to file.lzma with settings suitable
 248for 32-bit periodical data (for example, ARM or MIPS code).
 249
 2503) LZMA d file.lzma file.bin
 251
 252decompresses file.lzma to file.bin.
 253
 254
 255Compression ratio hints
 256-----------------------
 257
 258Recommendations
 259---------------
 260
 261To increase the compression ratio for LZMA compressing it's desirable
 262to have aligned data (if it's possible) and also it's desirable to locate
 263data in such order, where code is grouped in one place and data is
 264grouped in other place (it's better than such mixing: code, data, code,
 265data, ...).
 266
 267
 268Filters
 269-------
 270You can increase the compression ratio for some data types, using
 271special filters before compressing. For example, it's possible to
 272increase the compression ratio on 5-10% for code for those CPU ISAs:
 273x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.
 274
 275You can find C source code of such filters in C/Bra*.* files
 276
 277You can check the compression ratio gain of these filters with such
 2787-Zip commands (example for ARM code):
 279No filter:
 280  7z a a1.7z a.bin -m0=lzma
 281
 282With filter for little-endian ARM code:
 283  7z a a2.7z a.bin -m0=arm -m1=lzma
 284
 285It works in such manner:
 286Compressing    = Filter_encoding + LZMA_encoding
 287Decompressing  = LZMA_decoding + Filter_decoding
 288
 289Compressing and decompressing speed of such filters is very high,
 290so it will not increase decompressing time too much.
 291Moreover, it reduces decompression time for LZMA_decoding,
 292since compression ratio with filtering is higher.
 293
 294These filters convert CALL (calling procedure) instructions
 295from relative offsets to absolute addresses, so such data becomes more
 296compressible.
 297
 298For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.
 299
 300
 301LZMA compressed file format
 302---------------------------
 303Offset Size Description
 304  0     1   Special LZMA properties (lc,lp, pb in encoded form)
 305  1     4   Dictionary size (little endian)
 306  5     8   Uncompressed size (little endian). -1 means unknown size
 307 13         Compressed data
 308
 309
 310ANSI-C LZMA Decoder
 311~~~~~~~~~~~~~~~~~~~
 312
 313Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.
 314If you want to use old interfaces you can download previous version of LZMA SDK
 315from sourceforge.net site.
 316
 317To use ANSI-C LZMA Decoder you need the following files:
 3181) LzmaDec.h + LzmaDec.c + Types.h
 319LzmaUtil/LzmaUtil.c is example application that uses these files.
 320
 321
 322Memory requirements for LZMA decoding
 323-------------------------------------
 324
 325Stack usage of LZMA decoding function for local variables is not
 326larger than 200-400 bytes.
 327
 328LZMA Decoder uses dictionary buffer and internal state structure.
 329Internal state structure consumes
 330  state_size = (4 + (1.5 << (lc + lp))) KB
 331by default (lc=3, lp=0), state_size = 16 KB.
 332
 333
 334How To decompress data
 335----------------------
 336
 337LZMA Decoder (ANSI-C version) now supports 2 interfaces:
 3381) Single-call Decompressing
 3392) Multi-call State Decompressing (zlib-like interface)
 340
 341You must use external allocator:
 342Example:
 343void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }
 344void SzFree(void *p, void *address) { p = p; free(address); }
 345ISzAlloc alloc = { SzAlloc, SzFree };
 346
 347You can use p = p; operator to disable compiler warnings.
 348
 349
 350Single-call Decompressing
 351-------------------------
 352When to use: RAM->RAM decompressing
 353Compile files: LzmaDec.h + LzmaDec.c + Types.h
 354Compile defines: no defines
 355Memory Requirements:
 356  - Input buffer: compressed size
 357  - Output buffer: uncompressed size
 358  - LZMA Internal Structures: state_size (16 KB for default settings)
 359
 360Interface:
 361  int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
 362      const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode,
 363      ELzmaStatus *status, ISzAlloc *alloc);
 364  In:
 365    dest     - output data
 366    destLen  - output data size
 367    src      - input data
 368    srcLen   - input data size
 369    propData - LZMA properties  (5 bytes)
 370    propSize - size of propData buffer (5 bytes)
 371    finishMode - It has meaning only if the decoding reaches output limit (*destLen).
 372         LZMA_FINISH_ANY - Decode just destLen bytes.
 373         LZMA_FINISH_END - Stream must be finished after (*destLen).
 374                           You can use LZMA_FINISH_END, when you know that
 375                           current output buffer covers last bytes of stream.
 376    alloc    - Memory allocator.
 377
 378  Out:
 379    destLen  - processed output size
 380    srcLen   - processed input size
 381
 382  Output:
 383    SZ_OK
 384      status:
 385        LZMA_STATUS_FINISHED_WITH_MARK
 386        LZMA_STATUS_NOT_FINISHED
 387        LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK
 388    SZ_ERROR_DATA - Data error
 389    SZ_ERROR_MEM  - Memory allocation error
 390    SZ_ERROR_UNSUPPORTED - Unsupported properties
 391    SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).
 392
 393  If LZMA decoder sees end_marker before reaching output limit, it returns OK result,
 394  and output value of destLen will be less than output buffer size limit.
 395
 396  You can use multiple checks to test data integrity after full decompression:
 397    1) Check Result and "status" variable.
 398    2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.
 399    3) Check that output(srcLen) = compressedSize, if you know real compressedSize.
 400       You must use correct finish mode in that case. */
 401
 402
 403Multi-call State Decompressing (zlib-like interface)
 404----------------------------------------------------
 405
 406When to use: file->file decompressing
 407Compile files: LzmaDec.h + LzmaDec.c + Types.h
 408
 409Memory Requirements:
 410 - Buffer for input stream: any size (for example, 16 KB)
 411 - Buffer for output stream: any size (for example, 16 KB)
 412 - LZMA Internal Structures: state_size (16 KB for default settings)
 413 - LZMA dictionary (dictionary size is encoded in LZMA properties header)
 414
 4151) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:
 416   unsigned char header[LZMA_PROPS_SIZE + 8];
 417   ReadFile(inFile, header, sizeof(header)
 418
 4192) Allocate CLzmaDec structures (state + dictionary) using LZMA properties
 420
 421  CLzmaDec state;
 422  LzmaDec_Constr(&state);
 423  res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);
 424  if (res != SZ_OK)
 425    return res;
 426
 4273) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop
 428
 429  LzmaDec_Init(&state);
 430  for (;;)
 431  {
 432    ...
 433    int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen,
 434        const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);
 435    ...
 436  }
 437
 438
 4394) Free all allocated structures
 440  LzmaDec_Free(&state, &g_Alloc);
 441
 442For full code example, look at C/LzmaUtil/LzmaUtil.c code.
 443
 444
 445How To compress data
 446--------------------
 447
 448Compile files: LzmaEnc.h + LzmaEnc.c + Types.h +
 449LzFind.c + LzFind.h + LzFindMt.c + LzFindMt.h + LzHash.h
 450
 451Memory Requirements:
 452  - (dictSize * 11.5 + 6 MB) + state_size
 453
 454Lzma Encoder can use two memory allocators:
 4551) alloc - for small arrays.
 4562) allocBig - for big arrays.
 457
 458For example, you can use Large RAM Pages (2 MB) in allocBig allocator for
 459better compression speed. Note that Windows has bad implementation for
 460Large RAM Pages.
 461It's OK to use same allocator for alloc and allocBig.
 462
 463
 464Single-call Compression with callbacks
 465--------------------------------------
 466
 467Check C/LzmaUtil/LzmaUtil.c as example,
 468
 469When to use: file->file decompressing
 470
 4711) you must implement callback structures for interfaces:
 472ISeqInStream
 473ISeqOutStream
 474ICompressProgress
 475ISzAlloc
 476
 477static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }
 478static void SzFree(void *p, void *address) {  p = p; MyFree(address); }
 479static ISzAlloc g_Alloc = { SzAlloc, SzFree };
 480
 481  CFileSeqInStream inStream;
 482  CFileSeqOutStream outStream;
 483
 484  inStream.funcTable.Read = MyRead;
 485  inStream.file = inFile;
 486  outStream.funcTable.Write = MyWrite;
 487  outStream.file = outFile;
 488
 489
 4902) Create CLzmaEncHandle object;
 491
 492  CLzmaEncHandle enc;
 493
 494  enc = LzmaEnc_Create(&g_Alloc);
 495  if (enc == 0)
 496    return SZ_ERROR_MEM;
 497
 498
 4993) initialize CLzmaEncProps properties;
 500
 501  LzmaEncProps_Init(&props);
 502
 503  Then you can change some properties in that structure.
 504
 5054) Send LZMA properties to LZMA Encoder
 506
 507  res = LzmaEnc_SetProps(enc, &props);
 508
 5095) Write encoded properties to header
 510
 511    Byte header[LZMA_PROPS_SIZE + 8];
 512    size_t headerSize = LZMA_PROPS_SIZE;
 513    UInt64 fileSize;
 514    int i;
 515
 516    res = LzmaEnc_WriteProperties(enc, header, &headerSize);
 517    fileSize = MyGetFileLength(inFile);
 518    for (i = 0; i < 8; i++)
 519      header[headerSize++] = (Byte)(fileSize >> (8 * i));
 520    MyWriteFileAndCheck(outFile, header, headerSize)
 521
 5226) Call encoding function:
 523      res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable,
 524        NULL, &g_Alloc, &g_Alloc);
 525
 5267) Destroy LZMA Encoder Object
 527  LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);
 528
 529
 530If callback function return some error code, LzmaEnc_Encode also returns that code
 531or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS.
 532
 533
 534Single-call RAM->RAM Compression
 535--------------------------------
 536
 537Single-call RAM->RAM Compression is similar to Compression with callbacks,
 538but you provide pointers to buffers instead of pointers to stream callbacks:
 539
 540HRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
 541    CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark,
 542    ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);
 543
 544Return code:
 545  SZ_OK               - OK
 546  SZ_ERROR_MEM        - Memory allocation error
 547  SZ_ERROR_PARAM      - Incorrect paramater
 548  SZ_ERROR_OUTPUT_EOF - output buffer overflow
 549  SZ_ERROR_THREAD     - errors in multithreading functions (only for Mt version)
 550
 551
 552
 553Defines
 554-------
 555
 556_LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.
 557
 558_LZMA_PROB32   - It can increase the speed on some 32-bit CPUs, but memory usage for
 559                 some structures will be doubled in that case.
 560
 561_LZMA_UINT32_IS_ULONG  - Define it if int is 16-bit on your compiler and long is 32-bit.
 562
 563_LZMA_NO_SYSTEM_SIZE_T  - Define it if you don't want to use size_t type.
 564
 565
 566_7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder.
 567
 568
 569C++ LZMA Encoder/Decoder
 570~~~~~~~~~~~~~~~~~~~~~~~~
 571C++ LZMA code use COM-like interfaces. So if you want to use it,
 572you can study basics of COM/OLE.
 573C++ LZMA code is just wrapper over ANSI-C code.
 574
 575
 576C++ Notes
 577~~~~~~~~~~~~~~~~~~~~~~~~
 578If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),
 579you must check that you correctly work with "new" operator.
 5807-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.
 581So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:
 582operator new(size_t size)
 583{
 584  void *p = ::malloc(size);
 585  if (p == 0)
 586    throw CNewException();
 587  return p;
 588}
 589If you use MSCV that throws exception for "new" operator, you can compile without
 590"NewHandler.cpp". So standard exception will be used. Actually some code of
 5917-Zip catches any exception in internal code and converts it to HRESULT code.
 592So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.
 593
 594---
 595
 596http://www.7-zip.org
 597http://www.7-zip.org/sdk.html
 598http://www.7-zip.org/support.html
 599