Skip to content

compress.q

Jaskirat Rajasansir edited this page Dec 2, 2022 · 3 revisions

On-Disk Compression Functions

This library provides functions to assist with the compression of HDB splayed and partitioned tables.

Objects

.compress.defaults

Provides a symbol reference to a default compression mode for each supported compression type:

Symbol Compression
`none (0; 0; 0)
`qipc (17; 1; 0)
`gzip (17; 2; 7)
`snappy (17; 3; 0)
`lz4hc (17; 4; 9)

Functions

.compress.getSplayStats[splayPath]

Provides the compression statistics (via -21!) for all columns in the specified splayed table folder, including any additional columns for nested lists or anymaps, and returns a table.

Any uncompressed columns will have a null compressed value

q) .compress.getSplayStats `:/tmp/hdb/2021.10.29/trade
column compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
-------------------------------------------------------------------------------------------
time   8000560          8000016            qipc         1         17               0
sym    1182897          10101304           qipc         1         17               0
price  8000560          8000016            qipc         1         17               0
vol    3500240          8000016            qipc         1         17               0

q).compress.getSplayStats `:/tmp/hdb/2021.01.23/trade
column compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
-------------------------------------------------------------------------------------------
time                    16                 none         0         0                0
sym                     4096               none         0         0                0
price                   16                 none         0         0                0
vol                     16                 none         0         0                0

.compress.getPartitionStats[hdbRoot; partVal]

Provides the compression statistics (via -21!) for all columns in all tables within the specified partition with the specified HDB. The returned table is the same as .compress.getSplayStats with part and table columns added

Note that this function is not par.txt aware. If using a segmented HDB, the hdbRoot parameter should be the segment root.

q) select sum compressedLength by part, table from .compress.getPartitionStats[`:/tmp/hdb; 2021.01.23]
part       table    | uncompressedLength
--------------------| ------------------
2021.01.23 tbl      | 4392
2021.01.23 tbl10    | 40
2021.01.23 tbl2     | 40
2021.01.23 trade    | 4144
2021.01.23 tradeComp| 4144

.compress.splay[sourceSplayPath; targetSplayPath; compressType; options]

Compresses a splayed table.

  • compressType: Can either be a symbol (one of none, qipc, gzip, snappy, lz4hc) or a 3-element integer list describing the compression type
  • options: A dictionary of options to modify the function's behaviour
    • recompress: If true, any compressed files will be recompressed (default is false)
    • inplace: If true, targetSplayPath can be the same as sourceSplayPath (default is false)
    • parallel: Compress columns within the splay in parallel if none of the columns have copy write mode (default is true)
    • dryrun: If true don't actually run the compression, just return the table result of what would be done. Note the dryrun column in the result table will be true in this case (default is false)
    • gc: If true, perform a Garbage Collection after compression (default is false)

The function doesn't always compress every column in the splay. It will return a table information describing the operation that was performed; writeMode provides the detail to what was performed and why:

  • compress: The file was compressed
    • The file is uncompressed, or is compressed and the recompress option is true
  • copy: The file was copied (via the OS-specific copy command)
    • The file is either empty (0 = count) or is already compressed and the recompress option is missing or false
  • ignore: The file was ignored
    • The file would've been copied (as above) but was an inplace copy so nothing to do
    • Additional files for nested lists should not be directly compressed, they will get created when the primary list is compressed
q) .compress.getSplayStats `:/tmp/hdb/2021.11.08/trade
column  compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
--------------------------------------------------------------------------------------------
time                     8016               none         0         0                0
sym                      44696              none         0         0                0
price                    8016               none         0         0                0
vol                      8016               none         0         0                0
prices                   12096              none         0         0                0
prices#                  30096              none         0         0                0

q) .compress.splay[`:/tmp/hdb/2021.11.08/trade; `:/tmp/hdb/2021.11.08/trade; `lz4hc; ()!()]
...
column  source                             target                                 compressed inplace empty writeMode dryrun parallel time
---------------------------------------------------------------------------------------------------------------------------------------------------------
time    :/tmp/hdb/2021.11.08/trade/time    :/tmp/hdb/2021.11.08/tradeComp/time    0          0       0     compress  0      1        0D00:00:00.008962000
sym     :/tmp/hdb/2021.11.08/trade/sym     :/tmp/hdb/2021.11.08/tradeComp/sym     0          0       0     compress  0      1        0D00:00:00.010372000
price   :/tmp/hdb/2021.11.08/trade/price   :/tmp/hdb/2021.11.08/tradeComp/price   0          0       0     compress  0      1        0D00:00:00.009941000
vol     :/tmp/hdb/2021.11.08/trade/vol     :/tmp/hdb/2021.11.08/tradeComp/vol     0          0       0     compress  0      1        0D00:00:00.009440000
prices  :/tmp/hdb/2021.11.08/trade/prices  :/tmp/hdb/2021.11.08/tradeComp/prices  0          0       0     compress  0      1        0D00:00:00.004127000
prices# :/tmp/hdb/2021.11.08/trade/prices# :/tmp/hdb/2021.11.08/tradeComp/prices# 0          0       0     ignore    0      1

q) .compress.getSplayStats `:/tmp/hdb/2021.11.08/trade
column  compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
--------------------------------------------------------------------------------------------
time    8072             8016               lz4hc        4         17               9
sym     10907            44696              lz4hc        4         17               9
price   7011             8016               lz4hc        4         17               9
vol     4864             8016               lz4hc        4         17               9
prices  4115             12096              lz4hc        4         17               9
prices# 212              30096              lz4hc        4         17               9

.compress.partition[sourceRoot; targetRoot; partVal; tbls; compressType; options]

Compresses multiple splayed tables within a HDB partition

  • tbls: Either a list of tables to compress or COMP_ALL can be specified to compress all tables
  • options: A dictionary of options to modify the function's behaviour
    • recompress: If true, any compressed files will be recompressed (default is false)
    • inplace: If true, sourceRoot can be the same as targetRoot (default is false)
    • srcParTxt: If true, any par.txt in sourceRoot will be used to find the specified partition (default is true)
    • tgtParTxt: If true, any par.txt in targetRoot will be used to write the specified partition (default is true)
    • parallel: Compress columns within the splay in parallel if none of the columns have copy write mode (default is true)
    • dryrun: If true don't actually run the compression, just return the table result of what would be done. Note the dryrun column in the result table will be true in this case (default is false)
    • gc: If true, perform a Garbage Collection after compression (default is false)

NOTE: There is no interaction with the sym file in the source or target HDBs with this function. It is expected that the sym file is shared across both the source and target.

The same information is returned as .compress.splay with part and table columns added.

Clone this wiki locally