-
Notifications
You must be signed in to change notification settings - Fork 17
compress.q
This library provides functions to assist with the compression of HDB splayed and partitioned tables.
Provides a symbol reference to a default compression mode for each supported compression type:
Symbol | Compression |
---|---|
`none |
(0; 0; 0) |
`qipc |
(17; 1; 0) |
`gzip |
(17; 2; 7) |
`snappy |
(17; 3; 0) |
`lz4hc |
(17; 4; 9) |
Provides the compression statistics (via -21!
) for all columns in the specified splayed table folder, including any additional columns for nested lists or anymaps, and returns a table.
Any uncompressed columns will have a null compressed
value
q) .compress.getSplayStats `:/tmp/hdb/2021.10.29/trade
column compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
-------------------------------------------------------------------------------------------
time 8000560 8000016 qipc 1 17 0
sym 1182897 10101304 qipc 1 17 0
price 8000560 8000016 qipc 1 17 0
vol 3500240 8000016 qipc 1 17 0
q).compress.getSplayStats `:/tmp/hdb/2021.01.23/trade
column compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
-------------------------------------------------------------------------------------------
time 16 none 0 0 0
sym 4096 none 0 0 0
price 16 none 0 0 0
vol 16 none 0 0 0
Provides the compression statistics (via -21!
) for all columns in all tables within the specified partition with the specified HDB. The returned table is the same as .compress.getSplayStats
with part
and table
columns added
Note that this function is not par.txt
aware. If using a segmented HDB, the hdbRoot
parameter should be the segment root.
q) select sum compressedLength by part, table from .compress.getPartitionStats[`:/tmp/hdb; 2021.01.23]
part table | uncompressedLength
--------------------| ------------------
2021.01.23 tbl | 4392
2021.01.23 tbl10 | 40
2021.01.23 tbl2 | 40
2021.01.23 trade | 4144
2021.01.23 tradeComp| 4144
Compresses a splayed table.
-
compressType
: Can either be a symbol (one ofnone
,qipc
,gzip
,snappy
,lz4hc
) or a 3-element integer list describing the compression type -
options
: A dictionary of options to modify the function's behaviour-
recompress
: If true, any compressed files will be recompressed (default isfalse
) -
inplace
: If true,targetSplayPath
can be the same assourceSplayPath
(default isfalse
) -
parallel
: Compress columns within the splay in parallel if none of the columns havecopy
write mode (default istrue
) -
dryrun
: If true don't actually run the compression, just return the table result of what would be done. Note thedryrun
column in the result table will be true in this case (default isfalse
) -
gc
: If true, perform a Garbage Collection after compression (default isfalse
)
-
The function doesn't always compress every column in the splay. It will return a table information describing the operation that was performed; writeMode
provides the detail to what was performed and why:
-
compress
: The file was compressed- The file is uncompressed, or is compressed and the
recompress
option is true
- The file is uncompressed, or is compressed and the
-
copy
: The file was copied (via the OS-specific copy command)- The file is either empty (0 = count) or is already compressed and the
recompress
option is missing or false
- The file is either empty (0 = count) or is already compressed and the
-
ignore
: The file was ignored- The file would've been copied (as above) but was an inplace copy so nothing to do
- Additional files for nested lists should not be directly compressed, they will get created when the primary list is compressed
q) .compress.getSplayStats `:/tmp/hdb/2021.11.08/trade
column compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
--------------------------------------------------------------------------------------------
time 8016 none 0 0 0
sym 44696 none 0 0 0
price 8016 none 0 0 0
vol 8016 none 0 0 0
prices 12096 none 0 0 0
prices# 30096 none 0 0 0
q) .compress.splay[`:/tmp/hdb/2021.11.08/trade; `:/tmp/hdb/2021.11.08/trade; `lz4hc; ()!()]
...
column source target compressed inplace empty writeMode dryrun parallel time
---------------------------------------------------------------------------------------------------------------------------------------------------------
time :/tmp/hdb/2021.11.08/trade/time :/tmp/hdb/2021.11.08/tradeComp/time 0 0 0 compress 0 1 0D00:00:00.008962000
sym :/tmp/hdb/2021.11.08/trade/sym :/tmp/hdb/2021.11.08/tradeComp/sym 0 0 0 compress 0 1 0D00:00:00.010372000
price :/tmp/hdb/2021.11.08/trade/price :/tmp/hdb/2021.11.08/tradeComp/price 0 0 0 compress 0 1 0D00:00:00.009941000
vol :/tmp/hdb/2021.11.08/trade/vol :/tmp/hdb/2021.11.08/tradeComp/vol 0 0 0 compress 0 1 0D00:00:00.009440000
prices :/tmp/hdb/2021.11.08/trade/prices :/tmp/hdb/2021.11.08/tradeComp/prices 0 0 0 compress 0 1 0D00:00:00.004127000
prices# :/tmp/hdb/2021.11.08/trade/prices# :/tmp/hdb/2021.11.08/tradeComp/prices# 0 0 0 ignore 0 1
q) .compress.getSplayStats `:/tmp/hdb/2021.11.08/trade
column compressedLength uncompressedLength compressMode algorithm logicalBlockSize zipLevel
--------------------------------------------------------------------------------------------
time 8072 8016 lz4hc 4 17 9
sym 10907 44696 lz4hc 4 17 9
price 7011 8016 lz4hc 4 17 9
vol 4864 8016 lz4hc 4 17 9
prices 4115 12096 lz4hc 4 17 9
prices# 212 30096 lz4hc 4 17 9
Compresses multiple splayed tables within a HDB partition
-
tbls
: Either a list of tables to compress orCOMP_ALL
can be specified to compress all tables -
options
: A dictionary of options to modify the function's behaviour-
recompress
: If true, any compressed files will be recompressed (default isfalse
) -
inplace
: If true,sourceRoot
can be the same astargetRoot
(default isfalse
) -
srcParTxt
: If true, anypar.txt
insourceRoot
will be used to find the specified partition (default istrue
) -
tgtParTxt
: If true, anypar.txt
intargetRoot
will be used to write the specified partition (default istrue
) -
parallel
: Compress columns within the splay in parallel if none of the columns havecopy
write mode (default istrue
) -
dryrun
: If true don't actually run the compression, just return the table result of what would be done. Note thedryrun
column in the result table will be true in this case (default isfalse
) -
gc
: If true, perform a Garbage Collection after compression (default isfalse
)
-
NOTE: There is no interaction with the sym
file in the source or target HDBs with this function. It is expected that the sym
file is shared across both the source and target.
The same information is returned as .compress.splay
with part
and table
columns added.
Copyright (C) Sport Trades Ltd 2017 - 2020, John Keys and Jaskirat Rajasansir 2020 - 2024