- Name
- Synopsis
- Description
- Options
- Selection File
- Input List File
- Input File Range
- Match Or Reject List File
- Archive Format
- Archive Format Examples
- Leap Second List File
- Error Handling And Return Codes
- Author
datafilter [options] file1 [file2 file3 ...]
datafilter filters miniSEED data. Various data selection criteria are available including by identifier, time or lists of arbitrary identifiers and times.
By default, records that match all criteria and at least partially match a selected time range are written to the output. Output data optionally may be pruned, i.e. trimmed, as the sample level.
Input files will be read and processed in the order specified.
Files on the command line prefixed with a '@' character are input list files and are expected to contain a simple list of input files, see INPUT LIST FILE for more details.
Each input file may be specified with an explict byte range to read. The program will begin reading at the specified start offset and stop reading at the specified end range. See INPUT FILE RANGE for more details.
-V
Print program version and exit.
-h
Print program usage and exit.
-H
Print verbose program usage including details of archive format specification and exit.
-v
Be more verbose. This flag can be used multiple times ("-v -v" or "-vv") for more verbosity.
-s selectfile
Limit processing to miniSEED records that match a selection in the specified file. The selection file contains parameters to match the network, station, location, channel, quality and time range for input records. As a special case, specifying "-" will result in selection lines being read from stdin. For more details see the SELECTION FILE section below.
-ts time
Limit processing to miniSEED records that start after or contain time. The format of the time argument is: 'YYYY[,DDD,HH,MM,SS.FFFFFF]' where valid delimiters are either commas (,), colons (:) or periods (.), except the seconds and fractional seconds must be separated by a period (.).
-te time
Limit processing to miniSEED records that end before or contain time. The format of the time argument is: 'YYYY[,DDD,HH,MM,SS.FFFFFF]' where valid delimiters are either commas (,), colons (:) or periods (.), except the seconds and fractional seconds must be separated by a period (.).
-M match
Limit input to records that match this regular expression, the match is tested against the full source name: 'NET_STA_LOC_CHAN_QUAL'. If the match expression begins with an '@' character it is assumed to indicate a file containing a list of expressions to match, see the MATCH OR REJECT LIST FILE section below.
-R reject
Limit input to records that do not match this regular expression, the reject is tested against the full source name: 'NET_STA_LOC_CHAN_QUAL'. If the reject expression begins with an '@' character it is assumed to indicate a file containing a list of expressions to reject, see the MATCH OR REJECT LIST FILE section below.
-m match
This is effectively the same as -M except that match is evaluated as a globbing expression instead of regular expression. Otherwise undocumented as it is primarily useful at the IRIS DMC.
-o file
Write all output data to output file. If '-' is specified as the output file all output data will be written to standard out. By default the output file will be overwritten, changing the option to +o file appends to the output file.
-A format
All output records will be written to a directory/file layout defined by format. All directories implied in the format string will be created if necessary. The option may be used multiple times to write input records to multiple archives. See the ARCHIVE FORMAT section below for more details including pre-defined archive layouts.
-CHAN directory
-QCHAN directory
-CDAY directory
-SDAY directory
-BUD directory
-SDS directory
-CSS directory
Pre-defined output archive formats, see the Archive Format section below for more details.
-Ps
Prune, i.e. trim, records at the sample level according to the time range criteria. Record trimming requires a supported data encoding, if unsupported (primarily older encodings) the record will be in the output untrimmed.
-out file
Print a summary of output records to the specified file. Any existing file will be appended to. Specify the file as '-' to print to stdout or '--' to print to stderr. Each line contains network, station, location, channel, quality, start time, end time, byte count and sample count for each output trace segment.
-outprefix prefix
Include the specified prefix string at the beginning of each line of summary output when using the -out option. This is useful to identify the summary output in a stream that is potentially mixed with other output.
A selection file is used to match input data records based on network, station, location and channel information. Optionally a quality and time range may also be specified for more refined selection. The non-time fields may use the '*' wildcard to match multiple characters and the '?' wildcard to match single characters. Character sets may also be used, for example '[ENZ]' will match either E, N or Z. The '#' character indicates the remaining portion of the line will be ignored.
Example selection file entires (the first four fields are required)
#net sta loc chan qual start end IU ANMO * BH? II * * * Q IU COLA 00 LH[ENZ] R IU COLA 00 LHZ * 2008,100,10,00,00 2008,100,10,30,00
Warning: with a selection file it is possible to specify multiple, arbitrary selections. Some combinations of these selects are not possible. See CAVEATS AND LIMITATIONS for more details.
A list file can be used to specify input files, one file per line. The initial '@' character indicating a list file is not considered part of the file name. As an example, if the following command line option was used:
@files.list
The 'files.list' file might look like this:
data/day1.mseed data/day2.mseed data/day3.mseed
Each input file may be specified with an associated byte range to read. The program will begin reading at the specified start offset and finish reading when at or beyond the end offset. The range is specified by appending an '@' charater to the filename with the start and end offsets separated by a colon:
filename.mseed@[startoffset][:][endoffset]
For example: "filename.mseed@4096:8192". Both the start and end offsets are optional. The colon separator is optional if no end offset is specified.
A list file used with either the -M or -R contains a list of regular expressions (one on each line) that will be combined into a single compound expression. The initial '@' character indicating a list file is not considered part of the file name. As an example, if the following command line option was used:
-M @match.list
The 'match.list' file might look like this:
IU_ANMO_.* IU_ADK_00_BHZ.* II_BFO_00_BHZ_Q
The pre-defined archive layouts are as follows:
-CHAN dir :: dir/%n.%s.%l.%c -QCHAN dir :: dir/%n.%s.%l.%c.%q -CDAY dir :: dir/%n.%s.%l.%c.%Y:%j:#H:#M:#S -SDAY dir :: dir/%n.%s.%Y:%j -BUD dir :: dir/%n/%s/%s.%n.%l.%c.%Y.%j -SDS dir :: dir/%Y/%n/%s/%c.D/%n.%s.%l.%c.D.%Y.%j -CSS dir :: dir/%Y/%j/%s.%c.%Y:%j:#H:#M:#S
An archive format is expanded for each record using the following substitution flags:
n : network code, white space removed s : station code, white space removed l : location code, white space removed c : channel code, white space removed Y : year, 4 digits y : year, 2 digits zero padded j : day of year, 3 digits zero padded H : hour, 2 digits zero padded M : minute, 2 digits zero padded S : second, 2 digits zero padded F : fractional seconds, 4 digits zero padded q : single character record quality indicator (D, R, Q) L : data record length in bytes r : sample rate (Hz) as a rounded integer R : sample rate (Hz) as a float with 6 digit precision % : the percent (%) character # : the number (#) character
The flags are prefaced with either the % or # modifier. The % modifier indicates a defining flag while the # indicates a non-defining flag. All records with the same set of defining flags will be written to the same file. Non-defining flags will be expanded using the values in the first record for the resulting file name.
Time flags are based on the start time of the given record.
The format string for the predefined BUD layout:
/archive/%n/%s/%s.%n.%l.%c.%Y.%j
would expand to day length files named something like:
/archive/NL/HGN/HGN.NL..BHE.2003.055
As an example of using non-defining flags the format string for the predefined CSS layout:
/data/%Y/%j/%s.%c.%Y:%j:#H:#M:#S
would expand to:
/data/2003/055/HGN.BHE.2003:055:14:17:54
resulting in day length files because the hour, minute and second are specified with the non-defining modifier. The hour, minute and second fields are from the first record in the file.
If the environment variable LIBMSEED_LEAPSECOND_FILE is set it is expected to indicate a file containing a list of leap seconds as published by NIST and IETF, usually available here: https://www.ietf.org/timezones/data/leap-seconds.list
Specifying this file is highly recommended when pruning records at the sample level.
If present, the leap seconds listed in this file will be used to adjust the time coverage for records that contain a leap second. Also, leap second indicators in the miniSEED headers will be ignored.
To suppress the warning printed by the program without specifying a leap second file, set LIBMSEED_LEAPSECOND_FILE=NONE.
Any significant error message will be pre-pended with "ERROR" which can be parsed to determine run-time errors. Additionally the program will return an exit code of 0 on successful operation and 1 when any errors were encountered.
Chad Trabant IRIS Data Management Center
(man page 2018/5/22)