datafilter - miniSEED data filtering

Name
Synopsis
Description
Options
Selection File
Input List File
Input File Range
Match Or Reject List File
Archive Format
Archive Format Examples
Leap Second List File
Error Handling And Return Codes
Author

Synopsis

datafilter [options] file1 [file2 file3 ...]

Description

datafilter filters miniSEED data. Various data selection criteria are available including by identifier, time or lists of arbitrary identifiers and times.

By default, records that match all criteria and at least partially match a selected time range are written to the output. Output data optionally may be pruned, i.e. trimmed, as the sample level.

Input files will be read and processed in the order specified.

Files on the command line prefixed with a '@' character are input list files and are expected to contain a simple list of input files, see INPUT LIST FILE for more details.

Each input file may be specified with an explict byte range to read. The program will begin reading at the specified start offset and stop reading at the specified end range. See INPUT FILE RANGE for more details.

Options

-V

Print program version and exit.

-h

Print program usage and exit.

-H

Print verbose program usage including details of archive format specification and exit.

-v

Be more verbose. This flag can be used multiple times ("-v -v" or "-vv") for more verbosity.

-s selectfile

Limit processing to miniSEED records that match a selection in the specified file. The selection file contains parameters to match the network, station, location, channel, quality and time range for input records. As a special case, specifying "-" will result in selection lines being read from stdin. For more details see the SELECTION FILE section below.

-ts time

Limit processing to miniSEED records that start after or contain time. The format of the time argument is: 'YYYY[,DDD,HH,MM,SS.FFFFFF]' where valid delimiters are either commas (,), colons (:) or periods (.), except the seconds and fractional seconds must be separated by a period (.).

-te time

Limit processing to miniSEED records that end before or contain time. The format of the time argument is: 'YYYY[,DDD,HH,MM,SS.FFFFFF]' where valid delimiters are either commas (,), colons (:) or periods (.), except the seconds and fractional seconds must be separated by a period (.).

-M match

Limit input to records that match this regular expression, the match is tested against the full source name: 'NET_STA_LOC_CHAN_QUAL'. If the match expression begins with an '@' character it is assumed to indicate a file containing a list of expressions to match, see the MATCH OR REJECT LIST FILE section below.

-R reject

Limit input to records that do not match this regular expression, the reject is tested against the full source name: 'NET_STA_LOC_CHAN_QUAL'. If the reject expression begins with an '@' character it is assumed to indicate a file containing a list of expressions to reject, see the MATCH OR REJECT LIST FILE section below.

-m match

This is effectively the same as -M except that match is evaluated as a globbing expression instead of regular expression. Otherwise undocumented as it is primarily useful at the IRIS DMC.

-o file

Write all output data to output file. If '-' is specified as the output file all output data will be written to standard out. By default the output file will be overwritten, changing the option to +o file appends to the output file.

-A format

All output records will be written to a directory/file layout defined by format. All directories implied in the format string will be created if necessary. The option may be used multiple times to write input records to multiple archives. See the ARCHIVE FORMAT section below for more details including pre-defined archive layouts.

-CHAN directory

-QCHAN directory

-CDAY directory

-SDAY directory

-BUD directory

-SDS directory

-CSS directory

Pre-defined output archive formats, see the Archive Format section below for more details.

-Ps

Prune, i.e. trim, records at the sample level according to the time range criteria. Record trimming requires a supported data encoding, if unsupported (primarily older encodings) the record will be in the output untrimmed.

-out file

Print a summary of output records to the specified file. Any existing file will be appended to. Specify the file as '-' to print to stdout or '--' to print to stderr. Each line contains network, station, location, channel, quality, start time, end time, byte count and sample count for each output trace segment.

-outprefix prefix

Include the specified prefix string at the beginning of each line of summary output when using the -out option. This is useful to identify the summary output in a stream that is potentially mixed with other output.

Selection File

A selection file is used to match input data records based on network, station, location and channel information. Optionally a quality and time range may also be specified for more refined selection. The non-time fields may use the '*' wildcard to match multiple characters and the '?' wildcard to match single characters. Character sets may also be used, for example '[ENZ]' will match either E, N or Z. The '#' character indicates the remaining portion of the line will be ignored.

Example selection file entires (the first four fields are required)

#net sta  loc  chan  qual  start             end
IU   ANMO *    BH?
II   *    *    *     Q
IU   COLA 00   LH[ENZ] R
IU   COLA 00   LHZ   *     2008,100,10,00,00 2008,100,10,30,00

Warning: with a selection file it is possible to specify multiple, arbitrary selections. Some combinations of these selects are not possible. See CAVEATS AND LIMITATIONS for more details.

Input List File

A list file can be used to specify input files, one file per line. The initial '@' character indicating a list file is not considered part of the file name. As an example, if the following command line option was used:

@files.list

The 'files.list' file might look like this:

data/day1.mseed
data/day2.mseed
data/day3.mseed

Input File Range

Each input file may be specified with an associated byte range to read. The program will begin reading at the specified start offset and finish reading when at or beyond the end offset. The range is specified by appending an '@' charater to the filename with the start and end offsets separated by a colon:

filename.mseed@[startoffset][:][endoffset]

For example: "filename.mseed@4096:8192". Both the start and end offsets are optional. The colon separator is optional if no end offset is specified.

Match Or Reject List File

A list file used with either the -M or -R contains a list of regular expressions (one on each line) that will be combined into a single compound expression. The initial '@' character indicating a list file is not considered part of the file name. As an example, if the following command line option was used:

-M @match.list

The 'match.list' file might look like this:

IU_ANMO_.*
IU_ADK_00_BHZ.*
II_BFO_00_BHZ_Q

Archive Format

The pre-defined archive layouts are as follows:

-CHAN dir   :: dir/%n.%s.%l.%c
-QCHAN dir  :: dir/%n.%s.%l.%c.%q
-CDAY dir   :: dir/%n.%s.%l.%c.%Y:%j:#H:#M:#S
-SDAY dir   :: dir/%n.%s.%Y:%j
-BUD dir    :: dir/%n/%s/%s.%n.%l.%c.%Y.%j
-SDS dir    :: dir/%Y/%n/%s/%c.D/%n.%s.%l.%c.D.%Y.%j
-CSS dir    :: dir/%Y/%j/%s.%c.%Y:%j:#H:#M:#S

An archive format is expanded for each record using the following substitution flags:

  n : network code, white space removed
  s : station code, white space removed
  l : location code, white space removed
  c : channel code, white space removed
  Y : year, 4 digits
  y : year, 2 digits zero padded
  j : day of year, 3 digits zero padded
  H : hour, 2 digits zero padded
  M : minute, 2 digits zero padded
  S : second, 2 digits zero padded
  F : fractional seconds, 4 digits zero padded
  q : single character record quality indicator (D, R, Q)
  L : data record length in bytes
  r : sample rate (Hz) as a rounded integer
  R : sample rate (Hz) as a float with 6 digit precision
  % : the percent (%) character
  # : the number (#) character

The flags are prefaced with either the % or # modifier. The % modifier indicates a defining flag while the # indicates a non-defining flag. All records with the same set of defining flags will be written to the same file. Non-defining flags will be expanded using the values in the first record for the resulting file name.

Time flags are based on the start time of the given record.

Archive Format Examples

The format string for the predefined BUD layout:

/archive/%n/%s/%s.%n.%l.%c.%Y.%j

would expand to day length files named something like:

/archive/NL/HGN/HGN.NL..BHE.2003.055

As an example of using non-defining flags the format string for the predefined CSS layout:

/data/%Y/%j/%s.%c.%Y:%j:#H:#M:#S

would expand to:

/data/2003/055/HGN.BHE.2003:055:14:17:54

resulting in day length files because the hour, minute and second are specified with the non-defining modifier. The hour, minute and second fields are from the first record in the file.

Leap Second List File

If the environment variable LIBMSEED_LEAPSECOND_FILE is set it is expected to indicate a file containing a list of leap seconds as published by NIST and IETF, usually available here: https://www.ietf.org/timezones/data/leap-seconds.list

Specifying this file is highly recommended when pruning records at the sample level.

If present, the leap seconds listed in this file will be used to adjust the time coverage for records that contain a leap second. Also, leap second indicators in the miniSEED headers will be ignored.

To suppress the warning printed by the program without specifying a leap second file, set LIBMSEED_LEAPSECOND_FILE=NONE.

Error Handling And Return Codes

Any significant error message will be pre-pended with "ERROR" which can be parsed to determine run-time errors. Additionally the program will return an exit code of 0 on successful operation and 1 when any errors were encountered.

Author

Chad Trabant
IRIS Data Management Center

(man page 2018/5/22)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datafilter.md

datafilter.md

datafilter - miniSEED data filtering

Synopsis

Description

Options

Selection File

Input List File

Input File Range

Match Or Reject List File

Archive Format

Archive Format Examples

Leap Second List File

Error Handling And Return Codes

Author

Files

datafilter.md

Latest commit

History

datafilter.md

File metadata and controls

datafilter - miniSEED data filtering