timecolumns

Overview

The timecolumns statement is used to set the start time and end time columns in a DSET

Syntax

timecolumnsstart_time_col end_time_col

timecolumns clear

Details

The usage data stored in a DSET may or may not be time sensitive. By default it is not, and every record is treated as representing usage for the entire day. In many cases however, the usage data contains start and end times for each record which define the exact time period within the day that the record is valid for.

If the usage data contains start and end times that are required for functions such as aggregation or reporting, the column(s) containing those times need to be marked such that they can be identified further on in the processing pipeline. This marking is done using the timecolumns statement.

The timecolumns statement does not perform any validation of the values in either of the columns it is flagging. This is by design, as it may be that the values in the columns will be updated by subsequent statements.

The values in the columns will be validated by the finish statement.

If the timecolumns statement is executed more than once, then only the columns named by the latest execution of the statement will be flagged. It is not possible to have more than one start time and one end time column.

Both the start_time_col and end_time_col parameters may be fully qualified column names, but they must both belong to the same DSET.

It is possible to use the same column as both the start and end times. In such cases the usage record is treated as spanning 1 second of time. To do this, simply reference it twice in the statement:

timecolumns timestamp_col timestamp_col

Clearing the flagged timestamp columns

To clear both the start and end time columns, thus restoring the default DSET to treating each record as spanning the entire day, the statement timecolumns clear may be used.

Currently the statement timecolumns clearwill only clear the timestamp columns in the default DSET

This can be useful in the following use case:

  • The DSET is loaded and timestamp columns are created

  • finish is used to create a time-sensitive RDF

  • The timestamp columns are cleared

  • The DSET is renamed using the rename dset statement

  • Further processing is done on the DSET as required

  • finish is used to create a second RDF which is not time-sensitive

Example

# Read data from file into a DSET called usage.data
import system/extracted/usage_data.csv source usage alias data
# Create two UNIX-format timestamp columns from the columns
# usageStartTime and usageEndTime, each of which records a time
# in the format '2017-05-17T17:00:00-07:00'
var template = YYYY.MM.DD.hh.mm.ss
timestamp START_TIME using usageStartTime template ${template}
timestamp END_TIME using usageEndTime template ${template}
# Flag the two columns we just created as being the start and
# end time columns
timecolumns START_TIME END_TIME
# Create the time sensitive DSET
finish