Transform

Last updated 5 months ago

Transcript executes user-definable scripts (termed tasks) in order to produce one or more Reporting Database Files (RDFs) from one or more input Dataset files in CSV format. These RDFs are later used by the reporting engine to generate results.

Overview

Transcript tasks are located in system/config/transcript/ and are ASCII files which can be created with any editor. Both UNIX and Windows end-of-line formats are supported.

Statements

Each statement in a Transcript task must be contained on a single line. Statements consist of a keyword indicating the action to perform, followed by zero or more parameters, separated by white-space, required by the statement. Documentation for all the possible statements can be found in the Transcript language reference guide.

Quotes and escapes

By default a space, tab or newline will mark the end of a word in a Transcript task. To include white-space in a parameter (for example to reference a column name with a space in it) then this can be done by enclosing it in double quotes or escaping it by preceding it with \.

Examples: create columns from "Meter Name" using Quantity create columns from Meter\ Name using Quantity

The following table summarises the behaviour of quotes and escapes:

Characters

Meaning

" ... "

Anything inside the quotes, except for a newline or an escape character is treated as literal text

\"

Whether within quotes or not, this is expanded to a double quote - " - character

\t

When used outside quotes, this is expanded to a TAB character

\

When used outside quotes, a space following the \ is treated as a literal character

\\

Whether within quotes or not, this is expanded to a backslash - \ - character

Comments

Comments in a Transcript task start with a # character that is either of:

  • the first character of a line in the Transcript task

  • the first character in a word

Comments always end at the end of the line they are started on.

# This is a comment
import usage from Azure # The text from '#' onwards is a comment
import usage#1 from Azure # The '#' in 'usage#1' does not start a comment

Variables

Transcript statements may contain variables. Variables have a name and a value. When a variable name is encountered during execution of the task, the name is replaced with the value of the variable with that name.

To separate them from normal statement words, variable names are always preceded with ${ and followed by }. Therefore the variable with the name dataDate is referenced as ${dataDate} in the transcript task. As well as user defined variables (created using the var statement), the following default variables are supported by Exivity:

Variable

Meaning

${dataDate}

The datadate currently in effect, in yyyyMMdd format

${dataDay}

The day value in the dataDatevariable, expressed as a 2 digit number padded with a leading zero if necessary

${dataMonth}

The month value in the dataDate variable, expressed as a 2 digit number padded with a leading zero if necessary

${dataMonthDays}

The number of days in the month in the dataMonth variable

${dataDateStart}

00:00:00 on the day in the dataDate variable, expressed as a UNIX timestamp

${dataDateEnd}

23:59:59 on the day in the dataDate variable, expressed as a UNIX timestamp

${dataYear}

The year value in the dataDate variable, expressed as a 4 digit number

${homeDir}

The base working directory currently in effect

${exportDir}

This is the equivalent of ${baseDir}\exported

Variable names ...

  • may be used multiple times in a single statement

  • are case sensitive - ${dataDate} is different to ${datadate}

  • may not be nested

  • may be embedded within surrounding text - xxx${dataDate}yyy

  • may be used within quotes: import "${baseDir}\to_import\AzureJuly${dataDate}.ccr" source AzureJuly

  • may appear as words of their own in a transcript statement - create column Date value ${dataDate}

Regular Expression variables

A regular expression variable is a special type of variable used to match the name of a column in a DSET. It is enclosed by ${/ and /} and the text within this enclosure can take either of the following two forms:

  1. ${/expression/}

    • The regular expression described by expression will be applied to the default DSET

  2. ${/dset.id/expression/}

    • If the text preceding the central / character is a valid DSET ID then the expression after that / will be applied to the column names in that DSET

    • If the text preceding the / character is not a valid DSET ID then the entire text of the variable between the ${/ and /} enclosure is treated as a regular expression and will be applied to the default DSET

Once the DSET ID and the expression have been established by the above, the expression is tested against each column name in the DSET and the first matching column name is returned. If no match is found, then an error is logged and the transcript task will fail.

The regular expression may contain a subgroup, which is enclosed within parentheses - ( and ). If no subgroup is present, and a match is made, then the entire column name will be returned. If a subgroup is present and a match is made, then only the characters matching the portion of the expression within the parentheses are returned. For example:

# Rename a column with 'Operations' in its name such that its
# new name is whatever came before 'Operations' in the original name
var prefix = ${/(.*)Operations/}
rename column ${/.*Operations/} to ${prefix}

The expression does not have to match the entire column name. Assuming no subgroup is specified, as long as a match is made then the variable will be expanded to the whole column name.

Regular expression variables are powerful tools when combined with the rename statement, as they can be used to transform an uncertain column name into a known one.

Examples:

# Rename a column containing 'Transfer' or 'transfer' in
# its name, such that it is called 'Transfer':
rename column ${/.*[Tt]ransfer/} to Transfer
# As above, but specifically for the 'Azure.usage' DSET
rename column ${/Azure.usage/.*[Tt]ransfer/} to Transfer
# Rename a column with 'Operations' in its name such that its
# new name is whatever came before 'Operations' in the original name
var prefix = ${/(.*)Operations/}
rename column ${/.*Operations/} to ${prefix}

Importing Data

A Transcript task cannot manipulate data on disk directly, so it is necessary to import one or more Datasets in CSV format at runtime in order to process the data within them. When a Dataset is imported the following sequence of actions takes place:

  1. The Dataset (in CSV format) is read from disk

  2. A number of checks are done on the data to ensure it meets the requirements to qualify as a Dataset

  3. The data is converted into an internal format called a DSET

  4. The DSET is assigned two tags (source and alias) which when combined together form a unique ID to identify the DSET (see Core concepts for more information)

  5. An index is constructed, which facilitates high speed manipulation of the data in the DSET

  6. The DSET is added to the list of DSETs available for use by subsequent statements in the Transcript task

Once these actions have been completed, a DSET can be identified through the unique combination of source.alias. This permits Transcript statements to specify which DSET to operate on.

In addition, a default DSET can be specified, which will be used if no alternative DSET is specified. Full details of these mechanism are detailed in the reference guide, specifically in the import and default articles.

Exporting Data

Data can be exported in one of two ways during the execution of a Transcript task:

Export on demand

Many Transcript statements change the data in the DSET in some way. Columns may be created, renamed or deleted and rows may be added and removed for example.

At any point in the Transcript process the current state of a DSET can be rendered to disk as an output CSV file. This is accomplished via use of the export statement. This permits snapshots of a DSET to be created for debugging or audit purposes, as well as the creation of partially processed CSV files for import into a later Transcript process.

Finishing

The finish statement creates a Reporting Database File (RDF) containing the data in a DSET. This RDF can then be used by the reporting engine.