create

Overview

The create statement is used to add one more more new columns to an existing DSET.

Syntax

create columnNewColumnName[valueValue]

create columns fromColumnName[usingValueColumnName]

create mergedcolumnNewColumn[separatorsep]from [stringliteral] Column [/regex/] [Column [/regex/]|stringliteral]

Details

Explicit single column creation

Syntax

create columnNewColumnName[valueValue]

Details

This statement is used to create a new column called NewColumnName. The NewColumnName argument may be a fully qualified column name, in which case the new column will be created in the DSET specified as part of that name.

Note: If no default DSET has been explicitly defined using the default dset statement then the DSET created by the first use or import statement in the Transcript task is automatically set as the default DSET.

A column called NewColumnName must not already exist in the DSET. If NewColumnName contains dots then they will be converted into underscores.

The new column will be created with no values in any cells, unless the optional value *Value* portion of the statement is present, in which case all the cells in the new column will be set to Value.

Examples

Create a new empty column called Cost in the default DSET: create column Cost

Create a new column called Cost with a value of 1.0 in every row of the default DSET: create column Cost value 1.0

Create a new column called Cost with a value of 1.0 in every row of the DSET custom.charges: create column custom.charges.Cost value 1.0

Automated single/multiple column creation

Syntax

create columns fromColumnName[usingValueColumnName]

Details

This statement is used to create multiple columns in a single operation. As is the case for create columns above, if the using ValueColumnName portion of the statement is not present, then all newly created columns will have no values in any cells.

Given this example dataset:

The statement create columns from ServiceName using Count will create the result shown below:

The names of the new columns to create are derived from the contents of the cells in the column called ColumnName, and the values (if opted for) are derived from the contents of the cells in the column called ValueColumnName. Duplicates are ignored. If all the cells in ColumnName have the same contents, then only a single new column will be created. To illustrate this, consider the following:

SubscriptionID,ServiceName,Quantity
FE67,StorageGB,30
1377,Small_VM,2
EDED,Medium_VM,8
8E1B,Large_VM,1
99AA,Small_VM,99

When applied to the data above, the statement create columns from ServiceName will produce the following result (note that only a single column called Small_VM is created, and that empty cells are represented with a separator character, which in the case of the below is a comma):

SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
FE67,StorageGB,30,,,,
1377,Small_VM,2,,,,
EDED,Medium_VM,8,,,,
8E1B,Large_VM,1,,,,
99AA,Small_VM,99,,,,

If opting to set the values in the new columns, then for each row the value in ValueColumnName will be copied into the column whose name matches ColumnName. When applied to the same original data, the statement create columns from ServiceName using Quantity will produce the following result:

SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
FE67,StorageGB,30,30,,,
1377,Small_VM,2,,2,,
EDED,Medium_VM,8,,,8,
8E1B,Large_VM,1,,,,1
99AA,Small_VM,99,,99,,

When using create columns the new columns are always created in the default DSET. This means that when no values are being set, it is possible to specify a different DSET for ColumnName. If the default DSET is Azure.usage, then the statement create columns from custom.data.Services will derive the names of the new columns from the cell contents in the Services column in the custom.data DSET.

This is only possible in the absence of the using ValueColumnName option. When values are to be set, both the ColumnName and ValueColumnName arguments must belong to the default DSET.

Example

The following transcript task will import the datasets Azure.usage and system/extracted/Services.csv, and create new (empty) columns in Azure.usage whose names are taken from the values in the column ServiceDefinitions in Services.csv.

import system/extracted/Services.csv source custom
import usage from Azure
default dset Azure.usage
create columns from custom.Services.ServiceDefinitions

Merging column values to create a new column

Syntax

create mergedcolumnNewColumn[separatorsep]from [stringliteral] Column [/regex/] [ ... Column [/regex/]|stringliteral]

If preferred, the wordusingmay be used instead of the wordfrom(both work in an identical fashion)

Details

This form of the statement is used to generate a new column containing values derived from those in one or more existing columns (termed source columns). The parameters are as follows:

Parameter

Required

Meaning

NewColumn

Yes

The name of the new column to create

sep

No

If specified after the separator keyword, sep is a string to insert between values extracted from the source columns

Column

Yes (at least one)

The name of a source column. A minimum of 1 column must be specified (and at most, 8 may be specified)

/regex/

No

If specified, the expression enclosed by the / characters is applied to the values in source column specified by the preceding ColumnN argument

string literal

No

If specified, literal will be add to column value. Relative order of Column and literal bits is observed

The separator may be more than one character in length (up to 31 characters may be specified)

If a regex is specified then it must contain a subgroup enclosed in parentheses. The portion of the text in the source column matched by this subgroup will be extracted and used in place of the full column value.

The '/' characters surrounding the regular expression in the statement are not considered to be part of the expression itself - they are merely there to differentiate an expression from another column name.

If a regex is not specified, then the entire value in the source column will be used.

Options

By default the value extracted from a source column will be blank in the following two cases:

  • There is a blank value in a source column

  • No match for a regular expression is found in the value of a source column

In such cases the merged result will simply omit the contribution from the source column(s) in question. If all the source columns yield a blank result then the final merged result will also be blank.

This behaviour can be overridden through the use of the option statement. The options associated with the create mergedcolumn statement are as follows:

option merge_blank = some_text_here

This option will use the string some_text_here in place of any blank source column value.

option merge_nomatch = some_text_here

This option will use the string some_text_here if the result of applying the regular expression to a column value returns no matches.

Specifying the literal string <blank> as the merge_blank or merge_nomatch value will reset the option such that the default behaviour is re-activated.

Examples

Given the following dataset:

name,user_id,department
Eddy,123-456-123456,Development
Tim,654-321-654321,Project Management
Joram,555-222-999111,Development
Joost,826-513-284928,Sales and Marketing

The following examples illustrate some uses of the create mergedcolumn statement:

Example 1

# Create a new column called 'key' which combines the 'department'
# with the middle three digits of the 'user_id', separated by :
create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/
# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513

Example 2

If no regular expression is specified then the values in the source column will be used in their entirety:

# Create a new column called 'key' which combines the 'department'
# and 'user_id' columns separated by ":", with prefix
create mergedcolumn key separator : from string prefix department user_id
# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,prefix:Development:123-456-123456
Tim,654-321-654321,Project Management,prefix:Project Management:654-321-654321
Joram,555-222-999111,Development,prefix:Development:555-222-999111
Joost,826-513-284928,Sales and Marketing,prefix:Sales and Marketing:826-513-284928

Example 3

Let us add a new row to the sample dataset which has a non-compliant value for the user_id:

name,user_id,department
Eddy,123-456-123456,Development
Tim,654-321-654321,Project Management
John,xxx-xxx-xxxxxx,Pending
Joram,555-222-999111,Development
Joost,826-513-284928,Sales and Marketing

By default a non-matching value will result in a blank component of the merged result:

# Create a new column called 'key' which combines the 'department'
# with the middle three digits of the 'user_id', separated by :
create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/
# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
John,xxx-xxx-xxxxxx,Pending,Pending
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513

In this case, the resulting key for John has no separator characters in it. We can force a default value for the missing user_id portion as follows:

option merge_nomatch = [none]
create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/
# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
John,xxx-xxx-xxxxxx,Pending,Pending:[none]
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513