Search…
create

Overview

The create statement is used to add one more more new columns to an existing DSET.

Syntax

create columnNewColumnName[valueValue]
create columns fromColumnName[usingValueColumnName]
create mergedcolumnNewColumn[separatorsep]from [stringliteral] Column [/regex/] [Column [/regex/]|stringliteral]

Details

Explicit single column creation

Syntax

create columnNewColumnName[valueValue]

Details

This statement is used to create a new column called NewColumnName. The NewColumnName argument may be a fully qualified column name, in which case the new column will be created in the DSET specified as part of that name.
Note: If no default DSET has been explicitly defined using the default dset statement then the DSET created by the first use or import statement in the Transcript task is automatically set as the default DSET.
A column called NewColumnName must not already exist in the DSET. If NewColumnName contains dots then they will be converted into underscores.
The new column will be created with no values in any cells, unless the optional value *Value* portion of the statement is present, in which case all the cells in the new column will be set to Value.

Examples

Create a new empty column called Cost in the default DSET: create column Cost
Create a new column called Cost with a value of 1.0 in every row of the default DSET: create column Cost value 1.0
Create a new column called Cost with a value of 1.0 in every row of the DSET custom.charges: create column custom.charges.Cost value 1.0

Automated single/multiple column creation

Syntax

create columns fromColumnName[usingValueColumnName]

Details

This statement is used to create multiple columns in a single operation. As is the case for create columns above, if the using ValueColumnName portion of the statement is not present, then all newly created columns will have no values in any cells.
Given this example dataset:
The statement create columns from ServiceName using Count will create the result shown below:
The names of the new columns to create are derived from the contents of the cells in the column called ColumnName, and the values (if opted for) are derived from the contents of the cells in the column called ValueColumnName. Duplicates are ignored. If all the cells in ColumnName have the same contents, then only a single new column will be created. To illustrate this, consider the following:
1
SubscriptionID,ServiceName,Quantity
2
FE67,StorageGB,30
3
1377,Small_VM,2
4
EDED,Medium_VM,8
5
8E1B,Large_VM,1
6
99AA,Small_VM,99
Copied!
When applied to the data above, the statement create columns from ServiceName will produce the following result (note that only a single column called Small_VM is created, and that empty cells are represented with a separator character, which in the case of the below is a comma):
1
SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
2
FE67,StorageGB,30,,,,
3
1377,Small_VM,2,,,,
4
EDED,Medium_VM,8,,,,
5
8E1B,Large_VM,1,,,,
6
99AA,Small_VM,99,,,,
Copied!
If opting to set the values in the new columns, then for each row the value in ValueColumnName will be copied into the column whose name matches ColumnName. When applied to the same original data, the statement create columns from ServiceName using Quantity will produce the following result:
1
SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
2
FE67,StorageGB,30,30,,,
3
1377,Small_VM,2,,2,,
4
EDED,Medium_VM,8,,,8,
5
8E1B,Large_VM,1,,,,1
6
99AA,Small_VM,99,,99,,
Copied!
When using create columns the new columns are always created in the default DSET. This means that when no values are being set, it is possible to specify a different DSET for ColumnName. If the default DSET is Azure.usage, then the statement create columns from custom.data.Services will derive the names of the new columns from the cell contents in the Services column in the custom.data DSET.
This is only possible in the absence of the using ValueColumnName option. When values are to be set, both the ColumnName and ValueColumnName arguments must belong to the default DSET.

Example

The following transcript task will import the datasets Azure.usage and system/extracted/Services.csv, and create new (empty) columns in Azure.usage whose names are taken from the values in the column ServiceDefinitions in Services.csv.
1
import system/extracted/Services.csv source custom
2
import usage from Azure
3
default dset Azure.usage
4
create columns from custom.Services.ServiceDefinitions
Copied!

Merging column values to create a new column

Syntax

create mergedcolumnNewColumn[separatorsep]from [stringliteral] Column [/regex/] [ ... Column [/regex/]|stringliteral]
If preferred, the wordusingmay be used instead of the wordfrom(both work in an identical fashion)

Details

This form of the statement is used to generate a new column containing values derived from those in one or more existing columns (termed source columns). The parameters are as follows:
Parameter
Required
Meaning
NewColumn
Yes
The name of the new column to create
sep
No
If specified after the separator keyword, sep is a string to insert between values extracted from the source columns
Column
Yes (at least one)
The name of a source column. A minimum of 1 column must be specified (and at most, 8 may be specified)
/regex/
No
If specified, the expression enclosed by the / characters is applied to the values in source column specified by the preceding ColumnN argument
string literal
No
If specified, literal will be add to column value. Relative order of Column and literal bits is observed
The separator may be more than one character in length (up to 31 characters may be specified)
If a regex is specified then it must contain a subgroup enclosed in parentheses. The portion of the text in the source column matched by this subgroup will be extracted and used in place of the full column value.
The '/' characters surrounding the regular expression in the statement are not considered to be part of the expression itself - they are merely there to differentiate an expression from another column name.
If a regex is not specified, then the entire value in the source column will be used.

Options

By default the value extracted from a source column will be blank in the following two cases:
    There is a blank value in a source column
    No match for a regular expression is found in the value of a source column
In such cases the merged result will simply omit the contribution from the source column(s) in question. If all the source columns yield a blank result then the final merged result will also be blank.
This behaviour can be overridden through the use of the option statement. The options associated with the create mergedcolumn statement are as follows:
option merge_blank = some_text_here
This option will use the string some_text_here in place of any blank source column value.
option merge_nomatch = some_text_here
This option will use the string some_text_here if the result of applying the regular expression to a column value returns no matches.
Specifying the literal string <blank> as the merge_blank or merge_nomatch value will reset the option such that the default behaviour is re-activated.

Examples

Given the following dataset:
1
name,user_id,department
2
Eddy,123-456-123456,Development
3
Tim,654-321-654321,Project Management
4
Joram,555-222-999111,Development
5
Joost,826-513-284928,Sales and Marketing
Copied!
The following examples illustrate some uses of the create mergedcolumn statement:
Example 1
1
# Create a new column called 'key' which combines the 'department'
2
# with the middle three digits of the 'user_id', separated by :
3
4
create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/
5
6
# Result:
7
name,user_id,department,key
8
Eddy,123-456-123456,Development,Development:456
9
Tim,654-321-654321,Project Management,Project Management:321
10
Joram,555-222-999111,Development,Development:222
11
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
Copied!
Example 2
If no regular expression is specified then the values in the source column will be used in their entirety:
1
# Create a new column called 'key' which combines the 'department'
2
# and 'user_id' columns separated by ":", with prefix
3
4
create mergedcolumn key separator : from string prefix department user_id
5
6
# Result:
7
name,user_id,department,key
8
Eddy,123-456-123456,Development,prefix:Development:123-456-123456
9
Tim,654-321-654321,Project Management,prefix:Project Management:654-321-654321
10
Joram,555-222-999111,Development,prefix:Development:555-222-999111
11
Joost,826-513-284928,Sales and Marketing,prefix:Sales and Marketing:826-513-284928
Copied!
Example 3
Let us add a new row to the sample dataset which has a non-compliant value for the user_id:
1
name,user_id,department
2
Eddy,123-456-123456,Development
3
Tim,654-321-654321,Project Management
4
John,xxx-xxx-xxxxxx,Pending
5
Joram,555-222-999111,Development
6
Joost,826-513-284928,Sales and Marketing
Copied!
By default a non-matching value will result in a blank component of the merged result:
1
# Create a new column called 'key' which combines the 'department'
2
# with the middle three digits of the 'user_id', separated by :
3
4
create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/
5
6
# Result:
7
name,user_id,department,key
8
Eddy,123-456-123456,Development,Development:456
9
Tim,654-321-654321,Project Management,Project Management:321
10
John,xxx-xxx-xxxxxx,Pending,Pending
11
Joram,555-222-999111,Development,Development:222
12
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
Copied!
In this case, the resulting key for John has no separator characters in it. We can force a default value for the missing user_id portion as follows:
1
option merge_nomatch = [none]
2
create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/
3
4
# Result:
5
name,user_id,department,key
6
Eddy,123-456-123456,Development,Development:456
7
Tim,654-321-654321,Project Management,Project Management:321
8
John,xxx-xxx-xxxxxx,Pending,Pending:[none]
9
Joram,555-222-999111,Development,Development:222
10
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
Copied!
Last modified 2yr ago