replace

Overview

The replace statement is used to search for, remove, and optionally replace with new values, substrings of the values in a column.

Syntax

replacesubstringincolName[withreplacement]

Details

The replace statement will remove or replace all occurrences of substring in the values of a column. The parameters are as follows:

Parameter

Notes

substring

A string to search the values in colName for

colName

The column to search for the substring

replacement

The string to replace occurrences of substring with

The colName argument may or may not be fully qualified, but must reference an existing column.

If the optional replacement string is provided then any occurrences of substring will be substituted with the specified value.

If the replacement string is not provided then all occurrences of substring will be removed from the values in colName.

The replace statement is useful for reducing verbosity in extracted data, resulting in smaller RDFs without losing required information

Example

Given the following sample data:

PayerAccountName,TaxationAddress,ProductCode,ProductName,ItemDescription
John Doe,"123 Big St, Largetown, RH15 0HZ, United Kingdom",AWSDataTransfer,AWS Data Transfer,$0.00 per GB - EU (Germany) data transfer from EU (Ireland)
John Doe,"123 Big St, Largetown, RH15 0HZ, United Kingdom",AmazonS3,Amazon Simple Storage Service,$0.0245 per GB - first 50 TB / month of storage used
John Doe,"123 Big St, Largetown, RH15 0HZ, United Kingdom",AWSDataTransfer,AWS Data Transfer,$0.00 per GB - EU (Germany) data transfer from US West (Northern California)
John Doe,"123 Big St, Largetown, RH15 0HZ, United Kingdom",AWSDataTransfer,AWS Data Transfer,$0.090 per GB - first 10 TB / month data transfer out beyond the global free tier
John Doe,"123 Big St, Largetown, RH15 0HZ, United Kingdom",AWSDataTransfer,AWS Data Transfer,$0.00 per GB - EU (Germany) data transfer from US East (Northern Virginia)

The following script ...

import "system\extracted\example.csv" source test alias data
replace "John Doe" in PayerAccountName with "Finance Dept"
replace "RH15 0HZ, " in TaxationAddress
replace "United Kingdom" in TaxationAddress with UK
replace "AWS" in ProductCode
where ([ProductCode] == "AmazonS3") {
replace "Amazon" in ProductCode
}
replace "Amazon " in ProductName
replace "AWS " in ProductName
replace "transfer from" in ItemDescription with -

... will produce the following output data.

"PayerAccountName","TaxationAddress","ProductCode","ProductName","ItemDescription"
"Finance Dept","123 Big St, Largetown, UK","DataTransfer","Data Transfer","$0.00 per GB - EU (Germany) data - EU (Ireland)"
"Finance Dept","123 Big St, Largetown, UK","S3","Simple Storage Service","$0.0245 per GB - first 50 TB / month of storage used"
"Finance Dept","123 Big St, Largetown, UK","DataTransfer","Data Transfer","$0.00 per GB - EU (Germany) data - US West (Northern California)"
"Finance Dept","123 Big St, Largetown, UK","DataTransfer","Data Transfer","$0.090 per GB - first 10 TB / month data transfer out beyond the global free tier"
"Finance Dept","123 Big St, Largetown, UK","DataTransfer","Data Transfer","$0.00 per GB - EU (Germany) data - US East (Northern Virginia)"