Your
First Script Component in
SQL Server 2005 Integration Services
A sample from
The Rational Guide to Scripting SQL Server 2005 Integration Services
by Donald Farmer
In Chapter
2, you learned about the three types of Data Flow components
in SSIS: sources,
transformations and destinations. This chapter describes these
different types and show you
how to create your first transformation component using script.
Once you have mastered
the Script Component, you will find that you can readily integrate
almost any functionality
into the SSIS Data Flow. Custom scripted data sources can handle
otherwise unsupported
file formats; custom transformations can call functions in
managed assemblies, including
the .NET Framework; and custom scripted destinations enable
SSIS to output data in very
flexible ways. In fact, it is quite possible to write sophisticated
data flows for ETL and data
integration using only script components!
Script
Component Types
You will typically use script
somewhat differently in each of the three component
types.
Script Source Component
Source components have no input columns,
but do have output columns. The purpose of
the Script Component in this case is to deliver data
to the output columns. For example,
the component author could write a script using file
and string handling functions to parse a complex text file
whose format is not suitable for parsing with the Flat File
Connection
Manager.
Script
Transformation Component
Transformation components have input
columns and output columns. In these components,
the script typically will transform the data in some way between
inputs and outputs. What
these transformations are is up to you—that is why scripts
are so flexible. Chapter 2
discussed two different patterns of transformation: synchronous
and asynchronous . Script
components can handle both patterns quite easily.
A synchronous component, you will remember, is particularly
useful for row-by-row
transformations. For example, I may have incoming data that
includes customer names.
Perhaps for easy cross-referencing with other customer records,
I would like to calculate
a SOUNDEX value for each customer name. SOUNDEX is an algorithm
invented by
the US Census for codifying names to take account of different
spellings. T-SQL has a
SOUNDEX function, but SSIS does not.
An SSIS developer can code a SOUNDEX function in Visual Basic
.NET script. With
this, the script can transform every incoming customer name
and emit a SOUNDEX value
at the output.

An asynchronous
component is useful for performing operations which change
the shape
of the data significantly, or where incoming rows do not have
related rows at the output.
A good use of an asynchronous script would be to aggregate
text. The SSIS Aggregate
component is very powerful, but it can only perform Min and
Max calculations against
numeric columns. If you need to be able to calculate the Min
and Max values of a string
column, this can easily be achieved in Visual Basic .NET script,
as we shall see in Chapter
8.
Script
Destination Component
As mentioned
previously, it is possible to have a text file that
the Flat File Connection
Manager cannot parse, but which a script source component
can handle. In an enterprise
where text files like this are important to legacy applications,
you may also need the
ability to write data to a file in this format for the
legacy application to read. The script
destination component is useful in these circumstances.
You will have guessed by now
that a script destination component has input columns
but no output columns. Instead, the
Visual Basic .NET script handles the data, perhaps using
file and string routines from the
.NET Framework to output text files in the appropriate
format.
Adding a Script Component to Your Package
The
first Script Component we are going to look at it is
a transformation component.
In fact,
we are going to build a simple SOUNDEX component to
transform a column containing a
name to a codified value representing the sound of
the name, just as the US Census would
do. As this first component is a transformation, it
requires some data to work with. We can
quickly build a package containing a Data Flow and
a source component to get started.

Preparing the Package
Use the following steps to prepare the package:
- Create a new SSIS package.
- In the designer, drag a Data Flow Task from the Control
Flow Items tab of
the Toolbox to the Control Flow design surface.
- Double-click
the Data Flow Task to open the Data Flow design surface.
- Drag an OLE DB Source component from the Data Flow Sources
tab of
the Toolbox .
- Double-click the OLE DB Source component shape on the design
surface to
open the OLE DB Source Editor .
- Click the New button to create a new OLE DB Connection
Manager .
- Select an existing connection to your AdventureWorksDW
database, or
create a new one now. This assumes you have installed the
sample databases
with SQL Server.
- Click the OK button to return to the OLE DB Source Editor
.
- Select the Table or View Data Access Mode .
- Select
the [dbo].[DimCustomer] table. ( DimCustomer stands for
Customer
Dimension and is not a reflection on the intelligence of
AdventureWorks
customers! DimEmployee, however, may be a different matter.)
- Click
the Preview button if you would like to see the data in
this table. Note
the Last Name field that we shall be using later.
- Click the OK button to close the OLE DB Source Editor .
Now we have some source data to work with. At this stage,
your package design should
look like the example in Figure 7.1.

Figure 7.1: Package to Which We Will Add Our First Script
Component.
Adding the Script Component
- In the designer, drag a Script Component from the Data
Flow
Transformations tab of the Toolbox to the Data Flow design
surface.
- When
you drag the Script Component, the Select Script Component
Type
Dialog will appear, prompting you to select the kind of
component you
want to create (see Figure 7.2). This is important, as
the configuration of the
Script Component is somewhat different for each type. This
dialog box sets
up the component for you automatically, saving some work
in creating this
configuration yourself.

Figure 7.2: Select Component Type Dialog
Box.
- Select Transformation (the default) and click the OK button.
- Now connect the output of the OLE DB Source to the Script
Component .
At this stage your package should look like Figure 7.3. Note
the warning icon
that appears in the Script Component . You can mouse over
that to see the
text of the warning. In this case, we have not yet added
any script code, so
the component at this point is in an invalid state.
Figure 7.3: Package with OLE DB Source and Script Component
Connected.
Adding Columns to Your Script Component
The next
step is to tell your Script Component which input columns to
work with. The Script
Component requires you explicitly to say in advance which columns
are required. This
is because the component creates a wrapper that exposes these
columns to the scripting
environment. It would be expensive and redundant to expose
all the input columns by
default when you may only be using one of them.
Selecting the input columns is easy. Just follow these steps:
- Double-click the Script Component shape on the design surface
to open the
Script Transformation Editor .
- The default view is of the Input Columns tab. In this case,
check the box
next to the Last Name column. The component editor should
now look like
Figure 7.4.
Figure
7.4: Component Editor with Last Name Column Selected.
Now we can use the Last Name column in our component. Note
that we have left the
Usage Type of the column as Read Only . This is because we
do not want to alter the last
name. Instead, we want to calculate a new column from the values
in this column. But
where will these new values go? We need to create a new column
to hold these new values.
This new column will not appear at the input. It is being calculated
by the script, so it can
only appear at the output.
We should add an Output Column, as follows:
- Select the Inputs and Outputs tab of the Script Transformation
Editor .
- Expand the Output node of the Inputs and Outputs tree view.
- Select the Output Columns folder under the Output node.
- Click the Add Column button. This will add a new Output
Column , named
Column by default. The data type will be a four-byte signed
integer. You can
see this in Figure 7.5.
Figure 7.5: New Output Column.
This default column is not quite what we need. We can, however,
edit its
properties.
- Edit the Name property of the column to be Soundex. The
column name
should change in the tree view, too.
- SOUNDEX codes are strings, so select the Data Type property
of the
column and use the drop-down list to select the type String
[DT_STR] .

At this stage, your component should look Figure 7.6.

Figure 7.6: Renamed Output Column.
We have selected an Input Column ( LastName ) and created
an Output Column ( Soundex )
to hold the transformed value. Now we are ready to write some
code!
Elements
of the Script
Navigate
to the Script tab of the component. The component editor will
now look like
Figure 7.7.

Figure 7.7: Script Tab of the Script Component.
There are some useful properties available in the property
grid, but they are, in fact,
optional. However, it is a good idea to set the PreCompile
property to True .

To get started, just click the Design Script button. The VSA
editor will open, looking like
Figure 7.8.

Figure 7.8: Script Editing Environment for a Component.
In Chapter 3, you learned about the VSA environment and its
various elements. These
should be familiar to you now, but some aspects of the script
for a Script Component
will certainly be new, compared to the Script Task we looked
at earlier. First, look at the
Imports statements for the Script Component in Listing 7.1.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Listing 7.1: Imports Statements for a Script Component.
You can see that rather than importing the Dts.Runtime namespace,
there are two “
wrappers” to be imported. These wrappers enable a script
within a Data Flow Task to “
see out” from the task to objects such as variables.
In Chapter 8, you will learn that this
introduces a slightly different way of accessing variables.
In the Script Task, the Main() subroutine was where most of
the work was done, but here
in the Script Component, we see something a little more involved,
as in Listing 7.2.
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer)
'
' Add your code here
'
End Sub
End Class
Listing
7.2: ScriptMain Class in the Script Component.
The ScriptMain class performs those duties we mentioned earlier – wrapping
and
exposing the Data Flow columns and objects to the script. Within
ScriptMain you can
see the following subroutine: Input_ProcessInputRow(ByVal Row
As InputBuffer) .What
does this mean? We will cover this in more detail later, but
for now, all we need to know is
that when the input to the Script Component is processed, SSIS
in effect calls this routine
for every row. In practice, this means that we can immediately
start writing useful script
code.
Editing
the Script Component
The example
we are going to implement is a function to convert the customer’s
last name
to a SOUNDEX code. Listing 7.3 shows the function that we will
use for SOUNDEX,
written in Visual Basic .NET code.
Function CalcSoundex(ByVal sName As String) As String
Dim i, acode, dcode, prevCode As Integer
Const codes As String = "01230120022455012623010202"
SName = UCase(SName)
CalcSoundex = Left(SName, 1)
prevCode = Asc(Mid(codes, Asc(sName) - 64))
For i = 2 To Len(SName)
acode = Asc(Mid(SName, i, 1)) - 64
' we are not interested in symbols or numbers
If acode >= 1 And acode <= 26 Then
' convert the character to a digit based on soundex code
dcode = Asc(Mid(codes, acode, 1))
' ignore repeats
If dcode <> 48 And dcode <> prevCode Then
CalcSoundex = CalcSoundex & Chr(dcode)
If Len(CalcSoundex) = 4 Then Exit For
End If
prevCode = dcode
End If
Next
End Function
Listing
7.3: A Simple SOUNDEX Function in VB.NET Script.
We do not really need to understand much about the internals
of this function for this
example. It returns a code based on the consonants in a string
according to the SOUNDEX
algorithm described earlier. For now, write this function into
your script code, immediately
after the Input_ProcessInputRow function, but still within
ScriptMain() .
The next step is to reference the function in our script.
For each row coming in to the
component (in the input buffer) we will set the value of the
Soundex column to the
value returned by CalcSoundex based on the LastName column.
Just below the line Add
your code here , type: Row . That is, Row followed by a period.
As shown in Figure 7.9,
IntelliSense (within the editor) offers a list of options to
complete this object, including the
available columns (in this case LastName and Soundex ).

Figure
7.9: Using IntelliSense in the Scripting Environment.
In this case, select Soundex and complete the rest of the
statement as shown in Listing 7.4,
using IntelliSense if you like to complete the other references
in this statement:
Row.Soundex = CalcSoundex(Row.LastName)
That is all! Your script is ready to run. Close the scripting
environment. SSIS automatically
saves the script for you. Now click the OK button in the Script
Transformation Editor and
return to the Data Flow design surface. The validation icon
has now gone—your script is
valid and you are ready to complete the Data Flow for testing.
Debugging
the Script Component
In
Chapter 5, you learned how to debug the Script Task
by setting breakpoints within the
script itself. The VSA design environment does enable
you to set breakpoints in a Script
Component, but the SSIS designer and debugger ignores
them when executing a package.
There are a number of reasons for this, but it is mainly due
to the nature of the Script
Component , which (unlike the Script Task) does not call a
script only once, but many
times within an execution.
However, SSIS does provide some very elegant techniques for
debugging Data Flows
in general, and we can use these techniques to understand if
our script is performing as
expected. Later we will look at more fine-grained, row-by-row
debugging. For now, we
will see how to debug the script working against the Data Flow
as many rows pass through
it.
Debugging
a Data Flow with a RowCount Component
You might think that you could execute the SSIS package
right now, with just an OLE DB
Source and a Script Component. In fact, this is possible,
but not much will happen. If you
were to try it, you would see no activity. You could
find the explanation on the Progress
tab of the designer, where you would see the following
message:
Warning: Source "OLE DB Source Output" ()
will not be
read because none of its data ever becomes visible
outside the Data Flow Task.
SSIS has recognized that your data is not going anywhere,
so it has optimized the Data
Flow by removing the redundant components! Of course, the data
from your Script
Component (including your Soundex column) is not going anywhere,
so you have no
opportunity to see this data.
In most data-integration applications, you would send the
data to a temporary destination
in order to examine it and see if the process had transformed
the data correctly. SSIS
provides an excellent way to debug your data without having
to create temporary tables
or files.
The Row Count component simply counts the passing rows wherever
you add it to the
Data Flow. This component can be very useful for debugging.
It gives you a count of the
rows at a particular point and SSIS will not optimize it out
of the Data Flow. This enables
you to execute the Data Flow with a Row Count component at
the end of the flow, even
though you have added no destination. In other words, you can
execute your SSIS package
and debug it, even though the data is not going anywhere—it
is just counting rows!
In Chapter 5, you learned how to create SQL Server Integration
Services variables. Create
a variable now, named RowCount of Int32 type, as follows:
- Select SSIS > Variables to show the Variables window.
- Click the New Variable button to create a new variable.
This variable will
be of type Int32 by default.
- Rename it to RowCount (see Figure 7.10). You can now close
the Variables
window if you like.
Figure 7.10: Data Flow Designer and Variables Window with RowCount
Variable Added.
- Drag a Row Count Component from the Data Flow Transformations
tab
of the Toolbox to the Data Flow design surface.
- Now connect the output of the Script Component to the Row
Count
Component .
- Edit the Row Count component by double-clicking the component
shape in
the designer.
- In the Advanced Editor for Row Count , set the VariableName
property to
the name of the variable that you created: RowCount .
If you were to execute the package now, the number of rows
passing through the Row
Count component would be written to the named variable.

Now that you can execute the package and the Data Flow within
it, you can start to debug
the output of the script. You shall see whether the script
has calculated the SOUNDEX
value correctly. The best way to see the output from the Script
Component is to add a data
viewer to the Data Flow on the path between the output of the
Script Component and the
input of the Row Count Component.
- Right-click the line between path between the output of
the Script
Component and the input of the Row Count Component .
- Select Data Viewers .
- The Data Flow Path Editor will appear, with the Data Viewers
tab visible
by default (see Figure 7.11).
Figure 7.11: Data Flow Path Editor
- Click the Add … button in the Data Flow Path Editor
to show the
Configure Data Viewer dialog box (see Figure 7.12).
Figure 7.12: Configure Data Viewer Dialog
Box.
- Select the Grid type of Data Viewer, then click the Grid
tab.
- The Grid tab shows all the columns currently available
selected on the right
hand side. In fact, we only need to see the LastName and
Soundex columns
in order to debug our script. Use the arrow buttons to move
all the other rows
to the Unused Columns list on the left-hand side. In fact,
it may be easier to
move all the columns over to the left-hand side and reselect
LastName and
Soundex (see Figure 7.13).
Figure 7.13: Grid Data Viewer Configured to
Show Only Required Columns.
- Click the OK button to close the Configure Data Viewer
dialog, and click
OK again to close the Data Flow Path Editor .
Now the Data Flow is ready for debugging. The Data Flow designer
shows the Data
Viewer on the path between the Script Component and the Row
Count Component with a “
spectacles” icon (see Figure 7.14).

Figure 7.14: Completed Data Flow.
Running
a Data Flow with a Script Component
To
execute the package, and thus the Data Flow, you can
use the F5 key, or select
Debug? Start Debugging from the main menu, or right-click
the package itself in the
Solution Explorer and select Execute Package . When
the package executes, you will
notice that the Data Viewer window appears. You can
drag and dock this Data Viewer to
any size or convenient location in your designer or
on your screen.
On execution, the Data Flow task reads a buffer of data from
the OLE DB Source
component. The Script Component acts on this buffer, transforming
each row using our
function. The Row Count component should next count these rows,
but before it can do so,
the Data Viewer will pause execution of the Data Flow and the
package.
At this point, all the components in the flow will be colored
yellow, to indicate that they
are in progress and have not yet completed. When the Data Viewer
has received a buffer of
data, it displays the columns you selected in the viewer. As
shown in Figure 7.15, the grid
now shows the results of our SOUNDEX calculation for the first
buffer of data—4872
rows.

Figure 7.15: Data Flow Executed.
You can examine all the values to ensure that the function
is working correctly. In this
example they certainly appear to be correct. If necessary,
you can even copy the data to
the clipboard and paste it into other applications, such as
Microsoft Excel, for further
analysis.
To see the next buffer of data, click the green Continue button
in the top left-hand corner
of the Data Viewer. The Data Viewer will show the next buffer
of data and pause again.
Alternatively, if you have finished examining the data, you
could click the Detach button
and the flow will continue without pausing. Or, you could
just close the Data Viewer.

When the Data Flow is complete, all the components in the
Data Flow should be colored
green to show that they have all completed successfully. You
can now stop debugging,
using F5 if you like, just like any other SQL SSIS package.
Summary
As
you can see, script components are versatile. They
are easy to add to your Data Flow,
quite simple to program, and give excellent performance.
Add to that the power of visual
debugging and you can see why they are such an exciting
feature of SSIS. Mastering the
Script Component can be your key to a vast range of
data integration functionality.
Copyright © 2005
by Mann Publishing Group.
|