Development SSIS String Variables in Derived Columns
I was recently burned by a small quirk in SQL Server Integration Services (SSIS) 2008 the other day. It had to do with how SSIS handles string variables in derived columns and because I didn’t account for this, I was getting a steady stream of errors and failures that I took me forever to track down (well, about 10 minutes, but when you’re on a roll, that feels like forever). In all fairness, it was entirely my fault, but I would hope that SSIS would pop up a warning when you’re doing something this silly.
My ETL scenario
I am working on a data warehousing project, and part of the ETL process ( Extract, Transform, Load ) requires that I iterate over a few hundred directories, each containing a handful of XML files which I need to read in, parse, transform into a useful format, and send over to a SQL database. Recursing through directories to parse through XML files feels really archaic, but due to security permissions and setup of the overall system, this is unfortunately the only way to do it.
Added to the fun is that the directory containing the XML files has a unique identifier and some metadata that I need for my dimensions.
So, while this isn’t an ideal ETL setup, it’s not uncommon and SSIS is more than up to the challenge. Actually… Let me take that back… SSIS is mostly up to the challenge. What Integration Services cannot do is apply XSLTs to XML files at even a half-reasonable speed. It is SO SLOW! But, I digress, I’ll leave this XSLT issue for another post so I can fully discuss those workarounds.
Back to string variables
As I said, I need to parse out the directory and regex my way to a unique identifier and some metadata. I wrote a quick C# script task, because the Expression I would need to write would not be maintainable (there are about 7-8 different pieces of information to parse from the directory name, with different separators).
To make use of these pieces of data, I created a string variable for each of them and used them as needed. My troubles came when I tried to insert some of these variables into a SQL table using a derived column. The image on the right shows a trivial example of this setup (I removed the scripting task though).
When I ran the program, I kept running into a “A truncation error occurred on the specified object of the specified component.” error. After some troubleshooting, I discovered the issue.
When you insert a string variable into a derived column, by default, SSIS retrieves the string, takes its length, and creates a column with a Unicode string data type and sets the length. My problem was that I created several user variables with NO content (because they would be filled by the scripting task).
So, what happened?
When SSIS tried to make a new column, of Unicode string type, it grabbed the length of the variable. Unfortunately for me, the length was 0!
My simple solution
I don’t know if this is a solution, or a workaround, or best practice, or what it’s classified as. It works though and that’s what matters to me.
Note: Anyone who uses Integration Services has probably already found that there are lots of solutions to any problem. I’ll state now that this just one of many ways around this issue.
The way I took care of this issue was to use the Advanced Editor to manually set a maximum length for my derived column. Another possible solution would be to add an explicit cast in the Expression to your desired maximum length, but I found my solution a little cleaner looking, code-wise (although, a bit more hidden).
To do this, simply right-click on the Derived Column task and select ‘Show Advanced Editor’
Then, from the ‘Input and Output Properties’ tab, select your derived column from ‘Derived Column Output’ -> ‘Output Columns’ -> [Your Column Name]
Then, on the properties editor to the right, scroll down to ‘Data Type Properties’ and select ‘Length’.
Enter the maximum length of your string - which you hopefully know. If you don’t know this, I recommend examining your data (or a sample of it) to decide what the max length likely is. For the sake of prototyping, you can set the max to 255 (or any size really), but I highly recommend reducing that before going into production.
And there you go! Problem solved!
I hope this post helps anyone who has run into errors or failures in SQL Server Integration Services’ handling of zero-length string variables. Knowing this beforehand definitely would have saved me some frustration.
If it helped, please let me know. Or, if you have any other workarounds for this same problem, I’d love to hear them!