Update: I received some feedback from Phil Brammer that it might be a good idea to provide some recommendations at the end of the post on what you can do as you work with the SSIS Catalog. I think that was good advice, so I’ve updated the post to reflect my recommendations for you, and a couple of recommendations for Microsoft. Enjoy, and thanks Phil!
Microsoft’s Integration Services Management Offering
Photo by Chris Gladis
With the release of SQL Server 2012, Microsoft has added some nice features. Recently I’ve spent time with one of those, the SQL Server Integration Services (SSIS) Catalog and its supporting database. For those of you unfamiliar with it, the SSIS catalog is the new model for logging in SSIS as well as a new deployment model.
If you are new to the SSIS Catalog and want a good overview of how it works, hit up this post by Jamie Thomson (blog|twitter) over at SQLBlog.com. Jamie walks you through the key concepts and how to access the data stored in the SSIS Catalog database. Today’s post gives you some highlights of the feature, as well as points out some issues we’ve seen working with the catalog.
Now With Logging Baked Inside
In ETL, the more logging you have, the easier it is to troubleshoot issues. Prior to 2012, you could accomplish this in a couple of ways. You could either include logging tasks within your control flow, or configure the event handlers for existing tasks within your packages. In either case, the logging mechanism of choice would write out to custom logging tables in your database1.
With the new catalog, SSIS tracks all events during the execution of a package and writes them out to the catalog database. This means that you don’t have to customize each database and package in order to support logging. Logging is now baked into Integration Services by default.
This idea bears repeating. SSIS logging to the catalog means you don’t have the overhead and complexity of maintaining logging code across your entire enterprise deployment. Integration Services is doing it for you. While you may only have one or two packages, larger organizations often have hundreds of packages to maintain. This feature saves a lot of time.
The SSIS Catalog also provides configuration of the logging levels. You can choose between Verbose, Performance, Basic, and None. Each level provides different levels of detail in your logs. Just be aware that your choice can affect package execution performance. Fortunately, you can configure the level at run time if you choose. This means you can set your packages to Basic as the default and run Performance or Verbose when you need to troubleshoot issues.
Although It May Be Undercooked
Of course the first version of any feature is going to have some issues and our first is performance. Working with the catalog, some members of the SQL Server community noticed slow query performance and deadlocking issues. Ultimately, this led Phil Brammer (blog|twitter) to post recommendations for optimizing the SSIS Catalog by adding neglected indexes. On our team, we implemented Phil’s recommendations to great effect. However, I did extend them to more fully cover some of our queries. As a disclaimer, modifying the catalog database may put you into an unsupported configuration.
Even with the additional indexes, you need to know that querying the data may not be as fast as you’d like. We had to be careful querying from certain tables to avoid excessive query times, or worse blocking executions from completing. One of our key findings was to make sure we weren’t locking on these tables at all in our queries. We did this by adding nolock hints, but it is possible that changing to a Read Committed Snapshot Isolation level could help as well, we just haven’t tested that out yet.
Also note, the SSIS Catalog retains data for a year by default. This setting could cause problems if you do a lot of processing. Not only could the size of your catalog database grow faster than you expect, but your queries will get progressively slower as the data increases. At present, we have queries that return quickly for small sets of packages but never return past a certain threshold. You should work out the best retention policy for your organization and set the configuration value appropriately. You may also want to replicate data out into a separate reporting database for trending analysis.
Finally, the built in reports are very basic in nature. SSIS includes some canned reports showing package executions, runtime settings, and event messages from each package execution. Unfortunately, these reports aren’t really responsive, and the layout often makes it hard to interpret the data. To answer this, Jamie Thomson created a reporting pack that makes the information much easier to consume. It also includes executions over time, which is nice since that is missing from the reports included with the catalog.
Don’t Throw It Out Just Yet!
The SSIS Catalog is a much needed feature for the management of large SSIS deployments. Embedded logging, without having to create your own for each package, is a huge benefit. Unfortunately, there are performance issues with this first version of the feature. To deal with those issues, you can follow these recommendations:
- Be concious of pulling data from the catalog views as efficiently as possible in your queries. This means filtering down your sets as quickly as possible and making sure you are only pulling data you really need.
- Perform any analysis of the data in a separate workspace. You could use temporary tables or a separate database, but you shouldn’t be performing the analysis directly on the catalog views. It will be slow for you and can cause executions to be blocked.
- Index the catalog tables for performance where necessary. Phil’s post on the catalog indexes is excellent and will work for most cases. Just remember that implementing them may put you in an unsupported situation.
- Use query hints or database settings to minimize the locks placed on the catalog views while querying. In our situation we used nolock hints to make sure we were interfering with the catalog as little as possible. You may also be able to use Read Committed Snapshot Isolation, although that would take a lot of thorough testing.
- Finally, you should define a retention policy for the data in the catalog that is tailored to your organizations needs. Most of us probably don’t need to keep 365 days of executions in the database to get the information we need. If you do, consider archiving off that data on a regular basis into a separate database to reduce the load on the catalog database.
These recommendations will help make the current version usable. There are a couple of things that I would like to see Microsoft respond to as possible upgrades in the next version:
- While the execution_path is extremely cool, at times it is unwieldy to work with. The datatype is large as an nvarchar(max) and requires string manipulation to follow the descendant components in the execution tree. I would be interested to see if this information couldn’t be stored using a more suitable datatype such as hierarchyID. That may help walk the execution tree more efficient while also helping when joining to other tables within the catalog database. I am not sure if this is even feasible, but I think it’s worth looking into.
- The idea of using Read Committed Snapshot Isolation appeals to me. It would mean we could query the catalog views without fear of adversely affecting package execution. That being said, it would require some very thorough testing to make sure that the change in isolation levels didn’t affect processing itself. I would like to hear whether the SSIS development team have considered this change and whether it is even a viable option. If they are planning on testing it, or already have, I would also love to see them publish their results.
Ultimately, your organization must determine whether the benefits of embedded logging outweigh the required workarounds for performance. After spending time with the catalog over the last few weeks, we have decided use both custom logging and the SSIS catalog for the time being. We feel the need to do more testing before committing to transition completely over. Hopefully we’ll have an answer before Microsoft releases the next version.
Remember, if you like the posts you find here, you can always subscribe to our RSS or via email subscription.
1. Again, Jamie has a really good post on implementing event handlers for logging, if you’re interested. As a side note, if you’re doing SSIS work and not following Jamie’s blog, you should be.