Tag Archives: Microsoft SQL Server 2008

Microsoft SQL Server 2008: Checkpoints

As you learn about Integration Services, you’ll be able to tackle larger and more complex ETL projects, with dozens or even hundreds of tasks moving data among many data stores and performing multiple transformations on it along the way. You may also have individual tasks that take hours to run because of the volume of data they have to move and process, or because of slow resources such as network speeds.

A package might fail after many tasks completed successfully, or after the one hours-long task completes. If you have to re-run the entire package after fixing the problem, you’ll again have to patiently wait for hours while earlier tasks duplicate their work before you even get to the point where the package failed on the first run. That can be a painful experience, and the local database administrator will likely not be pleased that you’re taking up so many resources for so long, repeatedly.

To get around these kinds of problems, Integration Services packages are restartable using a feature called checkpoints. When you implement checkpoints, the package creates a checkpoint file that tracks the execution of the package. As each task completes, Integration Services writes state information to the file, including the current values of variables that are in
scope. If the package completes without errors, Integration Services deletes the checkpoint file. If the package fails, the file contains complete information about which tasks completed and which failed, as well as a reference to where the error occurred. After you fix the error, you execute the package again and the package restarts at the point of failure—not at the beginning—with the same state it had at failure. Checkpoints are an incredibly useful feature, especially in long-running packages.

Checkpoints are not enabled on a package by default. You have to set three package-level properties to configure checkpoints:

  • CheckpointFilename: Specifies the name and location of the checkpoint file name. You must set this property, but the name can be any valid Windows filename, with any extension.
  • CheckpointUsage: Determines how the package uses the checkpoint file while the package executes. It has three settings:
    • Always: The package will always use the checkpoint file and will fail if the file does not exist.
    • IfExists: The package will use the checkpoint file if it exists to restart the package at the previous point of failure. Otherwise, execution begins at the first Control Flow task. This is the usual setting for using checkpoints.
    • Never: The package will not use the checkpoint file even if it exists. This means that the package will never restart, and will only execute from the beginning.
  • SaveCheckpoints: Specifies whether the package should write checkpoints to the file.

This combination of properties provides flexibility in configuring checkpoints for the package, then turning its use on and off before execution without losing the checkpoint configuration.

In order for checkpoints to work, a task failure has to cause the package to fail. Otherwise, the package will continue executing beyond the failure, recording more data in the checkpoint file for subsequent tasks. So you must also set the
FailPackageOnFailure property to true for each task where you want to make it possible to restart the package using a checkpoint. If it is set to false for a task and the task fails, Integration Services doesn’t write any data to the
checkpoint file. Because the checkpoint data is incomplete, the next time you execute the package it will start from the beginning.

TIP: Checkpoints only record data for Control Flow tasks. This includes a Data Flow task, but it does not save checkpoint data for individual steps in a Data Flow. Therefore a package can restart at a Data Flow task, but not within a Data Flow itself. In other words, you cannot restart a package using a checkpoint to execute only part of a Data Flow, just the entire Data Flow.

At the start of the package, Integration Services checks for the existence of the checkpoint file. If the file exists, Integration Services scans the contents of the checkpoint file to determine the starting point in the package. Integration
Services writes to the checkpoint file while the package executes. The contents of the checkpoint file are stored as XML and include the following information:

  • Package ID: A GUID stamped onto the file at the beginning of the execution phase.
  • Execution Results: A log of each task that executes successfully in order of execution. Based on these results, Integration Services knows where to begin executing the package the next time.
  • Variable Values: Integration Services saves package variables’ values in the checkpoint file. When execution begins again, the checkpoint file’s variable values are read from the checkpoint file and then set on the package.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2008 Integration Services course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

Transaction Support in Integration Services

A transaction is a core concept of relational database systems. It is one of the major mechanisms through which a database server protects the integrity of data, by making sure that the data remains internally consistent. Within a transaction, if any part fails you can have the entire set of operations within the transaction roll back, so that no changes are persisted to the database. SQL Server has always had rich support for transactions, and Integration Services hooks into that support.

A key concept in relational database transactions is the ACID test. To ensure predictable behavior, all transactions must possess the basic ACID test, which means:

  • Atomic: A transaction must work as a unit, which is either fully committed or fully abandoned when complete.
  • Consistent: All data must be in a consistent state when the transaction is complete. All data integrity rules must be enforced and all internal storage mechanisms must be correct when the transaction is complete.
  • Isolated: All transactions must be independent of the data operation of other concurrent transactions. Concurrent transactions can only see data before other operations are complete or after other transactions are complete.
  • Durable: After the transaction is complete, the effects are permanent even in the event of system failure

Integration Services ensures reliable creation, updating, and insertion of rows through the use of ACID transactions. For example, if an error occurs in a package that uses transactions, the transaction rolls back the data that was previously inserted or updated, thereby keeping database integrity. This eliminates orphaned rows and restores updated data to its previous value to ensure that the data remains consistent. No partial success or failure exists when tasks in a package have transactions enabled. They fail or succeed together.

Tasks can use the parent container’s transaction isolation or create their own. The properties that are required to enable transactions are as follows:

  • TransactionOption: Set this property of a task or container to enable transactions. The options are:
    • Required: The task or container enrolls in the transaction of the parent container if one exists; otherwise it creates a new transaction for its own use.
    • Supported: The task uses a parent’s transaction, if one is available. This is the default setting.
    • Not Supported: The task does not support and will not use a transaction even if the parent is using one.
  • IsolationLevel: This property determines the safety level, using the same scheme you can use in a SQL Server stored procedure. The options are:
    • Serializable: The most restrictive isolation level of all. It ensures that if a query is reissued inside the same transaction, existing rows won’t look any different and new rows won’t suddenly appear. It employs a range of locks that prevents edits or insertions until the transaction is completed.
    • Read Committed: Ensures that shared locks are issued when data is being read and prevents “dirty reads.” A dirty read consists of data that is in the process of being edited, but has not been committed or rolled back. However, you can change data before the end of the transaction, resulting in nonrepeatable reads (also known as phantom data).
    • Read Uncommitted: The least restrictive isolation level, which is the opposite of READ COMMITTED, allows “dirty reads” of the data. Ignores locks that other operations may have issued and does not create any locks of its own. This is called “dirty read” because underlying data may change within the transaction and this query would not be aware of it.
    • Snapshot: Reads data as it was when the transaction started, ignoring any changes since then. As a result, it doesn’t represent the current state of the data, but it represents a consistent state of the database as of the beginning of the transaction.
    • Repeatable Read: Prevents others from updating data until a transaction is completed, but does not prevent others from inserting new rows. The inserted rows are known as phantom rows, because they are not visible to a transaction that was started prior to their insertion. This is the minimum level of isolation required to prevent lost updates, which occur when two separate transactions select a row and then update it based on the selected data. The second update would be lost since the criteria for update would no longer match.

Integration Services supports two types of transactions. The first is Distributed Transaction Coordinator (DTC) transactions, which lets you include multiple resources in the transaction. For example, you might have a single transaction that involves data in a SQL Server database, an Oracle database, and an Access database. This type of transaction can span connections, tasks, and packages. The down side is that it requires the DTC service to be running and tends to be very slow.
The other type of transaction is a Native transaction, which uses SQL Server’s built-in support for transactions within its own databases. This uses a single connection to a database and T-SQL commands to manage the transaction. Integration Services supports a great deal of flexibility with transactions.

Integration Services supports a great deal of flexibility with transactions. It supports a variety of scenarios, such as a single transaction within a package, multiple independent transactions in a single package, transactions that span packages, and others. You’ll be hard pressed to find a scenario that you can’t implement with a bit of careful thought using Integration Services transactions.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2008 Integration Services course written by expert Don Kiely.

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.