Tag Archives: SQL Server 2012

Isolation Levels and Transactions – Microsoft SQL

An isolation level defines how much a transaction is isolated from changes made by other, concurrent transactions. The current isolation level setting determines the side effects that can occur due to uncommitted changes that other transactions make. SQL Server supports all four of the isolation levels defined in the SQL ISO standard as well as two others related to row versioning:

  • The default READ COMMITTED isolation level is in effect with autocommit transactions. This ensures that every transaction meets the requirements of the ACID test without requiring extra code. READ COMMITTED prevents concurrent transactions from reading uncommitted data or interfering with the transaction until it is complete. This level results on locks on resources that could affect concurrency and the volume of transactions that a server can handle.
  • The READ UNCOMMITTED isolation level lets other transactions read changed data that has not yet been committed by other concurrent transactions, called dirty reads. This isolation level does not issue as many locks as READ COMMITTED, so has a lighter touch on resources and allows higher transaction rates.
  • The REAPEATABLE READ isolation level is similar to READ COMMITTED in that the transaction cannot read changed data not yet committed from other transactions. But it goes further: this isolation level prevents other transactions from modifying data read by the current transaction until it completes. The benefit is that a transaction at this isolation level can repeatedly read the data and get the same results, even if there are other transactions pending.
  • The SERIALIZABLE isolation level is similar to the REPEATABLE READ isolation level, but takes it up a notch. The SERIALIZABLE isolation level prevents both updates and inserts. It ensures that if a query is reissued inside the same transaction, existing rows won’t change and new rows won’t suddenly appear. SERIALIZABLE employs a range of locks that prevents edits, deletions, or insertions until the transaction is complete.
  • The SNAPSHOT isolation level essentially saves a snapshot of data at the start of the transaction to ensure that any data read during the life of the transaction will be the same as at the start of the transaction, no matter what changes have been made to the data in the meantime by other transactions (the current transaction will see its own data
    changes). This isolation level generally doesn’t require locks on data until the current transaction attempts to update data.
  • The READ COMMITTED SNAPSHOT isolation level is similar to SNAPSHOT except that instead of providing a snapshot of the data when the transaction started, READ COMMITTED SNAPSHOT provides one when the statement started. The behavior of this isolation level is similar to READ COMMITTED except that it doesn’t acquire shared locks and changes aren’t blocked when resources are exclusively locked.

NOTE Isolation levels are described in the “Isolation Levels in the Database Engine” topic in SQL Server Books Online (BOL). As of this writing, Microsoft has not updated this topic for SQL Server 2012, but the SQL Server 2008 version is still valid. This is the case for some of the other BOL references included in this chapter.

You can set the isolation level of the current session using the SET TRANSACTION ISOLATION LEVEL <isolation level name> statement, as well as with a table hint for a specific query using the WITH clause.

Isolation levels can affect what data your code reads when there are other concurrent transactions executing. The choice of isolation level you make can significantly impact the amount of activity an instance of SQL Server can support, so if you have performance issues on a heavily used database you might want to tweak the isolation levels used by your code.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

User-Defined Function Overview – Microsoft SQL

User-defined functions (UDFs) were introduced in SQL Server 2000 to provide the coding functionality similar to what is available in general-purpose programming languages. UDFs provide a level of flexibility and reusability that would be impossible to duplicate by using only stored procedures, triggers, and views. These functions, like those in other programming languages, provide reusable code modules that you can invoke with parameters, perform some kind of calculations or actions, and return various
kinds of data.

SQL Server supports three kinds of user-defined functions, each with its own specific features and benefits:

  • Scalar functions are what most developers think of as user-defined functions, because they are most like the functions available in other programming languages and the built-in Transact-SQL (T-SQL) functions. A scalar function can take parameters, contain complex processing, and return a single value of almost any SQL data type.
  • Inline table-valued functions are similar to views in that they each consist of a single SELECT statement and each returns a table object. However, inline table-valued functions can take parameters, whereas views cannot. Like views, you can use inline table-valued functions in statements that update, insert, or delete data, and you can join to them. An inline table-valued function is essentially a view with parameters.
  • Multi-statement table-valued functions are similar to stored procedures, but they return an in-memory table object (a set of rows and columns that you can select from) instead of a static result set. This makes them a powerful view/stored procedure hybrid, because you can perform complex processing in them as in a stored procedure and select data from them as in a view. However, the tables that these functions return are not updatable.

TIP: Another way to categorize the types of functions available in SQL Server is
as scalar and table-valued, based on the type of data returned. With that
categorization, there are then two types of table-valued functions.

UDFs are not allowed to have side effects, which is a bit of a limitation on their use. This means that they are not allowed to make any lasting changes in the database, such as to make changes to data or to alter the schema of any object. But there are more subtle actions that cause changes and therefore are not allowed in UDFs. For example, the RAND function, which generates random numbers, retains state information about the last random number it generated, which changes the state of the function and requires SQL Server to store that state somewhere.

User-defined functions combine some of the best elements of stored procedures and views. Many tasks that you previously performed by using views and stored procedures you can now handle better with UDFs. You can invoke these functions from views, stored procedures, other user-defined functions, batches, and scripts, as well as from external applications.

NOTE:  SQL Server 2000 did not permit nondeterministic functions in the body of a user-defined function. A deterministic function, like LEN, always returns the same result anytime it is called with the same argument values. A nondeterministic function, such as GETDATE(), always returns a different value on each invocation. However, in SQL Server 2005 and later, things have loosened up: GETDATE() is now permissible in UDFs.

Common Language Runtime UDFs

You can create each type of UDF in SQLCLR code as well as Transact-SQL. Transact-SQL is recommended for functions that rely on set-related data manipulation and SQLCLR functions are recommended for functions that require intensive computations or string manipulations, neither of which T-SQL is well suited for. In some cases, the .NET Framework contains base classes that supply functionality that would be difficult to duplicate in Transact-SQL, such as string processing that uses the RegEx class. However, if your functions involve fairly straightforward data manipulation, Transact-SQL is the best choice. This chapter will focus on T-SQL UDFs.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer course written by expert Don Kiely.

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

Types of Spatial Data

SQL Server includes two spatial data types—Geometry and Geography—that encapsulate the database’s support for spatial data. Each data type includes pretty much the same point, line, shape, and collection objects, as well as rich sets of methods and properties for working with these shapes. Figure below shows the hierarchy of available objects. The way this works is that you define a field or variable as either the Geometry or Geography data type, then assign to that field or variable data that describes the point, line, shape, or collection you want to store. The yellow/lighter objects shown in the figure are abstract objects, while the blue/darker objects are concrete objects that you can store in the spatial data type. The only difference between the Geometry and Geography object hierarchies is that in Geography there is a FullGlobe object that represents the complete surface of the earth. This is shown in Figure below in a lighter color and dashed line because it is not part of the Geometry object hierarchy.
TypeofSpatialimg1

The shapes defined by the various spatial objects are vector objects, which are collections of points, lines, curves, polygons, and compound curves. You’re likely to use the following spatial objects most often to store and manipulate within SQL Server:

  • Point: Identified by a pair of coordinates as an exact location as X and Y coordinates, but can also include Z (elevation) and M (measure). SQL Server doesn’t use the Z and M coordinates, but lets you store the data so that client applications can use them.
  • LineString: A path along a sequence of points. It is a one-dimensional shape: it has length but no area, even if multiple connected LineStrings look like a closed shape. The LineString is simple if it doesn’t cross itself, or a ring if the starting point is the same as the ending point.
  • CircularString: Similar to a LineString, but defines a curve as a portion of a circle instead of a straight line.
  • Polygon: A closed 2-dimensional shape; a ring. A polygon has both a length property and an area. It can have holes in its interior, the shapes of which are defined by other, smaller polygons. The area of a hole is excluded from the area of the outer polygon.

You can also create collections of these basic shapes, including a MultiPoint, MultiLineString, and MultiPolygon, each of which can contain only the specified shapes. If you need a collection of any type of shape, including the multi-shapes, you can use a GeomCollection. This makes it easy to work with groups of shapes as a single unit.

The Geography and Geometry data types are .NET classes implemented as system SQLCLR objects with several dozens of methods. Many of the methods, about two-thirds, have names that start with the prefix ST, such as STArea, STLength, STDistance, and STGeomFromText. These ST methods are implementations of Open Geophysical Consortium standard methods (the ST stands for spatiotemporal). These methods provide the most commonly used, basic features for working with spatial data.

Microsoft also added some custom extension methods to the objects to support features not part of the OGC standards. These method names don’t have the ST prefix, such as Parse, ToString, and Reduce.

The nice thing about the Geography and Geometry data types is that they share a lot of methods that work the same across the two types. The main difference is that with the Geography type you’ll use latitude and longitude coordinates but with the Geometry type you’ll generally use x and y coordinates.
ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

The HierarchyID Data Type

As the name suggests, relational databases are best at storing and retrieving relational data. That’s the kind of data that fits neatly into multiple tables, each consisting of rows and columns of relatively homogeneous data, all related through mechanisms such as foreign keys. Most of the material in this course covers relational data, since SQL Server is a relational database system.

But the reality is that sometimes you have chunks of data that don’t fit neatly into the relational model. One example of commonly used non-relational data is hierarchical data. This is data that defines some kind of hierarchy, such as an organizational chart or part subassemblies to build a complex object like an automobile. If the structure is rigidly set with a limited number of levels, you can do a decent job of representing the hierarchy using a table for each level, or other techniques. But it can be very difficult to represent the hierarchy if the structure needs to be flexible.

You can always store hierarchical data outside of SQL Server, but over the years, people have come up with various ways of adapting hierarchical data to fit the relational model. Some of these include:

  •  Table self-joins: In a table self-join, you use a single table to contain the hierarchical data. The table has a foreign key to itself—thus the name “self-join”—to reference another row in the table. A good example of this is the Northwind Employees table, which has a ReportsTo foreign key field to identify the person to whom an employee reports. You can use that field to construct the Northwind sales organizational chart.
  • XML: XML data is inherently hierarchical, consisting of elements that entirely contain one or more child elements in a structured hierarchy. SQL Server 2005 introduced XML as a native data type, and Microsoft integrated it well with relational data. XML is handy because you have a great deal of flexibility in how you define the hierarchy. But you have to learn specialized methods to work with the data, and searching and sorting on the data can be problematic.
  • Roll your own: SQL Server 2005 also introduced the capability to create your own custom data types using the .NET Common Language Runtime (CLR). Many people took advantage of this feature to create their own hierarchical data type, with all the features to support their needs. SQLCLR types support properties and methods, so the type could implement behaviors as well as custom data structures to support hierarchical data. Such custom types were far from trivial to implement, and it took a lot of work to get it right.

Because developers and administrators frequently need to store and work with hierarchical data in SQL Server, Microsoft introduced the HierarchyID data type in SQL Server 2008. This is a compact object you can use to manage hierarchical data using materialized paths. It is compact in that it uses very little storage, and it materializes paths and manages the hierarchy for you. It is a System CLR type, which means that Microsoft implements it via .NET code.

Because it is a system type, you don’t have to enable custom SQLCLR code in a database before using the HierarchyID data type, as you must for your own custom SQLCLR code.

You can use the HierarchyID type to represent organization charts, map a file directory system on your hard drive, create an assembly part chart with subassemblies, represent project tasks, and for many other uses. The type has rich support with methods that provide many ways to maintain the hierarchy, such as making arbitrary insertions and deletions, as well as relocating sections of the hierarchy to other places.

There is no magic to the HierarchyID. Although SQL Server gives you everything you need to maintain the integrity of the hierarchy, it doesn’t automatically enforce hierarchical relationships. You have to do that in your code, using the tools that are part of the data type. The HierarchyID type also does not guarantee uniqueness, but you can enforce uniqueness in other ways, such as with a unique constraint on a HierarchyID field in a table.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

Advanced Query Techniques

Thumbnail for 578

The T-SQL INSERT, UPDATE, and DELETE statements provide a lot of power and flexibility for maintaining data in a relational database. But they can also be unwieldy. Say that you have a table with rows that you need to use to update another table. Depending on the data in the source table, you might have to update or delete an existing row or, if no related row exists, insert it into the table. This can be unwieldy, requiring several statements to perform the various actions. And it can be a pain to get all the comparisons working just right.

The MERGE statement addresses this issue. It lets you insert, update, and delete table data with just a single statement, based on the similarities and differences in the data. It joins a data source table with a target table or view to perform the various actions. The operation it performs depends on the results of the join:

  • If data exists in both the source and the target, update the changed data in the target.
  • If data in the source doesn’t exist in the target, insert data into the target.
  • If data exists in the target but not in the source, delete the data from the target.

You have full control over how you define whether and how each of these actions will occur for a particular row in the source table. The MERGE statement is transactional, so you don’t need to explicitly define a transaction.
If any of the operations fails, the entire statement rolls back.

  • The MERGE statement uses five clauses:
  • The MERGE clause specifies the target data, such as a table, view, or other query.
  • The USING clause defines the source data.
  • The ON clause specifies how to join the tables, much like a regular JOIN clause.
  • One or more WHEN clauses define the actions to take based on the join matching.
  • The OUTPUT clause returns a row for each action taken.

The WHEN clause is where all the actions occur, and you’re likely to have more than one in a typical MERGE statement. The WHEN clause has three forms:

  •  WHEN MATCHED [AND condition]

Here is where you either update or delete an existing row in the target
data. You can have at most two of these clauses, and if you have two,
one must have a condition associated with it. You can define either an
UPDATE or DELETE statement in this clause. If you’re using an
UPDATE statement and more than one row in the target matches the
statement, you get an error, and you can’t update any row in the target
more than once. Similarly, you can’t update and delete the same row.

  •  WHEN NOT MATCHED [BY TARGET] [AND condition]

This is where you can use an INSERT statement, when a row in the source doesn’t match any row in the target. You can have only one of this WHEN form in any MERGE statement. The BY TARGET clause is optional, because it is the default.

  •  WHEN NOT MATCHED BY SOURCE [AND condition]

This form of the WHEN clause kicks in when rows in the target don’t match a row in the source; in this case you can either update or delete the row in the target. You can have at most two of these in a MERGE statement, and the second must have a condition associated with it. 

The optional condition on some of the versions of the WHEN clause can be  just about anything you can include in a WHERE clause. You have to include  at least one WHEN clause in the MERGE statement, although the order you include the different forms doesn’t matter at all. What does matter is that if you include the same form with and without a condition, the clause with the condition must be first.

Any triggers that exist on the affected target table will fire normally, but there is no guarantee of the order in which triggers fire if multiple operations are performed on the target table. So be careful of side effects. All normal permissions apply, so the user or security context must be able to perform the various actions.
ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

LINQ to SQL Read-Only Queries

If you are looking for maximum performance, you can execute your queries in read-only mode. To do that, you can set the ObjectTrackingEnabled property of the DataContext to false. This turns off change tracking, which has some overhead. Because LINQ to SQL does not track changes in this mode, performance improves.

When you select Read Only Queries in the chapter’s sample application, you
will execute two queries. The first retrieves a list of customers in the United
States.

The second query illustrates an important consequence of setting
ObjectTrackingEnabled to false: It sets DeferredLoadingEnabled to false and
therefore turns off deferred loading.

Figure below shows the result of running this code.
TopicLINQtoSQLimg1
Figure above Deferred loading is disabled, so the number of orders does not display.

The code first executes a query to retrieve information for a customer. It then displays the customer’s name, city, and region. Next, the code displays the number of orders for this customer. In the previous example, LINQ to SQL then generated a SQL statement to retrieve the customer’s orders. However, when you turn off change tracking you turn off deferred loading, so in this case, LINQ to SQL does not send the SQL statement to retrieve the customer’s orders.

The code next defines the following query to retrieve the ID and value for each order. However, customer.Orders is empty, so LINQ to SQL does not generate the SQL statement to retrieve the data.

 

TIP: Use read-only queries with caution. The absence of deferred loading means you will not retrieve all of the data you would otherwise. If your code relies on the missing data, you could receive runtime errors.

ldn-expertkgetzThis post is an excerpt from the online courseware for our Microsoft LINQ Using Visual C# 2010 course written by expert Ken Getz.

Ken Getz is a Visual Studio expert with over 25 years of experience as a successful developer and consultant. He is a nationally recognized author and speaker, as well as a featured instructor for LearnNowOnline.

LINQ and Relational Data

linq-logoAt its most basic level, LINQ provides the ability to query any data source that supports the IEnumerable or generic IEnumerable(T) interface. The data you want to query in an application can come from a variety of sources. The data
may reside in in-memory objects. If so, you can use LINQ to Objects. The data may reside in XML. If so, you can use LINQ to XML.

The primary source of data in most applications is a relational database, such as SQL Server. If you have built database applications with Visual Studio, you are familiar with using ADO.NET and building SQL statements to query and modify data in a relational database.

In this chapter, you will see how to use LINQ to SQL to query and modify data in a SQL Server database. You may find yourself thinking that ADO.NET works great, so why do I need another way to work with relational data? What advantage does LINQ to SQL offer me?

LINQ provides a consistent model for querying all types of data. With LINQ, a query that works with objects looks similar to a query that works with XML. It also looks similar to a query that works with relational data. If you know how to use LINQ to Objects to write a query, you already know most of what you need to write a query by using LINQ to SQL.

LINQ to SQL is part of the ADO.NET family of technologies. It abstracts away the various ADO.NET classes such as SqlConnection, SqlCommand, SqlDataAdapter, SqlDataReader, etc. You get the same functionality by writing less code. You can also easily mix LINQ to SQL with existing ADO.NET code.

LINQ to SQL also bridges the gap between object programming and traditional database programming. As a .NET developer, you build applications based on objects. The .NET Framework consists of classes. To use the functionality that a class provides, you create an instance of the class and then use its properties, methods, and events.

To query a database, you build a SQL statement as a string. You can then use ADO.NET classes to send the query to the database. You can also store the results by using the ADO.NET DataSet and DataTable classes. Is this object programming? Yes and no. The use of ADO.NET is object programming; however, the data model is based on tables and rows, not on objects.

To model the data as objects, you can create classes for each table in the database. For example, you could create a Customer class with properties for company name, address, city, region, and so on. When you query the Customer table, you store the results in one or more instances of the Customer class.

LINQ to SQL provides a runtime infrastructure for managing relational data as objects. To use LINQ to SQL, you map the structure of a relational database to an object model. Each table in the database maps to a class. This class is an entity class. At runtime, the .NET Framework translates LINQ queries into SQL and sends them to the database. When the database returns results, the runtime populates the objects with the returned data.

Once you create your object model, you can use LINQ to SQL to query and change data in a SQL Server database.

Note: Microsoft built LINQ to SQL to support any relational database.
However, the implementation of LINQ to SQL that ships with the
.NET Framework 3.5 and 4.0 support only SQL Server. It is
possible that Microsoft will support other databases in future
releases.

 

ldn-expertkgetzThis post is an excerpt from the online courseware for ourWindows 8 Using XAML: Views, Resources, and Toastscourse written by expert Ken Getz.

Ken Getz is a Visual Studio expert with over 25 years of experience as a successful developer and consultant. He is a nationally recognized author and speaker, as well as a featured instructor for LearnNowOnline.

Roots of SQL Server Data Tools

Thumbnail for 580Microsoft has long supported application database development with features built into Visual Studio, ranging from support for data access to the ability to create a connection to a database and manipulate database objects and data in simple ways. Various add-ons over the years have expanded those basic abilities, but even developers were constantly having to bounce between Visual Studio and either Enterprise Manager (in earlier versions of SQL Server) or Management Studio (in more recent versions, including SQL Server 2012) to perform data-related tasks, such as to create database tables, set security features, or execute stored procedures. Neither Visual Studio nor the SQL Server tools fully supported database development.

Then several years ago, Microsoft released its first version of the Team System Edition of Visual Studio 2008. This was a huge package of group software collaboration tools, including a long overdue successor to its Visual Source Safe source code control product. Over time, Team System spawned Architecture, Testing, and Development editions. But for a long time, one piece was missing: a database edition.

Microsoft soon rectified that, and released the product code named Data Dude. Officially, it is Visual Studio 2008 Team System Database Edition. Definitely a case where the code name was way cooler than the official name!

Data Dude, er, Database Edition, was a set of tools for managing the full database development lifecycle. Software developers were intimately familiar with the process, since it mirrors the code, build, deploy, rinse, repeat cycle common in software development. (And it can handle agile development methodologies as well.) Database administrators had to get used to some things, but could most certainly be part of the process.

Microsoft had several important high-level goals for the product:

  • Provide integrated change management, just like source control systems. This lets you go back in time to earlier versions of the database project and also provides controlled access to the project components to various developers for collaborative, simultaneous development.
  •  Manage database projects by allowing you to monitor progress and perform code reviews.
  •  Find and understand differences between versions. This way you can easily find the differences that cause problems in order to resolve them quickly.
  •  Make changes and see their effectiveness, without necessarily needing to build an entire project.
  • Support isolated development environments, so that each developer can work on their part of the project and test the changes before checking them in and distributing them to the rest of the team.
  • Test solutions to validate the design and code. You can use Team System’s support for unit tests to make sure that the design, code, and data doesn’t introduce any breaking changes.
  • Simplify deployment, of both the entire database and changes. This supports a variety of deployment scenarios during development and after the database goes into production, including finding the changes made to a test version of the database relative to a production version of the database.
  • Facilitate collaborative development of databases, so that team
    members can work together productively.

Database Edition was a huge step forward to make the process of creating database projects similar to creating application projects, particularly with its collaboration tools, support for source control, and test integration. But even with this step, things just weren’t as deeply integrated as they could be, in three primary ways:

Developer Edition wasn’t fully integrated with the database server. It enabled developers to perform more database-specific actions than earlier tools, but you still had to pop over to Management Studio too often.

  • Database developers had one tool to create database objects, and another tool—the Business Intelligence Development Studio
  • (BIDS)—for BI projects, adding a significant third tool to the mix in addition to Visual Studio and Management Studio.
  • Developer Edition made many simple things very easy, but once you went past simple stuff things got very hard. This is a rather subjectiv assessment, we realize, but there just seemed to be too many work arounds for what should have been easy.

The Database Edition was a great set of tools, particularly relative to what came before it. But with SQL Server 2012, Microsoft introduced what has so far proven to be the best set of tools yet, truly integrating database development tools into Visual Studio 2010 and 2012.

ldn-expertdkielyThis post is an excerpt from the online courseware for our SQL Server 2012: Advanced SQL CLR course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

SQL 2012 Developer: Aggregates

Aggregates

One of the more exciting types of SQLCLR code modules is custom aggregates. Earlier versions of SQL Server didn’t offer an easy way to extend the rather short list of T-SQL aggregates, such as SUM, AVG, MAX, MIN, and COUNT. You still can’t use T-SQL to create custom aggregates, but you can write your own in SQLCLR. It’s not something you’ll need to do often, but it can greatly simplify application coding in situations where you need it. 

Unlike other SQLCLR code modules, you can define only a single custom aggregate within a class. The class must be public and requires four non-static methods, which SQL Server calls as it processes the data to aggregate:

  • Init initializes storage values. SQL Server calls this once at the beginning of the query that uses the aggregate.
  • Accumulate does the work of combining values. SQL Server calls it once for each row that is aggregated, and it must support multiple instances of the aggregation code that is executing so that SQL Server can utilize multiple threads to make operations more efficient.
  • Merge is called at the end of the operation to merge the aggregations of multiple instances into one final value.
  • Terminate is the last method called and returns the final aggregated value.

You also have to deal with serialization issues because instances of the aggregation class have to be moved between threads. This complicates the code only a little, since the .NET Framework has rich support for serialization. Most commonly, you’ll implement the IBinarySerialize interface to let .NET do all the serialization work, but there are other options if you have a special need.

Consider a sample aggregation. A common requirement in applications is to create a concatenation of strings within a table, delimited by a single character such as a semicolon. Some applications require information in this format for processing. You’ll create a Concatenate aggregation that you can use to concatenate all string values in a field. For example, once you create the aggregation, you could run the following T-SQL code to produce a single string with all product names from the Northwind Products table that have a category ID of 4, which is Dairy Products.

This code produces the following string (with line breaks added to fit on the printed page):

The first part of the SQLCLR code defines the aggregation class and decorates it with Serializable and SqlUserDefinedAggregate attributes. The SqlUserDefinedAggregate attribute needs the following parameters, most of which the query processor uses to perform the aggregation properly.

  • Format specifies the serialization format, either Format.Native or Format.UserDefined. You’ll nearly always need to use UserDefined, although SQL Server has enhanced support for the Native format.
  • IsInvariantToDuplicates indicates whether the aggregation result is the same if any of the items are duplicates. The Concatenate aggregation returns a different result if duplicates exist (they’ll appear multiple times in the resulting string), so set this value to false.
  • IsInvariantToNulls indicates whether the aggregation result is the same if any of the items are nulls. Concatenate ignores null values, so set this value to true because the same string will result whether there are nulls or not.
  • IsNullIfEmpty indicates whether the aggregation result is null if no items were aggregated, such as when the WHERE clause returns norecords to aggregate. If no strings exist, Concatenate returns a null, so set this to true.
  • MaxByteSize is required by the UserDefined serialization format and indicates the largest aggregation value that can be returned. This is set to 8000 bytes in Concatenate.

Concatenate is implemented in the following code as a structure and inherits from IBinarySerialize so that SQL Server can properly use multiple instances of the aggregate, using the listed parameters.

SQL Server calls the Init() method before aggregation begins. This is the point where you should initialize any variables used in the aggregation. Concatenate uses a StringBuilder object in an sb variable to hold the aggregated values, so that variable is instantiated here.

SQL Server calls the Accumulate method when it has a new value to add to the aggregation. Concatenate works only on string type data, so the method receives a value of type SqlString. If the value is null, it returns immediately without doing any aggregation task. Otherwise, it calls the Append method of the StringBuilder variable to concatenate the string.

The Merge method is used to aggregate multiple instances of the Concatenate object. In this case it simply concatenates the value passed to it by the SQL Server query processor to the current instance’s aggregation value. The method is passed an object of type Concatenate and reads the object’s StringBuilder’s ToString property to get the current aggregation value.

Often you’ll need to do more work here; for example, when the resulting aggregation needs to be in alphabetical order. In that case, you may want to use a sortable .NET list object to store the strings internally while processing the aggregate. But here a simple append is sufficient, since the order of the items in the final string is not important.

The last of the four required aggregation methods is Terminate. SQL Server calls this after it processes the last value that it aggregates. This method needs to return the final aggregated value, so it first checks whether the current string is of zero length. If it is, it returns a null. Otherwise it strips off the final semicolon and returns the value. It also clears the sb object in case this instance of Concatenate is used for another aggregation.

The aggregation object also has two other methods, required because the object implements the IBinarySerialize interface. The Read method is called when the object is deserialized and restores the value of the sb variable. Write puts the current string value of sb into the serialization object w.

This simple but useful example shows what is necessary to create a SQLCLR aggregate. Other custom aggregates may require more complex code to implement all the required features, and you may have to use multiple class-level variables to keep track of multiple values during the course of calculation, but it really isn’t all that hard to do once you understand the basic principles. You’ll need to have a good handle on how .NET does serialization. But these techniques really become valuable when you implement your own
user-defined types in SQLCLR, which open up whole worlds of possibilities in SQL Server.

 

ldn-expertdkielyThis post is an excerpt from the online courseware for our SQL Server 2012: Advanced SQL CLR course written by expert Don Kiely. Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

Schemas and Naming in SQL Server

Whenever you execute a data selection query in SQL Server, you’ll be accessing one or more database objects, so it is important that you understand how things are named. In SQL Server, database object names use a convention that can contain four parts, any of which can be blank, except the object name:

  • The server_name specifies linked server name or remote server name. A blank implies the current server.
  • The database_name specifies the database name. A blank implies the current database context.
  • The schema_name specifies the name of the schema that contains the object. A blank implies the default schema for the current user or the dbo schema if no other default schema is assigned to the current user.
  • The object_name specifies the name of the object.

In most situations, it is not necessary to use all four parts. However, the recommendation is to use the schema name with the object name, as shown in the following two examples. The first example for the Northwind database will work with or without dbo, because the server uses dbo when no schema is explicitly defined and no default schema is explicitly assigned to the current user. The second query for the AdventureWorks2012 database will fail if the
schema name Sales is omitted, unless the user has Sales set as her default schema. In this case, the Store table was created in the Sales schema. This query will work only for a user with Sales as the default schema.

It might be tempting to deal with schemas by keeping all database objects assigned to dbo and avoid creating or assigning any other schemas. However, schemas can be a useful way of creating multiple namespaces in a database, just as namespaces make it easier for .NET programmers to keep track of classes. The AdventureWorks2012 database provides a good example of using schemas as namespaces. You can also assign permissions on a schema that grant the permission to all objects within it, which makes schemas a powerful security tool for protecting data access. 

NOTE

In versions before SQL Server 2005, a schema was created automatically for each database user. When a user created an object, the object was automatically created in that user’s
schema—unless the user was a database owner, in which case the object was created in the dbo schema. In more recent versions of SQL Server, each user does not automatically have a schema. Schemas are created independently of users; users must explicitly be assigned rights to a schema and can be assigned a default schema. Many users can have rights to use any schema and many users can have the same schema as their default.

 

ldn-expertdkielyThis post is an excerpt from the online courseware for our SQL Server 2012: Select, Where and Clauses course written by expert Don Kiely. Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.