Tag Archives: SQL 2012

SQL 2012: Developer: NULLs and SqlBoolean

When integrating T-SQL with the CLR, remember to declare variables, parameters, and return values of data types exposed through the System.Data.SqlTypes namespace. Doing so guarantees a behavior more similar to T-SQL.

As described in the previous section, the outcome of performing arithmetic, bitwise, and logical comparisons between two variables when one or both values is NULL can be inconsistent. The ANSI_NULLS option in T-SQL proves how different the results can be; and as you saw in the simple Visual Basic .NET example, not using the SqlTypes data types leads to the same confusion.

Fortunately, there is the SqlBoolean data type. Exposed as part of the SqlTypes namespace, the SqlBoolean data type can represent three distinct states—true, false, and unknown. In addition, the comparison of two SqlTypes data types always returns a SqlBoolean, which again ensures consistent behavior.

The SqlBoolean data type exposes three important properties:

  • IsTrue: Indicates whether the comparison produces a TRUE value.
  • IsFalse: The outcome when the comparison is FALSE.
  •  IsNull: Returns true when the comparison between the variables produces an unknown or NULL result.

Keeping these concepts in mind, look at the Visual Basic .NET code behind the SqlBooleans button on the switchboard form.

 

The code makes the following comparison:

 

The code compares intX—a SqlInt32 assigned the value 5—with intY, another SqlInt32 explicitly assigned a NULL value. The result is a SqlBoolean data type with properties that contain the outcome of the comparison— blnResult.IsTrue.
Figure 1 shows the Msgbox that displays the outcome of this routine.

Sqlbool1

Figure 1. SqlBooleans provide consistency when you work with NULL values.

WARNING! Remember that a SqlBoolean data type represents three states— IsTrue, IsFalse, and IsNull. IsNull returns TRUE only when both sides of the comparison are unknown.

Frank TillinghastFrank Tillinghast is a senior consultant with MTOW Software Solutions. He is a Microsoft Certified Solution Developer and has been developing applications for over fifteen years. Most of his time is spent consulting for companies nationwide with troubled projects or mentoring projects to successful completion. When he is not developing software or mentoring clients, Frank is teaching other developers. He has taught thousands of application developers how to create business solutions with Visual Studio .NET. VB.NET, ASP.NET, Visual C#, and SQL Server.

Microsoft SQL 2012 Developer: Creating Partitioned Tables

SQL Server 2012 includes the T-SQL extensions to allow for partitioned tables and indexes. Here are the specific steps you perform to create and maintain these database objects.

To create a partitioned table, complete the following tasks:

  1. Select an appropriate partition key.
  2. Create a partition function.
  3. Create a partition scheme.
  4. Create the table and identify its partition scheme.
Key Terms
Partition key A single column in a table whose values determine the partition on which the data resides.
Partition function A function that specifies how to partition the table by defining partition key boundaries for each subset of data.
Partition scheme A mapping of individual partitions to filegroups.

Creating a Partition Function

With a partition key selected, the next step is to define the partitions. The CREATE PARTITION FUNCTION statement specifies how to divide the table by identifying specific boundaries for each partition. The basic syntax of the CREATE PARTITION FUNCTION statement is:

Note: the following syntax details: The input_parameter_type specifies the data type of the partition key and not the name of the partitioning column. The LEFT and RIGHT keywords specify the side of a partition boundary on which a boundary_value belongs. LEFT is the default if you do not explicitly state a side.

LEFT and RIGHT Boundaries

The best way to understand the difference between LEFT and RIGHT partition boundaries is to look at some examples. Consider the following two partition functions:

 

Table below shows the specific partition boundaries for both LEFT and RIGHT versions of the partition function.

Partition Column Values (LEFT) Column Values (RIGHT)
1 <= 100 < 100
2 > 100 AND <= 1,000 >= 100 AND < 1,000
3 > 1,000 AND <= 10,000 >= 1,000 AND < 10,000
4 > 10,000 AND <= 100,000 >= 10,000 AND < 100,000
5 > 100,000 >= 100,000

The table above Partition range values defined according to LEFT and RIGHT boundaries.

A partition function that you define using LEFT boundaries indicates that the boundary value exists in the partition to the value’s left. Similarly, RIGHT boundaries place values in the partition to the value’s right.

Because all partition key values need to map to a partition, SQL Server actually creates n+1 partitions for every partition function. Notice that both functions specify only four boundary values. However, values less than the minimum and greater than the maximum are part of the entire domain of values and therefore must map to a partition.

TIP: If you need to specify acceptable minimum or maximum values, apply a CHECK constraint on the table to reject inappropriate partition key values. For example, to prevent partition key values greater than 100,000, define the partition function by using RANGE LEFT and placing a CHECK constraint on the table to reject values greater than 100,000.

In addition, a CHECK constraint on the table effectively leaves either the leftmost or the rightmost partition empty—a benefit if you add or remove partitions later.

Although not ideal, SQL Server allows partition key columns to contain NULL values and handles them using these rules:

By default, all records that have a partition key value of NULL exist on the leftmost partition.
If a partition function contains a NULL boundary value and is defined using RANGE RIGHT, the leftmost partition remains empty and rows that contain NULL values exist on the next partition.

Keep these considerations in mind when you design partition functions.

Frank TillinghastFrank Tillinghast is a senior consultant with MTOW Software Solutions. He is a Microsoft Certified Solution Developer and has been developing applications for over fifteen years. Most of his time is spent consulting for companies nationwide with troubled projects or mentoring projects to successful completion. When he is not developing software or mentoring clients, Frank is teaching other developers. He has taught thousands of application developers how to create business solutions with Visual Studio .NET. VB.NET, ASP.NET, Visual C#, and SQL Server.

SQL 2012: Aggregates

One of the more exciting types of SQLCLR code modules is custom aggregates. Earlier versions of SQL Server didn’t offer an easy way to extend the rather short list of T-SQL aggregates, such as SUM, AVG, MAX, MIN, and COUNT. You still can’t use T-SQL to create custom aggregates, but you can write your own in SQLCLR. It’s not something you’ll need to do often, but it can greatly simplify application coding in situations where you need it.

Unlike other SQLCLR code modules, you can define only a single custom aggregate within a class. The class must be public and requires four non-static methods, which SQL Server calls as it processes the data to aggregate:

  • Init initializes storage values. SQL Server calls this once at the beginning of the query that uses the aggregate.
  • Accumulate does the work of combining values. SQL Server calls it once for each row that is aggregated, and it must support multiple instances of the aggregation code that is executing so that SQL Server can utilize multiple threads to make operations more efficient.
  • Merge is called at the end of the operation to merge the aggregations of multiple instances into one final value.
  • Terminate is the last method called and returns the final aggregated value.

You also have to deal with serialization issues because instances of the aggregation class have to be moved between threads. This complicates the code only a little, since the .NET Framework has rich support for serialization. Most commonly, you’ll implement the IBinarySerialize interface to let .NET do all the serialization work, but there are other options if you have a special need.

Consider a sample aggregation. A common requirement in applications is to create a concatenation of strings within a table, delimited by a single character such as a semicolon. Some applications require information in this format for processing. You’ll create a Concatenate aggregation that you can use to concatenate all string values in a field. For example, once you create the aggregation, you could run the following T-SQL code to produce a single string with all product names from the Northwind Products table that have a category ID of 4, which is Dairy Products.

This code produces the following string (with line breaks added to fit on the printed page):

The first part of the SQLCLR code defines the aggregation class and decorates it with Serializable and SqlUserDefinedAggregate attributes. The SqlUserDefinedAggregate attribute needs the following parameters, most of which the query processor uses to perform the aggregation properly.

  • Format specifies the serialization format, either Format.Native or Format.UserDefined. You’ll nearly always need to use UserDefined, although SQL Server has enhanced support for the Native format.
  • IsInvariantToDuplicates indicates whether the aggregation result is the same if any of the items are duplicates. The Concatenate aggregation returns a different result if duplicates exist (they’ll appear multiple times in the resulting string), so set this value to false.
  • IsInvariantToNulls indicates whether the aggregation result is the same if any of the items are nulls. Concatenate ignores null values, so set this value to true because the same string will result whether there are nulls or not.
  • IsNullIfEmpty indicates whether the aggregation result is null if no items were aggregated, such as when the WHERE clause returns no records to aggregate. If no strings exist, Concatenate returns a null, so set this to true.
  • MaxByteSize is required by the UserDefined serialization format and indicates the largest aggregation value that can be returned. This is set to 8000 bytes in Concatenate.

Concatenate is implemented in the following code as a structure and inherits from IBinarySerialize so that SQL Server can properly use multiple instances of the aggregate, using the listed parameters.

SQL Server calls the Init() method before aggregation begins. This is the point where you should initialize any variables used in the aggregation. Concatenate uses a StringBuilder object in an sb variable to hold the aggregated values, so that variable is instantiated here.

SQL Server calls the Accumulate method when it has a new value to add to the aggregation. Concatenate works only on string type data, so the method receives a value of type SqlString. If the value is null, it returns immediately without doing any aggregation task. Otherwise, it calls the Append method of the StringBuilder variable to concatenate the string.

The Merge method is used to aggregate multiple instances of the Concatenate object. In this case it simply concatenates the value passed to it by the SQL Server query processor to the current instance’s aggregation value. The method is passed an object of type Concatenate and reads the object’s StringBuilder’s ToString property to get the current aggregation value.

Often you’ll need to do more work here; for example, when the resulting aggregation needs to be in alphabetical order. In that case, you may want to use a sortable .NET list object to store the strings internally while processing the aggregate. But here a simple append is sufficient, since the order of the items in the final string is not important.

The last of the four required aggregation methods is Terminate. SQL Server calls this after it processes the last value that it aggregates. This method needs to return the final aggregated value, so it first checks whether the current string is of zero length. If it is, it returns a null. Otherwise it strips off the final semicolon and returns the value. It also clears the sb object in case this instance of Concatenate is used for another aggregation.

The aggregation object also has two other methods, required because the object implements the IBinarySerialize interface. The Read method is called when the object is deserialized and restores the value of the sb variable. Write puts the current string value of sb into the serialization object w.

This simple but useful example shows what is necessary to create a SQLCLR aggregate. Other custom aggregates may require more complex code to implement all the required features, and you may have to use multiple classlevel variables to keep track of multiple values during the course of calculation, but it really isn’t all that hard to do once you understand the basic principles. You’ll need to have a good handle on how .NET does serialization. But these techniques really become valuable when you implement your own user-defined types in SQLCLR, which open up whole worlds of possibilities in SQL Server.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL  Server 2012 Developer course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

The HierarchyID Data Type

As the name suggests, relational databases are best at storing and retrieving relational data. That’s the kind of data that fits neatly into multiple tables, each consisting of rows and columns of relatively homogeneous data, all related through mechanisms such as foreign keys. Most of the material in this course covers relational data, since SQL Server is a relational database system.

But the reality is that sometimes you have chunks of data that don’t fit neatly into the relational model. One example of commonly used non-relational data is hierarchical data. This is data that defines some kind of hierarchy, such as an organizational chart or part subassemblies to build a complex object like an automobile. If the structure is rigidly set with a limited number of levels, you can do a decent job of representing the hierarchy using a table for each level, or other techniques. But it can be very difficult to represent the hierarchy if the structure needs to be flexible.

You can always store hierarchical data outside of SQL Server, but over the years, people have come up with various ways of adapting hierarchical data to fit the relational model. Some of these include:

  •  Table self-joins: In a table self-join, you use a single table to contain the hierarchical data. The table has a foreign key to itself—thus the name “self-join”—to reference another row in the table. A good example of this is the Northwind Employees table, which has a ReportsTo foreign key field to identify the person to whom an employee reports. You can use that field to construct the Northwind sales organizational chart.
  • XML: XML data is inherently hierarchical, consisting of elements that entirely contain one or more child elements in a structured hierarchy. SQL Server 2005 introduced XML as a native data type, and Microsoft integrated it well with relational data. XML is handy because you have a great deal of flexibility in how you define the hierarchy. But you have to learn specialized methods to work with the data, and searching and sorting on the data can be problematic.
  • Roll your own: SQL Server 2005 also introduced the capability to create your own custom data types using the .NET Common Language Runtime (CLR). Many people took advantage of this feature to create their own hierarchical data type, with all the features to support their needs. SQLCLR types support properties and methods, so the type could implement behaviors as well as custom data structures to support hierarchical data. Such custom types were far from trivial to implement, and it took a lot of work to get it right.

Because developers and administrators frequently need to store and work with hierarchical data in SQL Server, Microsoft introduced the HierarchyID data type in SQL Server 2008. This is a compact object you can use to manage hierarchical data using materialized paths. It is compact in that it uses very little storage, and it materializes paths and manages the hierarchy for you. It is a System CLR type, which means that Microsoft implements it via .NET code.

Because it is a system type, you don’t have to enable custom SQLCLR code in a database before using the HierarchyID data type, as you must for your own custom SQLCLR code.

You can use the HierarchyID type to represent organization charts, map a file directory system on your hard drive, create an assembly part chart with subassemblies, represent project tasks, and for many other uses. The type has rich support with methods that provide many ways to maintain the hierarchy, such as making arbitrary insertions and deletions, as well as relocating sections of the hierarchy to other places.

There is no magic to the HierarchyID. Although SQL Server gives you everything you need to maintain the integrity of the hierarchy, it doesn’t automatically enforce hierarchical relationships. You have to do that in your code, using the tools that are part of the data type. The HierarchyID type also does not guarantee uniqueness, but you can enforce uniqueness in other ways, such as with a unique constraint on a HierarchyID field in a table.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

The Stages of Query Compilation

Compiling a query involves three stages: parsing, algebrization, and optimization. The process transforms the logical T-SQL code you write into the physical operators that define the steps that SQL Server will perform to execute the query and return results. Below shows a very high-level view of the entire process.

thestagesofQuery1 The overall query compilation process in SQL Server.

When SQL Server receives a batch of T-SQL statements from a client application (which includes Management Studio), it first checks to see if a plan exists in the plan cache for the query. If the query was recently executed by this or another session, then the query doesn’t need compilation. The plan is passed to the execution engine, and the process is done.

If no match is found in the plan cache, SQL Server parses the query to check the correct T-SQL syntax, including the use of valid SQL identifiers for object names—although not yet to verify that all referenced objects exist—as well as the spelling and use of keywords.

Then for each statement in the batch, the query processor sees if there is an existing plan guide and if it is in the cache. A plan guide is an object that you can define to specify a query plan for a query. If there is a match that is in the plan cache, the processor uses the plan guide for that statement and sends it off for execution. If there is no match, the query is parameterized, which essentially creates a version of the query with separate parameters, and it once again checks to see if the parameterized query is in the plan cache. If there is a
version in the plan cache, it is sent to the execution engine.

Otherwise, if there is no matching plan guide or parameterized query in the plan cache, the query processor has to go through the process of creating an execution plan for the query. The first step is algebrization, also called normalization. This step creates a logical tree, sometimes called a parse tree, that represents the logical steps needed to execute the query. A primary task of algebrization is binding, which checks whether referenced tables and columns exist, loads metadata about those objects, and adds information about implicit
conversions needed. The next step is optimization, which, as its name implies, performs any refinements that can make the execution plan more efficient. Then the plan is added to the plan cache and sent to the execution engine.

TIP: Not all execution plans are cached. For example, you can define a stored
procedure with the WITH RECOMPILE option, which causes SQL Server to
generate a new execution plan every time that code executes. In that case,
SQL Server doesn’t bother caching the plan since there is no point.

Caching execution plans is one of the major ways that SQL Server dramatically increases the efficiency of query execution. Creating an execution plan can be a time-consuming operation, so if most queries can use a cached plan the server will be able to process far more queries.

Even if an execution plan for a query is in the plan cache, SQL Server may decide to recompile the query anyway and throw out the old plan. This process is called recompilation, and it occurs for two broad reasons, correctness and optimality.

  • Correctness: The query processor might determine that an existing cached plan could return incorrect results. In that case, it recompiles the statement or batch. There are numerous reasons why a plan might return incorrect results, including schema changes; added, changed, or dropped indexes; statistics updates; use of WITH RECOMPILE; changed environment settings (usually SET statements); or an explicit call to the sp_recompile system stored procedure.
  • Optimality: SQL Server tracks data distributions with statistics, including information about the number of rows in a table as well as changes to the data in individual columns. The query processor uses a threshold value to decide whether it should recompile a query. Essentially, if the data has changed too much, it could affect the choice of the optimal execution plan and so SQL Server recompiles it.

The benefit of recompilation, even though it is a hit against performance, is that a query with an execution plan that is incorrect or suboptimal is likely to perform far worse than with a correct, optimal plan. It could even return incorrect results in extreme cases. So the performance hit of recompilation is usually well worth the performance benefits.

NOTE: In SQL Server 2005 and later, individual statements can be
recompiled instead of the entire batch, an enormous performance
boost for long batches or complicated stored procedures and other
code modules that can take some time to compile.

It is important to understand that the query processor is not looking for the absolute best execution plan possible. A complex query might have many thousands of execution plans that return correct results, and it might take hours to find the best plan that saves mere milliseconds off query execution time. One of the tasks that the optimizer does is to look for a trivial plan, which has a known and constant cost, involves no parameters, and only uses system functions. An example of a query with a trivial plan is:

Because this query returns all rows and all columns from a relatively small table, no amount of analysis is likely to find one method of scanning the table that is substantially better, in terms of performance, than another. There just aren’t that many ways to perform this query. If the optimizer finds a trivial plan, it returns that execution plan and does no further work.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer: LocalDB Databases course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

SQL 2012 Developer: Aggregates

Aggregates

One of the more exciting types of SQLCLR code modules is custom aggregates. Earlier versions of SQL Server didn’t offer an easy way to extend the rather short list of T-SQL aggregates, such as SUM, AVG, MAX, MIN, and COUNT. You still can’t use T-SQL to create custom aggregates, but you can write your own in SQLCLR. It’s not something you’ll need to do often, but it can greatly simplify application coding in situations where you need it. 

Unlike other SQLCLR code modules, you can define only a single custom aggregate within a class. The class must be public and requires four non-static methods, which SQL Server calls as it processes the data to aggregate:

  • Init initializes storage values. SQL Server calls this once at the beginning of the query that uses the aggregate.
  • Accumulate does the work of combining values. SQL Server calls it once for each row that is aggregated, and it must support multiple instances of the aggregation code that is executing so that SQL Server can utilize multiple threads to make operations more efficient.
  • Merge is called at the end of the operation to merge the aggregations of multiple instances into one final value.
  • Terminate is the last method called and returns the final aggregated value.

You also have to deal with serialization issues because instances of the aggregation class have to be moved between threads. This complicates the code only a little, since the .NET Framework has rich support for serialization. Most commonly, you’ll implement the IBinarySerialize interface to let .NET do all the serialization work, but there are other options if you have a special need.

Consider a sample aggregation. A common requirement in applications is to create a concatenation of strings within a table, delimited by a single character such as a semicolon. Some applications require information in this format for processing. You’ll create a Concatenate aggregation that you can use to concatenate all string values in a field. For example, once you create the aggregation, you could run the following T-SQL code to produce a single string with all product names from the Northwind Products table that have a category ID of 4, which is Dairy Products.

This code produces the following string (with line breaks added to fit on the printed page):

The first part of the SQLCLR code defines the aggregation class and decorates it with Serializable and SqlUserDefinedAggregate attributes. The SqlUserDefinedAggregate attribute needs the following parameters, most of which the query processor uses to perform the aggregation properly.

  • Format specifies the serialization format, either Format.Native or Format.UserDefined. You’ll nearly always need to use UserDefined, although SQL Server has enhanced support for the Native format.
  • IsInvariantToDuplicates indicates whether the aggregation result is the same if any of the items are duplicates. The Concatenate aggregation returns a different result if duplicates exist (they’ll appear multiple times in the resulting string), so set this value to false.
  • IsInvariantToNulls indicates whether the aggregation result is the same if any of the items are nulls. Concatenate ignores null values, so set this value to true because the same string will result whether there are nulls or not.
  • IsNullIfEmpty indicates whether the aggregation result is null if no items were aggregated, such as when the WHERE clause returns norecords to aggregate. If no strings exist, Concatenate returns a null, so set this to true.
  • MaxByteSize is required by the UserDefined serialization format and indicates the largest aggregation value that can be returned. This is set to 8000 bytes in Concatenate.

Concatenate is implemented in the following code as a structure and inherits from IBinarySerialize so that SQL Server can properly use multiple instances of the aggregate, using the listed parameters.

SQL Server calls the Init() method before aggregation begins. This is the point where you should initialize any variables used in the aggregation. Concatenate uses a StringBuilder object in an sb variable to hold the aggregated values, so that variable is instantiated here.

SQL Server calls the Accumulate method when it has a new value to add to the aggregation. Concatenate works only on string type data, so the method receives a value of type SqlString. If the value is null, it returns immediately without doing any aggregation task. Otherwise, it calls the Append method of the StringBuilder variable to concatenate the string.

The Merge method is used to aggregate multiple instances of the Concatenate object. In this case it simply concatenates the value passed to it by the SQL Server query processor to the current instance’s aggregation value. The method is passed an object of type Concatenate and reads the object’s StringBuilder’s ToString property to get the current aggregation value.

Often you’ll need to do more work here; for example, when the resulting aggregation needs to be in alphabetical order. In that case, you may want to use a sortable .NET list object to store the strings internally while processing the aggregate. But here a simple append is sufficient, since the order of the items in the final string is not important.

The last of the four required aggregation methods is Terminate. SQL Server calls this after it processes the last value that it aggregates. This method needs to return the final aggregated value, so it first checks whether the current string is of zero length. If it is, it returns a null. Otherwise it strips off the final semicolon and returns the value. It also clears the sb object in case this instance of Concatenate is used for another aggregation.

The aggregation object also has two other methods, required because the object implements the IBinarySerialize interface. The Read method is called when the object is deserialized and restores the value of the sb variable. Write puts the current string value of sb into the serialization object w.

This simple but useful example shows what is necessary to create a SQLCLR aggregate. Other custom aggregates may require more complex code to implement all the required features, and you may have to use multiple class-level variables to keep track of multiple values during the course of calculation, but it really isn’t all that hard to do once you understand the basic principles. You’ll need to have a good handle on how .NET does serialization. But these techniques really become valuable when you implement your own
user-defined types in SQLCLR, which open up whole worlds of possibilities in SQL Server.

 

ldn-expertdkielyThis post is an excerpt from the online courseware for our SQL Server 2012: Advanced SQL CLR course written by expert Don Kiely. Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

SQL Server Security: The Threats

cloud-security

Relational databases are used in an amazing variety of applications with connections from a dizzying array of clients, ranging from handheld devices to mainframe Web service applications. This activity exposes data over widely distributed networks, particularly the Internet, which makes it accessible to almost anyone, anywhere. The databases hold a significant portion of human knowledge, including highly sensitive personal information and critical data that makes international commerce work.

These characteristics make databases attractive targets for people who want to steal data or harm its owner by tampering with it. Making sure that your data is secure is a critical part of installing and configuring SQL Server and developing applications that use it to store data.

SQL Server has everything you need to secure your server and data against today’s sophisticated attacks. But before you can use these security features effectively, you need to understand the threats you face and a few basic security concepts.

The Threats

Identifying threats to a particular set of data and its server is an important first step in understanding how to configure and use SQL Server to protect the data. A database you create to manage your grade school soccer team’s equipment inventory probably doesn’t require heavy security measures. You’ll probably want to provide at least minimal access control so that a team member can’t just randomly change the record of who has which box of soccer balls.

On the other hand, if the database has personal data about the minors on the team, such as home addresses and phone numbers, you’ll probably want to step up security protections (and may be legally required to do so). You might protect the privacy of the data by segregating access so that almost anyone with access to the database can change the equipment data but only a select few can access the personal data. If the data includes mom and dad’s credit card number, you’ll need to go to extreme lengths to protect that data.

TIP: Sometimes the best way to protect data is simply not to put it in the database—for example, credit card numbers.

The following list is a sample of the kinds of threats your data may be susceptible to, but it is by no means an exhaustive list. Plenty of resources are available on the Web that can help you analyze the risks for your specific situation. This list is intended to help you start thinking about threats and how to use the features of SQL Server to counter them or at least reduce your data’s exposure to them.

  • Theft of data: Theft of data covers various types of unauthorized access to your data, whether by an outsider hacking into your network or an insider scanning for dirt on famous people. It may involve the thrill of reading forbidden information or be motivated by the sale of stolen credit card numbers.
  • Data vandalism: A hacker who gains access to your data can change it, which can cause a whole range of problems, from public embarrassment to shutting down your entire operation when all of your customer records are deleted.
  • Protecting data integrity: One of the biggest benefits of storing data in a relational database is that the database can help protect the integrity of the data. Data integrity includes mandating that every order have an associated customer, that a date stored in a date field really represents a calendar date, and that a percentage field contains only values between 0 and 100. Data integrity probably isn’t the first thing you think of in connection with security, but it is an important part of protecting your data.
  • Illegal storage: In the past, the data you collected during the course of business was your own business. But now myriad federal laws exist in the U.S., throughout the European Union, and other countries that control the kinds of personal data you can store, how you store it, and how you protect it. The penalties for violations can be severe—both monetary penalties and damage to the public image of your company. This is less a threat to data than a threat to your organization.

You have to understand the threats to your data to know how to protect against them. Don’t waste time on measures that don’t protect against specific threats to your data. You’ll never be able to cover all hypothetical situations, and at worst you’ll make your database server completely unusable by its intended users. Security is always a compromise that balances the risks against the time and money necessary to implement and maintain safeguards.


ldn-expertdkielyThis post is an excerpt from the online courseware for our SQL Server 2012: Security Fundamentals
course written by expert Don Kiely.

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

T-SQL: Counting Rows & Column Data

One of the primary purposes of a relational database is to summarize information. Other than for data entry, you will rarely look at individual orders. Instead, using the database, you will do things like summarize all the orders by week, month, or year. SQL Server performs these kinds of summaries using aggregate functions. These aggregate functions, combined with the GROUP BY clause, can combine a tremendous amount of data into a clear and concise summary. This article looks at the COUNT aggregate function for counting rows & column data.

Counting Rows

Use the COUNT function to return the number of rows in a table, as shown in the following query. This query uses the asterisk (*) as the value of the argument to COUNT so that it counts all of the rows without considering column values and whether they have NULL values:

The Figure below shows the results of the query, indicating that there are nine employees in that table in the Northwind database.

Counting the number of rows in a table

TIP: Your aggregate queries will execute faster if you use the asterisk in the COUNT function instead of a column name. The asterisk instructs SQL Server to count only the number of rows. Using a column name forces SQL Server to retrieve every value in the column and check for nulls, which aren’t included in the count. If all you’re doing is counting the rows, the asterisk is more efficient.

Counting Column Data

When you specify a column name in the COUNT function, the count excludes any null values in the column. The following query counts the total number of rows in the Employees table, and then counts the non-null values in the Region column:

Because the table contains null Region values, the result set in the Figure below shows a discrepancy between the number of employees and the number of regions.

COUNT aggregate functions do not include null values

As you can see from these results, you’ll want to be careful about whether you use an asterisk or a column name with the COUNT aggregate function (and other aggregate functions). Most of the time an asterisk will work, unless you need to account for NULL values.


ldn-expertdkielyThis post is an excerpt from the online courseware for our
SQL Server 2012: T-SQL Select, Where, and Clauses course written by expert Don Kiely.

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.