Tag Archives: microsoft sql server

User-Defined Function Overview – Microsoft SQL

User-defined functions (UDFs) were introduced in SQL Server 2000 to provide the coding functionality similar to what is available in general-purpose programming languages. UDFs provide a level of flexibility and reusability that would be impossible to duplicate by using only stored procedures, triggers, and views. These functions, like those in other programming languages, provide reusable code modules that you can invoke with parameters, perform some kind of calculations or actions, and return various
kinds of data.

SQL Server supports three kinds of user-defined functions, each with its own specific features and benefits:

  • Scalar functions are what most developers think of as user-defined functions, because they are most like the functions available in other programming languages and the built-in Transact-SQL (T-SQL) functions. A scalar function can take parameters, contain complex processing, and return a single value of almost any SQL data type.
  • Inline table-valued functions are similar to views in that they each consist of a single SELECT statement and each returns a table object. However, inline table-valued functions can take parameters, whereas views cannot. Like views, you can use inline table-valued functions in statements that update, insert, or delete data, and you can join to them. An inline table-valued function is essentially a view with parameters.
  • Multi-statement table-valued functions are similar to stored procedures, but they return an in-memory table object (a set of rows and columns that you can select from) instead of a static result set. This makes them a powerful view/stored procedure hybrid, because you can perform complex processing in them as in a stored procedure and select data from them as in a view. However, the tables that these functions return are not updatable.

TIP: Another way to categorize the types of functions available in SQL Server is
as scalar and table-valued, based on the type of data returned. With that
categorization, there are then two types of table-valued functions.

UDFs are not allowed to have side effects, which is a bit of a limitation on their use. This means that they are not allowed to make any lasting changes in the database, such as to make changes to data or to alter the schema of any object. But there are more subtle actions that cause changes and therefore are not allowed in UDFs. For example, the RAND function, which generates random numbers, retains state information about the last random number it generated, which changes the state of the function and requires SQL Server to store that state somewhere.

User-defined functions combine some of the best elements of stored procedures and views. Many tasks that you previously performed by using views and stored procedures you can now handle better with UDFs. You can invoke these functions from views, stored procedures, other user-defined functions, batches, and scripts, as well as from external applications.

NOTE:  SQL Server 2000 did not permit nondeterministic functions in the body of a user-defined function. A deterministic function, like LEN, always returns the same result anytime it is called with the same argument values. A nondeterministic function, such as GETDATE(), always returns a different value on each invocation. However, in SQL Server 2005 and later, things have loosened up: GETDATE() is now permissible in UDFs.

Common Language Runtime UDFs

You can create each type of UDF in SQLCLR code as well as Transact-SQL. Transact-SQL is recommended for functions that rely on set-related data manipulation and SQLCLR functions are recommended for functions that require intensive computations or string manipulations, neither of which T-SQL is well suited for. In some cases, the .NET Framework contains base classes that supply functionality that would be difficult to duplicate in Transact-SQL, such as string processing that uses the RegEx class. However, if your functions involve fairly straightforward data manipulation, Transact-SQL is the best choice. This chapter will focus on T-SQL UDFs.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer course written by expert Don Kiely.

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

Types of Spatial Data

SQL Server includes two spatial data types—Geometry and Geography—that encapsulate the database’s support for spatial data. Each data type includes pretty much the same point, line, shape, and collection objects, as well as rich sets of methods and properties for working with these shapes. Figure below shows the hierarchy of available objects. The way this works is that you define a field or variable as either the Geometry or Geography data type, then assign to that field or variable data that describes the point, line, shape, or collection you want to store. The yellow/lighter objects shown in the figure are abstract objects, while the blue/darker objects are concrete objects that you can store in the spatial data type. The only difference between the Geometry and Geography object hierarchies is that in Geography there is a FullGlobe object that represents the complete surface of the earth. This is shown in Figure below in a lighter color and dashed line because it is not part of the Geometry object hierarchy.
TypeofSpatialimg1

The shapes defined by the various spatial objects are vector objects, which are collections of points, lines, curves, polygons, and compound curves. You’re likely to use the following spatial objects most often to store and manipulate within SQL Server:

  • Point: Identified by a pair of coordinates as an exact location as X and Y coordinates, but can also include Z (elevation) and M (measure). SQL Server doesn’t use the Z and M coordinates, but lets you store the data so that client applications can use them.
  • LineString: A path along a sequence of points. It is a one-dimensional shape: it has length but no area, even if multiple connected LineStrings look like a closed shape. The LineString is simple if it doesn’t cross itself, or a ring if the starting point is the same as the ending point.
  • CircularString: Similar to a LineString, but defines a curve as a portion of a circle instead of a straight line.
  • Polygon: A closed 2-dimensional shape; a ring. A polygon has both a length property and an area. It can have holes in its interior, the shapes of which are defined by other, smaller polygons. The area of a hole is excluded from the area of the outer polygon.

You can also create collections of these basic shapes, including a MultiPoint, MultiLineString, and MultiPolygon, each of which can contain only the specified shapes. If you need a collection of any type of shape, including the multi-shapes, you can use a GeomCollection. This makes it easy to work with groups of shapes as a single unit.

The Geography and Geometry data types are .NET classes implemented as system SQLCLR objects with several dozens of methods. Many of the methods, about two-thirds, have names that start with the prefix ST, such as STArea, STLength, STDistance, and STGeomFromText. These ST methods are implementations of Open Geophysical Consortium standard methods (the ST stands for spatiotemporal). These methods provide the most commonly used, basic features for working with spatial data.

Microsoft also added some custom extension methods to the objects to support features not part of the OGC standards. These method names don’t have the ST prefix, such as Parse, ToString, and Reduce.

The nice thing about the Geography and Geometry data types is that they share a lot of methods that work the same across the two types. The main difference is that with the Geography type you’ll use latitude and longitude coordinates but with the Geometry type you’ll generally use x and y coordinates.
ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

The HierarchyID Data Type

As the name suggests, relational databases are best at storing and retrieving relational data. That’s the kind of data that fits neatly into multiple tables, each consisting of rows and columns of relatively homogeneous data, all related through mechanisms such as foreign keys. Most of the material in this course covers relational data, since SQL Server is a relational database system.

But the reality is that sometimes you have chunks of data that don’t fit neatly into the relational model. One example of commonly used non-relational data is hierarchical data. This is data that defines some kind of hierarchy, such as an organizational chart or part subassemblies to build a complex object like an automobile. If the structure is rigidly set with a limited number of levels, you can do a decent job of representing the hierarchy using a table for each level, or other techniques. But it can be very difficult to represent the hierarchy if the structure needs to be flexible.

You can always store hierarchical data outside of SQL Server, but over the years, people have come up with various ways of adapting hierarchical data to fit the relational model. Some of these include:

  •  Table self-joins: In a table self-join, you use a single table to contain the hierarchical data. The table has a foreign key to itself—thus the name “self-join”—to reference another row in the table. A good example of this is the Northwind Employees table, which has a ReportsTo foreign key field to identify the person to whom an employee reports. You can use that field to construct the Northwind sales organizational chart.
  • XML: XML data is inherently hierarchical, consisting of elements that entirely contain one or more child elements in a structured hierarchy. SQL Server 2005 introduced XML as a native data type, and Microsoft integrated it well with relational data. XML is handy because you have a great deal of flexibility in how you define the hierarchy. But you have to learn specialized methods to work with the data, and searching and sorting on the data can be problematic.
  • Roll your own: SQL Server 2005 also introduced the capability to create your own custom data types using the .NET Common Language Runtime (CLR). Many people took advantage of this feature to create their own hierarchical data type, with all the features to support their needs. SQLCLR types support properties and methods, so the type could implement behaviors as well as custom data structures to support hierarchical data. Such custom types were far from trivial to implement, and it took a lot of work to get it right.

Because developers and administrators frequently need to store and work with hierarchical data in SQL Server, Microsoft introduced the HierarchyID data type in SQL Server 2008. This is a compact object you can use to manage hierarchical data using materialized paths. It is compact in that it uses very little storage, and it materializes paths and manages the hierarchy for you. It is a System CLR type, which means that Microsoft implements it via .NET code.

Because it is a system type, you don’t have to enable custom SQLCLR code in a database before using the HierarchyID data type, as you must for your own custom SQLCLR code.

You can use the HierarchyID type to represent organization charts, map a file directory system on your hard drive, create an assembly part chart with subassemblies, represent project tasks, and for many other uses. The type has rich support with methods that provide many ways to maintain the hierarchy, such as making arbitrary insertions and deletions, as well as relocating sections of the hierarchy to other places.

There is no magic to the HierarchyID. Although SQL Server gives you everything you need to maintain the integrity of the hierarchy, it doesn’t automatically enforce hierarchical relationships. You have to do that in your code, using the tools that are part of the data type. The HierarchyID type also does not guarantee uniqueness, but you can enforce uniqueness in other ways, such as with a unique constraint on a HierarchyID field in a table.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

The Stages of Query Compilation

Compiling a query involves three stages: parsing, algebrization, and optimization. The process transforms the logical T-SQL code you write into the physical operators that define the steps that SQL Server will perform to execute the query and return results. Below shows a very high-level view of the entire process.

thestagesofQuery1 The overall query compilation process in SQL Server.

When SQL Server receives a batch of T-SQL statements from a client application (which includes Management Studio), it first checks to see if a plan exists in the plan cache for the query. If the query was recently executed by this or another session, then the query doesn’t need compilation. The plan is passed to the execution engine, and the process is done.

If no match is found in the plan cache, SQL Server parses the query to check the correct T-SQL syntax, including the use of valid SQL identifiers for object names—although not yet to verify that all referenced objects exist—as well as the spelling and use of keywords.

Then for each statement in the batch, the query processor sees if there is an existing plan guide and if it is in the cache. A plan guide is an object that you can define to specify a query plan for a query. If there is a match that is in the plan cache, the processor uses the plan guide for that statement and sends it off for execution. If there is no match, the query is parameterized, which essentially creates a version of the query with separate parameters, and it once again checks to see if the parameterized query is in the plan cache. If there is a
version in the plan cache, it is sent to the execution engine.

Otherwise, if there is no matching plan guide or parameterized query in the plan cache, the query processor has to go through the process of creating an execution plan for the query. The first step is algebrization, also called normalization. This step creates a logical tree, sometimes called a parse tree, that represents the logical steps needed to execute the query. A primary task of algebrization is binding, which checks whether referenced tables and columns exist, loads metadata about those objects, and adds information about implicit
conversions needed. The next step is optimization, which, as its name implies, performs any refinements that can make the execution plan more efficient. Then the plan is added to the plan cache and sent to the execution engine.

TIP: Not all execution plans are cached. For example, you can define a stored
procedure with the WITH RECOMPILE option, which causes SQL Server to
generate a new execution plan every time that code executes. In that case,
SQL Server doesn’t bother caching the plan since there is no point.

Caching execution plans is one of the major ways that SQL Server dramatically increases the efficiency of query execution. Creating an execution plan can be a time-consuming operation, so if most queries can use a cached plan the server will be able to process far more queries.

Even if an execution plan for a query is in the plan cache, SQL Server may decide to recompile the query anyway and throw out the old plan. This process is called recompilation, and it occurs for two broad reasons, correctness and optimality.

  • Correctness: The query processor might determine that an existing cached plan could return incorrect results. In that case, it recompiles the statement or batch. There are numerous reasons why a plan might return incorrect results, including schema changes; added, changed, or dropped indexes; statistics updates; use of WITH RECOMPILE; changed environment settings (usually SET statements); or an explicit call to the sp_recompile system stored procedure.
  • Optimality: SQL Server tracks data distributions with statistics, including information about the number of rows in a table as well as changes to the data in individual columns. The query processor uses a threshold value to decide whether it should recompile a query. Essentially, if the data has changed too much, it could affect the choice of the optimal execution plan and so SQL Server recompiles it.

The benefit of recompilation, even though it is a hit against performance, is that a query with an execution plan that is incorrect or suboptimal is likely to perform far worse than with a correct, optimal plan. It could even return incorrect results in extreme cases. So the performance hit of recompilation is usually well worth the performance benefits.

NOTE: In SQL Server 2005 and later, individual statements can be
recompiled instead of the entire batch, an enormous performance
boost for long batches or complicated stored procedures and other
code modules that can take some time to compile.

It is important to understand that the query processor is not looking for the absolute best execution plan possible. A complex query might have many thousands of execution plans that return correct results, and it might take hours to find the best plan that saves mere milliseconds off query execution time. One of the tasks that the optimizer does is to look for a trivial plan, which has a known and constant cost, involves no parameters, and only uses system functions. An example of a query with a trivial plan is:

Because this query returns all rows and all columns from a relatively small table, no amount of analysis is likely to find one method of scanning the table that is substantially better, in terms of performance, than another. There just aren’t that many ways to perform this query. If the optimizer finds a trivial plan, it returns that execution plan and does no further work.

ldn-expertdkielyThis post is an excerpt from the online courseware for our Microsoft SQL Server 2012 Developer: LocalDB Databases course written by expert Don Kiely. 

Don Kiely is a featured instructor on many of our SQL Server and Visual Studio courses. He is a nationally recognized author, instructor and consultant who travels the country sharing his expertise in SQL Server and security.

LINQ to SQL Read-Only Queries

If you are looking for maximum performance, you can execute your queries in read-only mode. To do that, you can set the ObjectTrackingEnabled property of the DataContext to false. This turns off change tracking, which has some overhead. Because LINQ to SQL does not track changes in this mode, performance improves.

When you select Read Only Queries in the chapter’s sample application, you
will execute two queries. The first retrieves a list of customers in the United
States.

The second query illustrates an important consequence of setting
ObjectTrackingEnabled to false: It sets DeferredLoadingEnabled to false and
therefore turns off deferred loading.

Figure below shows the result of running this code.
TopicLINQtoSQLimg1
Figure above Deferred loading is disabled, so the number of orders does not display.

The code first executes a query to retrieve information for a customer. It then displays the customer’s name, city, and region. Next, the code displays the number of orders for this customer. In the previous example, LINQ to SQL then generated a SQL statement to retrieve the customer’s orders. However, when you turn off change tracking you turn off deferred loading, so in this case, LINQ to SQL does not send the SQL statement to retrieve the customer’s orders.

The code next defines the following query to retrieve the ID and value for each order. However, customer.Orders is empty, so LINQ to SQL does not generate the SQL statement to retrieve the data.

 

TIP: Use read-only queries with caution. The absence of deferred loading means you will not retrieve all of the data you would otherwise. If your code relies on the missing data, you could receive runtime errors.

ldn-expertkgetzThis post is an excerpt from the online courseware for our Microsoft LINQ Using Visual C# 2010 course written by expert Ken Getz.

Ken Getz is a Visual Studio expert with over 25 years of experience as a successful developer and consultant. He is a nationally recognized author and speaker, as well as a featured instructor for LearnNowOnline.

Protecting Your Data in SQL Server

ldn-expertdkiely

The transaction log is fundamental to the way SQL Server works and is created automatically when you create a new database or attach an existing database that doesn’t yet have a log file. The transaction log records almost all activity in the database and provides an extra layer of protection for your data. SQL Server uses a write-ahead strategy to maintain the log, taking the following actions for any data modification:

  1. Loads data pages into a buffer cache.
  2. Updates the copy in the buffer.
  3. Creates a log record in a log cache.
  4. Saves the log record to disk via the checkpoint process.
  5. Saves the data to disk.

If a system failure occurs, you can restore the database from a backup, then apply transactions from the transaction log to recover data changes that occurred after the last backup, and that would otherwise be lost. With complete backups of all transaction logs, you can restore the database to any point in time. This scheme provides remarkable resiliency to almost any kind of failure, short of a total system meltdown. And support is available even for surviving the devastating meltdown of a single server.

The transaction log is a separate data file (.ldf) that by default resides in the same location as the data file, but you’ll often want to place log files on a separate device from the data files for additional security and performance. SQL Server supports three recovery models for recovering from a database server meltdown: simple, full, and bulk-logged.

  • Simple: The log is truncated at a checkpoint, so that recovery takes the database back to the point of the last full or differential backup. A checkpoint is a known good point from which SQL Server can start applying changes from the log during recovery from a failure.
  • Full: All database operations are fully logged so that you can recover the database to the point of failure.
  • Bulk-Logged: Enables faster bulk-logged operations so that recovery is limited to logged transactions that occurred after the last backup.

TIP: The Recovery Model property of a database determines how the database uses its transaction log. A Simple setting for the Recovery Model keeps the log small by truncating the log after each checkpoint process. With this setting, the log supports each transaction while it is in progress, but database recovery relies on the last full backup of the database and cannot use the transaction log to restore data changes that occurred after the last backup.


ldn-expertdkielyThis post is an excerpt from the online courseware for our SQL Server 2012: Configuring Management Studio course written by expert Don Kiely.

SQL Server Management Studio: Displaying and Filtering Objects

sql production schema

SQL Server Management Studio displays server and database information in Object Explorer, which presents information about database objects in a treeview. In addition to displaying SQL Server Database Engine objects, Object Explorer also displays Analysis Services, Reporting Services, Integration Services, and other objects. You can also use it to administer SQL Server Security, SQL Server Agent, Replication, and Database Mail. It provides you with deep insight into just about every corner of an instance of SQL Server and all its services.

The Object Explorer displays information in a hierarchical treeview for all servers to which it connects. It populates the treeview on demand when you expand a node. You can double-click a node to expand it, and click the Stop button to stop the expansion.

Displaying and Filtering Objects

Object Explorer can display up to 65,536 objects. If you want to view additional objects, you need to close some nodes or apply a filter to reduce the number of displayed objects. To filter the view, select the parent node that you want to filter, then right-click and choose Filter|Filter Settings. For example, in the Tables node you can filter by Name, Schema, Owner, or Creation Date and you can choose which operator to use for each, as shown in the Figure below for the AdventureWorks2012 database. Click OK to display only tables in the Production schema.

When you filter a list of items, Object Explorer displays that information in the node name, as shown in the Figure below. This way, you don’t forget that it is filtered when you’re looking for an object that the filter excluded.

filtered view in object explorer

To remove the filter, right-click the node where you applied the filter and choose Filter|Remove Filter. Now all the database tables again appear in the Tables node.

ldn-expertdkielyThis post is an excerpt from the online courseware for our SQL Server 2012: Configuring Management Studio course written by expert Don Kiely.

Installing Multiple Instances of SQL on a Single Computer

2.0-bigsqllogo

 

One of the most powerful features of SQL Server is the ability to install multiple instances of the SQL Server relational database engine on a single computer. Each instance of SQL Server consists of both shared and instance features, which can save resources on a single server machine. All of the database objects and many server objects—including security settings—are completely contained within each instance. However, there are a few things you need to know before you install multiple SQL Server instances.

You can install one default instance and multiple named instances, or multiple named instances with no default instance. The only difference between the default and the named instance is that to access the default instance you use only the machine name to access the instance; a named instance requires the machine name and the instance name.

Instances other than the default instance must be SQL Server 2012, but the default instance can be SQL Server 2005 or later. Here are a few examples of how you can configure a single computer:

  • A default instance of SQL Server 2005, with multiple named instances of SQL Server 2012. Only SQL Server 2012 can operate as a named instance.
  • A default instance of SQL Server 2012 with up to 49 named instances, depending on which SQL Server edition you are using.
  • No default instance and up to 50 named instances, depending on which SQL Server edition you are using.

NOTE: The limit on the number of SQL Server 2012 instances you can install no longer depends on which edition of the product you are using. All editions across the board now support and allow 50 instances. It used to be that some editions were limited to 16 editions.

Multiple Instance Scenarios

Implementing multiple instances of SQL Server gives you great flexibility in the kinds of SQL Server applications you can develop and distribute. Here are a few examples of how you might find multiple instances useful:

  • Continue to run existing applications on earlier versions of SQL Server side-by-side with a SQL Server 2012 server on the same machine at the same time.
  • Run multiple instances of SQL Server on the virtual server of a SQL Server failover cluster. When you install SQL Server on each computer in a cluster, you create a named instance to distinguish that installation from the other instances of SQL Server that are on the same virtual server. Clustering is described fully in Failover Clustering in SQL Server Books Online.
  • Maintain a separate independent SQL Server environment for development and testing on the same computer that holds your production server. This is not generally a good idea, but if resources are tight it is doable.
  • Roll out a secured SQL Server database in a sandbox environment where administrative rights are curtailed. Structure security differently for each instance, with different administrators. One of the problems in past versions of SQL Server for independent software vendors and application developers was that the system administrator in SQL Server is all-powerful. Multiple instances provide a way to curtail administration on a server instance.
  • Install software that comes bundled with the Express edition of SQL Server on a machine that already has a running server.

The principal drawbacks of multiple instances are that administration becomes more complex, and multiple instances consume more resources than a single installation.

ldn expertdkielyThis post is an excerpt from the online courseware for our SQL Server 2012: Installing course written by expert Don Kiely.