Monday, November 7, 2016

Mastering Index – Index with included columns

This article is the part of series Mastering Index, in this particular article we will be exploring Index with included columns. Take a look at the content of this series –

Mastering Index – Index with included columns
One of the most interesting and talked about topic in database field is Index. I am writing this Series of article to cover all the different types of Index available in SQL Server. I will be writing this series in two categories Index and Special Index. Please find the List of topics below, which I will covering –

Mastering Index 
1. Clustered Index
2. Nonclustered Index
3. Unique Index
4. Index with included columns
5. Index on computed columns
6. Filtered Index

Special Index

1. XML Index
2. Full Text Index
3. Spatial Index
4. Columnstore Index

If you have read this series from the beginning, I am sure you have good understanding of Index, Index types, Index cost and benefits. Let me make it very clear Index with included columns is not a type of Index. It is just a variant of nonclustered Index, which can be very helpful in certain scenarios.
You may remember from Nonclustered Index article, when leaf node of the nonclustered Index cannot find all the data required, it uses the clustered Index to get the additional information. What if we can somehow store that additional information in nonclustered Index, so we can save time and resource used in searching clustered Index. We can do it using the include option in nonclustered Index but remember that it cost space in disk and can be useful only in certain situations. There are other benefits like we can include a column with data type which cannot be used as Index key and included columns are not consider by the SQL Server while calculating Index key size and number of Index key columns.
When we create a nonclustered Index with include option, leaf node actually stores the column value with Index key. So if query can find all the information in this Index there is no need to refer the clustered Index /heap. This kind of query is called covering query and Index is called covering Index.
Usage Recommendations-
Suppose you have nonclustered index with large index key size. In this case you can redesign the index, use only those columns as index key which are useful in searching and lookups and remaining columns use as included columns. So you have a smaller index key and stilling covering the query.
We have limitation on the index key size 16 columns or 900 bytes, if your index requires more columns, you can use them as included column. SQL Server does not consider them during calculating the size of index key or number of index column.

There are some limitation for include column-
We can use include columns only for nonclustered index
All data types can be included except text, ntext and image.
Computed columns can be used as included columns.
Included columns cannot be dropped from a table, if you want do so first drop the index.
We cannot change column nullability and length for varchar, nvarchar or vabinary.

Let’s start some coding, assume that we have a table called ‘Employee_Test’, you can create the table with below script.
Here we are using the AdventureWorks2014 database. Here we are creating a table and inserting records from original Employee table. I am copying the table records 1000 times to create enough amounts of data to show performance change. You can set the value as per your need.


USE [AdventureWorks2014]
GO
/****** Object:  Table [HumanResources].[Employee]    Script Date: 10/7/2016 12:38:57 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

Begin
CREATE TABLE [HumanResources].[Employee_Test](
       [BusinessEntityID] [int] Identity NOT NULL,
       [NationalIDNumber] [nvarchar](15) NOT NULL,
       [LoginID] [nvarchar](256) NOT NULL,
       [OrganizationNode] [hierarchyid] NULL,
       [OrganizationLevel]  int,
       [JobTitle] [nvarchar](50) NOT NULL,
       [BirthDate] [date] NOT NULL,
       [MaritalStatus] [nchar](1) NOT NULL,
       [Gender] [nchar](1) NOT NULL,
       [HireDate] [date] NOT NULL,
       [SalariedFlag] [dbo].[Flag] NOT NULL   DEFAULT ((1)),
       [VacationHours] [smallint] NOT NULL  DEFAULT ((0)),
       [SickLeaveHours] [smallint] NOT NULL   DEFAULT ((0)),
       [CurrentFlag] [dbo].[Flag] NOT NULL   DEFAULT ((1)),
       [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL   DEFAULT (newid()),
       [ModifiedDate] [datetime] NOT NULL  DEFAULT (getdate()),
)

Declare @Count int
Declare @TotalRec int

Set @Count=0
Set @TotalRec=1000

WHILE (@TotalRec>@Count)
BEGIN -- While Begin

Insert into [HumanResources].[Employee_Test]
(
  [NationalIDNumber]
      ,[LoginID]
      ,[OrganizationNode]
      ,[OrganizationLevel]
      ,[JobTitle]
      ,[BirthDate]
      ,[MaritalStatus]
      ,[Gender]
      ,[HireDate]
      ,[SalariedFlag]
      ,[VacationHours]
      ,[SickLeaveHours]
      ,[CurrentFlag]
      ,[rowguid]
      ,[ModifiedDate]
)
SELECT
      [NationalIDNumber]
      ,[LoginID]
      ,[OrganizationNode]
      ,[OrganizationLevel]
      ,[JobTitle]
      ,[BirthDate]
      ,[MaritalStatus]
      ,[Gender]
      ,[HireDate]
      ,[SalariedFlag]
      ,[VacationHours]
      ,[SickLeaveHours]
      ,[CurrentFlag]
      ,[rowguid]
      ,[ModifiedDate]
  FROM [AdventureWorks2014].[HumanResources].[Employee]


set @Count=@Count+1
END

CREATE UNIQUE CLUSTERED INDEX [ClusteredIndex-PK] ON [HumanResources].[Employee_Test]
(
       [BusinessEntityID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)



End


Once our table is ready with data, we can learn how to create a nonclustered Index with include option and how it influence the query execution plan and performance. We can create Index from SQL Script or from Management Studio, if you are not sure how you can do it, please refer previous article of this series. I have explained step by step process to create Index. Here is the links for that –





Now we have our table with PK only, which is created on BusinessEntityID column and by default it is a clustered Index. Let’s run the below scrip and analyze the execution plan. Select the option to include the actual execution plan. You can do it from task bar or right click and select actual execution plan.

SELECT
       [NationalIDNumber]
      ,[LoginID]
      ,[JobTitle]
      ,[BirthDate]
  FROM [AdventureWorks2014].[HumanResources].[Employee_Test]
  Where [NationalIDNumber]= 295847284

As you can see the execution plan in below screenshot most of the cost (98%) belongs to clustered Index scan but you can see in our script we are not directly using the clustered Index (Primary key on BusinessEntityID). 


Let’s create a nonclustered Index on NationalIDNumber because we are using this as filter criteria in our script and see how it changes the execution plan of the script. Run the below script or create the nonclustered Index from management studio GUI and run the same script again.

USE [AdventureWorks2014]
GO

/****** Object:  Index [NonClusteredIndex_NoInclude]    Script Date: 10/18/2016 11:55:11 AM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex_NoInclude] ON [HumanResources].[Employee_Test]
(
       [NationalIDNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


When I ran the script, I got the below execution plan. As you can see in below screenshot, the cost of nonclustered Index is 26% and key lookup (clustered Index) is 70%. Here SQL Server first look for the NationalIDNumber value 295847284. Once it finds it in nonclustered Index, it reads Clustered Index key to fetch the other information (LoginID, JobTitle and BirthDate). Then SQL Server look for the Clustered Index key in clustered Index and reads the information from the leaf node. Now we will create a nonclustered Index with these three columns included and will see how SQL Server will fetch the information. 


Run the below script to create a nonclustered Index with include columns LoginID, JobTitle and BirthDate.

USE [AdventureWorks2014]
GO

/****** Object:  Index [NonClusteredIndex_WithInclude]    Script Date: 10/18/2016 11:59:10 AM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex_WithInclude] ON [HumanResources].[Employee_Test]
(
       [NationalIDNumber] ASC
)
INCLUDE (     [LoginID],
       [JobTitle],
       [BirthDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

You can create it from management studio also.


After creating the Index, let’s run the same script and analyze the execution plan. When I ran the script, I got the below execution plan. Here you can see SQL Server is using only NonClusteredIndex_WithInclude Index. Now it’s not using clustered Index because all the required information is available in nonclustered Index itself. 



Now you can understand clustered Index can be useful to improve the performance of certain kind of queries, but it has very limited use and performs good specific situations. So think wisely to use this feature, it’s a great feature if you use it smartly. 


Important DMVs 
sys.indexes
sys.index_columns
sys.dm_db_index_physical_stats function
sys.dm_db_database_page_allocations function
sys.dm_db_missing_index_details
sys.dm_db_missing_index_groups

Interview Questions 
Q. Can we create a clustered index with include column option ?
Q. When we create a index with include column, how does it effect the index key?
Q. Explain a scenario where include column option is useful while creating index?
Q. What are the considerations before creating index with included columns?

Recommended Book for this Topic

Microsoft SQL Server 2012 Internals (Microsoft Press)