Sunday, November 6, 2016

Mastering Index – Index on Computed Columns

This article is the part of series Mastering Index, in this particular article we will be exploring Index on computed columns. Take a look at the content of this series – 

Mastering Index – Introduction
One of the most interesting and talked about topic in database field is Index. I am writing this Series of article to cover all the different types of Index available in SQL Server. I will be writing this series in two categories Index and Special Index. Please find the List of topics below, which I will covering –

Mastering Index
1.       Clustered Index
2.       Nonclustered Index
3.       Unique Index
4.       Index with included columns
5.       Index on computed columns
6.       Filtered Index

Special Index
1.       XML Index
2.       Full Text Index
3.       Spatial Index
4.       Columnstore Index

If you have read this series from the beginning, I am sure you have good understanding of Index, Index types, Index cost and benefits. As the name suggest, we will be exploring the Index on computed columns. SQL Server provides the option to create Index on a dynamically calculated column. Computed columns are not being stored and when you query them SQL Server calculated the value and returns it. You can use PERSISTED option to store the calculated column value and in this case it acts as a normal column.

We can also create index on a computed column which is deterministic but not precise. In this case we need to declare the column as PERSISTED, while creating the table or Alter Table statement. So that SQL Server stores the computed values and updates those when a column data is altered that is being used in the calculation of computed column. PERSISTED option can change the performance of your queries. If you are using PERSISTED option in this case SQL Server will be calculating the values, storing it and updating the Index, so our inserts will be slower but reads will be faster. If you are not using the PERSISTED, your inserts will be faster because there will be con calculations and storing the values of computed columns but on the other hand reads will be slower because column values will be calculated on the fly.

Before we create any index on computed column, we need to make sure our computed column conforms to following criteria –
1.       Ownership requirements
2.       Determinism requirements
3.       Precision requirements
4.       Data type requirements
5.       SET option requirements

Ownership Requirements
Table and the functions used in computed column should have the same owner.

Determinism Requirements
All the function referenced in computed column should be precise and deterministic. All the columns which are used in computed column should belong to the same table and should not pull data from multiple rows or aggregate functions. Computed column logic or expression should not access the user data or system data.

Precision Requirements
Computed column expression should not have real or float data types.

Data Type Requirements

Computed column expression should not return data type test, ntext or image. Computed column can be index if return data type is allowable as index key.

SET Option Requirements

The connection on which the index is created, and all connections trying INSERT, UPDATE, or DELETE statements that will change values in the index, must have six SET options set to ON and one option set to OFF. The optimizer ignores an index on a computed column for any SELECT statement executed by a connection that does not have these same option settings.

o    The NUMERIC_ROUNDABORT option must be set to OFF, and the following options must be set to ON:
o    ANSI_NULLS
o    ANSI_PADDING
o    ANSI_WARNINGS
o    ARITHABORT
o    CONCAT_NULL_YIELDS_NULL
o    QUOTED_IDENTIFIER
Setting ANSI_WARNINGS to ON implicitly sets ARITHABORT to ON when the database compatibility level is set to 90 or higher.


Now let’s create a table with computed column, you can run the below scrip and can create table. Scrip also has the code to insert some dummy data. Please remember, I am using AdventureWorks2014 database. As you can see in the below script we are creating a column Total_Leave_Hours, which is the sum of VacationHours and SickLeaveHours.

Begin
set NoCount on
Declare @Count int
Declare @TotalRec int

If exists (SELECT OBJECT_ID(N'[AdventureWorks2014].[HumanResources].[Employee_Test]'))
Begin
Drop table [AdventureWorks2014].[HumanResources].[Employee_Test]
End
CREATE TABLE [HumanResources].[Employee_Test](
       [BusinessEntityID] [int] IDENTITY(1,1) NOT NULL Primary Key,
       [NationalIDNumber] [nvarchar](15) NOT NULL,
       [LoginID] [nvarchar](256) NOT NULL,
       [OrganizationNode] [hierarchyid] NULL,
       [OrganizationLevel] [int] NULL,
       [JobTitle] [nvarchar](50) NOT NULL,
       [BirthDate] [date] NOT NULL,
       [MaritalStatus] [nchar](1) NOT NULL,
       [Gender] [nchar](1) NOT NULL,
       [HireDate] [date] NOT NULL,
       [SalariedFlag] [dbo].[Flag] NOT NULL DEFAULT ((1)),
       [VacationHours] [smallint] NOT NULL DEFAULT ((0)),
       [SickLeaveHours] [smallint] NOT NULL DEFAULT ((0)),
       [CurrentFlag] [dbo].[Flag] NOT NULL DEFAULT ((1)),
       [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL DEFAULT (newid()),
       [ModifiedDate] [datetime] NOT NULL DEFAULT (getdate()),
       [Total_Leave_Hours] as [VacationHours]+[SickLeaveHours] Persisted
) ON [PRIMARY]



Set @Count=0
Set @TotalRec=1000

WHILE (@TotalRec>@Count)
BEGIN -- While Begin

INSERT INTO [HumanResources].[Employee_Test]
           ([NationalIDNumber]
           ,[LoginID]
           ,[OrganizationNode]
           ,[OrganizationLevel]
           ,[JobTitle]
           ,[BirthDate]
           ,[MaritalStatus]
           ,[Gender]
           ,[HireDate]
           ,[SalariedFlag]
           ,[VacationHours]
           ,[SickLeaveHours]
           ,[CurrentFlag]
           ,[rowguid]
           ,[ModifiedDate])
    
Select
            [NationalIDNumber]
           ,[LoginID]
           ,[OrganizationNode]
           ,[OrganizationLevel]
           ,[JobTitle]
           ,[BirthDate]
           ,[MaritalStatus]
           ,[Gender]
           ,[HireDate]
           ,[SalariedFlag]
           ,[VacationHours]
           ,[SickLeaveHours]
           ,[CurrentFlag]
           ,[rowguid]
           ,[ModifiedDate]
From [HumanResources].[Employee]


set @Count=@Count+1
END


End

Once the table is ready you can create the index on computed column Total_Leave_Hours from GUI or script.


USE [AdventureWorks2014]
GO

SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET NUMERIC_ROUNDABORT OFF

GO

/****** Object:  Index [NonClusteredIndex-20161102-154039]    Script Date: 11/2/2016 3:41:47 PM ******/
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20161102-154039] ON [HumanResources].[Employee_Test]
(
       [Total_Leave_Hours] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


Here we are not going to evaluate the performance impact of Index on computed columns but it has all the Pros and Cons of an Index. Impact of index depends on the many factors and kind of queries you are going to write.


Important DMVs 
sys.indexes
sys.index_columns
sys.dm_db_index_physical_stats function
sys.dm_db_database_page_allocations function
sys.dm_db_missing_index_details
sys.dm_db_missing_index_groups

Interview Questions
Q. Explain PERSISTED property of computed column?
Q. What are the prerequisites to create an index on computed column?
Q. Can we create a clustered index on a computed column? 
Q. Explain a scenario where index on a computed index can impact the performance? 

Recommended Links -