Friday, November 11, 2016

Special Index - XML Index

I will be writing this series of articles for the Special Indexes in SQL Server. In this series we will cover following Index –

1. XML Index
2. Full Text Index
3. Spatial Index
4. Columnstore Index

Prerequisites
1. If you are reading this article, I am assuming that you know Index basics like type of Index, how it works and their benefits.
2. You are aware of XML data type in SQL Server.

You might have guess from the name, XML Indexes can be created on XML type columns which are stored as Large Binary Objects (BLOBs). When we create XML Index, it Index path, tags and values of XML column and improve query performance. When we do not have XML Index and our query is fetching data from XML data type, in such scenarios SQL Server parses the XML during execution, which results in poor query performance due to extra over head of parsing XML and applying the predicates. Usually XML values are large and retrieved values are comparatively small, if we create XML Index, it can skip the parsing xml data during execution and can improve performance by using Index for searching which results in better query performance. XML Index may sound different from traditional Indexes (clustered and Nonclustered Index) but this is implemented in similar way by using internal tables and B- Tree structures. We cannot query internal tables used for XML Index.
As other Indexes, XML Index also has maintains cost. So we need to consider this before creating any XML Index. We will be discussing this in details later in the article.

There are two kind of XML Index
1. Primary XML Index
2. Secondary XML Index

If a table has more than one XML type column, we can create one Primary XML Index and multiple Secondary XML Indexes for each XML column. We can create up to 249 XML Indexes and 999 Nonclustered Indexes.
Note – The first Index on an XML type column has to be Primary XML Index. Secondary XML Index can be created on Primary XML Index. Please remember that XML is case sensitive even though your SQL Server instance is not.
Primary XML Index
As mentioned above, XML Index on a XML type column are implemented by internal tables and B- Tree structures. These internal tables are called Node tables. Primary XML Index is a clustered Index on a node table, but for creating a Primary XML Index table must have a primary key which is implemented as clustered Index. This primary key can have up to 15 columns and used by SQL Server to lookup the records. If you want to see the columns in Node table for a given Primary XML Index, you can run the below script –
SELECT
I.name as 'IndexName',
c.name as 'NodeTableColumnName'
FROM sys.columns c
JOIN sys.Indexes i ON i.object_id = c.object_id
WHERE i.name = 'PXML_ProductModel_CatalogDescription_Test' AND i.type = 1;

 I got the following output for my SQL Server instance.

IndexName
NodeTableColumnName
PXML_ProductModel_CatalogDescription_Test
id
PXML_ProductModel_CatalogDescription_Test
nid
PXML_ProductModel_CatalogDescription_Test
tagname
PXML_ProductModel_CatalogDescription_Test
taguri
PXML_ProductModel_CatalogDescription_Test
tid
PXML_ProductModel_CatalogDescription_Test
value
PXML_ProductModel_CatalogDescription_Test
lvalue
PXML_ProductModel_CatalogDescription_Test
lvaluebin
PXML_ProductModel_CatalogDescription_Test
hid
PXML_ProductModel_CatalogDescription_Test
xsinil
PXML_ProductModel_CatalogDescription_Test
xsitype
PXML_ProductModel_CatalogDescription_Test
pk1

Let’s create our first Primary XML Index -
Here we will be creating a copy of [Production].[ProductModel] table from AdventureWorks2014 database. We are not working on original table because if we messed up, we have the original table and we can restore our test table from it. Let’s call it [Production].[ProductModel_TestXMLIndex], you can directly run the following script to create the table. This table has two XML type columns, CatalogDescription and Instructions. 

-- rename all constraints and Index
CREATE TABLE [Production].[ProductModel_TestXMLIndex](
       [ProductModelID] [int] IDENTITY(1,1) NOT NULL,
       [Name] [dbo].[Name] NOT NULL,
       [CatalogDescription] [xml](CONTENT [Production].[ProductDescriptionSchemaCollection]) NULL,
       [Instructions] [xml](CONTENT [Production].[ManuInstructionsSchemaCollection]) NULL,
       [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL CONSTRAINT [DF_ProductModel_rowguid_test]  DEFAULT (newid()),
       [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_ProductModel_ModifiedDate_test]  DEFAULT (getdate()),
 CONSTRAINT [PK_ProductModel_ProductModelID_test] PRIMARY KEY CLUSTERED
(
       [ProductModelID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Go


Once the table is ready, please make sure we do not have any XML Index on it. You can check it in Management Studio under table properties. Now we know there is no XML Index on our test table but before we run any query, we need data in our table. I am using our original table and loading data into our test table. Please run the same query multiple times so we have enough amounts of data to see any performance improvement. I know this is not the best way to create the data but here we are just concern about volume of data not the quality.

-- Run this script for 20 to 30 times

Insert into [Production].[ProductModel_TestXMLIndex]
Select
T.Name,
T.CatalogDescription,
T.Instructions,
T.rowguid,
T.ModifiedDate
from [Production].[ProductModel] T
Go

Now our test table is ready with data. Let’s run a simple query and capture the query execution plan.

DBCC FREEPROCCACHE
Go

SELECT CatalogDescription.value('             
    declare namespace PD="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelDescription";            
       (/PD:ProductDescription/@ProductModelID)[1]', 'int') AS Result            
FROM [Production].[ ProductModel_TestXMLIndex]         
WHERE CatalogDescription IS NOT NULL            
ORDER BY Result DESC 
Go

When I ran the above script in my environment, I got the following execution plan. As you can see in screen shot below, cost of Table Valued function (XML Reader with XPath filter) is 98%. You can cross check in blue rectangle that there is no XML Index, we just have a primary key.


Now let’s see how XML Index can improve the performance. Run the below script to create the XML Index or you can create it from Management studio GUI.

CREATE PRIMARY XML INDEX [PXML_ProductModel_CatalogDescription_Test] ON [Production].[ProductModel_TestXMLIndex]
(
       [CatalogDescription]
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

Go to our test table then, right click the Index, there you can find the option of New Index. Select Primary XML Index. Then you get a pop up window.


Here you provide the name for you Index and use add button to select the column on which you want to create the Index. In this case it is CatalogDescription. 


Now let’s run our same script. In my environment, I got the following output (please see the below screen shot). Now you can see that we have Primary XML Index PXML_ProductModel_CatalogDescription_Test
As you can see in red rectangle cost is now 58% because now query optimizer is using the Primary XML Index.


Secondary XML Indexes
Once we have created the Primary XML Index, we can create Secondary XML Index on XML column. Secondary XML Indexes are Nonclustered Index on node table or internal table and these are completely optional. There are three types of Secondary XML Index (Path, Value and Property) and we can create them in any order. These help in certain types of XQuery execution. 
Let’s create a Secondary XML Index for Path (find the example below). In this case Path Secondary XML Index is a Nonclustered Index on the HID and Value column of the Node table. Similarly Value Index is Nonclustered Index on Value and HID column and Property Index is a Nonclustered Index on pk, HID and Value column of the Node table. You can find the example for creating the all three types of Secondary XML Index.

PATH Secondary XML Index

CREATE XML INDEX [SecondaryXmlIndexPath] ON [Production].[ProductModel_TestXMLIndex]
(
       [CatalogDescription]
)
USING XML INDEX [PXML_ProductModel_CatalogDescription_Test] FOR PATH WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

 

VALUE Secondary XML Index

CREATE XML INDEX [SecondaryXmlIndexValue] ON [Production].[ProductModel_TestXMLIndex]
(
       [CatalogDescription]
)
USING XML INDEX [PXML_ProductModel_CatalogDescription_Test] FOR VALUE WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

 

PROPERTY Secondary Index

CREATE XML INDEX [SecondaryXmlIndexProperty] ON [Production].[ProductModel_TestXMLIndex]
(
       [CatalogDescription]
)
USING XML INDEX [PXML_ProductModel_CatalogDescription_Test] FOR PROPERTY WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO


We can run simple scripts against DMVs and can pull XML Index list in our database.

select
O.name as 'TableName',
I.name as 'IndexName',
I.type_desc as 'IndexType'
from Sys.Indexes I
Inner Join Sys.Objects O on I.object_id=O.object_id
where I.type_desc like 'XML'

Or
select
O.name as 'TableName',
I.name as 'IndexName',
I.type_desc as 'IndexType'
from Sys.xml_Indexes I
Inner Join Sys.Objects O on I.object_id=O.object_id

TableName
IndexName
IndexType
ProductModel
PXML_ProductModel_CatalogDescription
XML
ProductModel
PXML_ProductModel_Instructions
XML
Store
PXML_Store_Demographics
XML
ProductModel_TestXMLIndex
PXML_ProductModel_CatalogDescription_Test
XML
ProductModel_TestXMLIndex
SecondaryXmlIndexPath
XML
ProductModel_TestXMLIndex
SecondaryXmlIndexProperty
XML
ProductModel_TestXMLIndex
SecondaryXmlIndexValue
XML
Person
PXML_Person_AddContact
XML
Person
PXML_Person_Demographics
XML
Person
XMLPATH_Person_Demographics
XML
Person
XMLPROPERTY_Person_Demographics
XML
Person
XMLVALUE_Person_Demographics
XML

As you can see in the output of below script, top row (marked in green), shows the internal table xml_Index_nodes_663673412_256000, which is created by SQL Server for our XML Index on table ProductModel_TestXMLIndex.

SELECT
T.name,
T.object_id,
T.schema_id,
T.type_desc,
T.create_date,
T.modify_date,
T.internal_type_desc,
T.is_ms_shipped
FROM sys.internal_tables T
Order by T.modify_date Desc

name
object_id
schema_id
type_desc
create_date
modify_date
internal_type_desc
is_ms_shipped
xml_Index_nodes_663673412_256000
791673868
4
INTERNAL_TABLE
9/28/16 15:31
9/28/16 15:39
XML_INDEX_NODES
0
ifts_comp_fragment_1589580701_5
535672956
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_COMP_FRAGMENT
0
ifts_comp_fragment_1077578877_3
519672899
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_COMP_FRAGMENT
0
ifts_comp_fragment_610101214_4
503672842
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_COMP_FRAGMENT
0
fulltext_docidfilter_1077578877
251147940
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_DOCID_FILTER
0
fulltext_Indexeddocid_1077578877
267147997
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_INDEXED_DOCID
0
fulltext_avdl_1077578877
283148054
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_AVDL
0
fulltext_Index_docidmap_1077578877
219147826
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_DOCID_MAP
0
fulltext_Index_docidstatus_1077578877
235147883
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_DOCID_STATUS
0
fulltext_Index_docidstatus_1589580701
155147598
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_DOCID_STATUS
0
fulltext_docidfilter_1589580701
171147655
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_DOCID_FILTER
0
fulltext_Indexeddocid_1589580701
187147712
4
INTERNAL_TABLE
7/17/14 16:11
7/17/14 16:11
FULLTEXT_INDEXED_DOCID
0
Below script shows the list of column in node table for Primary XML Index.

-- list the columns in primary XML Index in the internal node table
SELECT *
FROM sys.columns c
INNER JOIN sys.Indexes i ON i.object_id = c.object_id
WHERE i.name = 'PXML_ProductModel_CatalogDescription_Test' AND i.type = 1;