Monday, 30 March 2015

Advanced SQL Server Query Tuning Concepts with example

                       SQL statements can be used to retrieve data from the any database. To get same results we need to write different SQL queries. For better performance we need to use best, faster and efficient queries. So we need SQL query tuning based on the business and user requirements.
                       Microsoft SQL Server Performs different query tuning operations using in-memory sort & join technology. A join operator is a type of an algorithm which the SQL Server chooses in order to implement logical joins between two sets of data.

It depends on different scenarios based on the
Requested query (It performs what type of operation like intersect, union ... etc)
Available indexes (Cluster and Non Cluster indexes in columns of table)
Statistics information (Sorted order on the join column)
Number of estimated rows in each data set.

There are Three major types of join operations in SQL Server:
Nested Loops (loop) – If one dataset is small(fewer than 10 rows) and the other dataset is fairly large and indexed on its join columns, an index nested loops join is the fastest join operation because they require the least I/O and the fewest comparisons.
Merge Join – If the two datasets are not small but are stored on their join column, a merge join is the fastest join operation.
Hash Match – If both datasets are large, unsorted, nonindexed inputs. This join allows reduction in the use of denormalization. (Denormalization means achieve better performance by reducing join operations)
Briefly examples the above joins operations as follows in different cases

Example - 1:
Let’s create two tables with large data

create table Dataset1 (id int identity ,name varchar(50))
declare @i int
set @i=0
while (@i<200)
begin
insert into Dataset1 (name)
select name from master.dbo.spt_values
set @i=@i+1
end
--select COUNT(*) from dbo.Dataset1 --501600
go
create table Dataset2 (id int identity ,name varchar(50))
declare @i int
set @i=0
while (@i<200)
begin
insert into Dataset2 (name)
select name from master.dbo.spt_values
set @i=@i+1
end
-- select COUNT(*) from dbo.Dataset2 –501600

select * from dbo.Dataset1 a join dbo.Dataset2 b on (a.id=b.id)

Case1: Without Index


Case2: With Index
Let’s create a clustered index

create unique clustered index cx_Dataset1 on Dataset1 (id)
create unique clustered index cx_Dataset2 on Dataset2 (id) 




Note: If either of the table has indexed it goes hash join.

Example - 2:
Let’s create two tables with medium data

create table Dataset3 (id int identity ,name varchar(50))
declare @i int
set @i=0
while (@i<1)
begin
insert into Dataset3 (name)
select name from master.dbo.spt_values
set @i=@i+1
end
--select COUNT(*) from dbo.Dataset3 --2508
go
create table Dataset4 (id int identity ,name varchar(50))
declare @i int
set @i=0
while (@i<1)
begin
insert into Dataset4 (name)
select name from master.dbo.spt_values
set @i=@i+1
end

-- select COUNT(*) from dbo.Dataset4 –2508
select * from dbo.Dataset3 c join dbo.Dataset2 d on (c.id=d.id)

Case1: Without Index




Case2: With Index
Let’s create a clustered index

create unique clustered index cx_Dataset3 on Dataset3 (id)
create unique clustered index cx_Dataset4 on Dataset4 (id)


Note: If either of the table has indexed it goes hash join.

Example - 3:
Let’s create two tables with small amount of data means 10 lines only.

create table Dataset5 (id int identity,name varchar(50))
insert into Dataset5 (name)
select top 10 name from master.dbo.spt_values
-- select COUNT(*) from dbo.Dataset5 --10
create table Dataset6 (id int identity,name varchar(50))
insert into Dataset6 (name)
select top 10 name from master.dbo.spt_values
-- select COUNT(*) from dbo.Dataset6 –10

select * from dbo.Dataset5 e join dbo.Dataset6 f on (e.id=f.id)

Case1: Without Index


Case2: With Index
Let’s create a clustered index

create unique clustered index cx_Dataset5 on Dataset5 (id)
create unique clustered index cx_Dataset6 on Dataset6 (id)







Note: If either of the table has indexed it goes Nested loop join.

From the above diagram:
• If both the tables have No index then the query optimizer will choose “Hash joins” internally.
• If both the tables have indexes then the query optimizer will choose “Merge (For Large Data in tables) / Nested loop (For small data in tables)” internally.
• If either of the tables have indexes then the query optimizer will choose “Merge (For medium data in tables) / Hash (For Large data in tables) / Nested loop (For small & Large Vs small tables)” internally.

Simply explain by table as follows

Table size of First Dataset
Table size of
 Second Dataset
With index (both)
Without index (Both)
Either of table has index
Large Data
Large Data
Merge
Hash
Hash
Medium Data
Medium Data
Merge
Hash
Merge
Small Data
Small Data
Nested Loop
Hash
Nested Loop
Large Data
Small Data
Nested Loop
Hash
Nested Loop
Large Data
Medium Data
Nested Loop
Hash
Nested Loop



No comments:

Post a Comment