SQL statements can be used to retrieve data from the any database. To get same results we need to write different SQL queries. For better performance we need to use best, faster and efficient queries. So we need SQL query tuning based on the business and user requirements.
Microsoft SQL Server Performs different query tuning operations using in-memory sort & join technology. A join operator is a type of an algorithm which the SQL Server chooses in order to implement logical joins between two sets of data.
It depends on different scenarios based on the
Requested query (It performs what type of operation like intersect, union ... etc)
Available indexes (Cluster and Non Cluster indexes in columns of table)
Statistics information (Sorted order on the join column)
Number of estimated rows in each data set.
There are Three major types of join operations in SQL Server:
• Nested Loops (loop) – If one dataset is small(fewer than 10 rows) and the other dataset is fairly large and indexed on its join columns, an index nested loops join is the fastest join operation because they require the least I/O and the fewest comparisons.
• Merge Join – If the two datasets are not small but are stored on their join column, a merge join is the fastest join operation.
• Hash Match – If both datasets are large, unsorted, nonindexed inputs. This join allows reduction in the use of denormalization. (Denormalization means achieve better performance by reducing join operations)
Briefly examples the above joins operations as follows in different cases
Example - 1:
Let’s create two tables with large data
Case2: With Index
Let’s create a clustered index
Note: If either of the table has indexed it goes hash join.
Example - 2:
Let’s create two tables with medium data
Case1: Without Index
Case2: With Index
Let’s create a clustered index
Note: If either of the table has indexed it goes hash join.
Example - 3:
Let’s create two tables with small amount of data means 10 lines only.
Case2: With Index
Let’s create a clustered index
Note: If either of the table has indexed it goes Nested loop join.
From the above diagram:
• If both the tables have No index then the query optimizer will choose “Hash joins” internally.
• If both the tables have indexes then the query optimizer will choose “Merge (For Large Data in tables) / Nested loop (For small data in tables)” internally.
• If either of the tables have indexes then the query optimizer will choose “Merge (For medium data in tables) / Hash (For Large data in tables) / Nested loop (For small & Large Vs small tables)” internally.
Simply explain by table as follows
Microsoft SQL Server Performs different query tuning operations using in-memory sort & join technology. A join operator is a type of an algorithm which the SQL Server chooses in order to implement logical joins between two sets of data.
It depends on different scenarios based on the
Requested query (It performs what type of operation like intersect, union ... etc)
Available indexes (Cluster and Non Cluster indexes in columns of table)
Statistics information (Sorted order on the join column)
Number of estimated rows in each data set.
There are Three major types of join operations in SQL Server:
• Nested Loops (loop) – If one dataset is small(fewer than 10 rows) and the other dataset is fairly large and indexed on its join columns, an index nested loops join is the fastest join operation because they require the least I/O and the fewest comparisons.
• Merge Join – If the two datasets are not small but are stored on their join column, a merge join is the fastest join operation.
• Hash Match – If both datasets are large, unsorted, nonindexed inputs. This join allows reduction in the use of denormalization. (Denormalization means achieve better performance by reducing join operations)
Briefly examples the above joins operations as follows in different cases
Example - 1:
Let’s create two tables with large data
create table
Dataset1 (id int identity ,name varchar(50))
declare @i int
set @i=0
while (@i<200)
begin
insert into
Dataset1 (name)
select name from
master.dbo.spt_values
set @i=@i+1
end
--select COUNT(*) from dbo.Dataset1 --501600
go
create table
Dataset2 (id int identity ,name varchar(50))
declare @i int
set @i=0
while (@i<200)
begin
insert into
Dataset2 (name)
select name from
master.dbo.spt_values
set @i=@i+1
end
-- select COUNT(*) from dbo.Dataset2 –501600
select *
from dbo.Dataset1
a join dbo.Dataset2
b on (a.id=b.id)
Case1: Without Index
Case2: With Index
Let’s create a clustered index
create unique
clustered index
cx_Dataset1 on Dataset1
(id)
create unique
clustered index
cx_Dataset2 on Dataset2
(id)
Note: If either of the table has indexed it goes hash join.
Example - 2:
Let’s create two tables with medium data
create table
Dataset3 (id int identity ,name varchar(50))
declare @i int
set @i=0
while (@i<1)
begin
insert into
Dataset3 (name)
select name from
master.dbo.spt_values
set @i=@i+1
end
--select COUNT(*) from dbo.Dataset3 --2508
go
create table
Dataset4 (id int identity ,name varchar(50))
declare @i int
set @i=0
while (@i<1)
begin
insert into
Dataset4 (name)
select name from
master.dbo.spt_values
set @i=@i+1
end
--
select COUNT(*) from dbo.Dataset4 –2508
select * from dbo.Dataset3 c join dbo.Dataset2 d on (c.id=d.id)Case1: Without Index
Case2: With Index
Let’s create a clustered index
create unique
clustered index
cx_Dataset3 on Dataset3
(id)
create unique
clustered index
cx_Dataset4 on Dataset4
(id)
Note: If either of the table has indexed it goes hash join.
Example - 3:
Let’s create two tables with small amount of data means 10 lines only.
create table
Dataset5 (id int identity,name varchar(50))
insert into
Dataset5 (name)
select top
10 name from master.dbo.spt_values
-- select COUNT(*) from dbo.Dataset5 --10
create table
Dataset6 (id int identity,name varchar(50))
insert into
Dataset6 (name)
select top
10 name from master.dbo.spt_values
--
select COUNT(*) from dbo.Dataset6 –10
select * from dbo.Dataset5 e join dbo.Dataset6 f on (e.id=f.id)
Case1: Without Index
Case2: With Index
Let’s create a clustered index
create unique
clustered index
cx_Dataset5 on Dataset5
(id)
create unique
clustered index
cx_Dataset6 on Dataset6
(id)
Note: If either of the table has indexed it goes Nested loop join.
From the above diagram:
• If both the tables have No index then the query optimizer will choose “Hash joins” internally.
• If both the tables have indexes then the query optimizer will choose “Merge (For Large Data in tables) / Nested loop (For small data in tables)” internally.
• If either of the tables have indexes then the query optimizer will choose “Merge (For medium data in tables) / Hash (For Large data in tables) / Nested loop (For small & Large Vs small tables)” internally.
Simply explain by table as follows
Table size of First Dataset
|
Table size of
Second
Dataset
|
With index (both)
|
Without index (Both)
|
Either of table has index
|
Large Data
|
Large Data
|
Merge
|
Hash
|
Hash
|
Medium Data
|
Medium Data
|
Merge
|
Hash
|
Merge
|
Small Data
|
Small Data
|
Nested Loop
|
Hash
|
Nested Loop
|
Large Data
|
Small Data
|
Nested Loop
|
Hash
|
Nested Loop
|
Large Data
|
Medium Data
|
Nested Loop
|
Hash
|
Nested Loop
|