SQL server allows us to create multiple indexes for each table. Some of us misuse this option by creating lots of duplicate and redundant indexes.
Some of which are not even used down the line or are used only one time at the end of the month.
In this article, we will go through a simple benchmark conclusions to understand the true impact of having multiple defined indexes on a table.
Setting up the benchmark had the following conditions:
1. I have created a temporary table with 5 columns from all types.
2. I have, for the most part, used a single identity column as the clustered primary key as in most SQL Server systems.
3. I have inserted 300,000 records upon each added \ subtracted index using single inserts. Bulk inserts work differently and will be described in future articles.
4. I have played with the size of the columns and the types.
This is the general template that you can also use to test it by yourself. if is not null) drop table #temporary
create table #temporary (col1 int not null identity(1,1), col2 nvarchar(500), col3 int, col4 nvarchar(500), col5 datetime not null default(getdate()))
create index ix1 on #temporary(col1,col2)
create index ix2 on #temporary(col2,col1)
create index ix3 on #temporary(col1,col3,col4)
create index ix4 on #temporary(col4,col1,col3)
create index ix5 on #temporary(col4,col1,col3,col5)
…
declare @i int =0
while (@i < 300000)
begin
insert into #temporary values ('43sadfjkh2304ujasbfldkb43sadfjkh2304ujasbfldkb43sadfjkh2304ujasbfldkb43sadfjkh2304ujasbfldkb43sadfjkh2304ujasbfldkb43sadfjkh2304ujasbfldkb43sadfjkh2304ujasbfldkb43sadfjkh2304ujasbfldkb43sadfjkh2304ujasbfldkb43sadfjkh2304ujasbfldkb43sadfjkh2304ujasbfldkb'
,@i,'asdfjldhsadfjklhsafjdkhaskfhsdlfkjsdh',getdate())
set @i=@i+1
end
The results are:Let us make reach conclusions from this test:
1. Test #1 is where I used a heap (non clustered index table). It was the fastest being on 12 seconds for the inserts. This can be explained because there was no sorting done at all. So the engine simply found the next available space and allocated it for each insert.
2. Test #2 – Adding a simple int identity clustered. This test ran for 14 seconds, being 16% slower than the heap. This is normal and expected as now the rows that are being inserted are sorted and there is a more complex clustered structure to be maintained.
3. Test #3 – This is where I added a single 2 column index. We can see that 3 seconds were added to the overall performance due to another memory structure needed to be updated and sorted.
4. Test #4 – this is where the magic begins. I changed the amount of columns and the basic "index row size" by adding 3 more columns with more information.
What we can see here is that test #3 and test #4 both took 17 seconds to run.
This teaches us something very important – that the amount of columns and the AVG size of the columns are not what effects the speed of insert \ update \ delete statement. It is the amount of indexes.
Meaning that it costs the engine more to maintain an additional index than it costs it to have a "wider" index with more columns.
We can understand that the overall activity of page header maintenance, free row location and all the work around costs significantly more than the actual data IO operation.
This is due to the fact that once the row is located for the index it is usually written sequentially and in the same page. So there will hardly be any difference between having an index on 2 columns and 10 columns.
When you create your index plan you can design it to have more covering indexes and wider indexes containing more columns that can assist more queries.
5. Test #5 – We added another index of approximately the same row length. We jumped to 39 seconds. Almost double the previous 17 seconds. This is 325% worse than the heap solution and 280% worse than having just a primary key.
6. Test #6 – I have add an index with the same columns as in test #5 to show that the time doesn't change.
create index ix1 on #temporary(col1,col2)
create index ix2 on #temporary(col2,col1)
create index ix3 on #temporary(col1,col3,col4)
create index ix4 on #temporary(col4,col1,col3)
create index ix5 on #temporary(col4,col1,col3,col5)
create index ix6 on #temporary(col4,col5,col1,col3)
The reason for this is that ix5 and ix6 are very similar. Same columns in different order. The engine knows how to use this already sorted result in memory and optimize the insert statements.
7. Test #7-#11. You will notice there is a "gap" between the performance up to 2 indexes, than between 2-6 indexes and then a jump above 7 indexes.
Performance is drastically worsened when we have more than 7 indexes defined on the table. The overhead is just too big to handle and the workers that need to report to the main thread take longer.
Conclusions:
WHAT IS THE MEANING OF SQL SERVER BI (BUSINESS INTELLIGENCE)?
במציאות הישראלית, שבה בכל רגע אנחנו יכולים להיות נתונים להתקפת טילים ואף, חס וחלילה, מלחמה, נושא ההמשכיות העסקית הופך חשוב ביותר.