Category Archives: SQL SERVER

SQL Server Problem Tombola: Too Many Indexes

When auditing SQL Server databases one of the more common problems I see is the fact that people are adding too many indexes on their tables. This has a significant performance impact on your SQL Server, first off your data inserts will be slower because for every update/delete/insert because we will have to maintain all indexes added for that specific table. But this also has an impact on your read performance, as we all now SQL Server is pretty fast when it comes to reading data out of memory. However, when you have a lot of indexes the data for these indexes also needs to be stored inside your memory, which will increase the amount of memory needed to have this full table in memory. If the data has to be fetched from disk, we will see a performance impact. While indexes are created to increase read performance, one might hurt their read performance when adding indexes while their system is in memory pressure.

To show you what the actual effects of having too many indexes on your table is I have created a few tests.

  1. We are going to test how fast we can insert/Update/Delete data while having different amount of indexes to our table (1 index, 2 indexes, 4 indexes)
    The indexes will be the biggest kind AKA Covering indexes
  2. We are going to test how much memory we are using extra for each index we add.

 

Inserts, Updates and Deletes

I have created a script which is going to insert 100 000 rows into a table

CREATE
TABLE IndexMania

(

IndexManiaID int
identity (1,1),

IndexColumn1 datetime,

IndexColumn2 char(1000),

IndexColumn3 char(500),

IndexColumn4 decimal (18,2),

IndexColumn5 int

)

You can see that I added a char(    1000) & char (500) field to make sure I get a good amount of data to be inserted. If my data inserts are too small SQL Server will make use of the lazy writer and not show any writes with the insert statement. By increasing the size of one row & importing 100 000 rows I will bypass this issue. I am also clearing the SQL Server memory before every statement.

The indexes I created are the following

— CLUSTERED INDEX ON ID

CREATE
UNIQUE
CLUSTERED
INDEX [CI_IndexMania1] ON [dbo].[IndexMania]

(

    [IndexManiaID] ASC

)

— CREATE COVERING DATE INDEX

CREATE
NONCLUSTERED
INDEX [NCI_1] ON [dbo].[IndexMania]

(

    [IndexColumn1] ASC

)

INCLUDE (     [IndexManiaID],

    [IndexColumn2],

    [IndexColumn3],

    [IndexColumn4],

    [IndexColumn5])

GO

— 2 Additional NCI

CREATE
NONCLUSTERED
INDEX [NCI_2] ON [dbo].[IndexMania]

(

    [IndexColumn2] ASC,

    [IndexColumn3] ASC

)

INCLUDE (     [IndexManiaID],

    [IndexColumn1],

    [IndexColumn4],

    [IndexColumn5])

CREATE
NONCLUSTERED
INDEX [NCI_3] ON [dbo].[IndexMania]

(

    [IndexColumn4] ASC,

    [IndexColumn3] ASC

)

INCLUDE (     [IndexManiaID],

    [IndexColumn1],

    [IndexColumn2],

    [IndexColumn5])

 

The results

HEAP TABLE
Amount of writes (about the same for Delete insert and update)

total_logical_writes

20020

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

 

HEAP

20001

156

Time statistics

INSERT

SQL Server Execution Times:

CPU time = 688 ms, elapsed time = 1299 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 19 ms.

UPDATE

SQL Server Execution Times:

CPU time = 407 ms, elapsed time = 623 ms.

SQL Server parse and compile time:

CPU time = 15 ms, elapsed time = 20 ms.
DELETE

SQL Server Execution Times:

CPU time = 468 ms, elapsed time = 1129 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Clustered Index Table
Amount of writes (about the same for Delete insert and update)

total_logical_writes

20040

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

CI_IndexMania1

CLUSTERED

20082

156

INSERT

SQL Server Execution Times:

CPU time = 1171 ms, elapsed time = 2745 ms.

SQL Server parse and compile time:

CPU time = 16 ms, elapsed time = 25 ms.
UPDATE

SQL Server Execution Times:

CPU time = 297 ms, elapsed time = 550 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 20 ms.
DELETE

SQL Server Execution Times:

CPU time = 734 ms, elapsed time = 3201 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Clustered Index & 1 Covering index

Amount of writes (about the same for Delete insert and update)

total_logical_writes

40694

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

NCI_1

NONCLUSTERED

20131

157

IndexMania

USER_TABLE

CI_IndexMania1

CLUSTERED

20083

156

TOTAL

     

40214

313

INSERT

SQL Server Execution Times:

CPU time = 1953 ms, elapsed time = 4575 ms.

SQL Server parse and compile time:

CPU time = 16 ms, elapsed time = 32 ms.
UPDATE

SQL Server Execution Times:

CPU time = 2375 ms, elapsed time = 5780 ms.

SQL Server parse and compile time:

CPU time = 16 ms, elapsed time = 23 ms.
DELETE

SQL Server Execution Times:

CPU time = 1718 ms, elapsed time = 7761 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Clustered Index & 3 Covering Indexes

Amount of writes

Insert Writes

Update Writes

Delete Writes

124154

182542

125677

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

NCI_2

NONCLUSTERED

40275

314

IndexMania

USER_TABLE

NCI_3

NONCLUSTERED

23331

182

IndexMania

USER_TABLE

NCI_1

NONCLUSTERED

20145

157

IndexMania

USER_TABLE

CI_IndexMania1

CLUSTERED

20083

156

Total

     

103834

809

INSERT

SQL Server Execution Times:

CPU time = 42468 ms, elapsed time = 47094 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 21 ms.

UPDATE

SQL Server Execution Times:

CPU time = 108063 ms, elapsed time = 125766 ms.

SQL Server parse and compile time:

CPU time = 10 ms, elapsed time = 10 ms.

DELETE

SQL Server Execution Times:

CPU time = 54922 ms, elapsed time = 73162 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Conclusion

As seen in our test, we can conclude that each added index has an impact on our write performance & on the amount of data stored in memory to facilitate the index.

Amount of writes

When inserting data to a heap table we needed about 20020 writes to insert 100 000 records, there was almost no difference with the amount of writes while writing to a clustered index. When adding a covering index, the amount of writes doubled, which is normal because we will keep the data 2 times. When adding another two covering indexes we are doing about 4 times the original amount of writes. There are some discrepancies in the amount of writes, but this is due to the fact that to execute some of the queries we needed to spill to TEMPDB, which gave a larger amount of writes as the amount of indexes increased. But after multiple tests we can see that we will need 4 times the amount of writes purely for the indexes. (DELETE UPDATE & INSERT)

Time consumed

When inserting data we have seen that while inserting into a heap table we have an average of about 1 second, when adding a clustered index this increased to an average of 2-3 seconds. When adding 1 covering index this increased to 4-6 seconds, and with two additional covering indexes the amount of time exploded to a whopping 45 – 50 seconds to insert the same workload.

Memory Used by object

After inserting 100 000 rows into the heap table we have seen that we use 156MB of the memory to facilitate this table, when adding the clustered index this amount stays exactly the same. When adding 1 covering index this doubled, which means we are now keeping 2 copies of the data inside the memory. When adding the 2 additional covering indexes we see that we are keeping the same data 4 times inside our memory.

General Conclusion

We should take care when adding additional indexes, this does not only have an impact on our insert/update/delete performance but this also impacts the amount of memory used for that specific table. We want to be able to get all items out of our cache which will be in nanoseconds instead of milliseconds when retrieving data from disk, so we have to keep in mind that every index we add might take additional space inside your cache, removing other objects out of your cache & slowing them down.

In the example of today we were using covering indexes which is actually saying double my data. Keep in mind however that this post is not a pledge to stop using indexes even covering indexes. Indexes will help your performance incredibly if used in the correct way. As a general recommendation I would say try to keep your indexes as small as possible & only include the fields you need to reduce the overhead of your index.

Thank you for reading and stay tuned!

 

 

 

Tagged , , , , , ,

SQL Server Problem Tombola: MAXDOP 1

As you all may now many vendors force you to put the MAXDOP of your SQL Server to 1 (SAP, SHAREPOINT, AX). This is by design for these application because another MAXDOP setting might interfere with the otherwise “performant” workload of the application. Today I would like to show you what the impact is of setting your MAXDOP to 1 on your server actually is.

To start off I would like to give you a small introduction on what MAXDOP actually does. The MAXDOP setting limits the amount of Cores used by one query in your SQL server instance (On SQL Server 2016 you can do this on database level THANK GOD! J ), for example if you put your MD to 2 then the query receives the option to create a parallel plan with 2 cores to execute a query. The MD setting works in parallel with another setting on instance level: Cost Threshold for Parallelism (default 5). The CTFP is the value you set for your instance to start evaluating the possibility to use a parallel plan if you have MD 2/4/8/…. For example, if your CTFP is set to 60 and your query plan only has a cost of 10 your query will not be evaluated to run as parallel, if then however you have another query which has a cost of 80 the query optimizer will check if it is possible to run this query in parallel (depending on your MD setting).

To show you what the impact to your system is I am going to create an OLTP workload & OLAP workload to my Virtual Machine running on my laptop. The OLTP workload will do update inserts and deletes, the OLAP workload will do some read intensive queries onto the system. The OLTP workload will be small queries with a low cost, the OLAP workload will have more impact and calculations which will make them have a higher cost.

The virtual machine will be configured with 4 cores

To run the workload I will be using SQLQueryStress which can be found at ( https://github.com/ErikEJ/SqlQueryStress ) This will use 1 thread and will do 100 iterations for the two workloads.

I will be running all my test against in memory tables to reduce any disk overhead, and just purely look to the CPU usage.

I will be running 5 tests with following setup.

  1. MAXDOP 1 OLAP & OLTP together
  2. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 1 OLAP & OLTP together
  3. MAXDOP 4 COST THRESHOLD FOR PARALLELISM 1 OLAP & OLTP together
  4. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 6 OLAP & OLTP together
  5. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 6 OLAP & OLTP together

I have altered my CTFP setting to 6 because then only my OLAP queries will be eligible for a parallel plan since they have a plan cost of 7.

Here are the results

Workload

MAXDOP

COST THRESHOLD

CPU/ITERATION

SECONDS/ITERATION

ELAPSED TIME

Client Seconds

LogicalReads

OLAP

1

NONE

0,6654

1,0881

22,305

0,2113

21,4595

OLTP

1

NONE

0,2831

0,5072

27,17

0,2522

NONE

OLAP

2

1

0,6595

0,4862

25,263

0,2301

19,4194

OLTP

2

1

0,2898

0,2495

19,1247

0,1479

NONE

OLAP

4

1

0,5921

0,4178

17,8526

0,1658

25,3333

OLTP

4

1

0,1635

0,1648

13,1542

0,1046

NONE

OLAP

2

6

0,4739

0,3581

17,9613

0,1618

31,7011

OLTP

2

6

0,1929

0,1869

12,7859

0,102

NONE

OLAP

4

6

0,4576

0,3861

21,2969

0,1918

32,5532

OLTP

4

6

0,3335

0,3118

19,9687

0,174

NONE

 

When looking at the full elapsed time we can see that following configuration were the fastest.

Shortest OLTP RUN

12,7859 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Shortest OLAP RUN

17,8526 Seconds

MAXDOP 4

COST THRESHOLD FOR PARALLELISM

1

Shortest Full run

30,7472 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

 

When Looking at the CPU user for every iteration we can see that following configurations used the least amount of CPU

Smallest OLTP CPU/Iteration

0,1929

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Smallest OLAP CPU/Iteration

0,4739

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Smallest FULL CPU/Iteration

0,6668

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

 

When looking at the amount of seconds used by the client for every iteration we see the following test to take the least amount of time

Shortest OLTP Client/Iteration

0,1020 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Shortest OLAP Client/Iteration

0,1618 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Shortest FULL Client/Iteration

0,2638 Seconds

MAXDOP2

COST THRESHOLD FOR PARALLELISM

6

 

When looking at the full elapsed time following configuration were the slowest

Slowest OLTP RUN

27,1700 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

Slowest OLAP RUN

25,2630 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

1

Slowest Full run

49,4750 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

 

When looking at the CPU usage for every iteration we can see that following configuration uses the most of the CPU

Largest OLTP CPU/Iteration

0,3335

MAXDOP 4

COST THRESHOLD FOR PARALLELISM

6

Largest OLAP CPU/Iteration

0,6654

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

Largest FULL CPU/Iteration

0,9485

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

 

When looking at the amount of seconds used by every client for every iteration we see that following test were the slowest.

Slowest OLTP Client/Iteration

0,2522 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

Slowest OLAP Client/Iteration

0,2301 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

1

Slowest FULL Client/Iteration

0,4635 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

 

Conclusion OLTP & OLAP

 

As we can see in the results the MAXDOP 1 setting is the slowest in comparison to all other tests in full. This means that in most cases you will limit the throughput of your SQL Server if you set your MAXDOP to one.

The fastest configuration for my laptop has been proven to be MAXDOP 2 with COST THRESHOLD FOR PARALLELISM 6 for this workload with this hardware configuration. This is however not an exact science; this has to be tested for every system.

The MAXDOP 1 was the slowest because we are limiting our Read intensive workload to use one core for every query it is executing. Which increases the time for every read query in total, this makes that the workload was the slowest.

The fastest configuration is when we allow the Read intensive workload to use 2 cores and limit the write/update/delete workload to 1 core. Because we are running both workloads together with 1 thread, we are letting our read intensive workload use 2 cores and leaving the 2 other cores available for our OLTP workload. Which means that we are getting an ideal throughput for our workload as seen to our hardware.

What with Data warehouses?

Our data warehouse will primarily have a read intensive aka OLAP workload, with one or more daily loads (INSERT/UPDATES/DELETES) happening. This means that we should get the most throughput on our read queries. I have been to some clients who have set their MAXDOP to 1 for their data warehouse this will have a negative effect on the speed with which your queries will execute. I will show you that now by test doing just the OLAP workload without the OLTP workload interfering.

For this I will do 3 tests

  1. 100 Iterations 1 thread OLAP workload MAXDOP 1
  2. 100 Iterations 1 thread OLAP workload MAXDOP 2
  3. 100 Iterations 1 thread OLAP workload MAXDOP 4

These are the results on my laptop

Workload

MAXDOP

CPU ITERATION

Seconds Iteration

Elapsed Time

OLAP

1

0,5297

0,5353

27,5624

OLAP

2

0,3703

0,2058

10,9843

OLAP

4

0,3117

0,1769

9,3593

 

As you can see in the table above the fastest way of doing read queries is by giving the workload as much cores as possible to increase the amount of reads which can be done in the least amount of time.

MAXDOP 1 is the slowest run, with MAXDOP 2 being 17 SECONDS faster. There is not a lot of difference between the run of MAXDOP 4 and MAXDOP 2 because of hardware limitations.

Conclusion OLAP

Because of the Read intended workload of a Data warehouse we should not put our MAXDOP to 1. As seen in the test above the more cores you have the more read throughput you will have which will make your queries in your DWH go faster. In these test we were 60% faster when using multiple cores for our workload instead of one core.

What with pure write workload?

When you are just writing data (UPDATE INSERT DELETE ) I would also like to know what MAXDOP setting would be the best. To test this, I have created 3 test scenarios

  1. 100 Iterations 1 thread OLTP workload MAXDOP 1
  2. 100 Iterations 1 thread OLTP workload MAXDOP 2
  3. 100 Iterations 1 thread OLTP workload MAXDOP 4

These are the results on my laptop

MAXDOP

CPU SECONDS

ACTUAL SECONDS

ELAPSED TIME

CLIENT SECONDS

1

0,0013

0,0048

1,1576

0,0467

2

0,0037

0,0183

1,3125

0,0568

4

0,006

0,0305

1,2988

0,0579

 

Conclusion OLTP

When inserting/updating/Deleting data we will be the fastest when running these queries in in a serial way. When running these processes in parallel we will increase the amount of CPU needed for the query and this will increase the time elapsed. When writing to in memory tables as we are doing in this test we see that writing data in amount of CPU time is 50% faster on MAXDOP 1. This means that going in a serial way will be faster than doing this in a parallel way.

Overall Conclusion

We should use caution when setting our MAXDOP setting, we can see that for the two different workloads we have different needs. The read intensive workload will in most cases benefit from a parallel execution, while the write intensive workload will benefit of a serial execution. Because most systems are not READ or WRITE only, I generally think the way to go is try to filter out those read-intensive high cost plans and allow these plans to make use of parallel execution. Thus limiting the amount of queries to be executed in parallel but not removing the option by default. This way we will not hold back our OLTP queries and we will be able to speed up those queries who can really benefit of the parallel execution. This means you should always use the MAXDOP setting in combination with your COST THRESHOLD FOR PARALLELISM SETTING. There is however no golden bullet for these settings and they will always need to be tested out to cope with the workload of your system.

Thank you for reading and stay tuned!

Tagged , , , , , , ,