Tag Archives: MICROSOFT

SQL Server Problem Tombola: Giant Reads

A SQL Server is as fast as you tune your workload, one of the things I try to do is tune the workload aka the queries executing on my SQL server in order to get a better performing server. I will start to look for problems with certain queries who do too many reads to retrieve a small result set. A problem however is when working with vendor applications, you might not be able to tune the applications queries. This limits your options but there are a lot of thing you can still do to optimize your workload.

Today we will do some testing with a giant read query and tune it in different ways by using real life scenarios

First of all, we are going to create a database and create two tables on it. We will be executing the scripts on SQL Server 2016 so some syntax might have to be changed when working with lower versions.

DROP
DATABASE
IF
EXISTS BigRead

GO

CREATE
DATABASE BigRead

GO

USE BigRead

GO

CREATE
TABLE dbo.BigReadTable_1

(

    BID int
identity (-2147483648 ,1),

    TheDate datetime,

    TheCategory varchar(20),

    TheLink int

)

GO

CREATE
TABLE dbo.BigReadTable_2

(

    BID_2 int
identity(-2147483648,1),

    TheDate datetime,

    TheCategory varchar(20),

    TheValue decimal(18,0)

)

GO

Once we have these table we will insert some data into it

— INSERT 1000 ROWS

SET
NOCOUNT
ON

insert
into dbo.BigReadTable_2

select
DATEADD(day,ABS(Checksum(NewID())
% 10),getdate()),
‘Category ‘+cast(ABS(Checksum(NewID())
% 10)
as
varchar),
cast((rand()*100000)
as
decimal(18,0))

GO 1000

 

insert
into dbo.BigReadTable_1

select
DATEADD(day,ABS(Checksum(NewID())
% 10),getdate()),‘Category ‘+cast(ABS(Checksum(NewID())
% 10)
as
varchar),(select
top 1 BID_2 from dbo.BigReadTable_2 order
by
newid())

GO 1000

 

insert
into dbo.BigReadTable_1

select TheDate,TheCategory,TheLink from dbo.BigReadTable_1

GO 9

 

insert
into dbo.BigReadTable_2

select [TheDate],[TheCategory],[TheValue] FROM dbo.BigReadTable_2

GO 9

After inserting the rows we will add a clustered index for each table

CREATE
CLUSTERED
INDEX CI_1 ON dbo.BIGREADTABLE_2

(

    BID_2

)

 

GO

 

CREATE
CLUSTERED
INDEX CI_2 ON dbo.BIGREADTABLE_1

(

    TheLink

)

Now we are all set up to start our testing. The query we are going to use is the following

select
*
from dbo.BigReadTable_2 C

inner
join

(

    select
AVG(TheValue) Average,TheCategory from dbo.BigReadTable_2

    group
by TheCategory

)B on C.TheCategory = B.TheCategory

inner
join

(

    select
SUM(TheValue) Summer, TheCategory from dbo.BigReadTable_2

    GROUP
BY TheCategory

)A on A.TheCategory = B.TheCategory

inner
join

    (select
CAST(AVG(CAST(TheDate AS
FLOAT))
AS
DATETIME) AVGDate,TheLink

    FROM dbo.BigReadTable_1

    Group
By TheLink) D

on D.TheLink = C.BID_2

OPTION(RECOMPILE)

This is the plan we get with this query at this moment

And these are the client statistics

Now what are the different levels of tuning possible for this query?

  1. We are able to edit code
  2. We are not able to edit code
  3. We are not able to edit code & cannot add indexes

We are able to edit code

As we can see this query is not the most optimal in filtering out data. In this case our tuning might be quite simple. If you can discuss with your business users that you can only return the fields they will use you will already see a performance increase. Let’s say for this example we only need the data for the BigReadTable_2

—LEAVE OUT SOME FIELDS

select B.*,A.*,D.*,C.TheCategory from dbo.BigReadTable_2 C

inner
join

(

    select
AVG(TheValue) Average,TheCategory from dbo.BigReadTable_2

    group
by TheCategory

)B on C.TheCategory = B.TheCategory

inner
join

(

    select
SUM(TheValue) Summer, TheCategory from dbo.BigReadTable_2

    GROUP
BY TheCategory

)A on A.TheCategory = B.TheCategory

inner
join

    (select
CAST(AVG(CAST(TheDate AS
FLOAT))
AS
DATETIME) AVGDate,TheLink

    FROM dbo.BigReadTable_1

    Group
By TheLink) D

on D.TheLink = C.BID_2

OPTION(RECOMPILE)

Here are the result for the client statistics ( the plan will stay the same )

We see that we are reducing the resources that we used when selecting all columns, so without adding additional indexes or filtering data we have tuned the amount of bytes received by 25%. This shows you how important it is to just select the fields you need instead of selecting everything.

There are many different ways of altering the code, but we will not cover all of these.

We are not able to edit the code

Now things are getting a little trickier, you no longer have control over your query. Good news though, we can still add indexes. So what can we do for our query

In SQL Server 2016 I’m using a new cool trick which can be used even in the express edition. The trick is adding an empty non-clustered columnstore index to the table to benefit from batch mode (for more information check out Niko’s blog on http://www.nikoport.com/ or check this post which explains the trick more thoroughly (http://sqlmag.com/sql-server/what-you-need-know-about-batch-mode-window-aggregate-operator-sql-server-2016-part-1 )

Let’s add 2 non-clustered columnstores

— No Possible Index NO Code edit

DROP
INDEX
IF
EXISTS NCCI_PERF ON dbo.BigReadTable_2

GO

CREATE
NONCLUSTERED
COLUMNSTORE
INDEX NCCI_PERF ON d    bo.BigReadTable_2

(

BID_2, TheDate, TheCategory, TheValue

)

WHERE BID_2 =
1 and BID_2 =
2

GO

DROP
INDEX
IF
EXISTS NCCI_PERF2 ON dbo.BigReadTable_1

GO

CREATE
NONCLUSTERED
COLUMNSTORE
INDEX NCCI_PERF2 ON dbo.BigReadTable_1

(

BID, TheDate, TheCategory, TheLink

)

WHERE BID =
1 and BID =
2

GO

Here is what we get as a result for our client statistics

We get the same plan, but we are using batch mode which means processing 1000 rows at a time instead of 1 row at a time.

By adding these 2 empty indexes we are reducing our execution time by 50%! Great result I feel for adding something empty!

No Code Edit and No Additional Indexes allowed

This is just annoying, almost all tools have been taken from you! But you can still tune the query! Mostly these queries would have the highest cost on your server. So by altering your cost threshold for parallelism you might just force those queries to go into parallel execution speeding them up!

Let’s check the cost of the plan:

The plan has a cost of 15, so let’s lower the value for my cost threshold for parallelism to 14 (current value = 50) and see what result we get.

As you can see we lowered the total execution time of our query by 30-40% by altering a server setting! If we look at the plan

you can see we get the benefit from the parallel execution this specific query benefits from the parallel execution! This however can also have some downsides!

Conclusion

When dealing with Big read queries you got a few tricks up your sleeve when working with SQL Server. In most cases you have 3 different scenario’s:
1. You can change the code
2. You can’t change the code but can add indexes
3. You can’t change the code and can’t add indexes

For the first scenario’s things are “easy”, you have multiple ways of tuning your query and only one was shown here. For the second scenario things get more complicated, you might be able to add an index to gain a seek, but a golden bullet for big reads does not exist. The COLUMNSTORE trick does help in most cases, but you will have to evaluate this for yourself. Then in the last scenario things are the trickiest, you can alter your server settings, but this might come at a price. In the example given today, the parallel plan was beneficial for this query, but it might slow down other processes or even overload your processor slowing everything down.

Thank you for reading & stay tuned!

 

 

Advertisements
Tagged , , , , , ,

SQL Server Problem Tombola: Too Many Indexes

When auditing SQL Server databases one of the more common problems I see is the fact that people are adding too many indexes on their tables. This has a significant performance impact on your SQL Server, first off your data inserts will be slower because for every update/delete/insert because we will have to maintain all indexes added for that specific table. But this also has an impact on your read performance, as we all now SQL Server is pretty fast when it comes to reading data out of memory. However, when you have a lot of indexes the data for these indexes also needs to be stored inside your memory, which will increase the amount of memory needed to have this full table in memory. If the data has to be fetched from disk, we will see a performance impact. While indexes are created to increase read performance, one might hurt their read performance when adding indexes while their system is in memory pressure.

To show you what the actual effects of having too many indexes on your table is I have created a few tests.

  1. We are going to test how fast we can insert/Update/Delete data while having different amount of indexes to our table (1 index, 2 indexes, 4 indexes)
    The indexes will be the biggest kind AKA Covering indexes
  2. We are going to test how much memory we are using extra for each index we add.

 

Inserts, Updates and Deletes

I have created a script which is going to insert 100 000 rows into a table

CREATE
TABLE IndexMania

(

IndexManiaID int
identity (1,1),

IndexColumn1 datetime,

IndexColumn2 char(1000),

IndexColumn3 char(500),

IndexColumn4 decimal (18,2),

IndexColumn5 int

)

You can see that I added a char(    1000) & char (500) field to make sure I get a good amount of data to be inserted. If my data inserts are too small SQL Server will make use of the lazy writer and not show any writes with the insert statement. By increasing the size of one row & importing 100 000 rows I will bypass this issue. I am also clearing the SQL Server memory before every statement.

The indexes I created are the following

— CLUSTERED INDEX ON ID

CREATE
UNIQUE
CLUSTERED
INDEX [CI_IndexMania1] ON [dbo].[IndexMania]

(

    [IndexManiaID] ASC

)

— CREATE COVERING DATE INDEX

CREATE
NONCLUSTERED
INDEX [NCI_1] ON [dbo].[IndexMania]

(

    [IndexColumn1] ASC

)

INCLUDE (     [IndexManiaID],

    [IndexColumn2],

    [IndexColumn3],

    [IndexColumn4],

    [IndexColumn5])

GO

— 2 Additional NCI

CREATE
NONCLUSTERED
INDEX [NCI_2] ON [dbo].[IndexMania]

(

    [IndexColumn2] ASC,

    [IndexColumn3] ASC

)

INCLUDE (     [IndexManiaID],

    [IndexColumn1],

    [IndexColumn4],

    [IndexColumn5])

CREATE
NONCLUSTERED
INDEX [NCI_3] ON [dbo].[IndexMania]

(

    [IndexColumn4] ASC,

    [IndexColumn3] ASC

)

INCLUDE (     [IndexManiaID],

    [IndexColumn1],

    [IndexColumn2],

    [IndexColumn5])

 

The results

HEAP TABLE
Amount of writes (about the same for Delete insert and update)

total_logical_writes

20020

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

 

HEAP

20001

156

Time statistics

INSERT

SQL Server Execution Times:

CPU time = 688 ms, elapsed time = 1299 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 19 ms.

UPDATE

SQL Server Execution Times:

CPU time = 407 ms, elapsed time = 623 ms.

SQL Server parse and compile time:

CPU time = 15 ms, elapsed time = 20 ms.
DELETE

SQL Server Execution Times:

CPU time = 468 ms, elapsed time = 1129 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Clustered Index Table
Amount of writes (about the same for Delete insert and update)

total_logical_writes

20040

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

CI_IndexMania1

CLUSTERED

20082

156

INSERT

SQL Server Execution Times:

CPU time = 1171 ms, elapsed time = 2745 ms.

SQL Server parse and compile time:

CPU time = 16 ms, elapsed time = 25 ms.
UPDATE

SQL Server Execution Times:

CPU time = 297 ms, elapsed time = 550 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 20 ms.
DELETE

SQL Server Execution Times:

CPU time = 734 ms, elapsed time = 3201 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Clustered Index & 1 Covering index

Amount of writes (about the same for Delete insert and update)

total_logical_writes

40694

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

NCI_1

NONCLUSTERED

20131

157

IndexMania

USER_TABLE

CI_IndexMania1

CLUSTERED

20083

156

TOTAL

     

40214

313

INSERT

SQL Server Execution Times:

CPU time = 1953 ms, elapsed time = 4575 ms.

SQL Server parse and compile time:

CPU time = 16 ms, elapsed time = 32 ms.
UPDATE

SQL Server Execution Times:

CPU time = 2375 ms, elapsed time = 5780 ms.

SQL Server parse and compile time:

CPU time = 16 ms, elapsed time = 23 ms.
DELETE

SQL Server Execution Times:

CPU time = 1718 ms, elapsed time = 7761 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Clustered Index & 3 Covering Indexes

Amount of writes

Insert Writes

Update Writes

Delete Writes

124154

182542

125677

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

NCI_2

NONCLUSTERED

40275

314

IndexMania

USER_TABLE

NCI_3

NONCLUSTERED

23331

182

IndexMania

USER_TABLE

NCI_1

NONCLUSTERED

20145

157

IndexMania

USER_TABLE

CI_IndexMania1

CLUSTERED

20083

156

Total

     

103834

809

INSERT

SQL Server Execution Times:

CPU time = 42468 ms, elapsed time = 47094 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 21 ms.

UPDATE

SQL Server Execution Times:

CPU time = 108063 ms, elapsed time = 125766 ms.

SQL Server parse and compile time:

CPU time = 10 ms, elapsed time = 10 ms.

DELETE

SQL Server Execution Times:

CPU time = 54922 ms, elapsed time = 73162 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Conclusion

As seen in our test, we can conclude that each added index has an impact on our write performance & on the amount of data stored in memory to facilitate the index.

Amount of writes

When inserting data to a heap table we needed about 20020 writes to insert 100 000 records, there was almost no difference with the amount of writes while writing to a clustered index. When adding a covering index, the amount of writes doubled, which is normal because we will keep the data 2 times. When adding another two covering indexes we are doing about 4 times the original amount of writes. There are some discrepancies in the amount of writes, but this is due to the fact that to execute some of the queries we needed to spill to TEMPDB, which gave a larger amount of writes as the amount of indexes increased. But after multiple tests we can see that we will need 4 times the amount of writes purely for the indexes. (DELETE UPDATE & INSERT)

Time consumed

When inserting data we have seen that while inserting into a heap table we have an average of about 1 second, when adding a clustered index this increased to an average of 2-3 seconds. When adding 1 covering index this increased to 4-6 seconds, and with two additional covering indexes the amount of time exploded to a whopping 45 – 50 seconds to insert the same workload.

Memory Used by object

After inserting 100 000 rows into the heap table we have seen that we use 156MB of the memory to facilitate this table, when adding the clustered index this amount stays exactly the same. When adding 1 covering index this doubled, which means we are now keeping 2 copies of the data inside the memory. When adding the 2 additional covering indexes we see that we are keeping the same data 4 times inside our memory.

General Conclusion

We should take care when adding additional indexes, this does not only have an impact on our insert/update/delete performance but this also impacts the amount of memory used for that specific table. We want to be able to get all items out of our cache which will be in nanoseconds instead of milliseconds when retrieving data from disk, so we have to keep in mind that every index we add might take additional space inside your cache, removing other objects out of your cache & slowing them down.

In the example of today we were using covering indexes which is actually saying double my data. Keep in mind however that this post is not a pledge to stop using indexes even covering indexes. Indexes will help your performance incredibly if used in the correct way. As a general recommendation I would say try to keep your indexes as small as possible & only include the fields you need to reduce the overhead of your index.

Thank you for reading and stay tuned!

 

 

 

Tagged , , , , , ,

SQL Server Problem Tombola: MAXDOP 1

As you all may now many vendors force you to put the MAXDOP of your SQL Server to 1 (SAP, SHAREPOINT, AX). This is by design for these application because another MAXDOP setting might interfere with the otherwise “performant” workload of the application. Today I would like to show you what the impact is of setting your MAXDOP to 1 on your server actually is.

To start off I would like to give you a small introduction on what MAXDOP actually does. The MAXDOP setting limits the amount of Cores used by one query in your SQL server instance (On SQL Server 2016 you can do this on database level THANK GOD! J ), for example if you put your MD to 2 then the query receives the option to create a parallel plan with 2 cores to execute a query. The MD setting works in parallel with another setting on instance level: Cost Threshold for Parallelism (default 5). The CTFP is the value you set for your instance to start evaluating the possibility to use a parallel plan if you have MD 2/4/8/…. For example, if your CTFP is set to 60 and your query plan only has a cost of 10 your query will not be evaluated to run as parallel, if then however you have another query which has a cost of 80 the query optimizer will check if it is possible to run this query in parallel (depending on your MD setting).

To show you what the impact to your system is I am going to create an OLTP workload & OLAP workload to my Virtual Machine running on my laptop. The OLTP workload will do update inserts and deletes, the OLAP workload will do some read intensive queries onto the system. The OLTP workload will be small queries with a low cost, the OLAP workload will have more impact and calculations which will make them have a higher cost.

The virtual machine will be configured with 4 cores

To run the workload I will be using SQLQueryStress which can be found at ( https://github.com/ErikEJ/SqlQueryStress ) This will use 1 thread and will do 100 iterations for the two workloads.

I will be running all my test against in memory tables to reduce any disk overhead, and just purely look to the CPU usage.

I will be running 5 tests with following setup.

  1. MAXDOP 1 OLAP & OLTP together
  2. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 1 OLAP & OLTP together
  3. MAXDOP 4 COST THRESHOLD FOR PARALLELISM 1 OLAP & OLTP together
  4. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 6 OLAP & OLTP together
  5. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 6 OLAP & OLTP together

I have altered my CTFP setting to 6 because then only my OLAP queries will be eligible for a parallel plan since they have a plan cost of 7.

Here are the results

Workload

MAXDOP

COST THRESHOLD

CPU/ITERATION

SECONDS/ITERATION

ELAPSED TIME

Client Seconds

LogicalReads

OLAP

1

NONE

0,6654

1,0881

22,305

0,2113

21,4595

OLTP

1

NONE

0,2831

0,5072

27,17

0,2522

NONE

OLAP

2

1

0,6595

0,4862

25,263

0,2301

19,4194

OLTP

2

1

0,2898

0,2495

19,1247

0,1479

NONE

OLAP

4

1

0,5921

0,4178

17,8526

0,1658

25,3333

OLTP

4

1

0,1635

0,1648

13,1542

0,1046

NONE

OLAP

2

6

0,4739

0,3581

17,9613

0,1618

31,7011

OLTP

2

6

0,1929

0,1869

12,7859

0,102

NONE

OLAP

4

6

0,4576

0,3861

21,2969

0,1918

32,5532

OLTP

4

6

0,3335

0,3118

19,9687

0,174

NONE

 

When looking at the full elapsed time we can see that following configuration were the fastest.

Shortest OLTP RUN

12,7859 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Shortest OLAP RUN

17,8526 Seconds

MAXDOP 4

COST THRESHOLD FOR PARALLELISM

1

Shortest Full run

30,7472 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

 

When Looking at the CPU user for every iteration we can see that following configurations used the least amount of CPU

Smallest OLTP CPU/Iteration

0,1929

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Smallest OLAP CPU/Iteration

0,4739

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Smallest FULL CPU/Iteration

0,6668

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

 

When looking at the amount of seconds used by the client for every iteration we see the following test to take the least amount of time

Shortest OLTP Client/Iteration

0,1020 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Shortest OLAP Client/Iteration

0,1618 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Shortest FULL Client/Iteration

0,2638 Seconds

MAXDOP2

COST THRESHOLD FOR PARALLELISM

6

 

When looking at the full elapsed time following configuration were the slowest

Slowest OLTP RUN

27,1700 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

Slowest OLAP RUN

25,2630 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

1

Slowest Full run

49,4750 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

 

When looking at the CPU usage for every iteration we can see that following configuration uses the most of the CPU

Largest OLTP CPU/Iteration

0,3335

MAXDOP 4

COST THRESHOLD FOR PARALLELISM

6

Largest OLAP CPU/Iteration

0,6654

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

Largest FULL CPU/Iteration

0,9485

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

 

When looking at the amount of seconds used by every client for every iteration we see that following test were the slowest.

Slowest OLTP Client/Iteration

0,2522 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

Slowest OLAP Client/Iteration

0,2301 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

1

Slowest FULL Client/Iteration

0,4635 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

 

Conclusion OLTP & OLAP

 

As we can see in the results the MAXDOP 1 setting is the slowest in comparison to all other tests in full. This means that in most cases you will limit the throughput of your SQL Server if you set your MAXDOP to one.

The fastest configuration for my laptop has been proven to be MAXDOP 2 with COST THRESHOLD FOR PARALLELISM 6 for this workload with this hardware configuration. This is however not an exact science; this has to be tested for every system.

The MAXDOP 1 was the slowest because we are limiting our Read intensive workload to use one core for every query it is executing. Which increases the time for every read query in total, this makes that the workload was the slowest.

The fastest configuration is when we allow the Read intensive workload to use 2 cores and limit the write/update/delete workload to 1 core. Because we are running both workloads together with 1 thread, we are letting our read intensive workload use 2 cores and leaving the 2 other cores available for our OLTP workload. Which means that we are getting an ideal throughput for our workload as seen to our hardware.

What with Data warehouses?

Our data warehouse will primarily have a read intensive aka OLAP workload, with one or more daily loads (INSERT/UPDATES/DELETES) happening. This means that we should get the most throughput on our read queries. I have been to some clients who have set their MAXDOP to 1 for their data warehouse this will have a negative effect on the speed with which your queries will execute. I will show you that now by test doing just the OLAP workload without the OLTP workload interfering.

For this I will do 3 tests

  1. 100 Iterations 1 thread OLAP workload MAXDOP 1
  2. 100 Iterations 1 thread OLAP workload MAXDOP 2
  3. 100 Iterations 1 thread OLAP workload MAXDOP 4

These are the results on my laptop

Workload

MAXDOP

CPU ITERATION

Seconds Iteration

Elapsed Time

OLAP

1

0,5297

0,5353

27,5624

OLAP

2

0,3703

0,2058

10,9843

OLAP

4

0,3117

0,1769

9,3593

 

As you can see in the table above the fastest way of doing read queries is by giving the workload as much cores as possible to increase the amount of reads which can be done in the least amount of time.

MAXDOP 1 is the slowest run, with MAXDOP 2 being 17 SECONDS faster. There is not a lot of difference between the run of MAXDOP 4 and MAXDOP 2 because of hardware limitations.

Conclusion OLAP

Because of the Read intended workload of a Data warehouse we should not put our MAXDOP to 1. As seen in the test above the more cores you have the more read throughput you will have which will make your queries in your DWH go faster. In these test we were 60% faster when using multiple cores for our workload instead of one core.

What with pure write workload?

When you are just writing data (UPDATE INSERT DELETE ) I would also like to know what MAXDOP setting would be the best. To test this, I have created 3 test scenarios

  1. 100 Iterations 1 thread OLTP workload MAXDOP 1
  2. 100 Iterations 1 thread OLTP workload MAXDOP 2
  3. 100 Iterations 1 thread OLTP workload MAXDOP 4

These are the results on my laptop

MAXDOP

CPU SECONDS

ACTUAL SECONDS

ELAPSED TIME

CLIENT SECONDS

1

0,0013

0,0048

1,1576

0,0467

2

0,0037

0,0183

1,3125

0,0568

4

0,006

0,0305

1,2988

0,0579

 

Conclusion OLTP

When inserting/updating/Deleting data we will be the fastest when running these queries in in a serial way. When running these processes in parallel we will increase the amount of CPU needed for the query and this will increase the time elapsed. When writing to in memory tables as we are doing in this test we see that writing data in amount of CPU time is 50% faster on MAXDOP 1. This means that going in a serial way will be faster than doing this in a parallel way.

Overall Conclusion

We should use caution when setting our MAXDOP setting, we can see that for the two different workloads we have different needs. The read intensive workload will in most cases benefit from a parallel execution, while the write intensive workload will benefit of a serial execution. Because most systems are not READ or WRITE only, I generally think the way to go is try to filter out those read-intensive high cost plans and allow these plans to make use of parallel execution. Thus limiting the amount of queries to be executed in parallel but not removing the option by default. This way we will not hold back our OLTP queries and we will be able to speed up those queries who can really benefit of the parallel execution. This means you should always use the MAXDOP setting in combination with your COST THRESHOLD FOR PARALLELISM SETTING. There is however no golden bullet for these settings and they will always need to be tested out to cope with the workload of your system.

Thank you for reading and stay tuned!

Tagged , , , , , , ,

Migration Gone Wrong! The Fallback Scenario

Today I’ll be handling an issue I had with the migration of a SQL Server 2000 database to SQL Server 2012. About a week ago we migrated the database from the old and grumpy, and more importantly unsupported windows server 2003 and SQL Server 2005 instance. All was well, and we had no instant issues when users started working with the application. The migration was seamless and we did not expect any future problems, also the upgrade advisor did not give us any errors!

But then lightning struck this morning, somebody who was on a holiday came back into work and started working with the application on the migrated database. But apparently this user was a bit different than the other users who tested. This guy used the application to its full capacity, unravelling a waterfall of syntax errors & DB errors. The impact was uncontrollable so there was only one option, we had to fall back to the old & abandoned SQL Server 2000 instance.

Now here starts the tricky part, as users had been using the migrated database in production for several days, they had uploaded a lot of data which had to be present when we fell back to the SQL Server 2000 instance. But as you all know once a database is on SQL Server 2012, even in lower compatibility mode, it will not restore on a version lower than the running version! It is also not possible to detach and attach the database as there are changes which happened on system table level. Due to these restrictions there is only one option left, scripting out the database and exporting the data to the old SQL Server 2000 database.

I will now continue on explaining how I executed the fall back scenario.

First we will generate the scripts off the database

We choose to script out the entire database & all database objects! Watch out for triggers, these are not scripted out you will have to manually add them later!

Click next to create the scripts and copy them to clipboard/file/new query window whatever has your preference

 

And click next, it will now script out your database.

 

Execute the script on the destination server, here it was a SQL server 2005, you might get some errors for adding members & altering compatibility mode, depending on the SQL server version you are working with.

You can alter the statements to statements that work on your version. Also you might want to set your database owner to sa at this point.

Now that we have our database on our SQL Server 2005 we want to pump the data into it from the new old database J

Go to the database you want to export the data from and click export data.

Enter the information of your source database

Afterwards insert the information of the destination database

 

Then choose to copy data from table

In the next window select all your tables, and then check table by table by clicking edit mappings

 

Don’t forget to put on enable identity insert on the tables!!!

Also if you have foreign keys, don’t forget to disable them when loading your data between the instances, you can easily do that with this command.

— Disable all the constraint in database

EXEC sp_msforeachtable ‘ALTER TABLE ? NOCHECK CONSTRAINT all’

When you have done all the steps above you should be able to export your data to your SQL server 2000 database.

 

After you did the export don’t forget to re-enable your foreign keys

— Enable all the constraint in database

EXEC sp_msforeachtable ‘ALTER TABLE ? WITH CHECK CHECK CONSTRAINT all’

 

In this way we just kind of restored a SQL Server 2012 database on a SQL Server 2005 instance! So the fall back was successful! All data was present and the business was happy, now we can start testing the application again for compatibility with the SQL Server 2012 version which will prove to be a long and though process!

But the application is working again to its full extent, unfortunately it is on a SQL Server 2005 instance with a compatibility mode of SQL Server 2000 with a Windows Server 2003. (unsupportedception J)

 

Hope you enjoyed reading this blog and stay tuned!

Tagged , , , , , , , , , ,

Openrowset across the CMS: The Proxy Way of Life

Today I came across an issue I was having with my CMS.
On my CMS server I have a database which has a few tables with some detail about my server state (Memory/Diskspace/…).
I wrote a few stored procs to gather this data with openrowset in combination with the serverlist that I get from the msdb on the Central Management Server.

When I execute this stored procedure under my windows account in SSMS nothing failed and I was able to get the data across my cms. Now when I tried to automate this, I came across an interesting issue which I would love to share with you guys.

The stored procs that I wrote contains server state information which feed tables of a database on my cms which will feed some reports on a report server. Since I want to have these stored procs run periodically, I decided to execute these stored procs using a sql server agent job.

So I created the job, I executed the job, so far so good. No errors, but when I looked into the data in the tables I saw that the only data that was collected was the data of the CMS and no data was collected from other servers. This is because the SQL Server agent account could not connect to all the servers in the CMS.

How to solve this pickle? I decided to alter my sql statement (execute of my stored proc) to a Powershell statement which will execute my jobs. Using powershell I can use a proxy to execute these statements and use an account with sufficient rights (View server state & rights to the databases you want to read from) on all my servers.

I will continue by explaining how I did this in a few screenshots.

Firstly I had to configure a proxy account.

Go to SQL Server agent and right click Proxies and select New Proxy

1

Enter your information and click OK to create the proxy.
When you have created the proxy go to your job step which needs to execute across your CMS

2
Pick your proxy under run as and then this job step will run under the account you defined under the proxy.

This will solve the problem, and execute the stored proc on your whole CMS or the subset of servers you selected from the msdb of you CMS.

NOTE : Don’t forget to put your SqlCmd.CommandTimeout to the amount of time you need for your query to finish. If you do not do this, he will report the job as successful but will only have gotten the data of the servers for which he had time in the current timeout of the query.

Thanks for reading!

Stay tuned!

Tagged , , , , ,

Loading Data with Dwloader & SSIS for MS PDW

In this video post I will show you how fast you can load data into your PDW with Dwloader & SSIS. This is a demo about my previous posts on DWloader, but also contains the demo for SSIS.

Tagged , , , ,

How to achieve Ultimate PDW load speed (DWLOADER PART 2)

In previous posts I talked about DWLOADER and the distribution & replication theory, today I’m going to explain how you can combine this knowledge to achieve the ultimate PDW load speed with DWLoader.

There are 3 things that determine your load speed

  1. File size
  2. Destination Table Design
  3. Structure of the file

 

I will start off with giving you more information how file size will affect your load speed, f.e. you have got a Fact table that has about 1 000 000 000 rows, if you want to load this file straight into your PDW with DWloader, you will see that your load speed is not getting the results you expect while working with Infiniband. How did I solve this issue? The solution for this decrease in load speed is quite straight forward, you just use a file split program to split your file up in the desired number of rows, I tend to use files which contain 100 000 000 rows. You can achieve this with GSPLIT tool or other. After splitting these files I loaded them in parallel using a BATCH to max out the Infiniband.

Now once you have split up your file we can move on to the second thing that affects your load speed, namely your Table Design and more specifically your choice of Distribution Key. If you want more information on how to select a good Distribution Key I suggest you read my previous post (https://swyssql.wordpress.com/2014/01/21/pdw-distribution-replication/). If you have followed the four steps for selecting your distribution key, your load speed will not be affected by your table design, but when you notice that your load speed is a lot slower than expected it might be interesting to look at the key you chose.

The last and final part that defines your load speed is the structure of your file. As you know PDW achieves incredible results when it comes to processing unstructured data, however when you load data into your PDW that is structured it might cause problems to your load speed. The thing you have to avoid is choosing a distribution key which is structured in your source file. And this brings me to the most important conclusion I made while tuning your DWloader: Sometimes it might be better to load your data into your PDW using a staging table with a different distribution key then your ideal distribution key. Afterwards you can use the CTAS magic to transform your table, to a table with the best queryperforming distribution key. (ELT principle https://swyssql.wordpress.com/2013/12/13/basic-introduction-to-ms-pdw/ ).

If you keep these 3 hints I gave you in mind you should normally achieve your ultimate dwloader speed (1.3GBPS is what we achieved so far J ).

In future posts I will be talking about the Polybase feature from PDW and how you set up your very own Hadoop cluster to combine Big Data using External table commands.

Stay tuned!

Tagged , , ,

PDW Distribution & Replication

Today I want to talk a bit more about Distribution and Replication and how it is used in a PDW. I’m going to start by answering two questions.

What is Distribution?

Because MS PDW is a MPP system (https://swyssql.wordpress.com/2013/12/13/basic-introduction-to-ms-pdw/), it uses different nodes to ensure fast querying! With the Distribution theory we take our large tables and distribute them over the different nodes and thus avoiding the I/O, Ethernet, disk bottleneck.

In a basic Dwh you will choose your Fact Table as your distributed table, in the case of a quarter stack PDW this will mean that your fact table will get distributed using the Distribution key on your 2 datanodes, the control node will contain metadata about the distribution of the fact table. There are cases when you will replicate your Fact table but in these cases your Fact table will be very small.

What is Replication?

With the replication theory we will replicate a certain table to all your nodes to make your queries return results faster. In most datawarehouses this will be your dimension tables, these tables are small compared to the fact table. Because these tables only need a tiny amount of space per node, it is better for your query-JOIN performance to replicate these tables on all your nodes.

Now that we know what distribution & replication is, I’m going to continue by explaining how this affects your table design in PDW. The first thing you will have to do is identify which tables are replicated or distributed, this analysis can be done quite fast by checking the size of these tables in your original system.
Once you have decided which tables are distributed & replicated, you can start on your table design.

Replicated table

I will start off with giving you an example create table statement for a replicated table.

CREATE TABLE DIM_FUNCTION(

[FUNCTION_ID] [int] NOT NULL,

[FUNCTION_NAME] [varchar(30)] NOT NULL,

[FUNCTION_NAME_SHORT] [char(5)] NOT NULL,

)

WITH 

(

DISTRIBUTION = REPLICATE,  

CLUSTERED COLUMNSTORE INDEX 

);

The first part of your create table statement is exactly the same as your statement in a normal DWH, but the WITH section is something new you will have to use. To define this table as a replicated table you put DISTRIBUTION = REPLICATE in your table design and PDW will automatically replicate this table on the different nodes of your PDW.
You can choose to add the Clustered Columnstore Index part of the statement. In my experience your PDW queries will work better with a Clustered Columnstore Index.This is because the data is now stored column per column (versus row by row) and also compressed in an efficient way, this means if your queries are only doing a SELECT f.e. SELECT LASTNAME,FIRSTNAME you will not read the jobname or function or … . This means you will read less off the disk, this will make your queries go a lot faster!

That’s all there is to creating a replicated table, I will continue now with the Distributed table which is a bit more complex.

Distributed table

Again I will start by giving you an example statement for a distributed table.

CREATE TABLE [FACT](

[file_id] [int] NOT NULL,

[Function] [int] NOT NULL,

[Place] [int] NOT NULL,

)

WITH
(
DISTRIBUTION = HASH (file_id), 
CLUSTERED COLUMNSTORE INDEX
);

As said in the Replication table part, the first part of your statement is exactly the same like a normal SQL server. But the WITH part is something new! You see that this time in the DISTRIBUTION we are hashing some sort of key, this key is your Distribution Key and this is the most important thing you will configure in your database, the right choice of key will result in lightning fast querying!

How do you choose your Distribution Key?

When I choose my distribution key I will analyze my all my columns by asking 4 questions.

  1. Is this column being used in a lot of my queries?
    1. If your column is being used for many joins, then it might be good to consider this column as a distribution key. The most intensive process in your PDW while querying will be the Data Movement Shuffle (Recreating a distributed table on a single node to make a join compatible) aka DMS, this DMS will significantly slow down your queries. If you choose a column which is used in a lot of queries this will avoid incompatible joins, and ensure an excellent query performance.
  2. Does this column have a High Cardinality?
    1. This question is actually the question how to avoid SKEW, skew is something that appears when you distributed your table on a wrong column. If this column contains a lot of NULL values & does not have a lot of distinct values, your system will not be able to highly distribute your data over your different nodes. This will result in one node having a lot of data and the other node having just a small amount of data. When this happens you actually only use One node of your MPP system and I think I don’t have to explain why this is bad for your query performance.
  3. Is this column compatible with my Aggregates?
    1. If your column is compatible with your aggregates this will improve your query performance.
  4. Will this column ensure a nice balance between workload & query behavior per node?
    1. This is what you ask yourself after the 3 previous questions and then you can determine what the best distribution key is for your DWH, I tend to use a point system which I give to my different columns to compare them so I can easily choose the most logical distribution key.

In future posts I will be explaining the relation between your table design and your load speed (DWLOADER PART 2), and how you will tune queries until you have maximum performance.

Stay tuned!

Tagged , , , , , ,

Load your data into your PDW (part 1 the DWLOADER)

Today I will enlighten you about the incredible fast load times which you will get with a Microsoft Parallel Data Warehouse!

The first thing you have to do when you work with a MSPDW is load your data into the PDW. This initial load will be the most “time consuming” so with this post I hope to help you avoid common mistakes which I made myself with the DWLOADER statement.

The first way to insert data is by developing SSIS packages and then inserting the data into the PDW using the package, I will not be going in deeper on this subject since I prefer working with the second way.

The second way is an exclusive feature for PDW: DWLOADER.EXE. This Command Line based loading tool will ensure the fastest loading times using the infiniband connection. There are many properties to configure DWLOADER, and in this post I will explain how changing these properties will make your load a success and affect your load speeds.

I will start off with giving you guys a sample dwloader statement and from this statement I’ll explain the different values you can configure to increase your loading speed, and I’ll explain the key properties of the DWLOADER (without these properties your load will fail!)

  •  dwloader.exe -M fastappend -m -b 2000000 -i FILEPATH\FILE -T TABLE -R FILEPATH\FILENAME.bad -t \t -r \r\n -dt FILEPATH\dateformatDR.txt -E -S 172.16.254.5 -U BatchUser -P B@tch1234 > “FILEPATH\DWLOADLOGvx.LOG”

  1. Dwloader.exe : this is used to call the dwloader program
  2. –M Fastappend: using fastappend will increase your loadrate significantly! Other options are Append Upsert & reload.
  3. –b: this will define after how many rows you want to insert this load in the columnstore table. It is better to split this up using a custom batch size to avoid tupple mover(tupple mover = stores data in rowstore to dump them afterwards in columnstore) . So it might be interesting to tweak your batch size so that you don’t trigger this rowstore and directly insert into the columnstore.

  4. –i: The path of the file you want to load into your PDW (Best on a location which is connected to the Infiniband to avoid the bottleneck of your ethernet connection)
  5. –T: Is the destination table created in your PDW in which DWLOADER will insert the data.
  6. –R: is the destinationpath were you will keep your log file if something goes wrong with your load (Values that can’t be loaded f.e. wrong dateformat)
  7. –t: is the delimiter which is used in the file (in the case of the sample statement above this is /t which means it’s a tab delimited file)
  8. \r: will define the end of your row. This is defined by \n in the example file which is enter.
  9. –D: If you don’t have multiple dateformats in your file, you can define the default format for your date fields by using the –D property f.e. –D “DD-mm-YYYY”
  10. –dt: If you have multiple date formats in your file you will have to add the format for every datefield in a txt file in which you put the name of the field followed by the format it uses.
    f.e. Birthdate= yyyy-MM-dd
    Startdate= yyyy-MM-dd HH:mm:ss.fff

  11. –S: will define which connection will be used to transfer the data from your flat file to your PDW, be sure to check the IP’s of your Ethernet & Infiniband connection. Using the IP-address of your Infiniband will ensure much faster load speeds
  12. –E: if you have empty values in your Flat File you will have to use –E which will convert these empty values to NULL. Without this property your load will fail!
  13. –U & -P: will define your username and password for your PDW
  14. For additional logging you will have to use > this way you can monitor your load speed a lot better and spot problems a lot faster!

This was an example of a sample DWLOADER statement, however this statement does not guarantee the best loading speed. This is defined by a combination of your dwloader configuration / flat file configuration / Table design. In my next post I will be going in deeper on table design in a PDW explaining the Distribution & Replication theory and how this will affect your load speed.

Tagged , , ,