SQL Server Agent for Azure SQL Database, Azure Elastic Database Pools & Azure Managed Instance

As you all know, the SQL Server agent is not available in the Azure SQL Database & Elastic database pool. When Microsoft introduced the Azure managed instance, the agent was also included in the Managed instance.
Now most vendors I work with, when they want to deploy their database, they also request a few jobs to be created which will execute various tasks at specific moments during the day. Today I will show you what option you have when you have to do those tasks with Azure SQL database, Azure Elastic Database Pools & Azure Managed Instance.

The main objective I would like to achieve is execute following query on a regular interval of 15 minutes.

declare @orderid int (select top 1 OrderID from dbo.Sales where payed = 0);
update dbo.Sales
set payed = 1
where OrderID = @orderid;

Scenario 1: Azure SQL Database & Elastic Database Pools

As you know, the SQL server agent is not available in Azure SQL Database & Elastic Database Pools, so we will have to use other components inside the azure stack to get this to work.

The option I choose and will show you is executing this command using Azure Data Factory

I start by creating an Azure data factory in my azure Portal, this is quite straightforward you can find a tutorial at https://docs.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-portal

The thing I will need to create is called a pipeline in ADF, in the GUI on the Azure portal you can click on create pipeline.

After clicking this we get the following screen

When looking at the possible activities, we don’t see a SQLCMD, but under general we can find stored procedure.

As we are SQL people, turning this SQL command into a Stored Procedure is not the hardest thing to do.

CREATE PROCEDURE dbo.UpdateSales
AS
BEGIN
declare @orderid int (select top 1 OrderID from dbo.Sales where payed = 0);
update dbo.Sales
set payed = 1
where OrderID = @orderid;
END

With the stored procedure created, we can now start configuring the pipeline & stored procedure component.

We start by naming the pipeline to something more usefull then pipeline1, let’s call it Update Sales

We then drag the Stored Procedure into the pipeline to the right

We then give the stored procedure component a name, let’s call it exec UpdateSales, and then go to the SQL Account tab

We click on the +new button and we get following screen

We fill in the name, select the type, our Azure SQL Database & the Server name where the database is hosted on.

We choose the DBName & SQL Authentication and give in the SQL user which is going to execute the stored procedure. We test and click finish

After creating the Linked Service, we go to the tab of Stored Procedure and select the stored procedure we want to execute, which is dbo.UpdateSales in our case.

After this our Pipeline is ready, we press validate & publish all.

Now the thing we need to do is schedule this task to run every 15 minutes, in ADF this can be done using a trigger. We open up the pipeline and click on trigger.

And choose new/ edit, we click on new in the following screen.

We fill in the name & fill in the start date in UTC , recurrence every minute and then say every 15 minutes. We want this to run indefinitely so we choose no end date.

We click next to continue

We then click finish & publish the changes

We then wait for the trigger to finish in its schedule & check the Monitor tab on the left, as we can see the stored procedure executed successfully and ran in 8 seconds.

So it is possible to create recurring tasks like in an agent using Azure data factory.

But when presenting the solution to our vendor, he says we will deploy multiple new jobs every month & we only know how to do it to a SQL Server agent & we do not want to do double coding just because you want to use azure SQL databases. Is there another option you can present me where I do not have to change a thing & still be in the cloud? Yes, there is: Azure Managed Instance

Scenario 2: Azure Managed Instance

I start off by connecting to my Azure Managed instance

As you can see we have an agent available here, so we can just create the job with the SQL Command in there

We then schedule the job for every 15 minutes & this all works like an on premise SQL Server.

This means our vendor does not have to change a thing, which means the vendor is now happy & comfortable to move to the Cloud!

Conclusion

We can set up a sort of Agent using Azure Data Factory for Azure SQL Databases & Azure Elastic Database Pools. We can do this by using a pipeline & a trigger with the Stored Procedure Component. This is a workaround but works fine, however it will require some redevelopment from the vendor.
Using managed instances, we do not have to change anything to the code of the vendor and are able to run the jobs on the Managed instance in the cloud.

Next post will be on SSIS packaged and how to handle them with Azure SQL Databases, Azure Elastic Database Pools & Azure Managed instances.

Stay Tuned!

Advertisements
Tagged , , , , , ,

Cross Database Queries in Azure SQL Database, Elastic Database Pool & Managed Instance

When suggesting to move third party applications to the cloud, we generally get a pessimistic response from the vendor. The vendors I work with, usually have a lot of old legacy code in their databases which is holding back the move to the PAAS services. One of those pitfalls for a successful move to Azure is cross database queries.

Today I will give you a scenario of a vendor, who has 2 databases who are interconnected through cross database queries. For the sake of the demo’s I have created 2 azure SQL databases, 2 databases in an azure elastic database pool & 2 databases on an Azure Managed instance.

The query I would like to be able to execute is the following

select
*
from ssd.Orders o
inner join RealtimeAnalyticsCrossQuery.dbo.Customer c on o.CustomerPersonID = c.CustomerID

Let’s see what our options are to get the same results as the on premise solution.

Scenario 1: Azure SQL Database & Elastic Database Pools

When I try to execute the query on an azure sql database server which has the following databases in it

select
*
from ssd.Orders o
inner join RealtimeAnalyticsCrossQuery.dbo.Customer c on o.CustomerPersonID = c.CustomerID

I get the following error

Msg 40515, Level 15, State 1, Line 19
Reference to database and/or server name in ‘RealtimeAnalyticsCrossQuery.dbo.Customer’ is not supported in this version of SQL Server.

The message is quite straightforward; we cannot use cross database queries even if they are on the same server. It is not possible to get around this, so to get these queries to work we will have to rewrite the query.

First thing I tried was using openrowset.

SELECT
*
FROM ssd.Orders o
inner join OPENROWSET(‘SQLNCLI11’,
‘SERVER=stijnpassdemo.database.windows.net;DATABASE=RealtimeAnalyticsCrossQuery;Uid=SQLStijn;Pwd=”*******”;’,‘select * from dbo.Customer’)
on o.CustomerPersonID and c.CustomerID

When executing this query, I get a different error

Msg 40526, Level 16, State 1, Line 5
‘OPENROWSET’ rowset provider not supported in this version of SQL Server.

Message is straightforward, not supported!

Another option is to try and create a synonym for this database we get the following error

CREATE SYNONYM CrossQueryCustomer
FOR RealtimeAnalyticsCrossQuery.dbo.Customer

Then we get the following error

Msg 40515, Level 15, State 1, Line 19
Reference to database and/or server name in ‘RealtimeAnalyticsCrossQuery.dbo.Customer’ is not supported in this version of SQL Server.

When looking in the documentation and as the error clearly says: it is not supported (More Information: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-synonym-transact-sql?view=sql-server-2017)

Windows Azure SQL Database supports the three-part name format database_name.[schema_name].object_name when the database_name is the current database or the database_name is tempdb and the object_name starts with #.

To get this to work we should use an external data source, let’s try to set it up and see if it works

To create an external data source, we must start off with creating a database master key.

CREATE MASTER KEY — Password is optional

When using Azure SQL Databases, we do not have to give a password. (More information: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-master-key-transact-sql?view=sql-server-2017)

After creating the database master key, we will need a database scoped credential which will be used to create the external data source. (More information: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql?view=sql-server-2017)

CREATE DATABASE SCOPED CREDENTIAL [SQLStijn]
WITH IDENTITY ‘SQLStijn’,
SECRET ‘*********’

After creating this database scoped credential, we are able to create the external data source. (More information: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql?view=sql-server-2017)

CREATE EXTERNAL DATA SOURCE dsCrossQuery
WITH (TYPE RDBMS,
      LOCATION ‘stijnpassdemo.database.windows.net’,
      DATABASE_NAME ‘RealtimeAnalyticsCrossQuery’,
      CREDENTIAL = SQLStijn
     )

After creating the external data source, we are able to create an external table, which then can be queried by aliasing it in the query I wanted to execute. I choose to create a new schema to be able to differentiate external tables from tables on my current azure database. (More Information: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017 )

CREATE SCHEMA [CROSSQ]
GO
CREATE EXTERNAL TABLE [CROSSQ].[Customer](
[CustomerID] [int] NOT NULL,
[FirstName] [nvarchar](50NOT NULL,
[MiddleName] [nvarchar](50NULL,
[LastName] [nvarchar](50NOT NULL
)
WITH
(
    DATA_SOURCE = dsCrossQuery,
    SCHEMA_NAME ‘dbo’,
    OBJECT_NAME ‘Customer’
);

After creating the external table, we can alter the syntax of our query which we want to execute and see if we can now query data from the other SQL database.

select
    *
from ssd.Orders o
inner join CROSSQ.Customer c on o.CustomerPersonID = c.CustomerID

When we execute this query, we get our results and are able to do a cross database query using the external table.

Following solution also works for Elastic Database Pools.

However, our third party application vendor is not pleased, he will have to rework all his code to get his database to run on Azure SQL Databases. This will take a few years to develop, which means we will have to keep the on premise servers for those application until the code change, if the code change ever comes. Is there another option we have? Yes, there is, Azure Managed Instances.

Scenario 2: Azure Managed Instance

I’ve set up two databases on my azure managed instance: RealtimeAnalytics & RealtimeAnalyticsCrossQuery.

Let’s see if I can execute the query.

select
*
from ssd.Orders o
inner join RealtimeAnalyticsCrossQuery.dbo.Customer c on o.CustomerPersonID = c.CustomerID

As this is a full instance in the cloud, cross database queries are supported, so the results are there!!!

Awesome! This means we can easily move those legacy cross database query applications to the Azure managed instance without having them chance the code. This greatly increases the chances of moving databases to the cloud.

In the next blogpost, I will be handling SQL Server agent jobs & how to deal with them on Azure SQL Databases & Managed Instances.

Stay Tuned

Tagged , , , , , ,

Speaking Engagements: Past & Upcoming

The last few weeks have been quite hectic in the sense of speaking engagements, here is a recap on those that have already passed and a preview to the upcoming ones.

On the 16th of September I went to Toulouse for the SQL Saturday there, the event is organized by Jean-Pierre Riehl and the French User group called GUSS. Toulouse is a nice city in the south of France, with lots of historical buildings and nice restaurants. Normally that time of the year it should be warm there, we however had a lot of rain that weekend. But as this was a SQL Saturday, it was the perfect weather to be inside. The event itself was in a building for startups in the center of Toulouse called La Cantine, with 3 different rooms. Because this is an event for the French speaking community there was a mixture of French sessions & English sessions, since my French was/is not that good I gave my session on Real-time Analytics in English. Apart from giving my own session I saw a very interesting session from Wolfgang Strasser on Scale out deployment for SSIS. It was a real pleasure being in Toulouse for the SQL Saturday, big thanks to Jean-Pierre for having me and I hope to be back again next year. (Event Details : http://www.sqlsaturday.com/658/eventhome.aspx )

Then the next weekend it was time for SQL Saturday Gothenburg, on the 23th of September Swedish data enthusiasts gathered in the venue building called the Lindholmen Science Park. This was my second year speaking at SQL Saturday Gothenburg, and was very happy to be back again this year. Mikael Wedham organizes this event every year together with his family, and they do a very good job! The city of Gothenburg is a very nice town with a lot of stores, that is why my wife decided to join me this year! I delivered a session called SQL Server Problem Tombola, which is a session with a bingo machine containing ten common & uncommon problems with SQL Server. Every attendee writes down three numbers between 1 and 10, the first one to cross of the three numbers shouts bingo and wins a box of Belgian Chocolates. After delivering my own session, I went and saw the session of Alexander Klein on cognitive services which was very nice to see, and has shown me the incredible possibilities using chatbots & facial API’s. In the afternoon I was baffled by the session of Rob Sewell on pester, this is a must see session if you are managing a lot of SQL Servers. (Event Details: http://www.sqlsaturday.com/657/eventhome.aspx )

The week after Gothenburg I went to an event called SQLRelay, this is an actual road trip through the UK, doing a conference in 5 cities on five separate days. The events took place in Reading, Nottingham, Leeds, Birmingham & Bristol. I joined the tour this year from Tuesday until Friday, delivering my session on Real-time analytics using Filtered column store indexes. It is amazing how the guys involved set up an entire event every day in an hour time in the morning. Alex Whittles, Alex Yates, Rob Sewell, Paul Andrew and many more are doing the amazing yet crazy job of organizing and orchestrating this event, while also presenting themselves! During these 5 events they reach about 1200+ SQL Server professionals in the UK, with various national and International speakers. I saw a lot of sessions during the week apart from giving my own and they were all of excellent quality! If you are based in the UK and have never been to the SQL Relay event, I would highly recommend it! As a speaker, it is an exhausting week, but ever so satisfying. You get to meet an amazing bunch of people & get to join on the SQLTour Van. If you ever wanted to feel like a rock star going from city to city and presenting this is the must submit to event!
(Event Details : https://www.sqlrelay.co.uk/ )

It was an intense month, but the people I’ve met & the things I’ve learned are priceless.

Now what are the upcoming speaking engagements?

First off, I will be speaking at Pass Summit this year! This is the biggest event in the Data Platform world with 4000+ attendees every year. The event is being held in Seattle, which is a big city situated in the Washington State of the United States of America. This is my first time going to the Pass summit, I was going to attend this year but thought let’s submit and see what will happen. I was very surprised to have been selected to speak at an event next to heavy weight speakers like Brent Ozar, Glenn Berry, Itzik Ben-Gan, Thomas Larock and many more. I am very excited to go there for the first time, and equally nervous to speak at such an event! If you are interested in following my session you can find me on Friday, the 3th of November in the room Tahoma 2. More details on http://www.pass.org/summit/2017/Sessions/Details.aspx?sid=65654

On the 9th of December, I am going back to SQL Saturday Slovenia! This event is organized by Dejan Sarka, Matija Lah, Mladen Prajdic & Tomaž Kastrun. This is the usually the last SQL Saturday of the year, but these guys definitely make it worth your while. I will be delivering a new session there on how to use column store indexes in a data warehouse environment (more details : http://www.sqlsaturday.com/687/Sessions/Details.aspx?sid=67830). The sessions selected this year are amazing and I’m finding it hard to figure out what I want to see, because I would like to see them all. Two sessions I will definitely attend are Crazy Data Science by Enrico & Tomaz and A game of Hierarchies: Introduction to Graph databases by Markus, because I want to get to know more about graph, and the title Crazy Data Science just instantly captivated me! If you would like to join us this year, don’t forget to register fast, because the event tends to sell out! (Event Details : http://www.sqlsaturday.com/687/eventhome.aspx)

The last planned event, is happening between the 12th and the 14th of February in NEW-ZEALAND! The event is called Difinity and is organized by Reza Rad & Leila Etaati, it is a Data Platform conference with about 50 sessions being presented in Auckland. I will be presenting two sessions at the conference, first off my Real-Time Analytics Session and secondly my session on how to use column store indexes in a Data warehouse environment. There are a lot of great sessions there delivered by High quality speakers like Pinal Dave, Denny Cherry, Reza Rad and many more. I am really looking forward to presenting at the conference and to visit New-Zealand for the first time, which is an amazing country. If you would like to register for the event you can find the details on http://difinity.co.nz/ . Hope to see you there!

That is what is planned for the next months, but I hope to able to speak at more events in 2018!

Thank you for reading & Stay Tuned

SQL Server Problem Tombola: Giant Reads

A SQL Server is as fast as you tune your workload, one of the things I try to do is tune the workload aka the queries executing on my SQL server in order to get a better performing server. I will start to look for problems with certain queries who do too many reads to retrieve a small result set. A problem however is when working with vendor applications, you might not be able to tune the applications queries. This limits your options but there are a lot of thing you can still do to optimize your workload.

Today we will do some testing with a giant read query and tune it in different ways by using real life scenarios

First of all, we are going to create a database and create two tables on it. We will be executing the scripts on SQL Server 2016 so some syntax might have to be changed when working with lower versions.

DROP
DATABASE
IF
EXISTS BigRead

GO

CREATE
DATABASE BigRead

GO

USE BigRead

GO

CREATE
TABLE dbo.BigReadTable_1

(

    BID int
identity (-2147483648 ,1),

    TheDate datetime,

    TheCategory varchar(20),

    TheLink int

)

GO

CREATE
TABLE dbo.BigReadTable_2

(

    BID_2 int
identity(-2147483648,1),

    TheDate datetime,

    TheCategory varchar(20),

    TheValue decimal(18,0)

)

GO

Once we have these table we will insert some data into it

— INSERT 1000 ROWS

SET
NOCOUNT
ON

insert
into dbo.BigReadTable_2

select
DATEADD(day,ABS(Checksum(NewID())
% 10),getdate()),
‘Category ‘+cast(ABS(Checksum(NewID())
% 10)
as
varchar),
cast((rand()*100000)
as
decimal(18,0))

GO 1000

 

insert
into dbo.BigReadTable_1

select
DATEADD(day,ABS(Checksum(NewID())
% 10),getdate()),‘Category ‘+cast(ABS(Checksum(NewID())
% 10)
as
varchar),(select
top 1 BID_2 from dbo.BigReadTable_2 order
by
newid())

GO 1000

 

insert
into dbo.BigReadTable_1

select TheDate,TheCategory,TheLink from dbo.BigReadTable_1

GO 9

 

insert
into dbo.BigReadTable_2

select [TheDate],[TheCategory],[TheValue] FROM dbo.BigReadTable_2

GO 9

After inserting the rows we will add a clustered index for each table

CREATE
CLUSTERED
INDEX CI_1 ON dbo.BIGREADTABLE_2

(

    BID_2

)

 

GO

 

CREATE
CLUSTERED
INDEX CI_2 ON dbo.BIGREADTABLE_1

(

    TheLink

)

Now we are all set up to start our testing. The query we are going to use is the following

select
*
from dbo.BigReadTable_2 C

inner
join

(

    select
AVG(TheValue) Average,TheCategory from dbo.BigReadTable_2

    group
by TheCategory

)B on C.TheCategory = B.TheCategory

inner
join

(

    select
SUM(TheValue) Summer, TheCategory from dbo.BigReadTable_2

    GROUP
BY TheCategory

)A on A.TheCategory = B.TheCategory

inner
join

    (select
CAST(AVG(CAST(TheDate AS
FLOAT))
AS
DATETIME) AVGDate,TheLink

    FROM dbo.BigReadTable_1

    Group
By TheLink) D

on D.TheLink = C.BID_2

OPTION(RECOMPILE)

This is the plan we get with this query at this moment

And these are the client statistics

Now what are the different levels of tuning possible for this query?

  1. We are able to edit code
  2. We are not able to edit code
  3. We are not able to edit code & cannot add indexes

We are able to edit code

As we can see this query is not the most optimal in filtering out data. In this case our tuning might be quite simple. If you can discuss with your business users that you can only return the fields they will use you will already see a performance increase. Let’s say for this example we only need the data for the BigReadTable_2

—LEAVE OUT SOME FIELDS

select B.*,A.*,D.*,C.TheCategory from dbo.BigReadTable_2 C

inner
join

(

    select
AVG(TheValue) Average,TheCategory from dbo.BigReadTable_2

    group
by TheCategory

)B on C.TheCategory = B.TheCategory

inner
join

(

    select
SUM(TheValue) Summer, TheCategory from dbo.BigReadTable_2

    GROUP
BY TheCategory

)A on A.TheCategory = B.TheCategory

inner
join

    (select
CAST(AVG(CAST(TheDate AS
FLOAT))
AS
DATETIME) AVGDate,TheLink

    FROM dbo.BigReadTable_1

    Group
By TheLink) D

on D.TheLink = C.BID_2

OPTION(RECOMPILE)

Here are the result for the client statistics ( the plan will stay the same )

We see that we are reducing the resources that we used when selecting all columns, so without adding additional indexes or filtering data we have tuned the amount of bytes received by 25%. This shows you how important it is to just select the fields you need instead of selecting everything.

There are many different ways of altering the code, but we will not cover all of these.

We are not able to edit the code

Now things are getting a little trickier, you no longer have control over your query. Good news though, we can still add indexes. So what can we do for our query

In SQL Server 2016 I’m using a new cool trick which can be used even in the express edition. The trick is adding an empty non-clustered columnstore index to the table to benefit from batch mode (for more information check out Niko’s blog on http://www.nikoport.com/ or check this post which explains the trick more thoroughly (http://sqlmag.com/sql-server/what-you-need-know-about-batch-mode-window-aggregate-operator-sql-server-2016-part-1 )

Let’s add 2 non-clustered columnstores

— No Possible Index NO Code edit

DROP
INDEX
IF
EXISTS NCCI_PERF ON dbo.BigReadTable_2

GO

CREATE
NONCLUSTERED
COLUMNSTORE
INDEX NCCI_PERF ON d    bo.BigReadTable_2

(

BID_2, TheDate, TheCategory, TheValue

)

WHERE BID_2 =
1 and BID_2 =
2

GO

DROP
INDEX
IF
EXISTS NCCI_PERF2 ON dbo.BigReadTable_1

GO

CREATE
NONCLUSTERED
COLUMNSTORE
INDEX NCCI_PERF2 ON dbo.BigReadTable_1

(

BID, TheDate, TheCategory, TheLink

)

WHERE BID =
1 and BID =
2

GO

Here is what we get as a result for our client statistics

We get the same plan, but we are using batch mode which means processing 1000 rows at a time instead of 1 row at a time.

By adding these 2 empty indexes we are reducing our execution time by 50%! Great result I feel for adding something empty!

No Code Edit and No Additional Indexes allowed

This is just annoying, almost all tools have been taken from you! But you can still tune the query! Mostly these queries would have the highest cost on your server. So by altering your cost threshold for parallelism you might just force those queries to go into parallel execution speeding them up!

Let’s check the cost of the plan:

The plan has a cost of 15, so let’s lower the value for my cost threshold for parallelism to 14 (current value = 50) and see what result we get.

As you can see we lowered the total execution time of our query by 30-40% by altering a server setting! If we look at the plan

you can see we get the benefit from the parallel execution this specific query benefits from the parallel execution! This however can also have some downsides!

Conclusion

When dealing with Big read queries you got a few tricks up your sleeve when working with SQL Server. In most cases you have 3 different scenario’s:
1. You can change the code
2. You can’t change the code but can add indexes
3. You can’t change the code and can’t add indexes

For the first scenario’s things are “easy”, you have multiple ways of tuning your query and only one was shown here. For the second scenario things get more complicated, you might be able to add an index to gain a seek, but a golden bullet for big reads does not exist. The COLUMNSTORE trick does help in most cases, but you will have to evaluate this for yourself. Then in the last scenario things are the trickiest, you can alter your server settings, but this might come at a price. In the example given today, the parallel plan was beneficial for this query, but it might slow down other processes or even overload your processor slowing everything down.

Thank you for reading & stay tuned!

 

 

Tagged , , , , , ,

SQL Server Problem Tombola: Too Many Indexes

When auditing SQL Server databases one of the more common problems I see is the fact that people are adding too many indexes on their tables. This has a significant performance impact on your SQL Server, first off your data inserts will be slower because for every update/delete/insert because we will have to maintain all indexes added for that specific table. But this also has an impact on your read performance, as we all now SQL Server is pretty fast when it comes to reading data out of memory. However, when you have a lot of indexes the data for these indexes also needs to be stored inside your memory, which will increase the amount of memory needed to have this full table in memory. If the data has to be fetched from disk, we will see a performance impact. While indexes are created to increase read performance, one might hurt their read performance when adding indexes while their system is in memory pressure.

To show you what the actual effects of having too many indexes on your table is I have created a few tests.

  1. We are going to test how fast we can insert/Update/Delete data while having different amount of indexes to our table (1 index, 2 indexes, 4 indexes)
    The indexes will be the biggest kind AKA Covering indexes
  2. We are going to test how much memory we are using extra for each index we add.

 

Inserts, Updates and Deletes

I have created a script which is going to insert 100 000 rows into a table

CREATE
TABLE IndexMania

(

IndexManiaID int
identity (1,1),

IndexColumn1 datetime,

IndexColumn2 char(1000),

IndexColumn3 char(500),

IndexColumn4 decimal (18,2),

IndexColumn5 int

)

You can see that I added a char(    1000) & char (500) field to make sure I get a good amount of data to be inserted. If my data inserts are too small SQL Server will make use of the lazy writer and not show any writes with the insert statement. By increasing the size of one row & importing 100 000 rows I will bypass this issue. I am also clearing the SQL Server memory before every statement.

The indexes I created are the following

— CLUSTERED INDEX ON ID

CREATE
UNIQUE
CLUSTERED
INDEX [CI_IndexMania1] ON [dbo].[IndexMania]

(

    [IndexManiaID] ASC

)

— CREATE COVERING DATE INDEX

CREATE
NONCLUSTERED
INDEX [NCI_1] ON [dbo].[IndexMania]

(

    [IndexColumn1] ASC

)

INCLUDE (     [IndexManiaID],

    [IndexColumn2],

    [IndexColumn3],

    [IndexColumn4],

    [IndexColumn5])

GO

— 2 Additional NCI

CREATE
NONCLUSTERED
INDEX [NCI_2] ON [dbo].[IndexMania]

(

    [IndexColumn2] ASC,

    [IndexColumn3] ASC

)

INCLUDE (     [IndexManiaID],

    [IndexColumn1],

    [IndexColumn4],

    [IndexColumn5])

CREATE
NONCLUSTERED
INDEX [NCI_3] ON [dbo].[IndexMania]

(

    [IndexColumn4] ASC,

    [IndexColumn3] ASC

)

INCLUDE (     [IndexManiaID],

    [IndexColumn1],

    [IndexColumn2],

    [IndexColumn5])

 

The results

HEAP TABLE
Amount of writes (about the same for Delete insert and update)

total_logical_writes

20020

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

 

HEAP

20001

156

Time statistics

INSERT

SQL Server Execution Times:

CPU time = 688 ms, elapsed time = 1299 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 19 ms.

UPDATE

SQL Server Execution Times:

CPU time = 407 ms, elapsed time = 623 ms.

SQL Server parse and compile time:

CPU time = 15 ms, elapsed time = 20 ms.
DELETE

SQL Server Execution Times:

CPU time = 468 ms, elapsed time = 1129 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Clustered Index Table
Amount of writes (about the same for Delete insert and update)

total_logical_writes

20040

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

CI_IndexMania1

CLUSTERED

20082

156

INSERT

SQL Server Execution Times:

CPU time = 1171 ms, elapsed time = 2745 ms.

SQL Server parse and compile time:

CPU time = 16 ms, elapsed time = 25 ms.
UPDATE

SQL Server Execution Times:

CPU time = 297 ms, elapsed time = 550 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 20 ms.
DELETE

SQL Server Execution Times:

CPU time = 734 ms, elapsed time = 3201 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Clustered Index & 1 Covering index

Amount of writes (about the same for Delete insert and update)

total_logical_writes

40694

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

NCI_1

NONCLUSTERED

20131

157

IndexMania

USER_TABLE

CI_IndexMania1

CLUSTERED

20083

156

TOTAL

     

40214

313

INSERT

SQL Server Execution Times:

CPU time = 1953 ms, elapsed time = 4575 ms.

SQL Server parse and compile time:

CPU time = 16 ms, elapsed time = 32 ms.
UPDATE

SQL Server Execution Times:

CPU time = 2375 ms, elapsed time = 5780 ms.

SQL Server parse and compile time:

CPU time = 16 ms, elapsed time = 23 ms.
DELETE

SQL Server Execution Times:

CPU time = 1718 ms, elapsed time = 7761 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Clustered Index & 3 Covering Indexes

Amount of writes

Insert Writes

Update Writes

Delete Writes

124154

182542

125677

Memory Usage

Object

Type

Index

Index_Type

buffer_pages

buffer_mb

IndexMania

USER_TABLE

NCI_2

NONCLUSTERED

40275

314

IndexMania

USER_TABLE

NCI_3

NONCLUSTERED

23331

182

IndexMania

USER_TABLE

NCI_1

NONCLUSTERED

20145

157

IndexMania

USER_TABLE

CI_IndexMania1

CLUSTERED

20083

156

Total

     

103834

809

INSERT

SQL Server Execution Times:

CPU time = 42468 ms, elapsed time = 47094 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 21 ms.

UPDATE

SQL Server Execution Times:

CPU time = 108063 ms, elapsed time = 125766 ms.

SQL Server parse and compile time:

CPU time = 10 ms, elapsed time = 10 ms.

DELETE

SQL Server Execution Times:

CPU time = 54922 ms, elapsed time = 73162 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Conclusion

As seen in our test, we can conclude that each added index has an impact on our write performance & on the amount of data stored in memory to facilitate the index.

Amount of writes

When inserting data to a heap table we needed about 20020 writes to insert 100 000 records, there was almost no difference with the amount of writes while writing to a clustered index. When adding a covering index, the amount of writes doubled, which is normal because we will keep the data 2 times. When adding another two covering indexes we are doing about 4 times the original amount of writes. There are some discrepancies in the amount of writes, but this is due to the fact that to execute some of the queries we needed to spill to TEMPDB, which gave a larger amount of writes as the amount of indexes increased. But after multiple tests we can see that we will need 4 times the amount of writes purely for the indexes. (DELETE UPDATE & INSERT)

Time consumed

When inserting data we have seen that while inserting into a heap table we have an average of about 1 second, when adding a clustered index this increased to an average of 2-3 seconds. When adding 1 covering index this increased to 4-6 seconds, and with two additional covering indexes the amount of time exploded to a whopping 45 – 50 seconds to insert the same workload.

Memory Used by object

After inserting 100 000 rows into the heap table we have seen that we use 156MB of the memory to facilitate this table, when adding the clustered index this amount stays exactly the same. When adding 1 covering index this doubled, which means we are now keeping 2 copies of the data inside the memory. When adding the 2 additional covering indexes we see that we are keeping the same data 4 times inside our memory.

General Conclusion

We should take care when adding additional indexes, this does not only have an impact on our insert/update/delete performance but this also impacts the amount of memory used for that specific table. We want to be able to get all items out of our cache which will be in nanoseconds instead of milliseconds when retrieving data from disk, so we have to keep in mind that every index we add might take additional space inside your cache, removing other objects out of your cache & slowing them down.

In the example of today we were using covering indexes which is actually saying double my data. Keep in mind however that this post is not a pledge to stop using indexes even covering indexes. Indexes will help your performance incredibly if used in the correct way. As a general recommendation I would say try to keep your indexes as small as possible & only include the fields you need to reduce the overhead of your index.

Thank you for reading and stay tuned!

 

 

 

Tagged , , , , , ,

SQL Server Problem Tombola: MAXDOP 1

As you all may now many vendors force you to put the MAXDOP of your SQL Server to 1 (SAP, SHAREPOINT, AX). This is by design for these application because another MAXDOP setting might interfere with the otherwise “performant” workload of the application. Today I would like to show you what the impact is of setting your MAXDOP to 1 on your server actually is.

To start off I would like to give you a small introduction on what MAXDOP actually does. The MAXDOP setting limits the amount of Cores used by one query in your SQL server instance (On SQL Server 2016 you can do this on database level THANK GOD! J ), for example if you put your MD to 2 then the query receives the option to create a parallel plan with 2 cores to execute a query. The MD setting works in parallel with another setting on instance level: Cost Threshold for Parallelism (default 5). The CTFP is the value you set for your instance to start evaluating the possibility to use a parallel plan if you have MD 2/4/8/…. For example, if your CTFP is set to 60 and your query plan only has a cost of 10 your query will not be evaluated to run as parallel, if then however you have another query which has a cost of 80 the query optimizer will check if it is possible to run this query in parallel (depending on your MD setting).

To show you what the impact to your system is I am going to create an OLTP workload & OLAP workload to my Virtual Machine running on my laptop. The OLTP workload will do update inserts and deletes, the OLAP workload will do some read intensive queries onto the system. The OLTP workload will be small queries with a low cost, the OLAP workload will have more impact and calculations which will make them have a higher cost.

The virtual machine will be configured with 4 cores

To run the workload I will be using SQLQueryStress which can be found at ( https://github.com/ErikEJ/SqlQueryStress ) This will use 1 thread and will do 100 iterations for the two workloads.

I will be running all my test against in memory tables to reduce any disk overhead, and just purely look to the CPU usage.

I will be running 5 tests with following setup.

  1. MAXDOP 1 OLAP & OLTP together
  2. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 1 OLAP & OLTP together
  3. MAXDOP 4 COST THRESHOLD FOR PARALLELISM 1 OLAP & OLTP together
  4. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 6 OLAP & OLTP together
  5. MAXDOP 2 COST THRESHOLD FOR PARALLELISM 6 OLAP & OLTP together

I have altered my CTFP setting to 6 because then only my OLAP queries will be eligible for a parallel plan since they have a plan cost of 7.

Here are the results

Workload

MAXDOP

COST THRESHOLD

CPU/ITERATION

SECONDS/ITERATION

ELAPSED TIME

Client Seconds

LogicalReads

OLAP

1

NONE

0,6654

1,0881

22,305

0,2113

21,4595

OLTP

1

NONE

0,2831

0,5072

27,17

0,2522

NONE

OLAP

2

1

0,6595

0,4862

25,263

0,2301

19,4194

OLTP

2

1

0,2898

0,2495

19,1247

0,1479

NONE

OLAP

4

1

0,5921

0,4178

17,8526

0,1658

25,3333

OLTP

4

1

0,1635

0,1648

13,1542

0,1046

NONE

OLAP

2

6

0,4739

0,3581

17,9613

0,1618

31,7011

OLTP

2

6

0,1929

0,1869

12,7859

0,102

NONE

OLAP

4

6

0,4576

0,3861

21,2969

0,1918

32,5532

OLTP

4

6

0,3335

0,3118

19,9687

0,174

NONE

 

When looking at the full elapsed time we can see that following configuration were the fastest.

Shortest OLTP RUN

12,7859 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Shortest OLAP RUN

17,8526 Seconds

MAXDOP 4

COST THRESHOLD FOR PARALLELISM

1

Shortest Full run

30,7472 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

 

When Looking at the CPU user for every iteration we can see that following configurations used the least amount of CPU

Smallest OLTP CPU/Iteration

0,1929

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Smallest OLAP CPU/Iteration

0,4739

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Smallest FULL CPU/Iteration

0,6668

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

 

When looking at the amount of seconds used by the client for every iteration we see the following test to take the least amount of time

Shortest OLTP Client/Iteration

0,1020 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Shortest OLAP Client/Iteration

0,1618 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

6

Shortest FULL Client/Iteration

0,2638 Seconds

MAXDOP2

COST THRESHOLD FOR PARALLELISM

6

 

When looking at the full elapsed time following configuration were the slowest

Slowest OLTP RUN

27,1700 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

Slowest OLAP RUN

25,2630 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

1

Slowest Full run

49,4750 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

 

When looking at the CPU usage for every iteration we can see that following configuration uses the most of the CPU

Largest OLTP CPU/Iteration

0,3335

MAXDOP 4

COST THRESHOLD FOR PARALLELISM

6

Largest OLAP CPU/Iteration

0,6654

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

Largest FULL CPU/Iteration

0,9485

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

 

When looking at the amount of seconds used by every client for every iteration we see that following test were the slowest.

Slowest OLTP Client/Iteration

0,2522 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

Slowest OLAP Client/Iteration

0,2301 Seconds

MAXDOP 2

COST THRESHOLD FOR PARALLELISM

1

Slowest FULL Client/Iteration

0,4635 Seconds

MAXDOP 1

COST THRESHOLD FOR PARALLELISM

NONE

 

Conclusion OLTP & OLAP

 

As we can see in the results the MAXDOP 1 setting is the slowest in comparison to all other tests in full. This means that in most cases you will limit the throughput of your SQL Server if you set your MAXDOP to one.

The fastest configuration for my laptop has been proven to be MAXDOP 2 with COST THRESHOLD FOR PARALLELISM 6 for this workload with this hardware configuration. This is however not an exact science; this has to be tested for every system.

The MAXDOP 1 was the slowest because we are limiting our Read intensive workload to use one core for every query it is executing. Which increases the time for every read query in total, this makes that the workload was the slowest.

The fastest configuration is when we allow the Read intensive workload to use 2 cores and limit the write/update/delete workload to 1 core. Because we are running both workloads together with 1 thread, we are letting our read intensive workload use 2 cores and leaving the 2 other cores available for our OLTP workload. Which means that we are getting an ideal throughput for our workload as seen to our hardware.

What with Data warehouses?

Our data warehouse will primarily have a read intensive aka OLAP workload, with one or more daily loads (INSERT/UPDATES/DELETES) happening. This means that we should get the most throughput on our read queries. I have been to some clients who have set their MAXDOP to 1 for their data warehouse this will have a negative effect on the speed with which your queries will execute. I will show you that now by test doing just the OLAP workload without the OLTP workload interfering.

For this I will do 3 tests

  1. 100 Iterations 1 thread OLAP workload MAXDOP 1
  2. 100 Iterations 1 thread OLAP workload MAXDOP 2
  3. 100 Iterations 1 thread OLAP workload MAXDOP 4

These are the results on my laptop

Workload

MAXDOP

CPU ITERATION

Seconds Iteration

Elapsed Time

OLAP

1

0,5297

0,5353

27,5624

OLAP

2

0,3703

0,2058

10,9843

OLAP

4

0,3117

0,1769

9,3593

 

As you can see in the table above the fastest way of doing read queries is by giving the workload as much cores as possible to increase the amount of reads which can be done in the least amount of time.

MAXDOP 1 is the slowest run, with MAXDOP 2 being 17 SECONDS faster. There is not a lot of difference between the run of MAXDOP 4 and MAXDOP 2 because of hardware limitations.

Conclusion OLAP

Because of the Read intended workload of a Data warehouse we should not put our MAXDOP to 1. As seen in the test above the more cores you have the more read throughput you will have which will make your queries in your DWH go faster. In these test we were 60% faster when using multiple cores for our workload instead of one core.

What with pure write workload?

When you are just writing data (UPDATE INSERT DELETE ) I would also like to know what MAXDOP setting would be the best. To test this, I have created 3 test scenarios

  1. 100 Iterations 1 thread OLTP workload MAXDOP 1
  2. 100 Iterations 1 thread OLTP workload MAXDOP 2
  3. 100 Iterations 1 thread OLTP workload MAXDOP 4

These are the results on my laptop

MAXDOP

CPU SECONDS

ACTUAL SECONDS

ELAPSED TIME

CLIENT SECONDS

1

0,0013

0,0048

1,1576

0,0467

2

0,0037

0,0183

1,3125

0,0568

4

0,006

0,0305

1,2988

0,0579

 

Conclusion OLTP

When inserting/updating/Deleting data we will be the fastest when running these queries in in a serial way. When running these processes in parallel we will increase the amount of CPU needed for the query and this will increase the time elapsed. When writing to in memory tables as we are doing in this test we see that writing data in amount of CPU time is 50% faster on MAXDOP 1. This means that going in a serial way will be faster than doing this in a parallel way.

Overall Conclusion

We should use caution when setting our MAXDOP setting, we can see that for the two different workloads we have different needs. The read intensive workload will in most cases benefit from a parallel execution, while the write intensive workload will benefit of a serial execution. Because most systems are not READ or WRITE only, I generally think the way to go is try to filter out those read-intensive high cost plans and allow these plans to make use of parallel execution. Thus limiting the amount of queries to be executed in parallel but not removing the option by default. This way we will not hold back our OLTP queries and we will be able to speed up those queries who can really benefit of the parallel execution. This means you should always use the MAXDOP setting in combination with your COST THRESHOLD FOR PARALLELISM SETTING. There is however no golden bullet for these settings and they will always need to be tested out to cope with the workload of your system.

Thank you for reading and stay tuned!

Tagged , , , , , , ,

SQL Saturday #602 Iceland

This week I  got the great news that I will be presenting 2 session on the SQL Saturday #602 in Iceland!

On Saturday the 18th of march in Reykjavik I will be talking on Operational analytics & will be presenting a whole new session on SQL Server problems!

The session on Operational analytics will be an full hour showing you how to combine your OLTP & OLAP workload in one database!

The tombola is a whole new concept! I will be bringing a tombola machine stacked full with SQL Server issues I’ve seen in the field throughout the years. An impartial observer will pick a ball from the bowl which contains an issue. This issue will then be explained & we will show the impact it has on the system. After this we will show you how to fix it on long term & short term. We will keep fishing issues out of the bowl the hour is finished.

All information concerning the SQL Saturday can be found on http://www.sqlsaturday.com/602/EventHome.aspx

I would like to thank the SQL Saturday team of Iceland for giving me the opportunity to do present on their SQL Saturday!

Hope to see a lot of you there!

Stay tuned!

SQL Saturday #536 Gothenburg

I am happy to inform you that I will be speaking on SQL Saturday #536 in Gothenburg!

On Saturday I will be explaining the do’s and dont’s on operational/real-time analytics. A whole hour of fun concerning columnstore indexes & how to combine OLTP & OLAP!

All information concerning the SQL Saturday can be found on http://www.sqlsaturday.com/536/eventhome.aspx

I would like to thank the SQL Saturday team of Gothenburg of giving me the opportunity to do my first international speaking slot!

Hope to see a lot of you there!

Stay tuned!

Natively Stored Procs & In memory Tables: NextGen loading!

Today I was creating a dummy database and wanted to fill this with a certain amount of data. Since I was going to use this database for demo purposes, I created them with foreign keys & Primary keys, and some indexes. (For my next blog on real-time analytics) But during the creation of this database I found that the way I was using to load my data was quite slow, so I decided to load the data into my database using Natively Stored Procedures and in memory tables since this should be the fastest way. The result really baffled me!

Some background information: I’m using the DELL Inspirion 5220 (http://i.dell.com/sites/doccontent/shared-content/data-sheets/en/Documents/Dell_Precision_15_5000_Series_5510_Spec_Sheet.pdf) with 16GB of ram. My virtual machine has access to all cores but is capped to 90%, and my virtual machine has 4096MB of memory & 3584 MB of memory is allocated to SQL Server. The VM is a server core with SQL server 2016. My VM is running on SSD’s.

The First Scenario which I used was loading my data into SQL Server without the usage of In memory & Natively stored procedures. ( A lot of code will follow but this way you can test it yourself)

The following script was used.

— CREATE DATABASE PART

ALTER DATABASE REALTIMEANALYTICS SET SINGLE_USER WITH ROLLBACK IMMEDIATE
USE MASTER
GO
DROP DATABASE IF EXISTS RealtimeAnalytics
GO
CREATE DATABASE [RealtimeAnalytics] CONTAINMENT NONE
ON PRIMARY
( NAME N’RealtimeAnalytics_1′FILENAME N’G:\DATA\RealtimeAnalytics_1.mdf’, SIZE = 131072KB , FILEGROWTH = 131072KB ),
( NAME N’RealtimeAnalytics_2′FILENAME N’G:\DATA\RealtimeAnalytics_2.ndf’, SIZE = 131072KB , FILEGROWTH = 131072KB ),
( NAME N’RealtimeAnalytics_3′FILENAME N’G:\DATA\RealtimeAnalytics_3.ndf’, SIZE = 131072KB , FILEGROWTH = 131072KB ),
( NAME N’RealtimeAnalytics_4′FILENAME N’G:\DATA\RealtimeAnalytics_4.ndf’, SIZE = 131072KB , FILEGROWTH = 131072KB )
LOG ON
( NAME N’RealtimeAnalytics_log’FILENAME N’C:\Data\LOG\RealtimeAnalytics_log.ldf’, SIZE = 131072KB , FILEGROWTH = 131072KB )
GO

I created a Database with 4 files called RealtimeAnalytics. Afterwards I’m adding 4 tables with keys to each other.

— CREATE TABLE PART

CREATE TABLE dbo.Clients
(
ClientID int identity(-2147483648,1)PRIMARY KEY NOT NULL,
ClientName varchar(200),
ClientCreditcard varchar(200)
)
GO
CREATE TABLE dbo.SalesPersons
(
SalesPersonID int identity(-2147483648,1)PRIMARY KEY NOT NULL,
SalesPersonName varchar(300),
SalesPersonDiscount int
)
GO
CREATE TABLE dbo.Item
(
ItemID int identity(-2147483648,1)PRIMARY KEY NOT NULL,
ItemName varchar(300),
ItemPrice numeric(15,2)
)
GO
CREATE TABLE dbo.Production
(
ProductionID int identity (-2147483648,1)PRIMARY KEY NOT NULL,
ClientID int FOREIGN KEY REFERENCES CLIENTS(ClientID)NOT NULL,
SalesPersonID int FOREIGN KEY REFERENCES SalesPersons(SalesPersonID)NOT NULL,
AmountOfItemsSold int,
ItemID int FOREIGN KEY REFERENCES Item(ItemID)NOT NULL,
DateOfSale datetime not null DEFAULT(Getdate())
)
GO

After the creation of the tables I’m going to insert data into the tables in a while loop (not the best solution but hey I will do the same thing with the natively)

—- FILL THE TABLES WITH DATA
DECLARE @Count int = 1
–FILL CLIENTS
WHILE @Count <= 1000
BEGIN
INSERT INTO dbo.Clients
select ‘Name’ +cast(@Count as varchar),‘1111-1111-111’+cast(@Count as varchar)
SET @Count = @Count +1
END
SET @Count = 1
–FILL SalesPersons
WHILE @Count <= 10
BEGIN
INSERT INTO dbo.SalesPersons
select ‘Name’ cast(@Count as varchar), FLOOR(RAND()*(501)+1)
SET @Count = @Count +1
END
SET @Count = 1
–FILL Items
WHILE @Count <= 800
BEGIN
INSERT INTO dbo.Item
select ‘Item’+cast(@Count as varchar),cast(RAND()*(2000001)as numeric(15,2))
SET @Count = @Count +1
END
SET @Count = 1
DECLARE @ClientID int
DECLARE @SalesPersonID int
DECLARE @ItemID int
–FILL Production
WHILE @Count <= 2000000
BEGIN
SET @ClientID =(select ClientID from dbo.Clients where ClientName ‘Name’+CAST(FLOOR(RAND()*(10001)+1)AS VARCHAR))
SET @SalesPersonID =(select SalesPersonID from dbo.SalesPersons where SalesPersonName ‘Name’+CAST(FLOOR(RAND()*(101)+1)AS VARCHAR))
SET @ItemID =(select ItemID from dbo.Item where ItemName =‘Item’+CAST(FLOOR(RAND()*(8001)+1)as varchar))
INSERT INTO dbo.Production
select @ClientID,@SalesPersonID,FLOOR(RAND()*(1001)+1),@ItemID,dateadd(second,floor(rand()*(6048001)+1),getdate())
SET @Count = @Count +1
END

When we run this script we see following workload on the machine, my C drive (SSD is getting hammered, but cpu usage is quite fine)

I decided to stop the script after about an hour and a half, because this was taking way to long

When I check how many records where inserted into the Production Table I see following result

During all that time only about 1 000 000 rows were inserted, this is unacceptable. So I thought let’s use that improved Inmemory table and natively stored procedures and see what the difference is. So I edited the script with following code

On the Create database I made it Memory Optimized enabled

use master
go
alter database RealtimeAnalytics add filegroup RealtimeAnalytics_mod contains memory_optimized_data
go
— adapt filename as needed
alter database RealtimeAnalytics add file (name=‘RealtimeAnalytics_mod’filename=‘G:\data\RealtimeAnalytics_mod’)
to filegroup RealtimeAnalytics_mod
go

I then created the same table structure but inmemory.

–MEMORY OPTIMIZED
use RealtimeAnalytics
go
CREATE TABLE dbo.Clients_MON
(
ClientID int identity(1,1)PRIMARY KEY NONCLUSTERED NOT NULL,
ClientName varchar(200),
ClientCreditcard varchar(200)
)
with (memory_optimized=on)
GO
CREATE TABLE dbo.SalesPersons_MON
(
SalesPersonID int identity(1,1)PRIMARY KEY NONCLUSTERED NOT NULL,
SalesPersonName varchar(300),
SalesPersonDiscount int
)
with (memory_optimized=on)
GO
CREATE TABLE dbo.Item_MON
(
ItemID int identity(1,1)PRIMARY KEY NONCLUSTERED NOT NULL,
ItemName varchar(300),
ItemPrice numeric(15,2)
)
with (memory_optimized=on)
GO
CREATE TABLE dbo.Production_MON
(
ProductionID int identity (1,1PRIMARY KEY NONCLUSTERED NOT NULL,
ClientID int FOREIGN KEY REFERENCES CLIENTS_MON(ClientID)NOT NULL,
SalesPersonID int FOREIGN KEY REFERENCES SalesPersons_MON(SalesPersonID)NOT NULL,
AmountOfItemsSold int,
ItemID int FOREIGN KEY REFERENCES Item_MON(ItemID)NOT NULL,
DateOfSale datetime not null DEFAULT(Getdate())
)
with (memory_optimized=on)

Notice that I can’t set my ID to -2147483648, more info on that here : (https://msdn.microsoft.com/en-us/library/dn247640.aspx). Why would they do that?

But this is out of scope and I’m going to continue by giving you the stored proc that I created. This is exactly the same thing as I did in the previous test, only now I made a native stored proc out of it.

USE RealtimeAnalytics
GO
create procedure dbo.FillDatabase with native_compilationschemabindingexecute as owner
as
begin
atomic 
with (transaction isolation level=snapshotlanguage=N’us_english’)
DECLARE @Count int = 1
–FILL CLIENTS
WHILE @Count <= 1000
BEGIN
INSERT INTO dbo.Clients_MON
select ‘Name’ cast(@Count as varchar),‘1111-1111-111’+cast(@Count as varchar)
SET @Count = @Count +1
END
SET @Count = 1
–FILL SalesPersons
WHILE @Count <= 10
BEGIN
INSERT INTO dbo.SalesPersons_MON
select ‘Name’ +cast(@Count as varchar), FLOOR(RAND()*(501)+1)
SET @Count = @Count +1
END
SET @Count = 1
–FILL Items
WHILE @Count <= 800
BEGIN
INSERT INTO dbo.Item_MON
select ‘Item’ cast(@Count as varchar),cast(RAND()*(2000001as numeric(15,2))
SET @Count = @Count +1
END
SET @Count = 1
DECLARE @ClientID int
DECLARE @SalesPersonID int
DECLARE @ItemID int
–FILL Items
WHILE @Count <= 2000000
BEGIN
SET @ClientID (select ClientID from dbo.Clients_MON where ClientName ‘Name’+CAST(FLOOR(RAND()*(10001)+1AS VARCHAR))
SET @SalesPersonID (select SalesPersonID from dbo.SalesPersons_MON where SalesPersonName ‘Name’+CAST(FLOOR(RAND()*(101)+1)AS VARCHAR))
SET @ItemID (select ItemID from dbo.Item_MON where ItemName ‘Item’+CAST(FLOOR(RAND()*(8001)+1)as varchar))
INSERT INTO dbo.Production_MON
select @ClientID,@SalesPersonID,FLOOR(RAND()*(1001)+1),@ItemID,dateadd(secondfloor(rand()*(6048001)+1),getdate())
SET @Count = @Count +1
END
end

After this was created I created following statement to load the data to inmemory and then switch it to my normal tables using insert into select

EXEC dbo.FillDatabase
GO
SET IDENTITY_INSERT dbo.Clients ON
insert into dbo.Clients(ClientID,ClientName,ClientCreditcard)
select ClientID,ClientName,ClientCreditcard from Clients_MON
SET IDENTITY_INSERT dbo.Clients OFF
GO
SET IDENTITY_INSERT dbo.SalesPersons ON
insert into dbo.SalesPersons (SalesPersonID,SalesPersonName,SalesPersonDiscount)
select SalesPersonID,SalesPersonName,SalesPersonDiscount from SalesPersons_MON
SET IDENTITY_INSERT dbo.SalesPersons OFF
GO
SET IDENTITY_INSERT dbo.Item ON
insert into dbo.Item(ItemID,ItemName,ItemPrice)
select ItemID,ItemName,ItemPrice from Item_MON
SET IDENTITY_INSERT dbo.Item OFF
GO
SET IDENTITY_INSERT dbo.Production ON
insert into dbo.Production (ProductionID,ClientID,SalesPersonID,AmountOfItemsSold,ItemID,DateOfSale)
select ProductionID,ClientID,SalesPersonID,AmountOfItemsSold,ItemID,DateOfSale from Production_MON
SET IDENTITY_INSERT dbo.Production OFF

While executing the last statement we can see that the workload has changed. We no longer see our disk getting hammered (Normal because we are hammering our Memory) and more CPU usage!

After about 8 minutes I got a ping that my query had executed. 8 MINUTES!!!!

So for doing exactly the same thing but first loading them into memory I got my result in my table in 8 minutes and 48 seconds, while with the other way (No In memory tables & native procs) I only had inserted 1 000 000 rows after 1 hour and 30 minutes. This is an excellent performance gain if you ask me! To conclude, let’s just say that In memory optimized tables & natively stored procs are fast, very fast, extremely fast! If you have a high insert rate on your table and the table can’t follow I would suggest trying to implement this scenario because it might be the case that you can greatly benefit from it.

Thanks for reading and stay tuned!

Index maintenance on a VLDB: How to tame the beast!

When working with large data volumes, doing normal day to day maintenance becomes a more difficult issue, in a previous post (https://swyssql.wordpress.com/2015/08/06/dbcc-how-to-check-the-uncheckable-dbcc-on-vldb/) I talked about how to handle DBCC CheckDB on a multi-terabyte database. Today I will tackle another issue I came across while dealing with multi-terabyte databases, Index Maintenance.

Like most of you my absolute favorite solution when it comes to index maintenance is the Ola hallengren solution, this is a perfectly written and solid maintenance solution, but I noticed on VLDB’s this solution will not get passed the fragmentation checking stage of your index.

So in order to tame the VLDB beast, I wrote my own script for handling Index Maintenance on a VLDB. This script can be found on https://www.dropbox.com/s/mmhlxjx7fhln2ky/IndexColStatMaintenance.sql?dl=0

Now first things first: How does the script work?

When using other maintenance solutions, your maintenance will start all over every day, even if your maintenance did not reach all indexes during its maintenance window. To cope with this the VLDB maintenance script will create a few tables which will keep track of the progress of your index maintenance (Checks & Rebuild/Reorganizes), and not start over every day. By doing this we are able to check all indexes & rebuild all indexes. When the whole cycle has been done, it will start over again. What we actually do is divide the full index maintenance workload over a few days.
Another issue I was having that when rebuilding an index which is larger than your TLOG size, your TLOG will grow & cause your disk to get full with all problems which come with it. So in the script is a check built in which will skip the index if not enough space is available at this current moment or in total. It will then get reported in the reporting table, so that you can check which indexes are not getting rebuild and try to get them rebuild in another way.

The parameters

As a normal script I have defined some parameters which you will have to enter to create the index maintenance.

SET @ManagementDatabase =‘master’–> Database in which objects have to be created

This will be the database which you will use to create the objects needed for the script to work

SET @Database =‘WideWorldImporters’–> Database which will be maintained

This will be the database on which the maintenance has to be done

SET @AmountOfMemory = 30 –> Percentage of memory to be used for Online Rebuild Operations(0-100) (EnterpriseOnly)

This will be the amount of memory available for online rebuild. To ensure you don’t use all your memory for one rebuild.

SET @ColumnStatisticsMaintenance ‘Yes’ –>Do you want to maintain Column Statistics? (Yes/No)

If you also want to maintain the column statistics you will have to say yes here

SET @MaxRunTime = 300 –> Amount of minutes the Maintenance can run each night (300 = 5hours)

The amount of minutes your script can run every night, after this amount of time no new statements will be launched

SET @JobStartHour = 1 –> F.E. 1 job will start at 01:00 (possible options 1 2 3 4 5 6 7 8 9 10 11 12 13 … 24)

The start hour of your job, this will be the hour the jobs created by the script will be scheduled at.

SET @LockTimeout = 360 –> Set the locktimeout in seconds

The amount of time in seconds a lock can be kept by de index rebuild/reorganize before cancelling the operation

SET @WithOnline ‘YES’ –>Rebuild indexes with the ONLINE rebuild option (YES OR NO)

This will give you the choice to use online rebuilds or not (in case of enterprise: I would suggest YES,but I had one case where it was NO J )

Objects that will be created

Tables

The ColumnStatistics table will contain the workload which will be done for all Column Statistics. The script uses this table to define whether or not it should resample the statistics

The IndexFragTable will contain all information the script needs to rebuild or reorganize your indexes & where it has to continue since last rebuild

The IndexRebuildColumnStatLoggingTable is the most important table for you because it will contain all information on failed rebuilds and error messages and indexes which are not being checked.

Stored Procedures

The dbo.IndexStatsOptimize is the main stored procedure which will be executed to handle you index maintenance.

The dbo.Update IndexStatsTable will update your table and add new indexes & reset after the cycle has been run.

Job


This job will execute the stored procedures. If you need to make alteration to your schedule this is the place to go, and also if you want to change your parameters you can do it here or by altering the dbo.IndexStatsOptimizer stored procedure.

Availability Groups

The script has been made AG-Aware, the only thing you have to do is execute it on both nodes.

I made a mistake

If you made a mistake with your parameters just reexecute the script It will recreate all objects. If you made an error with the name of your database, then you have to manually remove all objects listed above.

I have had a lot of table changes in my database.

The script will check daily for new tables and indexes add them to the list of indexes to be checked and rebuild, it will also look for deleted table and delete them from the list.

If you have any questions feel free to ask.

Thanks for reading and enjoy the script!

Stay tuned!

Tagged , , , ,