Monthly Archives: July 2016

Natively Stored Procs & In memory Tables: NextGen loading!

Today I was creating a dummy database and wanted to fill this with a certain amount of data. Since I was going to use this database for demo purposes, I created them with foreign keys & Primary keys, and some indexes. (For my next blog on real-time analytics) But during the creation of this database I found that the way I was using to load my data was quite slow, so I decided to load the data into my database using Natively Stored Procedures and in memory tables since this should be the fastest way. The result really baffled me!

Some background information: I’m using the DELL Inspirion 5220 (http://i.dell.com/sites/doccontent/shared-content/data-sheets/en/Documents/Dell_Precision_15_5000_Series_5510_Spec_Sheet.pdf) with 16GB of ram. My virtual machine has access to all cores but is capped to 90%, and my virtual machine has 4096MB of memory & 3584 MB of memory is allocated to SQL Server. The VM is a server core with SQL server 2016. My VM is running on SSD’s.

The First Scenario which I used was loading my data into SQL Server without the usage of In memory & Natively stored procedures. ( A lot of code will follow but this way you can test it yourself)

The following script was used.

— CREATE DATABASE PART

ALTER DATABASE REALTIMEANALYTICS SET SINGLE_USER WITH ROLLBACK IMMEDIATE
USE MASTER
GO
DROP DATABASE IF EXISTS RealtimeAnalytics
GO
CREATE DATABASE [RealtimeAnalytics] CONTAINMENT NONE
ON PRIMARY
( NAME N’RealtimeAnalytics_1′FILENAME N’G:\DATA\RealtimeAnalytics_1.mdf’, SIZE = 131072KB , FILEGROWTH = 131072KB ),
( NAME N’RealtimeAnalytics_2′FILENAME N’G:\DATA\RealtimeAnalytics_2.ndf’, SIZE = 131072KB , FILEGROWTH = 131072KB ),
( NAME N’RealtimeAnalytics_3′FILENAME N’G:\DATA\RealtimeAnalytics_3.ndf’, SIZE = 131072KB , FILEGROWTH = 131072KB ),
( NAME N’RealtimeAnalytics_4′FILENAME N’G:\DATA\RealtimeAnalytics_4.ndf’, SIZE = 131072KB , FILEGROWTH = 131072KB )
LOG ON
( NAME N’RealtimeAnalytics_log’FILENAME N’C:\Data\LOG\RealtimeAnalytics_log.ldf’, SIZE = 131072KB , FILEGROWTH = 131072KB )
GO

I created a Database with 4 files called RealtimeAnalytics. Afterwards I’m adding 4 tables with keys to each other.

— CREATE TABLE PART

CREATE TABLE dbo.Clients
(
ClientID int identity(-2147483648,1)PRIMARY KEY NOT NULL,
ClientName varchar(200),
ClientCreditcard varchar(200)
)
GO
CREATE TABLE dbo.SalesPersons
(
SalesPersonID int identity(-2147483648,1)PRIMARY KEY NOT NULL,
SalesPersonName varchar(300),
SalesPersonDiscount int
)
GO
CREATE TABLE dbo.Item
(
ItemID int identity(-2147483648,1)PRIMARY KEY NOT NULL,
ItemName varchar(300),
ItemPrice numeric(15,2)
)
GO
CREATE TABLE dbo.Production
(
ProductionID int identity (-2147483648,1)PRIMARY KEY NOT NULL,
ClientID int FOREIGN KEY REFERENCES CLIENTS(ClientID)NOT NULL,
SalesPersonID int FOREIGN KEY REFERENCES SalesPersons(SalesPersonID)NOT NULL,
AmountOfItemsSold int,
ItemID int FOREIGN KEY REFERENCES Item(ItemID)NOT NULL,
DateOfSale datetime not null DEFAULT(Getdate())
)
GO

After the creation of the tables I’m going to insert data into the tables in a while loop (not the best solution but hey I will do the same thing with the natively)

—- FILL THE TABLES WITH DATA
DECLARE @Count int = 1
–FILL CLIENTS
WHILE @Count <= 1000
BEGIN
INSERT INTO dbo.Clients
select ‘Name’ +cast(@Count as varchar),‘1111-1111-111’+cast(@Count as varchar)
SET @Count = @Count +1
END
SET @Count = 1
–FILL SalesPersons
WHILE @Count <= 10
BEGIN
INSERT INTO dbo.SalesPersons
select ‘Name’ cast(@Count as varchar), FLOOR(RAND()*(501)+1)
SET @Count = @Count +1
END
SET @Count = 1
–FILL Items
WHILE @Count <= 800
BEGIN
INSERT INTO dbo.Item
select ‘Item’+cast(@Count as varchar),cast(RAND()*(2000001)as numeric(15,2))
SET @Count = @Count +1
END
SET @Count = 1
DECLARE @ClientID int
DECLARE @SalesPersonID int
DECLARE @ItemID int
–FILL Production
WHILE @Count <= 2000000
BEGIN
SET @ClientID =(select ClientID from dbo.Clients where ClientName ‘Name’+CAST(FLOOR(RAND()*(10001)+1)AS VARCHAR))
SET @SalesPersonID =(select SalesPersonID from dbo.SalesPersons where SalesPersonName ‘Name’+CAST(FLOOR(RAND()*(101)+1)AS VARCHAR))
SET @ItemID =(select ItemID from dbo.Item where ItemName =‘Item’+CAST(FLOOR(RAND()*(8001)+1)as varchar))
INSERT INTO dbo.Production
select @ClientID,@SalesPersonID,FLOOR(RAND()*(1001)+1),@ItemID,dateadd(second,floor(rand()*(6048001)+1),getdate())
SET @Count = @Count +1
END

When we run this script we see following workload on the machine, my C drive (SSD is getting hammered, but cpu usage is quite fine)

I decided to stop the script after about an hour and a half, because this was taking way to long

When I check how many records where inserted into the Production Table I see following result

During all that time only about 1 000 000 rows were inserted, this is unacceptable. So I thought let’s use that improved Inmemory table and natively stored procedures and see what the difference is. So I edited the script with following code

On the Create database I made it Memory Optimized enabled

use master
go
alter database RealtimeAnalytics add filegroup RealtimeAnalytics_mod contains memory_optimized_data
go
— adapt filename as needed
alter database RealtimeAnalytics add file (name=‘RealtimeAnalytics_mod’filename=‘G:\data\RealtimeAnalytics_mod’)
to filegroup RealtimeAnalytics_mod
go

I then created the same table structure but inmemory.

–MEMORY OPTIMIZED
use RealtimeAnalytics
go
CREATE TABLE dbo.Clients_MON
(
ClientID int identity(1,1)PRIMARY KEY NONCLUSTERED NOT NULL,
ClientName varchar(200),
ClientCreditcard varchar(200)
)
with (memory_optimized=on)
GO
CREATE TABLE dbo.SalesPersons_MON
(
SalesPersonID int identity(1,1)PRIMARY KEY NONCLUSTERED NOT NULL,
SalesPersonName varchar(300),
SalesPersonDiscount int
)
with (memory_optimized=on)
GO
CREATE TABLE dbo.Item_MON
(
ItemID int identity(1,1)PRIMARY KEY NONCLUSTERED NOT NULL,
ItemName varchar(300),
ItemPrice numeric(15,2)
)
with (memory_optimized=on)
GO
CREATE TABLE dbo.Production_MON
(
ProductionID int identity (1,1PRIMARY KEY NONCLUSTERED NOT NULL,
ClientID int FOREIGN KEY REFERENCES CLIENTS_MON(ClientID)NOT NULL,
SalesPersonID int FOREIGN KEY REFERENCES SalesPersons_MON(SalesPersonID)NOT NULL,
AmountOfItemsSold int,
ItemID int FOREIGN KEY REFERENCES Item_MON(ItemID)NOT NULL,
DateOfSale datetime not null DEFAULT(Getdate())
)
with (memory_optimized=on)

Notice that I can’t set my ID to -2147483648, more info on that here : (https://msdn.microsoft.com/en-us/library/dn247640.aspx). Why would they do that?

But this is out of scope and I’m going to continue by giving you the stored proc that I created. This is exactly the same thing as I did in the previous test, only now I made a native stored proc out of it.

USE RealtimeAnalytics
GO
create procedure dbo.FillDatabase with native_compilationschemabindingexecute as owner
as
begin
atomic 
with (transaction isolation level=snapshotlanguage=N’us_english’)
DECLARE @Count int = 1
–FILL CLIENTS
WHILE @Count <= 1000
BEGIN
INSERT INTO dbo.Clients_MON
select ‘Name’ cast(@Count as varchar),‘1111-1111-111’+cast(@Count as varchar)
SET @Count = @Count +1
END
SET @Count = 1
–FILL SalesPersons
WHILE @Count <= 10
BEGIN
INSERT INTO dbo.SalesPersons_MON
select ‘Name’ +cast(@Count as varchar), FLOOR(RAND()*(501)+1)
SET @Count = @Count +1
END
SET @Count = 1
–FILL Items
WHILE @Count <= 800
BEGIN
INSERT INTO dbo.Item_MON
select ‘Item’ cast(@Count as varchar),cast(RAND()*(2000001as numeric(15,2))
SET @Count = @Count +1
END
SET @Count = 1
DECLARE @ClientID int
DECLARE @SalesPersonID int
DECLARE @ItemID int
–FILL Items
WHILE @Count <= 2000000
BEGIN
SET @ClientID (select ClientID from dbo.Clients_MON where ClientName ‘Name’+CAST(FLOOR(RAND()*(10001)+1AS VARCHAR))
SET @SalesPersonID (select SalesPersonID from dbo.SalesPersons_MON where SalesPersonName ‘Name’+CAST(FLOOR(RAND()*(101)+1)AS VARCHAR))
SET @ItemID (select ItemID from dbo.Item_MON where ItemName ‘Item’+CAST(FLOOR(RAND()*(8001)+1)as varchar))
INSERT INTO dbo.Production_MON
select @ClientID,@SalesPersonID,FLOOR(RAND()*(1001)+1),@ItemID,dateadd(secondfloor(rand()*(6048001)+1),getdate())
SET @Count = @Count +1
END
end

After this was created I created following statement to load the data to inmemory and then switch it to my normal tables using insert into select

EXEC dbo.FillDatabase
GO
SET IDENTITY_INSERT dbo.Clients ON
insert into dbo.Clients(ClientID,ClientName,ClientCreditcard)
select ClientID,ClientName,ClientCreditcard from Clients_MON
SET IDENTITY_INSERT dbo.Clients OFF
GO
SET IDENTITY_INSERT dbo.SalesPersons ON
insert into dbo.SalesPersons (SalesPersonID,SalesPersonName,SalesPersonDiscount)
select SalesPersonID,SalesPersonName,SalesPersonDiscount from SalesPersons_MON
SET IDENTITY_INSERT dbo.SalesPersons OFF
GO
SET IDENTITY_INSERT dbo.Item ON
insert into dbo.Item(ItemID,ItemName,ItemPrice)
select ItemID,ItemName,ItemPrice from Item_MON
SET IDENTITY_INSERT dbo.Item OFF
GO
SET IDENTITY_INSERT dbo.Production ON
insert into dbo.Production (ProductionID,ClientID,SalesPersonID,AmountOfItemsSold,ItemID,DateOfSale)
select ProductionID,ClientID,SalesPersonID,AmountOfItemsSold,ItemID,DateOfSale from Production_MON
SET IDENTITY_INSERT dbo.Production OFF

While executing the last statement we can see that the workload has changed. We no longer see our disk getting hammered (Normal because we are hammering our Memory) and more CPU usage!

After about 8 minutes I got a ping that my query had executed. 8 MINUTES!!!!

So for doing exactly the same thing but first loading them into memory I got my result in my table in 8 minutes and 48 seconds, while with the other way (No In memory tables & native procs) I only had inserted 1 000 000 rows after 1 hour and 30 minutes. This is an excellent performance gain if you ask me! To conclude, let’s just say that In memory optimized tables & natively stored procs are fast, very fast, extremely fast! If you have a high insert rate on your table and the table can’t follow I would suggest trying to implement this scenario because it might be the case that you can greatly benefit from it.

Thanks for reading and stay tuned!

Advertisements

Index maintenance on a VLDB: How to tame the beast!

When working with large data volumes, doing normal day to day maintenance becomes a more difficult issue, in a previous post (https://swyssql.wordpress.com/2015/08/06/dbcc-how-to-check-the-uncheckable-dbcc-on-vldb/) I talked about how to handle DBCC CheckDB on a multi-terabyte database. Today I will tackle another issue I came across while dealing with multi-terabyte databases, Index Maintenance.

Like most of you my absolute favorite solution when it comes to index maintenance is the Ola hallengren solution, this is a perfectly written and solid maintenance solution, but I noticed on VLDB’s this solution will not get passed the fragmentation checking stage of your index.

So in order to tame the VLDB beast, I wrote my own script for handling Index Maintenance on a VLDB. This script can be found on https://www.dropbox.com/s/mmhlxjx7fhln2ky/IndexColStatMaintenance.sql?dl=0

Now first things first: How does the script work?

When using other maintenance solutions, your maintenance will start all over every day, even if your maintenance did not reach all indexes during its maintenance window. To cope with this the VLDB maintenance script will create a few tables which will keep track of the progress of your index maintenance (Checks & Rebuild/Reorganizes), and not start over every day. By doing this we are able to check all indexes & rebuild all indexes. When the whole cycle has been done, it will start over again. What we actually do is divide the full index maintenance workload over a few days.
Another issue I was having that when rebuilding an index which is larger than your TLOG size, your TLOG will grow & cause your disk to get full with all problems which come with it. So in the script is a check built in which will skip the index if not enough space is available at this current moment or in total. It will then get reported in the reporting table, so that you can check which indexes are not getting rebuild and try to get them rebuild in another way.

The parameters

As a normal script I have defined some parameters which you will have to enter to create the index maintenance.

SET @ManagementDatabase =‘master’–> Database in which objects have to be created

This will be the database which you will use to create the objects needed for the script to work

SET @Database =‘WideWorldImporters’–> Database which will be maintained

This will be the database on which the maintenance has to be done

SET @AmountOfMemory = 30 –> Percentage of memory to be used for Online Rebuild Operations(0-100) (EnterpriseOnly)

This will be the amount of memory available for online rebuild. To ensure you don’t use all your memory for one rebuild.

SET @ColumnStatisticsMaintenance ‘Yes’ –>Do you want to maintain Column Statistics? (Yes/No)

If you also want to maintain the column statistics you will have to say yes here

SET @MaxRunTime = 300 –> Amount of minutes the Maintenance can run each night (300 = 5hours)

The amount of minutes your script can run every night, after this amount of time no new statements will be launched

SET @JobStartHour = 1 –> F.E. 1 job will start at 01:00 (possible options 1 2 3 4 5 6 7 8 9 10 11 12 13 … 24)

The start hour of your job, this will be the hour the jobs created by the script will be scheduled at.

SET @LockTimeout = 360 –> Set the locktimeout in seconds

The amount of time in seconds a lock can be kept by de index rebuild/reorganize before cancelling the operation

SET @WithOnline ‘YES’ –>Rebuild indexes with the ONLINE rebuild option (YES OR NO)

This will give you the choice to use online rebuilds or not (in case of enterprise: I would suggest YES,but I had one case where it was NO J )

Objects that will be created

Tables

The ColumnStatistics table will contain the workload which will be done for all Column Statistics. The script uses this table to define whether or not it should resample the statistics

The IndexFragTable will contain all information the script needs to rebuild or reorganize your indexes & where it has to continue since last rebuild

The IndexRebuildColumnStatLoggingTable is the most important table for you because it will contain all information on failed rebuilds and error messages and indexes which are not being checked.

Stored Procedures

The dbo.IndexStatsOptimize is the main stored procedure which will be executed to handle you index maintenance.

The dbo.Update IndexStatsTable will update your table and add new indexes & reset after the cycle has been run.

Job


This job will execute the stored procedures. If you need to make alteration to your schedule this is the place to go, and also if you want to change your parameters you can do it here or by altering the dbo.IndexStatsOptimizer stored procedure.

Availability Groups

The script has been made AG-Aware, the only thing you have to do is execute it on both nodes.

I made a mistake

If you made a mistake with your parameters just reexecute the script It will recreate all objects. If you made an error with the name of your database, then you have to manually remove all objects listed above.

I have had a lot of table changes in my database.

The script will check daily for new tables and indexes add them to the list of indexes to be checked and rebuild, it will also look for deleted table and delete them from the list.

If you have any questions feel free to ask.

Thanks for reading and enjoy the script!

Stay tuned!

Tagged , , , ,

Distributed Availability Groups: Availability groupception!

On the all new SQL Server 2016 we now have a feature which reminds me of the movie Inception with Leo DiCaprio & Tom Hardy. There you had the dreams in dreams in dreams, for the availability groups we now have AG in AG. This makes the Distributed Availbility groups a dream come true for Geographical differences between availability groups. Today’s post will show you how you set up a distributed availability group and what the options are.

I started with setting up 3 nodes in a Hyper-V environment on my local laptop, on these nodes SQL Server 2016 Developer edition was installed. After these installations I added them in my local domain name SQLSTIJN, and installed a failover cluster with the 3 nodes inside.

Now I will start off with setting up my primary availability group on the SS2016 & SS2016P nodes.

CREATE AVAILABILITY GROUP [AG_Ception]
FOR
DATABASE [AGTest] 
REPLICA
ON
N’SS2016\SS2016′
WITH
(ENDPOINT_URL =N’TCP://SS2016.SQLSTIJN.COM:5022′,
FAILOVER_MODE AUTOMATIC,
AVAILABILITY_MODE SYNCHRONOUS_COMMIT,
BACKUP_PRIORITY = 50,
SECONDARY_ROLE(ALLOW_CONNECTIONS=NO),
Seeding_Mode AUTOMATIC),
N’SS2016P\SS2016P’
WITH (
    ENDPOINT_URL N’TCP://SS2016P.SQLSTIJN.COM:5022′,FAILOVER_MODE=AUTOMATIC,
AVAILABILITY_MODE SYNCHRONOUS_COMMIT,
BACKUP_PRIORITY = 50,
SECONDARY_ROLE(ALLOW_CONNECTIONS=NO),
Seeding_Mode AUTOMATIC);

Because we are using the Seeding mode automatic, we still have to join the database and grant the availability group rights to create a database on the secondary replica. In my case I connect to the SS2016P node and execute following command

ALTER AVAILABILITY GROUP [AG_Ception] JOIN
ALTER AVAILABILITY GROUP [AG_Ception] GRANT CREATE ANY DATABASE
GO

Now our the database should be created on the secondary node and start synchronizing using automatic seeding (so easy without the backup restore … J ) (In case of error, check your errorlog à make sure your folders are the same and existing f.e. if you have on the primary d:\data en you have named the folder on the secondary s:\dataa it will not work. So make sure all folders and drives exist and have the same name J )

After you executed all the commands you should see something like this. This has been our normal setup so far, so this is apart from the automatic seeding has no difference with the SQL Server 2014 version.

Now we are going to create a listener for the first availability group.

USE [master]
GO
ALTER AVAILABILITY GROUP [AG_Ception]
ADD LISTENER N’AGCeption1′
(
WITH IP((N’33.0.0.90′,N’255.255.255.0′)), PORT=65000);
GO

After this we are going to create a single node secondary availability group for the SS2016DR node. We are not going to specify a database this time because we will again use the automatic seeding functionality of SQL Server 2016. We will however create it with the seeding_mode automatic to ensure the ability to seed after we join it through a distributed AG. IMPORTANT: Use the same mirroring endpoint otherwise after a local failover you will not be able to replicate.

USE [master]
GO
CREATE AVAILABILITY GROUP [AG_Ception2]
FOR REPLICA ON N’SS2016DR\SS2016DR’
WITH
(ENDPOINT_URL N’TCP://SS2016DR.SQLSTIJN.COM:5022′,
FAILOVER_MODE=MANUAL,
AVAILABILITY_MODE=SYNCHRONOUS_COMMIT,
BACKUP_PRIORITY= 50,
PRIMARY_ROLE(ALLOW_CONNECTIONS=ALL),
SECONDARY_ROLE(ALLOW_CONNECTIONS=NO),
SEEDING_MODE =AUTOMATIC);
GO

Now we create a listener for the second AG_Ception2

USE [master]
GO
ALTER AVAILABILITY GROUP [AG_Ception2]
ADD LISTENER N’AGCeption2′
(
WITH IP((N’33.0.0.91′,N’255.255.255.0′)), PORT=65000);
GO

When we have configured both availability groups it is finally time to add our Distributed Availability group. We will execute this on our current primary node SS2016\SS2016

CREATE AVAILABILITY GROUP [DISTRICEPTION]
WITH (DISTRIBUTED)
AVAILABILITY GROUP ON ‘AG_CEPTION’
WITH (
LISTENER_URL =‘TCP://AGCeption1:5022’,
AVAILABILITY_MODE=ASYNCHRONOUS_COMMIT,
FAILOVER_MODE=MANUAL,
SEEDING_MODE =AUTOMATIC
),
‘AG_CEPTION2’
WITH
(
LISTENER_URL =‘TCP://AGCeption2:5022’,
AVAILABILITY_MODE=ASYNCHRONOUS_COMMIT,
FAILOVER_MODE=MANUAL,
SEEDING_MODE =AUTOMATIC
);
GO

Notice that I use the endpoint port and not the listener port. This is needed otherwise you will get errors J

Now I’m going to join the secondary availability group, I will execute this on the SS2016DR\SS2016DR (Secondary AG)

Before doing this, check your endpoints, apparently mine was not created. Execute this on the secondary node if you don’t have an endpoint.

USE [master]
GO
CREATE ENDPOINT [Hadr_endpoint]
STATE=STARTED
    AS
TCP (LISTENER_PORT= 5022,LISTENER_IP=ALL)
FOR DATA_MIRRORING (ROLE=ALL,AUTHENTICATION=WINDOWS NEGOTIATE,ENCRYPTION=REQUIREDALGORITHM AES)
GO

After this you can join your secondary availability group to the Distributed AG

ALTER AVAILABILITY GROUP [Distriception]
JOIN
AVAILABILITY GROUP ON ‘AG_CEPTION’
WITH
(
LISTENER_URL ‘TCP://AGCeption1:5022’,
AVAILABILITY_MODE ASYNCHRONOUS_COMMIT,
FAILOVER_MODE=MANUAL,
SEEDING_MODE AUTOMATIC
),
‘AG_CEPTION2’
WITH
(
LISTENER_URL ‘TCP://AGCeption2:5022’,
AVAILABILITY_MODE ASYNCHRONOUS_COMMIT,
FAILOVER_MODE=MANUAL,
SEEDING_MODE =AUTOMATIC);
GO

After this you can check your error log and databases from the secondary AG and see that the automatic seeding has started and you now have a database in sync on the secondary AG.

If you see this, then you have successfully set up your Distributed Availability group.

One major downside I find in the current version is that you cannot create a listener on the distributed availability group & there is no synchronous mode. Which means that the whole failover process is manual and will need some editing to DNS.

This is however a very promising feature and hope to see evolve this!

Thanks for reading and stay tuned!

Tagged , , , ,