Load your data into your PDW (part 1 the DWLOADER)

Today I will enlighten you about the incredible fast load times which you will get with a Microsoft Parallel Data Warehouse!

The first thing you have to do when you work with a MSPDW is load your data into the PDW. This initial load will be the most “time consuming” so with this post I hope to help you avoid common mistakes which I made myself with the DWLOADER statement.

The first way to insert data is by developing SSIS packages and then inserting the data into the PDW using the package, I will not be going in deeper on this subject since I prefer working with the second way.

The second way is an exclusive feature for PDW: DWLOADER.EXE. This Command Line based loading tool will ensure the fastest loading times using the infiniband connection. There are many properties to configure DWLOADER, and in this post I will explain how changing these properties will make your load a success and affect your load speeds.

I will start off with giving you guys a sample dwloader statement and from this statement I’ll explain the different values you can configure to increase your loading speed, and I’ll explain the key properties of the DWLOADER (without these properties your load will fail!)

  •  dwloader.exe -M fastappend -m -b 2000000 -i FILEPATH\FILE -T TABLE -R FILEPATH\FILENAME.bad -t \t -r \r\n -dt FILEPATH\dateformatDR.txt -E -S 172.16.254.5 -U BatchUser -P B@tch1234 > “FILEPATH\DWLOADLOGvx.LOG”

  1. Dwloader.exe : this is used to call the dwloader program
  2. –M Fastappend: using fastappend will increase your loadrate significantly! Other options are Append Upsert & reload.
  3. –b: this will define after how many rows you want to insert this load in the columnstore table. It is better to split this up using a custom batch size to avoid tupple mover(tupple mover = stores data in rowstore to dump them afterwards in columnstore) . So it might be interesting to tweak your batch size so that you don’t trigger this rowstore and directly insert into the columnstore.

  4. –i: The path of the file you want to load into your PDW (Best on a location which is connected to the Infiniband to avoid the bottleneck of your ethernet connection)
  5. –T: Is the destination table created in your PDW in which DWLOADER will insert the data.
  6. –R: is the destinationpath were you will keep your log file if something goes wrong with your load (Values that can’t be loaded f.e. wrong dateformat)
  7. –t: is the delimiter which is used in the file (in the case of the sample statement above this is /t which means it’s a tab delimited file)
  8. \r: will define the end of your row. This is defined by \n in the example file which is enter.
  9. –D: If you don’t have multiple dateformats in your file, you can define the default format for your date fields by using the –D property f.e. –D “DD-mm-YYYY”
  10. –dt: If you have multiple date formats in your file you will have to add the format for every datefield in a txt file in which you put the name of the field followed by the format it uses.
    f.e. Birthdate= yyyy-MM-dd
    Startdate= yyyy-MM-dd HH:mm:ss.fff

  11. –S: will define which connection will be used to transfer the data from your flat file to your PDW, be sure to check the IP’s of your Ethernet & Infiniband connection. Using the IP-address of your Infiniband will ensure much faster load speeds
  12. –E: if you have empty values in your Flat File you will have to use –E which will convert these empty values to NULL. Without this property your load will fail!
  13. –U & -P: will define your username and password for your PDW
  14. For additional logging you will have to use > this way you can monitor your load speed a lot better and spot problems a lot faster!

This was an example of a sample DWLOADER statement, however this statement does not guarantee the best loading speed. This is defined by a combination of your dwloader configuration / flat file configuration / Table design. In my next post I will be going in deeper on table design in a PDW explaining the Distribution & Replication theory and how this will affect your load speed.

Advertisements
Tagged , , ,

12 thoughts on “Load your data into your PDW (part 1 the DWLOADER)

  1. Yogesh says:

    Hi can u create SSIS Package for same what u explained.
    Plz. can u do as soon as posible.

    • swyskoh says:

      Hi Yogesh,

      You can also create this with a SSIS package, but this will be a bit slower then when you use DWloader. Are you familiar with SSIS Packages?

      If you keep in mind the 4 pillars(Query, Cardinality, Aggregates, Workload Balance) for your destination Table design, your Load will still have a great speed with SSIS.

      If you want I can upload a tutorial on how I use SSIS for my load?

  2. Sharath S says:

    Our project is trying to migrate to PDW. I have created a PDW database in VS 2013 having the same tables that we have in SQL Server 2012. I am now trying to do a simple load into PDW using dwloader.

    Can you help me with the syntax of loading the data from one table from SQL Server 2012 into PDW. (The table structures are the same).

    Thanks in advance.
    Sharath

  3. alteryxscott says:

    Great article. Very useful!

  4. A says:

    Hi, thanks for the info. I assume upsert will only do updates and inserts, not a merge, i.e. insert, update or delete, correct?

  5. Aditi says:

    Hello.. I am interested in loading a flat file which is pipe delimited into a PDW table. I know how to do this for fixed width and tab delimited.
    on command line if I enter this:
    DWLOADER -M fastappend -m -b 200000 -rv 100 -i -t \| .txt -T .dbo. -R BAD_RESULTS.BAD -w T.txT -fh 0 -r \r\n -D ymd -E -S 10.45.17.19 -U -P LOG.LOG

    And the format file is say
    col1=6
    col2=1
    col3=11
    col4=1
    col5=7
    col6=5
    col7=6

    Each column size is the size of just the datafield. The pipe is going be in between Example: 201207|2|53447062130|1|1040|146|161

    How do you suggest I have the dwloader command set as for this?

  6. Basava says:

    Hi Stijn,

    Can you help me with loading a DBCC output to a table in PDW.

    Regards,
    Basava

  7. leo says:

    can we load a zip file into PDW using dwloader.exe command? is there a special flag for it?

    • SQL Stijn says:

      As far as I know this is not possible, but this could have changed. Best you contact someone at microsoft to ask this. Try to tweet James Rowland Jones, he is the man to go to.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: