Monday, November 23, 2015

Title: Microsoft Dynamics AX to Enterprise Data Warehouse (EDW) SQL Extract Transform and Load (ETL) for Multiple Tables:
Author: Toraj Khavari
Date: November 23, 2015

Objective: Multiple DB (Data Base) table ETL, Extracting meaningful data from Microsoft Dynamics AX, Transform the source data, and Loading the data in an Enterprise Data Warehouse (EDW) are an efficient design. This article will outline a practical solution for multiple tables from source Microsoft Dynamics AX to EDW using SQL Server Integration Services (SSIS). Although the focus is on the Microsoft Dynamics AX tables, the solution described in this article can be applied to any DB ETL. In this article, we refer to “Staging” DB, “ControlIncremental” and “DimAudit” tables, and SSIS packages. To fully follow, understand, and take advantage of this article, refer to “Microsoft Dynamics AX to Enterprise Data Warehouse (EDW) SQL Extract”.

Article Body: Very seldom single table ETL meets business needs. The high level business requirements to meet, in this case, are as follows.
Ø  ETL multiple tables from Microsoft Dynamics AX to the EDW as fast as possible with minimum impact on the source.
Ø  Perform the multiple ETL with high quality, tractability, and production support.
Ø  The design solution must be simple and scalable. The solution can be applied to other multiple ETL packages. In addition, the existing multiple ETL packages can have additional packages (2 to n packages).
Ø  The ETL for multiple tables must be flexible to meet different refresh periods based on the business needs.
The outlined solution is not the only answer or claim to be the best possible design. It is a solid solution that meets our needs. To meet the above objectives, the solution is in three components.
Ø  Pre-Processing
Ø  Perform the multiple ETL
Ø  Post–Processing

1-      In the Visual Studio (VS) “SSIS Packages” add a “New SSIS Package”. Rename it to “MASTER EXTRACT <Your Business Area> PRODUCTION.dtsx”.
2-      In the “Variable” tab add the following variables outlined in the following picture. They will be used in the subsequent sections.
3-      Pre-Processing: The Pre-Processing, gets and prepares information for the actual multiple ETL tasks. It contains four components.
a.      Pre-Processing - Get Incremental Date Range
b.      Get LastSuccessfulExtractDateTime
c.       Get CurrentExtractDateTime
d.      Get AuditKey

a.      Pre-Processing - Get Incremental Date Range – From the “SSIS Toolbox” add a “Sequence Container”, rename it to “Pre-Processing - Get Incremental Date Range”. Groups its member components into subsystem.

b.      Get LastSuccessfulExtractDateTime - From the “SSIS Toolbox” add an “Execute SQL Task” inside the “Pre-Processing - Get Incremental Date Range”, rename it to “Get LastSuccessfulExtractDateTime”. Edit the “Get LastSuccessfulExtractDateTime”. In the “General” “SQL Statement” field copy and paste the following SQL statements.
SELECT  MAX(ETLLoadDateTime) AS LastSuccessfulExtractDateTime
FROM        ControlIncremental
WHERE  SourceTableName = 'MASTEREXTRACTADECPRODUCTION'

In the “Result Set” identify “Result Name” 0 and “Variable Name” User::LastSuccessfulExtractDateTime.
c.       Get CurrentExtractDateTime - From the “SSIS Toolbox” add an “Execute SQL Task” inside the “Pre-Processing - Get Incremental Date Range”, rename it to “Get CurrentExtractDateTime”. Connect “Get LastSuccessfulExtractDateTime” and “Get CurrentExtractDateTime”. In the “General” “SQL Statement” field copy and paste the following SQL statements.
SELECT     GETDATE() AS CurrentlExtractDateTime

In the “Result Set” identify “Result Name” 0 and “Variable Name” User::CurrentExtractDateTime.

Very much like section 3b.

d.      Get AuditKey - From the “SSIS Toolbox” add an “Execute SQL Task” outside the “Pre-Processing - Get Incremental Date Range”, rename it to “Get AuditKey”. Connect “Pre-Processing - Get Incremental Date Range” and “Get AuditKey”.  In the “General” “SQL Statement” field copy and paste the following SQL statements.

INSERT INTO [DimAudit]
           (ParentAuditKey
           ,[TableName]
           ,[PkgName]
   --        ,[PkgGUID]
           --,[PkgVersionGUID]
           --,[PkgVersionMajor]
           --,[PkgVersionMinor]
           ,[ExecStartDT]
           ,[ExecStopDT]
           ,[ExtractRowCnt]
           ,[InsertRowCnt]
           ,[UpdateRowCnt]
           ,[ErrorRowCnt]
           ,[TableInitialRowCnt]
           ,[TableFinalRowCnt]
           ,[TableMaxSurrogateKey]
           ,[SuccessfulProcessingInd])
     VALUES
           (0
           ,'MASTER EXTRACT <Your Business> PRODUCTION'
 ,'MASTER EXTRACT <Your Business> PRODUCTION'
    --      ,@PkgGUID
           --,@PkgVersionGUID
           --,@PkgVersionMajor
           --,@PkgVersionMinor
           ,?
           ,NULL
           ,NULL
           ,NULL
           ,NULL
           ,NULL
           ,NULL
           ,NULL
           ,NULL
           ,'N')
SELECT CAST(SCOPE_IDENTITY() AS INT) AS AuditKey

In the “Result Set” identify “Result Name” 0 and “Variable Name” User::AuditKey.

Very much like section 3b and 3c.

4-      Perform the multiple ETL: At this point, we have all the needed key values. We can start the setup to execute the existing ETL SSIS packages. From the “SSIS Toolbox” add an “Execute Package Task” outside the “Pre-Processing - Get Incremental Date Range”, rename it to “ROUTE”. Connect “Get AuditKey” and “ROUTE”. Edit the “ROUTE”.  Select “Package”. In the “Package Name From Project Refrence”, select “ROUTE.dtsx”.

Select the “Parameter bindings” and setup the “CurrentExtractDateTime” and “MasterAuditKey”.
You can repeat the above steps as many times as you want, and create parallel ETL execution.
Hints:
·         Balancing the ETL execution parallelism may require a few runs. The tables’ size, number of rows, and network load affect the ETL execution run time. During the experimental executions, I recommend performing them off peak hours. It will give you a more realistic run time view.
·         The “SQL command text” in the source “OLE DB Source” for each table ETL can have “Parameters” to provide values for incremental ETL used in “where [MODIFIEDDATETIME] between ? and ?” statement. E.g.  SELECT …. FROM [dbo].[ROUTE] where [MODIFIEDDATETIME] between ? and ?
·         Limit the number of parallelism. At some point, parallelism will cause more issues than just improving performance.

5-      Post–Processing: In the Post-Process, we update the audit information.
a.      Update AuditDim Row - From the “SSIS Toolbox” add an “Execute SQL Task” outside the “Pre-Processing - Get Incremental Date Range”, rename it to “Update AuditDim row”. Connect all the last table ETL branches, in this case 3, and “Update AuditDim row”. Edit the “Update AuditDim row”.  In the “General” “SQL Statement” copy and paste the following SQL statements.
UPDATE [DimAudit]
SET ExecStopDT = SYSDATETIME()
            ,SuccessfulProcessingInd='Y'
WHERE AuditKey = ?--@AuditKey;
Select the “Parameter Mapping” and setup the “User::AuditKey”.
b.      Update Control Table. From the “SSIS Toolbox” add an “Execute SQL Task” outside the “Pre-Processing - Get Incremental Date Range”, rename it to “Update Control Table”. Connect “Update AuditDim row” and “Update Control Table”. Edit the “Update Control Table”.  In the “General” “SQL Statement” copy and paste the following SQL statements.
update [dbo].[ControlIncremental]
  set ETLLoadDateTime=?, PriorETLLoadDateTime=?
  where SourceTableName=?
Select the “Parameter Mapping” and setup the “User::CurrentExtractDateTime”, “User::TableName” and “User::LastSuccessfulExtractDateTime”.
You are done.  Build the solution and test your Master ETL.

Master ETL SSIS packages are powerful design methods to logically group tables ETL. Each Master ETL SSIS can run on different planned schedules to meet the business needs. Some tables require daily refresh. While others may require more frequent refresh cycles, e.g., 3-4 times a day, every hour, etc.

My spacial thanks to Steve Moss and his involvement in this project.

Happy coding.
Cheers, Toraj

References:
-          Microsoft Dynamics AX to Enterprise Data Warehouse (EDW) SQL Extract Transform and Load (ETL) –
-          Sequence Container - https://msdn.microsoft.com/en-us/library/ms139855.aspx


1 comment: