Title: Microsoft
Dynamics AX to Enterprise Data Warehouse (EDW) SQL Extract Transform and Load
(ETL) for Multiple Tables:
Author: Toraj
Khavari
Date: November
23, 2015
Objective: Multiple
DB (Data Base) table ETL, Extracting meaningful data from Microsoft Dynamics
AX, Transform the source data, and Loading the data in an Enterprise Data
Warehouse (EDW) are an efficient design. This article will outline a practical solution
for multiple tables from source Microsoft Dynamics AX to EDW using SQL Server
Integration Services (SSIS). Although the focus is on the Microsoft Dynamics AX
tables, the solution described in this article can be applied to any DB ETL. In
this article, we refer to “Staging” DB, “ControlIncremental” and “DimAudit”
tables, and SSIS packages. To fully follow, understand, and take advantage of
this article, refer to “Microsoft Dynamics AX to Enterprise Data Warehouse
(EDW) SQL Extract”.
Article Body: Very
seldom single table ETL meets business needs. The high level business
requirements to meet, in this case, are as follows.
Ø ETL
multiple tables from Microsoft Dynamics AX to the EDW as fast as possible with
minimum impact on the source.
Ø Perform
the multiple ETL with high quality, tractability, and production support.
Ø The
design solution must be simple and scalable. The solution can be applied to
other multiple ETL packages. In addition, the existing multiple ETL packages
can have additional packages (2 to n packages).
Ø The ETL
for multiple tables must be flexible to meet different refresh periods based on
the business needs.
The
outlined solution is not the only answer or claim to be the best possible
design. It is a solid solution that meets our needs. To meet the above
objectives, the solution is in three components.
Ø Pre-Processing
Ø Perform
the multiple ETL
Ø Post–Processing
1-
In
the Visual Studio (VS) “SSIS Packages” add a “New SSIS Package”. Rename it to
“MASTER EXTRACT <Your Business Area> PRODUCTION.dtsx”.
2-
In
the “Variable” tab add the following variables outlined in the following picture.
They will be used in the subsequent sections.
3-
Pre-Processing: The
Pre-Processing, gets and prepares information for the actual multiple ETL tasks.
It contains four components.
a.
Pre-Processing
- Get Incremental Date Range
b.
Get
LastSuccessfulExtractDateTime
c.
Get
CurrentExtractDateTime
d.
Get
AuditKey
a.
Pre-Processing
- Get Incremental Date Range – From the “SSIS Toolbox” add a “Sequence
Container”, rename it to “Pre-Processing - Get Incremental Date Range”. Groups
its member components into subsystem.
b.
Get
LastSuccessfulExtractDateTime - From the “SSIS Toolbox” add an “Execute SQL
Task” inside the “Pre-Processing - Get Incremental Date Range”, rename it to “Get
LastSuccessfulExtractDateTime”. Edit the “Get LastSuccessfulExtractDateTime”.
In the “General” “SQL Statement” field copy and paste the following SQL
statements.
SELECT MAX(ETLLoadDateTime) AS
LastSuccessfulExtractDateTime
FROM ControlIncremental
WHERE SourceTableName = 'MASTEREXTRACTADECPRODUCTION'
In the
“Result Set” identify “Result Name” 0 and “Variable Name” User::LastSuccessfulExtractDateTime.
c.
Get
CurrentExtractDateTime - From the “SSIS Toolbox” add an “Execute SQL Task”
inside the “Pre-Processing - Get Incremental Date Range”, rename it to “Get
CurrentExtractDateTime”. Connect “Get LastSuccessfulExtractDateTime” and “Get
CurrentExtractDateTime”. In the “General” “SQL Statement” field copy and paste
the following SQL statements.
SELECT GETDATE() AS CurrentlExtractDateTime
In the
“Result Set” identify “Result Name” 0 and “Variable Name” User::CurrentExtractDateTime.
Very
much like section 3b.
d.
Get
AuditKey - From the “SSIS Toolbox” add an “Execute SQL Task” outside the “Pre-Processing
- Get Incremental Date Range”, rename it to “Get AuditKey”. Connect “Pre-Processing
- Get Incremental Date Range” and “Get AuditKey”. In the “General” “SQL Statement” field copy
and paste the following SQL statements.
INSERT
INTO [DimAudit]
(ParentAuditKey
,[TableName]
,[PkgName]
--
,[PkgGUID]
--,[PkgVersionGUID]
--,[PkgVersionMajor]
--,[PkgVersionMinor]
,[ExecStartDT]
,[ExecStopDT]
,[ExtractRowCnt]
,[InsertRowCnt]
,[UpdateRowCnt]
,[ErrorRowCnt]
,[TableInitialRowCnt]
,[TableFinalRowCnt]
,[TableMaxSurrogateKey]
,[SuccessfulProcessingInd])
VALUES
(0
,'MASTER EXTRACT <Your
Business> PRODUCTION'
,'MASTER EXTRACT <Your Business>
PRODUCTION'
--
,@PkgGUID
--,@PkgVersionGUID
--,@PkgVersionMajor
--,@PkgVersionMinor
,?
,NULL
,NULL
,NULL
,NULL
,NULL
,NULL
,NULL
,NULL
,'N')
SELECT
CAST(SCOPE_IDENTITY() AS INT) AS AuditKey
In the
“Result Set” identify “Result Name” 0 and “Variable Name” User::AuditKey.
Very
much like section 3b and 3c.
4-
Perform
the multiple ETL:
At this point, we have all the needed key values. We can start the setup to
execute the existing ETL SSIS packages. From the “SSIS Toolbox” add an “Execute
Package Task” outside the “Pre-Processing - Get Incremental Date Range”, rename
it to “ROUTE”. Connect “Get AuditKey” and “ROUTE”. Edit the “ROUTE”. Select “Package”. In the “Package Name From
Project Refrence”, select “ROUTE.dtsx”.
Select the “Parameter bindings” and setup the
“CurrentExtractDateTime” and “MasterAuditKey”.
You can repeat the above steps as many times
as you want, and create parallel ETL execution.
Hints:
·
Balancing the ETL execution parallelism may
require a few runs. The tables’ size, number of rows, and network load affect
the ETL execution run time. During the experimental executions, I recommend
performing them off peak hours. It will give you a more realistic run time
view.
·
The “SQL command text” in the source “OLE DB
Source” for each table ETL can have “Parameters” to provide values for
incremental ETL used in “where [MODIFIEDDATETIME] between ? and ?” statement. E.g. SELECT …. FROM [dbo].[ROUTE] where
[MODIFIEDDATETIME] between ? and ?
·
Limit the number of parallelism. At some
point, parallelism will cause more issues than just improving performance.
5-
Post–Processing: In the
Post-Process, we update the audit information.
a.
Update
AuditDim Row - From the “SSIS Toolbox” add an “Execute SQL Task” outside the “Pre-Processing
- Get Incremental Date Range”, rename it to “Update AuditDim row”. Connect all
the last table ETL branches, in this case 3, and “Update AuditDim row”. Edit
the “Update AuditDim row”. In the
“General” “SQL Statement” copy and paste the following SQL statements.
UPDATE
[DimAudit]
SET
ExecStopDT = SYSDATETIME()
,SuccessfulProcessingInd='Y'
WHERE
AuditKey = ?--@AuditKey;
Select the “Parameter Mapping” and setup the
“User::AuditKey”.
b.
Update
Control Table. From the “SSIS Toolbox” add an “Execute SQL Task” outside the “Pre-Processing
- Get Incremental Date Range”, rename it to “Update Control Table”. Connect “Update
AuditDim row” and “Update Control Table”. Edit the “Update Control Table”. In the “General” “SQL Statement” copy and
paste the following SQL statements.
update
[dbo].[ControlIncremental]
set ETLLoadDateTime=?, PriorETLLoadDateTime=?
where SourceTableName=?
Select the “Parameter Mapping” and setup the
“User::CurrentExtractDateTime”, “User::TableName” and “User::LastSuccessfulExtractDateTime”.
You
are done. Build the solution and test
your Master ETL.
Master
ETL SSIS packages are powerful design methods to logically group tables ETL.
Each Master ETL SSIS can run on different planned schedules to meet the
business needs. Some tables require daily refresh. While others may require
more frequent refresh cycles, e.g., 3-4 times a day, every hour, etc.
My
spacial thanks to Steve Moss and his involvement in this project.
Happy
coding.
Cheers,
Toraj
References:
-
Microsoft
Dynamics AX to Enterprise Data Warehouse (EDW) SQL Extract Transform and Load (ETL)
–
-
Sequence
Container - https://msdn.microsoft.com/en-us/library/ms139855.aspx
Nice information, this is will helpfull a lot, Thank for sharing, Keep do posting i like to follow this informatica online training
ReplyDeleteinformatica online course
informatica bdm training
informatica developer training