Wednesday, October 18, 2017

Title: SharePoint SQL ETL Extract SSIS Package:

Version: 1.0.1018.01.00 - Toraj Khavari – October 18, 2017

Objective: The following process outlines how to develop SharePoint SQL ETL Extract SSIS Package.

Article Body: I faced a business requirement to have lists maintained by data stewards with the following requirements.
-        Easy to use and secure.
-        The lists are outside Data Bases (Prefer Corporate SharePoint).
-        The lists must be available in the corporate Enterprise Data Warehouse (EDW) as dimensions for SQL Cubes and Business Inelegance (BI) reporting.
-        The EDW dimensions must be updated nightly.
-        Do not use Microsoft Access.
-        Do not use custom application.
-        Do not use Master Data Services (MDS). It proved to be difficult to maintain and use.

In this article, the readers should be knowledgeable about Microsoft Visual Studio (VS), SQL Server Integration Services (SSIS), Adjacency List, SharePoint List, SharePoint Security, SQL Security, Identify Management System (IDM), Active Directory (AD), RESTful Web Services, OData, EDW, BI, and SQL Job.

<Side> There are a lot of technologies to bring together for SharePoint SQL ETL Extract SSIS Package solution architecture. It is worth it! Powerful.

This article will focus on an A-dec customer hierarchy adjacency SharePoint list using corporate standard SharePoint Security (in our case IDM empowerID and AD). The list contains customer hierarchy. It is used in reports for customer information rollup and maintained by very specific data stewards.

The solution has multiple steps.
1-     Setup
2-     SSIS Package Development
3-     SSIS Package deployment and SQL job setup (refer to one of my other article). Out of scope of this article.

1.     Setup
1.1.  Create a service ID and password. Have the Service ID be a member of the SharePoint and SQL group with appropriate roles to read and write between source (SharePoint) and destination (SQL).
1.2.  Down load the 32 bits version of additional SSIS tools needed.
SQL Server 2014 - https://www.microsoft.com/en-us/download/details.aspx?id=42295

Hint:
If the SQL Server uses 64 bits, installed the 32Bit OData Source (ODataSourceForSQLServer2012-x86) run the package from visual studio. Once released on the SSIS server, the 64Bit run-time is required 64 (ODataSourceForSQLServer2012-amd64).

1.3.  Identify the SharePoint list URL. In my case as follows.
https://<Corporate Team Site>/sites/adecbi/Lists/Sales%20Hierarchy
1.4.  Validate the Corporate SharePoint is RESTFull enabled. Add “_vti_bin/listdata.svc” to the SharePoint base URL in your favorite internet explorer. Example as follows.
https://<Corporate Team Site>/sites/adecbi/_vti_bin/listdata.svc
A SOAP xml outlines all the components available for Web Services interfaces.

<Side> I validated that Microsoft SharePoint 2010, 2013, and above supports RESTful OData Web Services.

2.     SSIS Package Development
2.1.  I) Add a new SSIS Project to your solution in VS. II) Add a Data Flow Task to the project Control Flow.
2.2.  Open Data Flow for the added Data Flow Task. Click and drag the new OData Source to the Data Flow design tab.
 
2.3.  Open the OData Source. I) Select New, II) Enter the “Service document location” (list URL), III) Select “Use this user name and password”. Enter the service IDs authentication information (from step 1.1.), IV) select “Test Connection” and select “OK”.

2.4.  I) Select “Collection” for “Use collection or resource path:”, II) Select the SharePoint List name, in my case “CustomerHierarchy”.

2.5.  At this point, when selecting the “Preview…” button and “Columns” tabs, there are enough information to add a SSIS DB Destination, SQL Connection Manager and persist the SharePoint data into a SQL table. Refer to “Microsoft Dynamics AX to Enterprise Data Warehouse (EDW) SQL Extract Transform and Load (ETL): article steps 8 d. for data persistence.

You are done. You have SharePoint data in SQL table.

Hints:
1-     The SharePoint Site is imbedded in the SSIS Package OData Source. If the SharePoint site changes, upgrade, the package must be updated. I like to make the SharePoint URL a variable or table driven. Thereafter, the production DBA can update the URL without software release and system development life cycle. I do not have a fix for this, yet.
2-     The SharePoint Lists have version control, in a way, Slow Changing data information. In my case, the SharePoint Lists are not version controlled. The ETL Package nightly will build the EDW dimension from staging data base using SDC (Slow Changing Dimension) technology. Therefore, I found the SharePoint version control not necessary.
3-     The impact of OData Source to SharePoint is like adding another user to the SharePoint site (minimum). In addition, the ETL Extract is done off work demanding hours.

Service Oriented Architecture (SOA) at its finest. Multi-layers, multi-systems, and across technologies interface. RESTful OData great standards.


Happy coding. Toraj