Title: SharePoint SQL ETL Extract SSIS Package:
Version: 1.0.1018.01.00
- Toraj Khavari – October 18, 2017
Objective: The
following process outlines how to develop SharePoint SQL ETL Extract SSIS Package.
Article
Body: I
faced a business requirement to have lists maintained by data stewards with the
following requirements.
-
Easy
to use and secure.
-
The
lists are outside Data Bases (Prefer Corporate SharePoint).
-
The
lists must be available in the corporate Enterprise Data Warehouse (EDW) as dimensions
for SQL Cubes and Business Inelegance (BI)
reporting.
-
The
EDW dimensions must be updated nightly.
-
Do
not use Microsoft Access.
-
Do
not use custom application.
In
this article, the readers should be knowledgeable about Microsoft Visual Studio
(VS), SQL Server Integration Services (SSIS), Adjacency List, SharePoint List, SharePoint Security, SQL Security, Identify
Management System (IDM), Active Directory (AD), RESTful Web Services, OData, EDW, BI, and SQL Job.
<Side>
There are a lot of technologies to bring together for SharePoint
SQL ETL Extract SSIS Package solution architecture. It is worth it! Powerful.
This
article will focus on an A-dec customer hierarchy adjacency SharePoint list
using corporate standard SharePoint Security (in our case IDM empowerID and AD). The list contains customer
hierarchy. It is used in reports for customer information rollup and maintained
by very specific data stewards.
The
solution has multiple steps.
1-
Setup
2-
SSIS
Package Development
3-
SSIS
Package deployment and SQL job setup (refer to one of my other article). Out of scope of
this article.
1.
Setup
1.1.
Create
a service ID and password. Have the Service ID be a member of the SharePoint
and SQL group with appropriate roles to read and write between source (SharePoint)
and destination (SQL).
1.2.
Down
load the 32 bits version of additional SSIS tools needed.
SQL Server 2014 - https://www.microsoft.com/en-us/download/details.aspx?id=42295
Hint:
Hint:
If the SQL Server uses 64 bits, installed the 32Bit OData
Source (ODataSourceForSQLServer2012-x86) run the package from visual studio. Once
released on the SSIS server, the 64Bit run-time is required 64 (ODataSourceForSQLServer2012-amd64).
1.3. Identify the SharePoint list URL. In my case as follows.
https://<Corporate
Team Site>/sites/adecbi/Lists/Sales%20Hierarchy
1.4. Validate the Corporate SharePoint is RESTFull enabled. Add “_vti_bin/listdata.svc” to the
SharePoint base URL in your favorite internet explorer. Example as follows.
https://<Corporate
Team Site>/sites/adecbi/_vti_bin/listdata.svc
A SOAP xml outlines all the components available for Web
Services interfaces.
<Side> I validated that Microsoft
SharePoint 2010, 2013, and above supports RESTful OData Web Services.
2.
SSIS
Package Development
2.1.
I)
Add a new SSIS Project to your solution in VS. II) Add a Data Flow Task to the
project Control Flow.
2.2.
Open
Data Flow for the added Data Flow Task. Click and drag the new OData Source to
the Data Flow design tab.
2.3.
Open
the OData Source. I) Select New, II) Enter the “Service document location”
(list URL), III) Select “Use this user name and password”. Enter the service
IDs authentication information (from step 1.1.), IV) select “Test Connection”
and select “OK”.
2.4.
I)
Select “Collection” for “Use collection or resource path:”, II) Select the
SharePoint List name, in my case “CustomerHierarchy”.
2.5.
At
this point, when selecting the “Preview…” button and “Columns” tabs, there are enough
information to add a SSIS DB Destination, SQL Connection Manager and persist
the SharePoint data into a SQL table. Refer to “Microsoft Dynamics AX to Enterprise Data Warehouse
(EDW) SQL Extract Transform and Load (ETL):” article steps 8
d. for data persistence.
You are done. You have SharePoint data in SQL
table.
Hints:
1- The SharePoint Site is imbedded in the SSIS Package OData
Source. If the SharePoint site changes, upgrade, the package must be updated. I
like to make the SharePoint URL a variable or table driven. Thereafter, the
production DBA can update the URL without software release and system
development life cycle. I do not have a fix for this, yet.
2- The SharePoint Lists have version control, in a way, Slow
Changing data information. In my case, the SharePoint Lists are not version
controlled. The ETL Package nightly will build the EDW dimension from staging data
base using SDC (Slow Changing Dimension) technology. Therefore, I found the
SharePoint version control not necessary.
3- The impact of OData Source to SharePoint is like adding
another user to the SharePoint site (minimum). In addition, the ETL Extract is
done off work demanding hours.
Service Oriented Architecture (SOA) at its
finest. Multi-layers, multi-systems, and across technologies interface. RESTful
OData great standards.
Happy coding. Toraj