In my last blog post, I provided “how-to” instructions for installing HP Vertica 7.0.1 Client Packages for the Community Edition. As stated previously, the client packages include utilities for you to connect client facing applications to your HP Vertica Community Edition Virtual Machine. An ODBC driver is also included that, once configured properly, will allow you to import data into many of the popular data visualization and analysis tools such as Microsoft Power BI, QlikView, and Tableau Software. From there you’ll be able to design extremely compelling, and information rich content from the semi and unstructured data that can be easily shared with your information stakeholders.
For this demonstration, I will show you how to load data from HP Vertica into a Microsoft Power Pivot data model, and visualize it using Pivot Tables and Pivot Charts. I make the assumption that you have access to a running version of the HP Vertica 7.0.1 Community Edition Virtual Machine, have installed the Client Packages for the Community Edition, configured an ODBC driver to connect to your VM, that you have Excel 2013 Professional Plus Edition installed on your computer, and have a basic understand of Microsoft Power Pivot data models and Pivot Tables.
First, ensure that your HP Vertica Community Edition VM is up a running, and then open a new Excel 2013 workbook. Click the Power Pivot ribbon tab, and choose [Data Model] -> [Manage]. This will bring up the Power Pivot design surface. Next you will choose the [Get External Data] -> [From Other Sources] option where you will be presented with the screen below:
Selecting the [Others (OLEDBODBC)] option brings up the Table Import Wizard. Once the Wizard is up, click on the [Build] button to access the Data Link Properties window will open, and select the [Provider] tab as displayed below:
Highlight the Microsoft OLE DB Provider for ODBC Drivers option and click the next button. This action will direct you to the connection tab where you must specify your connection properties. Use the drop-down arrow to choose the OBDC driver that you configured to connect to your HP Vertica VM then enter the User name and Password, and the catalog name for the database that is running. I am using the VMart example database that ships with HP Vertica which is based on a fictitious department store chain that has an online storefront as well as traditional brick and mortar stores.
It is a good practice to click the [Test Connection] button after entering the user and password credentials to ensure that the specified connection properties are correct. After the Data Link Properties are set, clicking the OK button will take you back to the Table Import Wizard where you can enter a friendly name for the connection.
Click next to choose the method of importing the data (either by selecting from a list of entities, or by writing an SQL query).
I initially chose the first option to select from a list of entities, selected the store.store_sales_fact entity, and clicked the [Finish] button.
This produces the following error message:
Researching the error I was able to find an explanation in the Version 6.0 documentation that states:
“This occurs because, when using the odbc bridge, SSIS doesn’t know the flavor of SQL required to create a new table, so it defaults to SQL Server syntax. If you want to use the ODBC bridge and create a new table with SSIS, then you must manually edit the SQL in the create statement to conform to valid Vertica syntax.”
Although the documentation is referencing SSIS, it is still apropos to importing data into Power Pivot because we are using the same ODBC bridge that defaults to SQL Server syntax. The error message that we received is not very clear, but what is actually happening more specifically is that the bridge is omitting the terminating semicolon that is required when querying an HP Vertica database.
The simple solution is to close the dialog, return to the Table Import Wizard, and use the second option and manually write a SQL query to import the data.
From the Power Pivot design surface, choose [Get External Data] -> [Existing Connections], select the previous data connection, click the Open button.
Select the second option, enter a friendly query name, and a SQL Statement (remembering to terminate the query with a semicolon)
Click finish and see the data load into Power Pivot.
Follow the same steps above and import data about VMart’s products, call center, and a date dimension to obtain attributes to describe the sales date.
- select * from public.product_dimension; (Products)
- select * from online_sales.call_center_dimension; (Call_Center)
- select * from public.date_dimension; (Sale_Date)
- select * from public.customer_dimension; (Customers)
After you have loaded all of the tables, you should have four tables in your Power Pivot model:
Create a new calculated column named [ProductReference] in both the online_sales_fact and products tables using the following DAX:
Now you should be able to successfully relate the two tables on the new calculated column. The resulting data model will look like this in Design View:
At this point it would be best practice to provide friendly names for the columns that will be most useful to your users. You will also want to hide from your users any columns that should not be accessible such as the [ProductReference] calculated column that we created to relate the products and sales tables.
Below is a Power View visualization that I was able to create in Excel 2013: