Business Intelligence is all about using data to construct meaningful insights that can be consumed by the business. As IT practitioners, we encounter situations that test our ingenuity on a daily basis, and must dig into our toolbox of techniques to overcome any technical challenges.
The Use Case
If you work on the ETL (Extraction/Transformation/Load) side of Business Intelligence, you’ll recognize the use case as a common one: detecting changes in data to enable retrieval and loading into an environment. This is the key to allowing incremental loads. When dealing with small datasets, the issue is not so critical. But when you scale, the data to be loaded must be more targeted in order to keep your system from glowing red with activity. When you have several millions of rows of data to evaluate, it’s a different scenario than when you’re looking at a few thousand.
The Specific Challenge/Goal
The specific challenge/goal is straightforward: get a MAX change date from multiple columns—usually across multiple tables—and get it all done in a performance-friendly manner. My personal scenario involved nine tables in a query (order header, order detail, and seven reference tables with 95 million rows) to check.
Prerequisite/Considerations
- Include accurate CreateDate or ModifiedDate columns in the source to be able to interrogate
- If you don’t have a usable change date, generate one using something like Hashbytes to compare between the stored data and the previous incoming data. Otherwise, you must forgo performing an incremental load.
Even with usable CreateDate or ModifiedDate columns, as the number of columns increases, development (along with the ability to maintain accurate code) becomes harder as the checks grow almost exponentially.
Example:
Just going from two to three columns increases the required checks and complexity.
2 Columns
IF column1 >= Column 2 THEN column 1 IF column2 > Column 1 THEN column 2
3 Columns
IF column1 >= Column 2 AND column 1 >= Column 3 THEN column 1 IF column2 >= Column 3 AND column 2 >= Column 1 THEN column 2 IF column3 >= Column 2 AND column 3 >= Column 1 THEN column 3
… and it keeps on growing with each added column!
Again, my scenario included nine tables in a query (order header, order detail, and seven reference tables) with more than 95 million rows that had potentially changed information I needed to evaluate to determine if it had to be supplied to my data warehouse load. Losing a transaction is not an option.
Options
To let the cat out of the bag: I used a VALUES clause in a column sub-select to let me quickly get the max change date from nine columns across nine tables without adversely affecting performance or creating a coding nightmare. But before we take a closer look, let’s consider some options for performing our check.
In-Line Code
SELECT case WHEN column1 > Column 2 AND column 1 > Column 3 then column1 WHEN column2 > Column 3 AND column 2 > Column 1 then column2 WHEN column3 > Column 2 AND column 3 > Column 1 then column3 END FROM table1, table2, table3
User Defined Function (UDF)
SELECT dbo.GetMaxChangeDate(column1, column2, column3) FROM table1 LEFT JOIN table2 ON … LEFT JOIN table3 ON …
Sub-Select
SELECT (SELECT MAX([Date]) FROM (SELECT column1 AS [Date] FROM table1 where table1.PK_ID = drvTable.PK_ID UNION ALL SELECT column2 FROM table2 where table2.PK_ID = drvTable.PK_ID UNION ALL SELECT column3 FROM table3 where table3.PK_ID = drvTable.PK_ID ) D ) AS MaxChangeDate FROM table1 as drvTable
Sub-Select with a VALUES Clause
SELECT ( SELECT MAX(ChangeDate) FROM (VALUES (column1) ,(column2) ,(column3) ) AS value(ChangeDate) ) AS MaxChangeDate FROM table1 LEFT JOIN table2 ON … LEFT JOIN table3 ON …
Here’s a simple, working SQL example as a take-away (use the AdventureWorks DB).
SELECT od.[SalesOrderID]
,od.[SalesOrderDetailID]
,od.[CarrierTrackingNumber]
,od.[OrderQty]
,od.[ProductID]
,od.[SpecialOfferID]
,od.[UnitPrice]
,od.[UnitPriceDiscount]
,od.[LineTotal]
,od.[rowguid]
,od.[ModifiedDate]
,p.ModifiedDate
,( SELECT MAX(ChangeDate)
FROM (VALUES (od.[ModifiedDate])
,(p.ModifiedDate )
) AS value(ChangeDate)
) AS MaxChangeDate
FROM [Sales].[SalesOrderDetail] od
INNER JOIN [Production].[Product] p
on p.ProductID = od.ProductID
Conclusion
The VALUES clause in a sub-select as part of the column select is a straightforward construct. This usage is a little-known, but powerful, documented TSQL trick. The VALUES clause allows you to materialize the check-in memory to let you perform a streamlined sub-select with the column values resident for the row being processed. Though my developer background would have me leaning towards using the UDF to provide a common code module to support the logic, poor performance on SQL Server negates it. When executed as part of the column list in a SELECT statement, SQL Server must instantiate a copy of the UDF for every row processed. Performance degrades as the number of rows increases, substantial even when the number is in the thousands. The VALUES clause processes in memory and the effect on performance is negligible. Plus, the great thing is that you can use any aggregate function in conjunction with the construct.
This is just one solution (albeit a critical one) to a problem that is itself just one piece of the overall picture. As IT practitioners, we live and die by our experience and the techniques and methods in our toolbox. So, what is in yours? If you have any questions around SQL Server best practices, the simplest and quickest way to solve them is to schedule a consult with Anexinet’s experienced team. To learn more about our Business Intelligence services, click here.
by Vince Napoli

Vince Napoli
Business Intelligence Architect
Vince Napoli is a Data Warehouse and Business Intelligence Architect with over 20 years’ focus on DW/BI and SQL Server. A Mainframe developer since 1987, Vince specializes in MVS technical support, System Testing, Data Warehousing/ Business Intelligence architecture and development, SQL Server DBA/ ETL and database development. He is a Professional Association of SQL Server member and former president of the Philadelphia SQL Server User group.
Let’s get the conversation started
Reach out now to begin your digital transformation
+ 16,659
ZOOM MEETINGS
+ 9,789
HAPPY CLIENTS
+ 5,075
FINISHED PROJECTS
+ 133,967,432
LINES OF CODE
© 2000 - 2021 Anexinet Corp., All rights reserved | Privacy Policy | Cookie Policy
This website uses cookies
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL cookies.
Manage consent
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.