A common data warehouse design will incorporate foreign keys on the fact table related to its dimension tables. These keys improve the performance of inner joins and make the joins reliable by insuring rows will not be dropped in a query because we know the primary key table has the referenced values. However, they do come with a price during the loading process — the database needs to validate that each and every value exists in the primary key table before they are inserted. Using Adventure works, I’ll mock up a scenario to demonstrate the point. First, let’s create a new table:
Create table dbo.FactInternetProductSales
(productkey intnot null,
TotalProduct money not null,
constraint [fk_FactInternetProductSales_Product]Foreign key (productkey) references [dbo].[DimProduct] (productkey)
Let’s suppose that we need to insert lots of data into this table each night. Now, factInternetSales has only a whopping 60k rows. But let’s suppose that we have to insert that into our new table each night (with a few additional rows from a cross join to exaggerate the point):
Insert into dbo.FactInternetProductSales
FROM [AdventureWorks].[Sales].[SalesOrderDetail] d
INNER JOIN AdventureWorks.Production.Product pr
ON d.ProductID = pr.ProductID
INNER JOIN DimProduct p on pr.ProductNumber = p.ProductAlternateKey
cross join (SELECT 1 as col UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1) as t;
This produces a query plan like this:
Note the sort which prepares the dataset for a merge join with the product dimension. This lookup validates the foreign key, even though we just did an inner to get those very values. This whole operation is about half the cost of the query plan.
On my system, this query executes in about 5s. However, if I first disable the Foreign key and then re-enable them, doing all the checking at once I get a total duration of 2.6 seconds.
The disabling and re-enabling looks like this:
Alter table dbo.FactInternetProductSales withcheck nocheck constraint all;
Alter table dbo.FactInternetProductSales withcheck check constraint all;
In this case, it was about a 50% cost savings, but it may be a bit more moderate than that depending on the volume of data. A couple of seconds do not matter on a small fact table, but if you expand this same scenario out to wider tables with more foreign keys and larger row counts, the cost savings may be much more dramatic. In the case of very large tables, the task of checking foreign key constraints can be done on a staging table and then switched into the main table.
Considering that this operation needs to happen for every foreign key, the incoming dataset will be sorted for each foreign key. It’s worth comparing for your largest tables and longest running jobs.