This week’s blog topic will be In Memory Row Composition. It’s important to establish a quick basic understanding of how In Memory Data is stored as a primer to future blog entries.
Before going into the topic of this blog post (the primer for future posts), let’s establish a quick primer on SQL Server’s traditional disk based table storage. So a quick primer for the primer!
In traditional disk based tables, table data is stored on 8 KB pages within the database mdf file. At the very high level, these pages contain header information (metadata about the page), data rows and a row offset table. Data rows are written sequentially to these pages. The 8 contiguous pages are stored in extents. This is the basic unit of space managed and read in SQL Server. Some takeaways here. Rows are stored sequentially in the pages and because reads are performed in 64kb blocks, similarly keyed data will be retrieved together.
In Memory Row Composition
In memory data is stored directly in memory. There are no pages or extents. This fundamentally changes the way that data may be read and stored. In Memory rows are composed differently than disk based tables to allow optimization for byte addressable memory. The data itself is not stored in any particular order, in no necessary proximity to adjacent rows. Additionally, In Memory employs are row versioning system (a later blog topic) to eliminate the locking and blocking of data. In order to overcome locking issues, in memory rows are never updated. They are only created, read and destroyed. SQL Server accomplishes all this using metadata for each row.
Each In Memory row is composed of two parts, a Row Header and a Payload. The Payload is the data, all the index keys and all the other columns in the table. The Row Header is where it gets interesting.
As you can see in the image above the row header is divided into at least 5 sections. A Begin Timestamp, End Timestamp, StatementID, Index Link Count and Index Pointers.
The timestamp columns describe the life of a row. When the row is initially created, the BeginTS field is populated with the timestamp of the transaction that created the record. At this point, the EndTS field is populated with a special infinity value and updated with the timestamp of the transaction that deletes the record.
Following these timestamp fields is the StatementID field. This is the unique statement id that created the row.
The next set of fields are the Index Link Count, the number of indexes that reference this row, followed by an Index Pointer field for each index in the table. The index data structure in conjunction with these Index pointer fields link the data together in a Memory Optimized table. This is how data is accessed and order in In Memory tables. As a result every Memory Optimized tables require at least one index.
This was a quick overview of how In Memory rows are composed. It opens the door to future discussions about In Memory Indexes, isolation levels and row versioning In the upcoming blogs we’ll look into how the row header helps SQL Server achieve these various enhancements with In Memory tables.
Jeff Shurak is an Anexinet SQL Server Architect focused on Disaster Recovery and Cloud Migration with over 14 years of data technology experience in ETL development, three-dimensional cube development, and database administration.