Slotted Pages: The Backbone of PostgreSQL’s Data Storage

Database management systems (DBMS) are marvels of engineering, designed to store and retrieve massive amounts of data efficiently. One critical concept that underpins their storage layer is the slotted page.

Slotted pages, as being the most common binary layout in DBMS, serve as the foundation for organizing data on disk, balancing the complexities of space management, performance, and flexibility. In this blog, I’ll dive into the concept of slotted pages.


What Are Slotted Pages?

A slotted page is a data structure used to manage records within a fixed-size block of storage, typically referred to as a page or block. Pages are the smallest unit of I/O for most DBMSs, meaning data is read from and written to disk one page at a time.

A slotted page organizes its contents into two main areas. The first part is called "Header" or "Slot Array" - this section contains metadata about the page and pointers (or offsets) to the records stored in the data area. The second part, "Data Area", contains the actual records, stored contiguously from the end of the page toward the slot array.

The slot array is like a recipe box containing index cards for meals, and the data area is the cabinet storing the actual ingredients. Each card tells you where the ingredients for a specific dish are located in the cabinet. If you reorganize the cabinet (data area), you only need to update the recipe card to reflect the new locations. This approach avoids the chaos of rewriting all recipes just because you moved some ingredients around.

Slot Array, a list of fixed pointers to data slots, grow from left to right. Data Area, a list of dynamic-length tuple data, grow from right to left. The middle area is a free area.

(Slot Array) -----> (Free Space) <----- (Data Area)

This layout achieves minimized fragmentation. When a record is deleted or updated, the space can be reused without needing to shift all records around. The slot array keeps track of where records are stored, even if their positions in the data area change.

The way slot array keeps pointers to data tuples achieves efficient record access, too. As the slot array provides quick access to records via offsets, this layout eliminates the need to search the entire page to find a specific record.


Slotted Pages in PostgreSQL

PostgreSQL uses slotted pages-like data layout to manage data storage within its fixed-size 8 KB blocks. Let’s take a closer look at the specific layout and components of a PostgreSQL slotted page:

1. PageHeaderData

The PageHeaderData is located at the beginning of every page and stores metadata about the page. Key fields include:

  • pd_lsn (8 bytes): Log sequence number, used for crash recovery.
  • pd_lower (2 bytes): Offset to start of free space.
  • pd_upper (2 bytes): Offset to end of free space.

2. ItemIdData (Slot Array)

The list of ItemId, which is a slot array in PostgreSQL, contains fixed-size entries (4 bytes). Each line pointer corresponds to a tuple (or record) stored in the data area. The slot array grows upward from the page header toward the data area.

3. Items (Data Area)

The list of items, which is a data area in PostgreSQL, stores the actual tuples. It grows downward from the end of the page toward the slot array.

4. Free Space

Between the slot array and the data area is free space, which PostgreSQL uses for new or updated tuples. Free space management is critical for avoiding page splits and minimizing fragmentation.


Conclusion

The slotted page is a cornerstone of PostgreSQL’s data storage architecture, enabling efficient management of tuples within fixed-size blocks. By decoupling logical and physical locations of records, PostgreSQL achieves flexibility, performance, and scalability. Its thoughtful design, including the separation of slot arrays and data areas, ensures data can be accessed and managed efficiently, even in the face of updates and deletions.

2025-01-24