Why PostgreSQL's B-Tree is nbtree (Non-Balanced)

As part of our book club diving deep into database internals, my software engineering friends and I recently explored B-Trees, one of the most fundamental data structures used in relational database indexes.

While the classic B-Tree we read about in academic papers describes a "balanced" structure where every node is kept precisely half-full, we were surprised to discover that PostgreSQL implements a variant called nbtree (non-balanced B-Tree) (See src/backend/access/nbtree).

This deviation led us to question: why does PostgreSQL's B-Tree behave differently in practice? This blog post delves into that question.


What Is a B-Tree?

A B-Tree is a self-balancing tree structure that maintains sorted data and enables fast lookup, insertion, and deletion operations in logarithmic time. It is widely used in database indexes because it minimizes disk I/O by ensuring that each node (or page) is filled to a predetermined level before being split. In theory, a classic B-Tree has the following characteristics:

  • Each node contains a sorted list of keys and pointers to child nodes.
  • Internal nodes maintain at least half-full occupancy to ensure balanced tree height.
  • When a node becomes full, it splits into two nodes, pushing a key up to the parent node. When deletion creates an underfilled node, it may merge on the other hand.

This structure ensures that tree height grows logarithmically with data size, optimizing search performance.


Why Is PostgreSQL's B-Tree "Non-Balanced"?

Unlike the theoretical B-Tree, PostgreSQL's nbtree (non-balanced B-Tree) does not strictly enforce the half-full rule. Instead, it allows pages to remain less than half-full and optimizes for real-world workloads rather than theoretical balance. Key differences include:

  1. Avoiding Frequent Page Splits: PostgreSQL does not split a page immediately when it reaches exactly half capacity. Instead, it allows a page to fill up beyond 50%, reducing unnecessary page splits and maintaining better disk locality.

  2. Fill Factor-Based Optimization: PostgreSQL deliberately leaves pages with some free space to accommodate future inserts and updates efficiently. This reduces index fragmentation and prevents frequent page splits in workloads with heavy writes.

  3. No Immediate Merging on Deletion: Unlike strict B-Trees that aggressively merge underfilled nodes, PostgreSQL delays merging unless absolutely necessary. This avoids unnecessary page movements and write amplification, improving performance for high-throughput applications.

These optimizations lead to less frequent tree restructuring, better cache locality, and reduced disk writes, which are crucial for real-world database workloads.


Understanding Fill Factor

One of PostgreSQL’s key optimizations is the concept of fill factor, which defines how full an index page can be before PostgreSQL starts inserting into a new page. By default:

  • Leaf pages have a fill factor of 90%.
  • Internal pages have a fill factor of 70%.

Here is a code comment from the source code:

 * It is not wise to pack the pages entirely full, since then *any*
 * insertion would cause a split (and not only of the leaf page; the need
 * for a split would cascade right up the tree).  The steady-state load
 * factor for btrees is usually estimated at 70%.  We choose to pack leaf
 * pages to the user-controllable fill factor (default 90%) while upper pages
 * are always packed to 70%.  This gives us reasonable density (there aren't
 * many upper pages if the keys are reasonable-size) without risking a lot of
 * cascading splits during early insertions.

The fill factor controls the trade-off between space utilization and write performance:

  • Higher Fill Factor (~100%): Maximizes space efficiency but may lead to more frequent page splits.
  • Lower Fill Factor (<90%): Leaves extra space in each page, reducing the likelihood of page splits and improving write performance.

PostgreSQL allows developers to modify the fill factor for specific indexes based on workload patterns. The following table outlines scenarios where adjusting the fill factor can be beneficial:

Scenario Recommended Benefit
Insert-Heavy Workloads Lower (e.g., 70%) Reduces page splits by keeping space available for new entries
Frequent Updates Lower (e.g., 70%) Avoids page splits and fragmentation by allowing space for new tuple versions
Read-Optimized Indexes Higher (e.g., 95%) Improves storage efficiency and cache performance for read-heavy queries
Bulk Loading Data Adjust as needed Ensures efficient space utilization without excessive fragmentation

To modify an index's fill factor, you can use:

CREATE INDEX my_index ON my_table(column_name) WITH (fillfactor = 70);

or adjust an existing index:

ALTER INDEX my_index SET (fillfactor = 70);

Conclusion

While the B-Tree is often described as a "balanced" data structure in textbooks, PostgreSQL takes a pragmatic approach by implementing nbtree (non-balanced B-Tree) with optimizations for real-world workloads. The fill factor plays a crucial role in managing storage efficiency, reducing page splits, and improving overall database performance. Understanding these nuances allows database engineers to fine-tune PostgreSQL indexes for optimal performance in large-scale applications.

2025-01-27