PostgreSQL: CLOCK-Sweep and Buffer Management

Recently, while reading Chapter 5 of "Database Internals", I came across the CLOCK-sweep algorithm as an alternative to Least Recently Used (LRU) for page replacement. The book explained why LRU, despite being conceptually simple, can be inefficient at scale due to frequent list updates and contention in multi-threaded environments. The mention of CLOCK-sweep piqued my interest, so I decided to check PostgreSQL’s source code to see how it manages its buffer pool.

What I found was fascinating - PostgreSQL implements a CLOCK-sweep algorithm, an efficient approach for handling frequently accessed data pages. This blog post explores what I've found by diving deep into PostgreSQL's internal.


How CLOCK-Sweep Works

"The Internals of PostgreSQL" greatly helped me understanding how it conceptually works.

The CLOCK-sweep algorithm is a lightweight approximation of LRU, designed to minimize the overhead of maintaining an ordered list of recently used pages. It maintains a circular list (clock) of buffer pages, with each page assigned a usage counter instead of a strict recency order.

Here is steps:

  1. Each buffer page has a usage counter, which tracks how frequently it has been accessed.
  2. When a page is accessed, its counter is incremented, indicating recent usage. (source)
  3. When a new page needs to be loaded into memory, the algorithm begins a circular scan of the buffer pool. (source)
  4. If the page has a nonzero usage counter, the counter is decremented and the scan continues. (source)
  5. The scan continues until a suitable victim page is found.

This mechanism ensures that frequently accessed pages stay in memory longer, while older, less-used pages are gradually phased out in a fair and efficient manner.


Why CLOCK-Sweep is Better Than LRU

Traditional LRU maintains pages in a strict order of access, requiring frequent list operations (removing and reinserting pages). While LRU works well in simple cases, it has scalability issues in high-concurrency environments:

When LRU CLOCK-sweep
Multi-Threaded Environments Requires locking every time a page is accessed A simple counter, reducing memory contention
List Manipulations Constant reordering of pages Sweeps through pages when needed
Workload Patterns Evict frequently used pages if access patterns change suddenly Gradually reduces page priority instead of immediately discarding older pages

In short, CLOCK-sweep provides LRU-like behavior but at a fraction of the cost, making it ideal for databases handling large workloads.


When VACUUM-ing...

It's worthwhile to note that PostgreSQL unlikely uses the same page replacement strategy when VACUUM, or a large sequential scan.

This is an intentional design and was implemented for a good reason.

During sequential scans requiring a large pages, page-in will likely does more harm than good as most of pages will not be needed again soon. Leveraging the normal CLOCK-sweep algorithm will end up swapping pages a couple of times by wasting resources for no reasons.

PostgreSQL uses a small (256KB) ring instead, which makes transferring pages from OS cache to shared buffer cache efficient by leveraging L2 cache.


Conclusion

The CLOCK-sweep algorithm is a powerful alternative to traditional LRU, offering similar benefits while avoiding high memory contention and CPU overhead. PostgreSQL’s use of CLOCK-sweep in its buffer manager ensures that frequently accessed pages remain cached, while older, less-used pages are efficiently evicted.

For anyone managing PostgreSQL at scale, understanding how the buffer manager works provides valuable insights into performance tuning and query optimization. Next time you notice high disk I/O or slow queries, remember that PostgreSQL’s CLOCK-sweep algorithm is quietly optimizing memory usage behind the scenes.

2025-02-06