Understanding Bitmap Index Scan and Bitmap Heap Scan in PostgreSQL

When working with PostgreSQL, understanding query execution plans is crucial for optimizing performance. One of the fascinating components of PostgreSQL's query execution is the use of Bitmap Index Scan and Bitmap Heap Scan, particularly for queries involving large datasets or multiple conditions. This blog explores their workings, explains their efficiency, and demonstrates them with diagrams and examples.


What are Bitmap Index Scan and Bitmap Heap Scan?

Bitmap Index Scan uses an index to identify matching rows for a query condition. Instead of directly fetching rows, it creates a bitmap in memory where each bit corresponds to a row in the table. For example, A set bit (1) indicates that the corresponding row matches the condition.

Bitmap Heap Scan uses the bitmap produced by the Bitmap Index Scan to efficiently fetch the actual rows from the heap (table data files). It groups these accesses by data pages, reducing random disk I/O.

In PostgreSQL, Bitmap Index Scan and Bitmap Heap Scan are typically used together, as they are part of a two-step process to efficiently retrieve data.


Think in Analogy: Library

Let's think this in analogy.

Imagine you're in a library and you're searching for books based on certain criteria, like the author's name and publication year. The library has a catalog system (index), say an old Windows computer sitting next to the entrance, that helps you locate books without physically browsing all the shelves (heap).

You first look at the library catalog to find books. Instead of directly collecting the books every time you hit the result, you instead write down the shelf numbers and positions on the shelves. This checklist is a "bitmap".

Apparently, instead of collecting all the books on every search and then pick only what you're looking for, creating a checklit (bitmap) and pick only what you want to read during your summer holiday is much cost-effective (in terms of your calories).


Step-by-Step Example

Let's say you want to execute the following query:

EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE customer_id = 100;

Here is an example query plan output with Bitmap Index Scan and Bitmap Heap Scan.

Bitmap Heap Scan on orders
  Recheck Cond: (customer_id = 100)
  -> Bitmap Index Scan on idx_customer_id
       Index Cond: (customer_id = 100)

This is how they are retrieved. First, Bitmap is created based on the search result from Index File(s). Then actudal data are retrieved from Heap by looking at the bitmap checklist.

Query: SELECT * FROM orders WHERE customer_id = 100;

   +----------------------+        +---------------------------+
   |      Index File      |        |         Heap (Table)      |
   |  (customer_id index) |        |  Unordered Table Storage  |
   +----------------------+        +---------------------------+
            |                                  |
            v                                  |
  1. Bitmap Index Scan                         |
     - Scan index to find                      |
       matching rows                           |
     - Produce bitmap:                         |
       [0, 1, 0, 1, 0, 0, 1]                   |
            |                                  |
            v                                  |
   +-----------------------+                   |
   |     Bitmap (in RAM)   |                   |
   |  [0, 1, 0, 1, 0, 0, 1] |                  |
   +-----------------------+                   |
            |                                  |
            v                                  |
  2. Bitmap Heap Scan                          |
     - Use bitmap to access                    |
       specific rows in the heap:              |
       Fetch TID (e.g., 2, 4, 7)               |
            |                                  |
            v                                  |
   +---------------------------+               |
   |         Heap Data          | <------------+
   |   Row 2: customer_id=100   |
   |   Row 4: customer_id=100   |
   |   Row 7: customer_id=100   |
   +---------------------------+

Combining Bitmaps: BitmapAnd and BitmapOr

PostgreSQL can combine multiple bitmaps for queries involving multiple conditions using BitmapAnd (intersection) or BitmapOr (union).

Here is another query example:

EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE customer_id = 100 AND order_date > '2023-01-01';

In this case, BitmapAnd is expected to combine two bitmap checklists that are created from each WHERE clauses.

Bitmap Heap Scan on orders
  Recheck Cond: (customer_id = 100 AND order_date > '2023-01-01')
  -> BitmapAnd
        -> Bitmap Index Scan on idx_customer_id
             Index Cond: (customer_id = 100)
        -> Bitmap Index Scan on idx_order_date
             Index Cond: (order_date > '2023-01-01')

Before picking actual data from the heap, two bitmaps are combined by AND operation to avoid unnecessary disk I/O.

Query: SELECT * FROM orders WHERE customer_id = 100 AND order_date > '2023-01-01';

  Bitmap Index Scan 1 (customer_id = 100): [0, 1, 0, 1, 0, 0, 1]
  Bitmap Index Scan 2 (order_date > '2023-01-01'): [1, 1, 0, 1, 1, 0, 0]
  -------------------------------------------------------------------
                   BitmapAnd Result: [0, 1, 0, 1, 0, 0, 0]

  Bitmap Heap Scan:
    - Fetch rows 2 and 4 based on the combined bitmap.

Why Are Bitmap Scans Efficient?

  1. Efficient Multi-Condition Queries:

    • Bitmap scans allow combining multiple indexes (e.g., via BitmapAnd/BitmapOr), optimizing queries with multiple conditions.
  2. Reduced Disk I/O:

    • By grouping row accesses by data pages, Bitmap Heap Scan minimizes random I/O operations.
  3. Memory Efficiency:

    • Bitmaps use an in-memory representation that is compact and can handle large datasets effectively.

Key Takeaways

Bitmap Index Scan identifies matching rows using indexes and produces a bitmap representation. Comparingly, Bitmap Heap Scan uses this bitmap to efficiently retrieve the actual rows from the table. Even in case of multi-condition queries, Bitmap operations are effective for large datasets and queries involving multiple conditions by use of BitmapAnd/BitmapOr.

2025-01-12