When working with PostgreSQL, understanding query execution plans is crucial for optimizing performance. One of the fascinating components of PostgreSQL's query execution is the use of Bitmap Index Scan and Bitmap Heap Scan, particularly for queries involving large datasets or multiple conditions. This blog explores their workings, explains their efficiency, and demonstrates them with diagrams and examples.
Bitmap Index Scan uses an index to identify matching rows for a query condition. Instead of directly fetching rows, it creates a bitmap in memory where each bit corresponds to a row in the table. For example, A set bit (1
) indicates that the corresponding row matches the condition.
Bitmap Heap Scan uses the bitmap produced by the Bitmap Index Scan to efficiently fetch the actual rows from the heap (table data files). It groups these accesses by data pages, reducing random disk I/O.
In PostgreSQL, Bitmap Index Scan and Bitmap Heap Scan are typically used together, as they are part of a two-step process to efficiently retrieve data.
Let's think this in analogy.
Imagine you're in a library and you're searching for books based on certain criteria, like the author's name and publication year. The library has a catalog system (index), say an old Windows computer sitting next to the entrance, that helps you locate books without physically browsing all the shelves (heap).
You first look at the library catalog to find books. Instead of directly collecting the books every time you hit the result, you instead write down the shelf numbers and positions on the shelves. This checklist is a "bitmap".
Apparently, instead of collecting all the books on every search and then pick only what you're looking for, creating a checklit (bitmap) and pick only what you want to read during your summer holiday is much cost-effective (in terms of your calories).
Let's say you want to execute the following query:
EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE customer_id = 100;
Here is an example query plan output with Bitmap Index Scan and Bitmap Heap Scan.
Bitmap Heap Scan on orders
Recheck Cond: (customer_id = 100)
-> Bitmap Index Scan on idx_customer_id
Index Cond: (customer_id = 100)
This is how they are retrieved. First, Bitmap is created based on the search result from Index File(s). Then actudal data are retrieved from Heap by looking at the bitmap checklist.
Query: SELECT * FROM orders WHERE customer_id = 100;
+----------------------+ +---------------------------+
| Index File | | Heap (Table) |
| (customer_id index) | | Unordered Table Storage |
+----------------------+ +---------------------------+
| |
v |
1. Bitmap Index Scan |
- Scan index to find |
matching rows |
- Produce bitmap: |
[0, 1, 0, 1, 0, 0, 1] |
| |
v |
+-----------------------+ |
| Bitmap (in RAM) | |
| [0, 1, 0, 1, 0, 0, 1] | |
+-----------------------+ |
| |
v |
2. Bitmap Heap Scan |
- Use bitmap to access |
specific rows in the heap: |
Fetch TID (e.g., 2, 4, 7) |
| |
v |
+---------------------------+ |
| Heap Data | <------------+
| Row 2: customer_id=100 |
| Row 4: customer_id=100 |
| Row 7: customer_id=100 |
+---------------------------+
PostgreSQL can combine multiple bitmaps for queries involving multiple conditions using BitmapAnd (intersection) or BitmapOr (union).
Here is another query example:
EXPLAIN ANALYZE
SELECT *
FROM orders
WHERE customer_id = 100 AND order_date > '2023-01-01';
In this case, BitmapAnd is expected to combine two bitmap checklists that are created from each WHERE clauses.
Bitmap Heap Scan on orders
Recheck Cond: (customer_id = 100 AND order_date > '2023-01-01')
-> BitmapAnd
-> Bitmap Index Scan on idx_customer_id
Index Cond: (customer_id = 100)
-> Bitmap Index Scan on idx_order_date
Index Cond: (order_date > '2023-01-01')
Before picking actual data from the heap, two bitmaps are combined by AND operation to avoid unnecessary disk I/O.
Query: SELECT * FROM orders WHERE customer_id = 100 AND order_date > '2023-01-01';
Bitmap Index Scan 1 (customer_id = 100): [0, 1, 0, 1, 0, 0, 1]
Bitmap Index Scan 2 (order_date > '2023-01-01'): [1, 1, 0, 1, 1, 0, 0]
-------------------------------------------------------------------
BitmapAnd Result: [0, 1, 0, 1, 0, 0, 0]
Bitmap Heap Scan:
- Fetch rows 2 and 4 based on the combined bitmap.
Efficient Multi-Condition Queries:
Reduced Disk I/O:
Memory Efficiency:
Bitmap Index Scan identifies matching rows using indexes and produces a bitmap representation. Comparingly, Bitmap Heap Scan uses this bitmap to efficiently retrieve the actual rows from the table. Even in case of multi-condition queries, Bitmap operations are effective for large datasets and queries involving multiple conditions by use of BitmapAnd/BitmapOr.