PostgreSQL is known for its robust indexing options, including the Bloom filter index (bloom
), which can improve query performance for specific workloads. This blog will discuss how to read the EXPLAIN
output, why rechecking is needed, and when to choose a Bloom filter index over a B-tree index.
A Bloom filter is a space-efficient, probabilistic data structure designed to test whether an element is part of a set. It’s highly efficient in terms of memory but allows for a small probability of false positives (i.e., reporting that an element exists when it doesn’t).
I'm going to skip explaining Bloom filter in details, as there are many other good online resources explaining how Bloom filter works with nice diagrams or clean reference implementations.
TLDR, if you can accept lossy predictions by having smaller index sizes, you might want to consider Bloom filter index over creating B-tree indexes for every columns in conditions.
Let’s analyze a sample EXPLAIN
output:
Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=0.388..0.388 rows=0 loops=1)
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Index Recheck: 29
Heap Blocks: exact=28
-> Bitmap Index Scan on bloomidx (cost=0.00..1792.00 rows=2 width=0) (actual time=0.356..0.356 rows=29 loops=1)
Index Cond: ((i2 = 898732) AND (i5 = 123451))
Read my another blog if you want to know more about Bitmap Index Scan and Bitmap Heap Scan.
Bitmap Heap Scan:
actual time
shows the time taken for this step, with identical start and end times (0.388..0.388
), meaning the operation completed in a single step.Recheck Cond:
Recheck Cond
specifies conditions applied to verify the accuracy of rows fetched by the index.Rows Removed by Index Recheck:
Heap Blocks:
exact=28
means 28 blocks of the table were accessed to fetch the required rows.Bitmap Index Scan:
Please note that Bloom filters are probabilistic and may return false positives. Rechecking ensures accuracy. The index returns a superset of rows that might match the query, so PostgreSQL must verify each row to confirm whether it meets the conditions. For example, in the output above, 29 rows were identified as candidates but were later discarded. In other words, the Bloom filter reduces the I/O cost of scanning large datasets but introduces a small CPU cost for rechecking.
In practical world, there are some cases you might want to choose
Feature | B-tree Index | Bloom Filter Index |
---|---|---|
Precision | Exact | Approximate (allows false positives) |
Space Usage | Larger | Compact |
Updates | Efficient | Inefficient (static) |
Query Types | Range and equality conditions | Equality conditions only |
Bloom filter indexes in PostgreSQL provide a space-efficient solution for read-heavy workloads involving multi-column equality queries. Understanding their behavior, as seen in EXPLAIN
output, is essential to leverage their strengths and mitigate their limitations. The trade-offs between precision and efficiency make them ideal for certain scenarios but unsuitable for others.
When designing indexes, always consider the query patterns, update frequency, and data characteristics to choose the best indexing strategy. By doing so, you can ensure optimal database performance and resource utilization.