Why Column Orders in Multi-Column Indexes in PostgreSQL Matter

As a Site Reliability Engineer diving deep into the internals of PostgreSQL, I've come to appreciate the nuances of index design. Among these, the topic of multi-column indexes and their column order has stood out as a fascinating and often misunderstood aspect. This blog aims to shed light on why column order in multi-column indexes matters and how it can significantly impact query performance.

What Are Multi-Column Indexes?

A multi-column index (also known as a composite index) is an index that covers two or more columns of a table. These indexes allow PostgreSQL to efficiently search and sort based on combinations of columns, which can significantly speed up queries that filter or sort on multiple columns simultaneously.

Here’s how you might create a multi-column index:

CREATE INDEX idx_example ON my_table (column1, column2);

This index enables efficient lookups for queries that involve column1 and column2 in specific ways, which we’ll explore in more detail below.

Commands on this blog were tested on Supabase with this version: PostgreSQL 15.8 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit

Why Column Order Matters

PostgreSQL implements multi-column indexes with a specific order in mind. The order of columns in the CREATE INDEX clause determines how the index is structured and what types of queries it can optimize effectively. Multi-column indexes in PostgreSQL support lexicographic ordering, which means that the first column is the most significant, followed by the second, and so on.

This design has important implications for query performance. PostgreSQL can use a multi-column index if the query filters on a prefix of the indexed columns. For example, an index on (column1, column2) can optimize queries that filter on column1 alone or on both column1 and column2. However, it will not help for queries that filter only on column2.

Therefore, when designing a multi-column index, choosing the correct column order can make the difference between lightning-fast queries and sluggish performance. Let’s explore with examples.

Examples

Consider the following table and multi-column index:

CREATE TABLE sales (
    id SERIAL PRIMARY KEY,
    region TEXT,
    product TEXT,
    amount NUMERIC
);

CREATE INDEX idx_sales_region_product ON sales (region, product);

Now, insert some test data:

INSERT INTO sales (region, product, amount)
SELECT
    CASE
        WHEN i % 3 = 0 THEN 'North'
        WHEN i % 3 = 1 THEN 'South'
        ELSE 'East'
    END,
    CASE
        WHEN i % 2 = 0 THEN 'Gadget'
        ELSE 'Widget'
    END,
    (i % 100) * 10
FROM generate_series(1, 10000) s(i);

Next, let’s compare the query plans for different queries using EXPLAIN to observe how the column order in the index affects query performance.

Query 1: Filtering by `region`

EXPLAIN SELECT * FROM sales WHERE region = 'North';

Output:

| QUERY PLAN                                                                             |
| -------------------------------------------------------------------------------------- |
| Bitmap Heap Scan on sales  (cost=1.53..20.63 rows=20 width=100)                        |
|   Recheck Cond: (region = 'North'::text)                                               |
|   ->  Bitmap Index Scan on idx_sales_region_product  (cost=0.00..1.53 rows=20 width=0) |
|         Index Cond: (region = 'North'::text)                                           |

Here, the index is fully utilized since region is the first column in the index.

Query 2: Filtering by `product`

EXPLAIN SELECT * FROM sales WHERE product = 'Gadget';

Output:

| QUERY PLAN                                                |
| --------------------------------------------------------- |
| Seq Scan on sales  (cost=0.00..189.00 rows=5000 width=20) |
|   Filter: (product = 'Gadget'::text)                      |

The index is not used because product is not the first column in the index.

Query 3: Filtering by `region` and `product`

EXPLAIN SELECT * FROM sales WHERE region = 'North' AND product = 'Gadget';

Output:

| QUERY PLAN                                                                                 |
| ------------------------------------------------------------------------------------------ |
| Index Scan using idx_sales_region_product on sales  (cost=0.29..103.60 rows=1666 width=20) |
|   Index Cond: ((region = 'North'::text) AND (product = 'Gadget'::text))                    |

Both columns are utilized efficiently, as the query matches the column order in the index.

This comparison highlights the importance of column order in multi-column indexes. Designing indexes that align with your query patterns ensures optimal performance and minimizes unnecessary scans.

Practical Guidelines for Column Order

To determine the optimal column order for a multi-column index, consider the following:

Query Patterns: Analyze your workload and identify the most common query patterns. Prioritize the column used most frequently in filters or sorts as the first column.
Selectivity: Place the most selective column (i.e., the column that filters out the most rows) first. This reduces the number of rows PostgreSQL needs to scan.
Test with EXPLAIN: Use the EXPLAIN command to analyze query plans and ensure that your indexes are being used effectively.

Conclusion

Multi-column indexes are a powerful tool in PostgreSQL, but their effectiveness depends heavily on the order of columns in the index. By understanding how PostgreSQL uses multi-column indexes and carefully designing indexes to match your query patterns, you can significantly improve query performance and reduce resource consumption.

I hope this blog helps you better understand the importance of column order in multi-column indexes and inspires you to dive deeper into PostgreSQL performance optimization.

2025-01-16

Ken W.

Programmer. Generalist.

Why Column Orders in Multi-Column Indexes in PostgreSQL Matter

What Are Multi-Column Indexes?

Why Column Order Matters

Examples

Query 1: Filtering by region

Query 2: Filtering by product

Query 3: Filtering by region and product

Practical Guidelines for Column Order

Conclusion

Query 1: Filtering by `region`

Query 2: Filtering by `product`

Query 3: Filtering by `region` and `product`